Return to search

Analysis of online news media through visualisation and text clustering

Online news has grown in frequency and popularity as a convenient source of information for several years. A result of this drastic surge is the increased competition for viewer-ship and prolonged relevance of online news websites. Higher demands by internet audiences have led to the use of sensationalism such as ‘clickbait’ articles or ‘fake news’ to attract more viewers. The subsequent shift in the journalistic approach in new media opened new opportunities to study the behaviour and intent behind the news content. As news publications cater their news to a specific target audience, conclusions about said news outlets and their readers can be deduced from the content they wish to broadcast. In order to understand the nature behind the publication’s choice of producing content, this thesis uses automated text categorisation as a means to analyse the words and phrases used by most news outlets. The thesis acts as a case study for approximately 143,000 online news articles from 15 different publications focused on the United States between the years 2016 and 2017. The focus of this thesis is to create a framework that observes how news articles group themselves based on the most relevant terms in their corpora. Similarly, other forms of analyses were performed to find similar insights that may give an idea about the news structure over a certain period of time. For this thesis, a preliminary quantitative analysis was also conducted before data processing, followed by applying K-means clustering to these articles post-cleansing. The overall categorisation approach and visual analysis provided sufficient data to re-use this framework with further adjustments. The cluster groups deduced that the most common news categories or genres for the selected publications were either politics - with special focus on the U.S. presidential elections - or crime-related news within the U.S and around the world. The visual formations of these clusters heavily implied that the above two categories were distributed even within groups containing other genres like finance or infotainment. Moreover, the added factor of churning out multiple articles and stories per day suggest that mainstream online news websites continue to use broadcast journalism as their main form of communication with their audiences

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-361562
Date January 2018
CreatorsPasi, Niharika
PublisherUppsala universitet, Informationssystem
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0014 seconds