• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017

Rydberg, Filip, Tornfors, Jonas January 2017 (has links)
Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web.  The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.
2

Analysis of online news media through visualisation and text clustering

Pasi, Niharika January 2018 (has links)
Online news has grown in frequency and popularity as a convenient source of information for several years. A result of this drastic surge is the increased competition for viewer-ship and prolonged relevance of online news websites. Higher demands by internet audiences have led to the use of sensationalism such as ‘clickbait’ articles or ‘fake news’ to attract more viewers. The subsequent shift in the journalistic approach in new media opened new opportunities to study the behaviour and intent behind the news content. As news publications cater their news to a specific target audience, conclusions about said news outlets and their readers can be deduced from the content they wish to broadcast. In order to understand the nature behind the publication’s choice of producing content, this thesis uses automated text categorisation as a means to analyse the words and phrases used by most news outlets. The thesis acts as a case study for approximately 143,000 online news articles from 15 different publications focused on the United States between the years 2016 and 2017. The focus of this thesis is to create a framework that observes how news articles group themselves based on the most relevant terms in their corpora. Similarly, other forms of analyses were performed to find similar insights that may give an idea about the news structure over a certain period of time. For this thesis, a preliminary quantitative analysis was also conducted before data processing, followed by applying K-means clustering to these articles post-cleansing. The overall categorisation approach and visual analysis provided sufficient data to re-use this framework with further adjustments. The cluster groups deduced that the most common news categories or genres for the selected publications were either politics - with special focus on the U.S. presidential elections - or crime-related news within the U.S and around the world. The visual formations of these clusters heavily implied that the above two categories were distributed even within groups containing other genres like finance or infotainment. Moreover, the added factor of churning out multiple articles and stories per day suggest that mainstream online news websites continue to use broadcast journalism as their main form of communication with their audiences

Page generated in 0.0824 seconds