Spelling suggestions: "subject:"aswedish news 2articles"" "subject:"aswedish news 3articles""
1 |
Exploring NMF and LDA Topic Models of Swedish News ArticlesSvensson, Karin, Blad, Johan January 2020 (has links)
The ability to automatically analyze and segment news articles by their content is a growing research field. This thesis explores the unsupervised machine learning method topic modeling applied on Swedish news articles for generating topics to describe and segment articles. Specifically, the algorithms non-negative matrix factorization (NMF) and the latent Dirichlet allocation (LDA) are implemented and evaluated. Their usefulness in the news media industry is assessed by its ability to serve as a uniform categorization framework for news articles. This thesis fills a research gap by studying the application of topic modeling on Swedish news articles and contributes by showing that this can yield meaningful results. It is shown that Swedish text data requires extensive data preparation for successful topic models and that nouns exclusively and especially common nouns are the most suitable words to use. Furthermore, the results show that both NMF and LDA are valuable as content analysis tools and categorization frameworks, but they have different characteristics, hence optimal for different use cases. Lastly, the conclusion is that topic models have issues since they can generate unreliable topics that could be misleading for news consumers, but that they nonetheless can be powerful methods for analyzing and segmenting articles efficiently on a grand scale by organizations internally. The thesis project is a collaboration with one of Sweden’s largest media groups and its results have led to a topic modeling implementation for large-scale content analysis to gain insight into readers’ interests.
|
2 |
Large-scale Exploratory Text VisualisationAxelsson, Wilma, Engström, Nellie January 2023 (has links)
The amount of available text data has increased rapidly in the latest years, making it difficult for an everyday user to find relevant information. To solve this, NLP and visualisation methods have been developed for extracting valuable information from text and presenting it to the user. The aim of this project is to implement a proof-of-concept visualisation prototype for exploring a large amount of Swedish news articles with related metadata and investigate the temporal and relational aspects of the data. The project was divided into three major parts. In the first part, sketches of the visualisation were designed and evaluated through user tests. The second part consisted of designing and implementing a NLP pipeline, using BERTopic, where both Dynamic Topic Modeling (DTM) and Hierarchical Topic Modeling (HTM) were used. Some parameters of the pipeline were evaluated using evaluation metrics and through visual inspection, for instance a Swedish sentence transformer. The final part consisted of implementing and evaluating the visualisation prototype. The project resulted in a web-based visualisation, presenting the NLP results, with two different views: a top 10 topics view and a hierarchical view containing all topics. The prototype has various features, e.g., clicking and hovering for details-on-demand and options for changing and altering the view. The prototype was then evaluated through an internal case study and user tests. For the user tests, there were two groups of participants: people working in the journalism field and people working closely to the NLP field. Both groups thought there was more value in viewing the top 10 topics view than the hierarchical view. Furthermore, the quality of the top 10 topics view was considered higher overall compared to the hierarchical view. In the end, the result of this project is a proof-of-concept visualisation prototype presenting topics of Swedish news articles, over time and in relation to each other. A few possible improvement possibilities include improving the hierarchical relations between the topics and the run time of the topic model and prototype. Also, the prototype may be further improved with additional features, e.g., real-time data, a map, the full text of the articles and a search function. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
|
Page generated in 0.0629 seconds