• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Jointly Mining News and User-Generated Content: Machine Learning, Information and Social Network Perspective

Alshehri, Jumanah, 0000-0002-0077-7173 January 2023 (has links)
The amount of published news articles is steadily increasing, and readers are shifting toward online platforms because of the convenience and affordable technology costs (Shearer, 2021). Users have become more engaged with online news articles. This engagement creates a rich corpus, which makes it a powerful means to understand public opinion, emerging events, and their evolvement. Therefore, many organizations invest in mining this large-scale user-generated content to improve their products, services, and, more importantly, their decision-making process. Studying users’ reactions to online news is essential for social scientists, policymakers, and journalists. This type of engagement is an area of study introduced previously. In the statistical and machine learning community, many survey-based studies tried to understand the users’ behavior by characterizing and categorizing comments in online news. Some studies focus on mining user opinions from social media and online news comments. Other works look into bias in the news and its influence on user-generated content. At the same time, the social network community addresses the problem of mining large-scale online news from different angles. Some work focuses on constructing knowledge graphs from the text. Others focus on building high-level graphs, where nodes are users and posts or documents, and links represent the relationship between nodes. Another line of work looked into the word level of the text. They extracted entities and topics by combining Natural Language Processing and graph techniques. From a Machine Learning perspective, there are three main challenges in all these studies 1) jointly mining massive user-generated data, 2) from multiple sources and platforms, and 3) the unpredictable quality of user-generated content. To address these issues, we tackle the problem of jointly learning and mining valuable information from online news articles and user-generated content. We start by studying and understating the relationship between users’ comments and articles in online news. Where the focus is to understand the level of relevancy between articles and their comments, we labeled a few article-comment pairs in this work. We proposed BERTAC (Alshehri et al.,2021), a BERT-based model that jointly learns article-comment embeddings and infers the relevance class of comment. However, we found that the disagreement among annotators as a part of a human (expert) labeling process produces noisy labels, which affect the performance of supervised learning algorithms. On the other hand, working only with high agreement annotations introduces another challenge: the data imbalance problem (Alshehri et al., 2022). As in many machine learning problems, labeling a sufficient number of examples is costly and time-consuming. Therefore, we propose a framework for aligning comments and news articles under a constrained budget(Alshehri et al., 2023a). The proposed model considers the data imbalanced, where we have only a few examples from one class, in addition, it considers the degrees of annotator disagreement. Within the framework, we consider two solutions, 1) semi-automatic labeling based on human-AI collaboration and 2) synthetic data augmentation. Another critical aspect of mining news articles and user-generated content is understanding emerging events and their associated entities. However, this is challenging, especially with the massive growth of online articles and user-generated content across different platforms. Therefore, we proposed MultiLayerET (Alshehri et al., 2023b), a unified representation of online news articles and comments. This work highlights the relationship between entities and topics in news articles and user-generated content. It projects entities and topics as a multi-layer graph, which gives a high-level understanding of the story behind the large pile of the corpus. We showed that such graphs enrich the textual representation and enhance the model learning performance in many downstream applications, such as media bias classification and fake news detection. / Computer and Information Science

Page generated in 0.1389 seconds