Many news outlets allow users to contribute comments on topics about daily world events. News articles are the seeds that spring users' interest to contribute content, i.e., comments. An article may attract an apathetic user engagement (several tens of comments) or a spontaneous fervent user engagement (thousands of comments). This environment creates a social dynamic that is little studied. The social dynamics around articles have the potential to reveal interesting facets of the user population at a news outlet. We report some salient finding about these social media based on data collected from 17 news outlets. Analysis of the data reveals interesting insights such as there is an uneven relationship between news outlets and their user populations across outlets. Such observations and others have not been revealed, to our knowledge. Besides, we also study the problem of predicting the total number of user comments a news article will receive. Our main insight is that the early dynamics of user comments contribute the most to an accurate prediction, while news article specific factors have surprisingly little influence. We show that the early arrival rate of comments is the best indicator of the eventual number of comments. We conduct an in-depth analysis of this feature across several dimensions, such as news outlets and news article categories.
Online comments submitted by readers of news articles can provide valuable feedback and critique, personal views and perspectives, and opportunities for discussion. Previously, we manually collect user comments from 17 news outlets for data analysis. However, this kind of manual solution is very limited and cannot work for variety of websites. Therefore, we need to have an automatic solution for thousands of news outlets. We find that most new websites employ third party commenting systems to create and manage their comment sections. One can crawl the comments by sending URL requests to those commenting system servers. We propose an approach of crawling user comments by instantiating URL templates supported by these web servers. We propose a hybrid framework that combines the advantages of labeling functions, in the form of regular expressions, and deep learning. We conduct extensive experiments with thousands of web pages to show that we can crawl comments from many websites with our approach. / Computer and Information Science
Identifer | oai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/6542 |
Date | January 2021 |
Creators | He, Lihong |
Contributors | Dragut, Eduard Constantin, Obradovic, Zoran, Vucetic, Slobodan, Meng, Weiyi |
Publisher | Temple University. Libraries |
Source Sets | Temple University |
Language | English |
Detected Language | English |
Type | Thesis/Dissertation, Text |
Format | 126 pages |
Rights | IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/ |
Relation | http://dx.doi.org/10.34944/dspace/6524, Theses and Dissertations |
Page generated in 0.0018 seconds