Due to online social media, access to information is becoming easier and easier. Meanwhile, the truthfulness of online information is often not guaranteed. Incorrect information, often called misinformation, can have several modalities, and it can spread to multiple social media platforms in different languages, which can be destructive to society. However, academia and industry do not have automated ways to assess the impact of misinformation on social media, preventing the adoption of productive strategies to curb the prevalence of misinformation. In this dissertation, I present my research to build computational pipelines that help measuring and detecting misinformation on social media. My work can be divided into three parts.
The first part focuses on processing misinformation in text form. I first show how to group political news articles from both trustworthy and untrustworthy news outlets into stories. Then I present a measurement analysis on the spread of stories to characterize how mainstream and fringe Web communities influence each other.
The second part is related to analyzing image-based misinformation. It can be further divided into two parts: fauxtography and generic image misinformation. Fauxtography is a special type of image misinformation, where images are manipulated or used out-of-context. In this research, I present how to identify fauxtography on social media by using a fact-checking website (Snopes.com), and I also develop a computational pipeline to facilitate the measurement of these images at scale. I next focus on generic misinformation images related to COVID-19. During the pandemic, text misinformation has been studied in many aspects. However, very little research has covered image misinformation during the COVID-19 pandemic. In this research, I develop a technique to cluster visually similar images together, facilitating manual annotation, to make subsequent analysis possible.
The last part is about the detection of misinformation in text form following a multi-language perspective. This research aims to detect textual COVID-19 related misinformation and what stances Twitter users have towards such misinformation in both English and Chinese. To achieve this goal, I experiment on several natural language processing (NLP) models to investigate their performance on misinformation detection and stance detection in both monolingual and multi-lingual manners. The results show that two models: COVID-Tweet-BERT v2 and BERTweet are generally effective in detecting misinformation and stance in the two above manners. These two models are promising to be applied to misinformation moderation on social media platforms, which heavily depends on identifying misinformation and stance of the author towards this piece of misinformation.
Overall, the results of this dissertation shed light on understanding of online misinformation, and my proposed computational tools are applicable to moderation of social media, potentially benefitting for a more wholesome online ecosystem.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/46251 |
Date | 24 May 2023 |
Creators | Wang, Yuping |
Contributors | Stringhini, Gianluca |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.002 seconds