Global ETD Search

Return to search

Automatic identification of representative content on Twitter

Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2016. / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 97-103). / Microblogging services, most notably Twitter, have become popular avenues to voice opinions and be active participants of discourse on a wide range of topics. As a consequence, Twitter has become an important part of the political battleground that journalists and political analysts can harness to analyze and understand the narratives that organically form, spread and decline among the public in a political campaign. A challenge with social media is that important discussions around certain issues can be overpowered by majoritarian or controversial topics that provoke strong reactions and attract large audiences. In this thesis we develop a method to identify the specific ideas and sentiments that represent the overall conversation surrounding a topic or event as reflected in collections of tweets. We have developed this method in the context of the 2016 US presidential elections. We present and evaluate a large scale data analytics framework, based on recent advances in deep neural networks, for identifying and analyzing election- related conversation on Twitter on a continuous, longitudinal basis in order to identify representative tweets across prominent election issues. The framework consists of two main components, (1) a dynamic topic model that identifies all tweets related to election issues using knowledge from news stories and continuous learning of Twitter's evolving vocabulary, (2) a semantic model of tweets called Tweet2vec that generates general purpose tweet embeddings used for identifying representative tweets by robust semantic clustering. The topic model performed with an average F-1 score of 0.90 across 22 different election topics on a manually annotated dataset. Tweet2Vec outperformed state-of-the- art algorithms on widely used semantic relatedness and sentiment classification evaluation tasks. To demonstrate the value of the framework, we analyzed tweets leading up to a primary debate and contrasted the automatically identified representative tweets with those that were actually used in the debate. The system was able to identify tweets that represented more semantically diverse conversations around each of the major election issues, in comparison to those that were presented during the debate. This framework may have a broad range of applications, from enabling exemplar-based methods for understanding the gist of large collections of tweets, extensible perhaps to other forms of short text documents, to providing an input for new forms of data-grounded journalism and debate. / by Prashanth Vijayaraghavan. / S.M.

Program in Media Arts and Sciences ()

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/106045
Date	January 2016
Creators	Vijayaraghavan, Prashanth
Contributors	Deb Roy., Program in Media Arts and Sciences (Massachusetts Institute of Technology), Program in Media Arts and Sciences (Massachusetts Institute of Technology)
Publisher	Massachusetts Institute of Technology
Source Sets	M.I.T. Theses and Dissertation
Language	English
Detected Language	English
Type	Thesis
Format	103 pages, application/pdf
Rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.002 seconds

Automatic identification of representative content on Twitter

Description

Links & Downloads

Tags

Additional Fields