Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2017. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (pages 175-185). / This thesis presents and evaluates methods for searching and analyzing social media data in order to improve situational awareness. We begin by proposing a method for network vertex search that looks for the target vertex by sequentially examining the neighbors of a set of "known" vertices. Using a dynamic programming approach, we show that there is always an optimal "block" search policy, in which all of the neighbors of a known vertex are examined before moving on to another vertex. We provide a precise characterization of the optimal policy in two specific cases: (1) when the connections between the known vertices and the target vertex are independent, and (2) when the target vertex is connected to at most one known vertex. We then apply this result to the problem of finding new accounts belonging to Twitter users whose previous accounts had been suspended for extremist activity, quantifying the performance of our optimal search policy in this application against other policies. In this application we use thousands of Twitter accounts related to the Islamic State in Iraq and Syria (ISIS) to develop a behavioral models for these extremist users. These models are used to identify new extremist accounts, identify pairs of accounts belonging to the same user, and predict to whom a user will connect when opening an account. We use this final model to inform our network search application. Finally, we develop a more general application of network search and classification that obtains a set of social media users from a specified location or group. We propose an expand -- classify methodology which recursively collects users that have social network connections to users inside the target location, and then classifies all of the users by maximizing the probability over a factor graph model. This factor graph model accounts for the implications of both observed user profile features and social network connections in inferring location. Using geo-located data to evaluate our method, we find that our classification method typically outperforms Twitter's native search methods in building a dataset of Twitter users in a specific location. / by Christopher E. Marks. / Ph. D.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/112012 |
Date | January 2017 |
Creators | Marks, Christopher E. (Christopher Edward) |
Contributors | John Irvine and Tauhid Zaman., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center. |
Publisher | Massachusetts Institute of Technology |
Source Sets | M.I.T. Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | 185 pages, application/pdf |
Rights | MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission., http://dspace.mit.edu/handle/1721.1/7582 |
Page generated in 0.0019 seconds