Global ETD Search

1	The Insignificance of Feature Frequency in Classifying Gender of Twitter Tweets Kroft, Amanda Marie 11 April 2013 (has links) In 2011, Internet users spent almost 23% of their time on social media sites such as Twitter and Facebook. Twitter alone was estimated to have over 200 million active users. With social media being such a popular online pastime, a tremendous amount of information becomes available from the posts that users put on social media sites. This information has the potential to reveal details about the social media users, such as the relationship between characteristics of the users and what they post. This relationship is a hot research topic and one of the most frequently studied characteristic is the gender of a user. Feature frequency is often included in such a task, but this thesis shows that for Twitter tweets it either does not contribute significantly to gender classification or hinders classification. / McAnulty College and Graduate School of Liberal Arts; / Computational Mathematics / MS; / Thesis; Frequency, Gender, Java, Tweets, Twitter
2	Database for Storing and Analyzing Tweets Posted During Disasters Saha, Debarshi January 1900 (has links) Master of Science / Department of Computer Science / Doina Caragea / In the last few decades, we have witnessed many natural disasters that have shaken the nations across the world. Millions of people have lost their lives, cities have been destroyed, people have gone homeless, injured and their lives have been affected. Sometimes hours or even days after a disaster, people are still stuck in the disaster sites, powerless, homeless and without food, as the rescue teams do not always get information about people in need in a timely manner. Whenever there is a natural disaster like a hurricane or an earthquake, people start tweeting about it. Most of the tweets are posted by users who are in the disaster sites, and may contain information about victims of the disaster: where they are and what the problem is, in what areas the rescue teams should work or focus on, or if someone needs special help. Such information can be very useful for the response teams, which can leverage this information in the recovery or rescue process. However, rescue team are faced with an information overload problem, due to the large number of tweets they need to sift through. To help with this issue, computational approaches can be used to analyze and prioritize information that may be useful to the rescue teams. In this project, we have crawled tweets related to natural disasters, and extracted useful information in CSV files. Then, we have designed and developed a database to store the tweets. The design of the database is such that it will help us to query and gain information about a natural disaster. We have also performed some statistical analysis, such as deriving word clouds of the tweets posted during natural disasters. The analysis shows the areas where the users who post tweet about disaster are highly concerned. The word cloud analysis can help in comparing multiple natural disasters to understand patterns that are common or specific to disasters in terms of how Twitter users talk about them. Disaster, Tweets, Word clouds, database
3	A Large Collection Learning Optimizer Framework Chakravarty, Saurabh 30 June 2017 (has links) Content is generated on the web at an increasing rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social media is the microblogging site Twitter. Twitter is known for its high level of activity during live events, natural disasters, and events of global importance. Challenges with the data in the Twitter universe include the limit of 140 characters on the text length. Because of this limitation, the vocabulary in the Twitter universe includes short abbreviations of sentences, emojis, hashtags, and other non-standard usage. Consequently, traditional text classification techniques are not very effective on tweets. Fortunately, sophisticated text processing techniques like cleaning, lemmatizing, and removal of stop words and special characters will give us clean text which can be further processed to derive richer word semantic and syntactic relationships using state of the art feature selection techniques like Word2Vec. Machine learning techniques, using word features that capture semantic and context relationships, can be of benefit regarding classification accuracy. Improving text classification results on Twitter data would pave the way to categorize tweets relative to human defined real world events. This would allow diverse stakeholder communities to interactively collect, organize, browse, visualize, analyze, summarize, and explore content and sources related to crises, disasters, human rights, inequality, population growth, resiliency, shootings, sustainability, violence, etc. Having the events classified into different categories would help us study causality and correlations among real world events. To check the efficacy of our classifier, we would compare our experimental results with an Association Rules (AR) classifier. This classifier composes its rules around the most discriminating words in the training data. The hierarchy of rules, along with an ability to tune to a support threshold, makes it an effective classifier for scenarios where short text is involved. Traditionally, developing classification systems for these purposes requires a great degree of human intervention. Constantly monitoring new events, and curating training and validation sets, is tedious and time intensive. Significant human capital is required for such annotation endeavors. Also, involved efforts are required to tune the classifier for best performance. Developing and tuning classifiers manually using human intervention would not be a viable option if we are to monitor events and trends in real-time. We want to build a framework that would require very little human intervention to build and choose the best among the available performing classification techniques in our system. Another challenge with classification systems is related to their performance with unseen data. For the classification of tweets, we are continually faced with a situation where a given event contains a certain keyword that is closely related to it. If a classifier, built for a particular event, due to overfitting to what is a biased sample with limited generality, is faced with new tweets with different keywords, accuracy may be reduced. We propose building a system that will use very little training data in the initial iteration and will be augmented with automatically labelled training data from a collection that stores all the incoming tweets. A system that is trained on incoming tweets that are labelled using sophisticated techniques based on rich word vector representation would perform better than a system that is trained on only the initial set of tweets. We also propose to use sophisticated deep learning techniques like Convolutional Neural Networks (CNN) that can capture the combination of the words using an n-gram feature representation. Such sophisticated feature representation could account for the instances when the words occur together. We divide our case studies into two phases: preliminary and final case studies. The preliminary case studies focus on selecting the best feature representation and classification methodology out of the AR and the Word2Vec based Logistic Regression classification techniques. The final case studies focus on developing the augmented semi-supervised training methodology and the framework to develop a large collection learning optimizer to generate a highly performant classifier. For our preliminary case studies, we are able to achieve an F1 score of 0.96 that is based on Word2Vec and Logistic Regression. The AR classifier achieved an F1 score of 0.90 on the same data. For our final case studies, we are able to show improvements of F1 score from 0.58 to 0.94 in certain cases based on our augmented training methodology. Overall, we see improvement in using the augmented training methodology on all datasets. / Master of Science / Content is generated on social media at a very fast pace. Social media content in the form of tweets that is generated by the microblog site Twitter is quite popular for understanding the events and trends that are prevalent at a given point of time across various geographies. Categorizing these tweets into their real-world event categories would be useful for researchers, students, academics and the government. Categorizing tweets to their real-world categories is a challenging task. Our framework involves building a classification system that can learn how to categorize tweets for a given category if it is provided with a few samples of the relevant and non-relevant tweets. The system retrieves additional tweets from an auxiliary data source to further learn what is relevant and irrelevant based on how similar a tweet is to a positive example. Categorizing the tweets in an automated way would be useful in analyzing and studying the events and trends for past and future real-world events. Digital Libraries Text Classification Tweets Apache Spark
4	Analyse wissenschaftlicher Konferenz-Tweets mittels Codebook und der Software Tweet Classifier Lemke, Steffen, Mazarakis, Athanasios 26 March 2018 (has links) (PDF) Mit seiner fokussierten Funktionsweise hat der Mikrobloggingdienst Twitter im Laufe des vergangenen Jahrzehnts eine beachtliche Präsenz als Kommunikationsmedium in diversen Bereichen des Lebens erreicht. Eine besondere Weise, auf die sich die gestiegene Sichtbarkeit Twitters in der täglichen Kommunikation häufig manifestiert, ist die gezielte Verwendung von Hashtags. So nutzen Unternehmen Hashtags um die auf Twitter stattfindenden Diskussionen über ihre Produkte zu bündeln, während Organisatoren von Großveranstaltungen und Fernsehsendungen durch Bekanntgabe ihrer eigenen, offiziellen Hashtags Zuschauer dazu ermutigen, den Dienst parallel zum eigentlichen Event als Diskussionsplattform zu nutzen. [... aus der Einleitung] Twitter Konferenz-Tweets Codebook Tweet Classifier Twitter Conference Tweets Codebook Tweet Classifier ddc:330 rvk:QR 760
5	Tweet-interaktion medBeliebers : En textanalys om hur Justin Bieber konstruerargemenskap med en tilltänkt publik genomtweets på Twitter Söderström, Mimmi January 2013 (has links) Syftet med denna uppsats är att, genom en textanalys av tweets på mikrobloggen Twitter, undersöka hurinteraktion skapas och upprätthålls mellan idol och fans. Exemplet som används är popstjärnan JustinBieber och hur tweets konstrueras på hans Twitter-sida för att bekräfta frågor om pseudo-interaktion,gemenskap och närvaro med sina följare som ofta kallas ”Beliebers”. Jag vill ta reda på vilkakommunikationskoder som används och hur teorier om interaktion kan kopplas till de tweets jagundersöker närmare. Som främsta metod används en kvalitativ textanalys för att se om det går att hittatydliga indikationer genom språkbruk, tilltal och innehåll som kan kopplas till teorier om hurinteraktionen med publiken presenteras, och huruvida publiken ses som okänd eller iakttagbar.Resultatet av studien har visat att den centrala kommunikationsmodellen som används i stjärnansTwitter-flöde fokuserar på gemenskap och samhörighet i budskapet som överförs snarare än självaöverföringen av information mellan till sändare, Bieber, och mottagare; fans, ”Beliebers” och följare. Twitter JB tweets interaktion gemenskap närvaro kontakt publik
6	@therealDonaldTrump EFFECT: DONALD TRUMP’S SOCIAL INFLUENCE THROUGH THE USE OF TWITTER Schuhmeier, Phoenisha 01 June 2019 (has links) There has been a recent rise in the use of social media as a platform for political communication. President Donald Trump who is very influential, due in part to his celebrity status as well as his presidential position, has had the power to influence his millions of followers on twitter. For this research, I used a content analysis and comparative analysis approach on eight tweets made by President Donald Trump which targeted Mexican immigration, Maxine Waters, LeBron James, Don Lemon, the National Football League (NFL) national anthem protesters and Elizabeth Warren and three tweets made by Senator Ted Cruz which targeted Mexican immigration. I found that for Mexican immigration, twitter commenters on Trump’s tweets were more prone to agree with him, as opposed to Cruz’s tweets, where his commenters disagreed with him. Donald Trump Twitter Ted Cruz Mexican Immigration Tweets Other Communication
7	Monitoring Tweets for Depression to Detect At-Risk Users Jamil, Zunaira January 2017 (has links) According to the World Health Organization, mental health is an integral part of health and well-being. Mental illness can affect anyone, rich or poor, male or female. One such example of mental illness is depression. In Canada 5.3% of the population had presented a depressive episode in the past 12 months. Depression is difficult to diagnose, resulting in high under-diagnosis. Diagnosing depression is often based on self-reported experiences, behaviors reported by relatives, and a mental status examination. Currently, author- ities use surveys and questionnaires to identify individuals who may be at risk of depression. This process is time-consuming and costly. We propose an automated system that can identify at-risk users from their public social media activity. More specifically, we identify at-risk users from Twitter. To achieve this goal we trained a user-level classifier using Support Vector Machine (SVM) that can detect at-risk users with a recall of 0.8750 and a precision of 0.7778. We also trained a tweet-level classifier that predicts if a tweet indicates distress. This task was much more difficult due to the imbalanced data. In the dataset that we labeled, we came across 5% distress tweets and 95% non-distress tweets. To handle this class imbalance, we used undersampling methods. The resulting classifier uses SVM and performs with a recall of 0.8020 and a precision of 0.1237. Our system can be used by authorities to find a focused group of at-risk users. It is not a platform for labeling an individual as a patient with depres- sion, but only a platform for raising an alarm so that the relevant authorities could take necessary interventions to further analyze the predicted user to confirm his/her state of mental health. We respect the ethical boundaries relating to the use of social media data and therefore do not use any user identification information in our research. NLP Machine Learning Tweets text mining social media sentiment analysis
8	When Tweets Are Embedded, Who Gains the Upper Hand? : The Discursive Power Struggle on Finnish Digital News Articles before the 2019 Parliamentary Election Lehtinen, Don January 2021 (has links) This Master’s thesis focuses on the discursive power struggle between politicians and journalists on Finnish digital news articles where the politician’s tweets are embedded or quoted in using multimodal discourse analysis. Embedded and quoted tweets are one of the premier links between Twitter and digital news platforms but have for the most part been left out of the field of discourse analysis. This research will try to fill that gap, focusing on a time period of one month before the 2019 parliamentary election in Finland. The research material consists of 18 articles from two of the biggest digital news platforms in Finland, Iltalehti and Ilta-Sanomat. They are analyzed using Machin and Mayr’s seven-part scheme for critical discourse analysis, focusing on the embedded and quoted tweets in relation to the text’s discourse, and also the intertwined textual and the visual sides of the articles. The analysis shows that in most articles, the discourse portrayed in the tweets is not challenged by the journalist, meaning that the politicians most often come on top in the discursive power struggle. The analysis also displays that there are multiple ways of challenging the discourse, but they are seldom used in the power struggle. In conclusion, as the tweets’ discourses often go unchallenged, both the politicians and Twitter as a platform have arguably disproportionate power to influence both the discourse on digital news platforms, as well as the readers of those. embedded tweets discourse power Twitter politics journalism Media Studies Medievetenskap
9	Improving Text Classification Using Graph-based Methods Karajeh, Ola Abdel-Raheem Mohammed 05 June 2024 (has links) Text classification is a fundamental natural language processing task. However, in real-world applications, class distributions are usually skewed, e.g., due to inherent class imbalance. In addition, the task difficulty changes based on the underlying language. When rich morphological structure and high ambiguity are exhibited, natural language understanding can become challenging. For example, Arabic, ranked the fifth most widely used language, has a rich morphological structure and high ambiguity that result from Arabic orthography. Thus, Arabic natural language processing is challenging. Several studies employ Long Short- Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), but Graph Convolutional Networks (GCNs) have not yet been investigated for the task. Sequence- based models can successfully capture semantics in local consecutive text sequences. On the other hand, graph-based models can preserve global co-occurrences that capture non- consecutive and long-distance semantics. A text representation approach that combines local and global information can enhance performance in practical class imbalance text classification scenarios. Yet, multi-view graph-based text representations have received limited attention. In this research, first we introduce Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN), a transductive multi-view text classification model that captures textual graph representations for the minority class alongside sequence-based text representations. Experimental results show that MMCT-GCN obtains consistent improvements over baselines. Second, we develop an Arabic Bidirectional Encoder Representations from Transformers (BERT) Graph Convolutional Network (AraBERT-GCN), a hybrid model that combines the large-scale pre-trained models that encode the local context and semantics alongside graph-based features that are capable of extracting the global word co-occurrences in non-consecutive extended semantics by only one or two hops. Experimental results show that AraBERT-GCN outperforms the state-of-the-art (SOTA) on our Arabic text datasets. Finally, we propose an Arabic Multidimensional Edge Graph Convolutional Network (AraMEGraph) designed for text classification that encapsulates richer and context-aware representations of word and phrase relationships, thus mitigating the impact of the complexity and ambiguity of the Arabic language. / Doctor of Philosophy / The text classification task is an important step in understanding natural language. However, this task has many challenges, such as uneven data distributions and language difficulty. For example, Arabic is the fifth most spoken language. It has many different word forms and meanings, which can make things harder to understand. Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) are widely utilized for text classification. However, another kind of network called graph convolutional network (GCN) has yet to be explored for this task. Graph-based models keep track of how words are connected, even if they are not right next to each other in a sentence. This helps with better understanding the meaning of words. On the other hand, sequence-based models do well in understanding the meaning of words that are right next to each other. Mixing both types of information in text understanding can work better, especially when dealing with unevenly distributed data. In this research, we introduce a new text classification method called Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN). This model looks at text from different angles and combines information from graphs and sequence-based models. Our experiments show that this model performs better than other ones proposed in the literature. Additionally, we propose an Arabic BERT Graph Convolutional Network (AraBERT-GCN). It combines pre-trained models that understand words in context and graph features that look at how words relate to each other globally. This helps AraBERT- GCN do better than other models when working with Arabic text. Finally, we develop a special network called Arabic Multidimensional Edge Graph Convolutional Network (AraMEGraph) for Arabic text. It is designed to better understand Arabic and classify text more accurately. We do this by adding special edge features with multiple dimensions to help the network learn the relationships between words and phrases. Graph convolutional networks Text classification Tweets Imbalanced data Arabic
10	HYPERLINKS IN THE TWITTERVERSE: ANALYZING THE URL USAGE IN SOCIAL MEDIA POSTS Aljebreen, Abdullah, 0009-0008-1925-818X 05 1900 (has links) An important means for disseminating information on social media platforms is by including URLs that point to external sources in user posts. In X, formally known as Twitter, we estimate that about 21% of the daily stream of English-language posts contain URLs. Given this prevalence, we assert that studying URLs in social media holds significant importance as they play a pivotal part in shaping the flow of information and influencing user behavior. Examining hyperlinked posts can help us gain valuable insights into online discourse and detect emerging trends. The first aspect of our analysis is the study of users' intentions behind including URLs in social media posts. We argue that gaining insights about the users' motivations for posting with URLs has multiple applications, including the appropriate treatment and processing of these posts in other tasks. Hence, we build a comprehensive taxonomy containing the various intentions behind sharing URLs on social media. In addition, we explore the labeling of intentions via the use of crowdsourcing. In addition to the intentions aspect of hyperlinked posts, we analyze their structure relative to the content of the web documents pointed to by the URLs. Hence, we define, and analyze the segmentation problem of hyperlinked posts and develop an effective algorithm to solve it. We show that our solution can benefit sentiment analysis on social media. In the final aspect of our analysis, we investigate the emergence of news outlets posing as local sources, known as "pink slime", and their spread on social media. We conduct a comprehensive study investigating hyperlinked posts featuring pink slime websites. Through our analysis of the patterns and origins of posts, we discover and extract syntactical features and utilize them for developing a classification approach to detect such posts. Our approach has achieved an accuracy rate of 92.5%. / Computer and Information Science Computer science Classification Intentions Sentiment analysis Social media Tweets URLs

Search results