Global ETD Search

1	Sarcasm Detection on Twitter: A Behavioral Modeling Approach January 2014 (has links) abstract: Sarcasm is a nuanced form of language where usually, the speaker explicitly states the opposite of what is implied. Imbued with intentional ambiguity and subtlety, detecting sarcasm is a difficult task, even for humans. Current works approach this challenging problem primarily from a linguistic perspective, focusing on the lexical and syntactic aspects of sarcasm. In this thesis, I explore the possibility of using behavior traits intrinsic to users of sarcasm to detect sarcastic tweets. First, I theorize the core forms of sarcasm using findings from the psychological and behavioral sciences, and some observations on Twitter users. Then, I develop computational features to model the manifestations of these forms of sarcasm using the user's profile information and tweets. Finally, I combine these features to train a supervised learning model to detect sarcastic tweets. I perform experiments to extensively evaluate the proposed behavior modeling approach and compare with the state-of-the-art. / Dissertation/Thesis / Masters Thesis Computer Science 2014 Computer science behavior modeling data mining sarcasm detection social computing social media mining
2	Visualization Tool for Islamic Radical and Counter Radical Movements and their online followers in South East Asia January 2015 (has links) abstract: With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of information in understanding the content based relationship between the online users apart from the existing context based follower and friend relationship. An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users. / Dissertation/Thesis / Masters Thesis Computer Science 2015 Computer science Counter Radical Radical Social Media Mining Twitter Visualization Tool
3	Protecting User Privacy with Social Media Data and Mining January 2020 (has links) abstract: The pervasive use of the Web has connected billions of people all around the globe and enabled them to obtain information at their fingertips. This results in tremendous amounts of user-generated data which makes users traceable and vulnerable to privacy leakage attacks. In general, there are two types of privacy leakage attacks for user-generated data, i.e., identity disclosure and private-attribute disclosure attacks. These attacks put users at potential risks ranging from persecution by governments to targeted frauds. Therefore, it is necessary for users to be able to safeguard their privacy without leaving their unnecessary traces of online activities. However, privacy protection comes at the cost of utility loss defined as the loss in quality of personalized services users receive. The reason is that this information of traces is crucial for online vendors to provide personalized services and a lack of it would result in deteriorating utility. This leads to a dilemma of privacy and utility. Protecting users' privacy while preserving utility for user-generated data is a challenging task. The reason is that users generate different types of data such as Web browsing histories, user-item interactions, and textual information. This data is heterogeneous, unstructured, noisy, and inherently different from relational and tabular data and thus requires quantifying users' privacy and utility in each context separately. In this dissertation, I investigate four aspects of protecting user privacy for user-generated data. First, a novel adversarial technique is introduced to assay privacy risks in heterogeneous user-generated data. Second, a novel framework is proposed to boost users' privacy while retaining high utility for Web browsing histories. Third, a privacy-aware recommendation system is developed to protect privacy w.r.t. the rich user-item interaction data by recommending relevant and privacy-preserving items. Fourth, a privacy-preserving framework for text representation learning is presented to safeguard user-generated textual data as it can reveal private information. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2020 Artificial intelligence Computer science Machine Learning Privacy Protection Social Media Mining User Behavioral Modeling
4	A Study of User Behaviors and Activities on Online Mental Health Communities January 2019 (has links) abstract: Social media is a medium that contains rich information which has been shared by many users every second every day. This information can be utilized for various outcomes such as understanding user behaviors, learning the effect of social media on a community, and developing a decision-making system based on the information available. With the growing popularity of social networking sites, people can freely express their opinions and feelings which results in a tremendous amount of user-generated data. The rich amount of social media data has opened the path for researchers to study and understand the users’ behaviors and mental health conditions. Several studies have shown that social media provides a means to capture an individual state of mind. Given the social media data and related work in this field, this work studies the scope of users’ discussion among online mental health communities. In the first part of this dissertation, this work focuses on the role of social media on mental health among sexual abuse community. It employs natural language processing techniques to extract topics of responses, examine how diverse these topics are to answer research questions such as whether responses are limited to emotional support; if not, what other topics are; what the diversity of topics manifests; how online response differs from traditional response found in a physical world. To answer these questions, this work extracts Reddit posts on rape to understand the nature of user responses for this stigmatized topic. In the second part of this dissertation, this work expands to a broader range of online communities. In particular, it investigates the potential roles of social media on mental health among five major communities, i.e., trauma and abuse community, psychosis and anxiety community, compulsive disorders community, coping and therapy community, and mood disorders community. This work studies how people interact with each other in each of these communities and what these online forums provide a resource to users who seek help. To understand users’ behaviors, this work extracts Reddit posts on 52 related subcommunities and analyzes the linguistic behavior of each community. Experiments in this dissertation show that Reddit is a good medium for users with mental health issues to find related helpful resources. Another interesting observation is an interesting topic cluster from users’ posts which shows that discussion and communication among users help individuals to find proper resources for their problem. Moreover, results show that the anonymity of users in Reddit allows them to have discussions about different topics beyond social support such as financial and religious support. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019 Computer science linguistic behavior mental health social media mining topic modeling
5	Intelligent Event Focused Crawling Farag, Mohamed Magdy Gharib 23 September 2016 (has links) There is need for an integrated event focused crawling system to collect Web data about key events. When an event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of information about events. We propose intelligent event focused crawling for automatic event tracking and archiving, as well as effective access. We extend the traditional focused (topical) crawling techniques in two directions, modeling and representing: events and webpage source importance. We developed an event model that can capture key event information (topical, spatial, and temporal). We incorporated that model into the focused crawler algorithm. For the focused crawler to leverage the event model in predicting a webpage's relevance, we developed a function that measures the similarity between two event representations, based on textual content. Although the textual content provides a rich set of features, we proposed an additional source of evidence that allows the focused crawler to better estimate the importance of a webpage by considering its website. We estimated webpage source importance by the ratio of number of relevant webpages to non-relevant webpages found during crawling a website. We combined the textual content information and source importance into a single relevance score. For the focused crawler to work well, it needs a diverse set of high quality seed URLs (URLs of relevant webpages that link to other relevant webpages). Although manual curation of seed URLs guarantees quality, it requires exhaustive manual labor. We proposed an automated approach for curating seed URLs using social media content. We leveraged the richness of social media content about events to extract URLs that can be used as seed URLs for further focused crawling. We evaluated our system through four series of experiments, using recent events: Orlando shooting, Ecuador earthquake, Panama papers, California shooting, Brussels attack, Paris attack, and Oregon shooting. In the first experiment series our proposed event model representation, used to predict webpage relevance, outperformed the topic-only approach, showing better results in precision, recall, and F1-score. In the second series, using harvest ratio to measure ability to collect relevant webpages, our event model-based focused crawler outperformed the state-of-the-art focused crawler (best-first search). The third series evaluated the effectiveness of our proposed webpage source importance for collecting more relevant webpages. The focused crawler with webpage source importance managed to collect roughly the same number of relevant webpages as the focused crawler without webpage source importance, but from a smaller set of sources. The fourth series provides guidance to archivists regarding the effectiveness of curating seed URLs from social media content (tweets) using different methods of selection. / Ph. D. Focused Crawling Event Modeling Web Archiving Digital Libraries Web Mining Social Media Mining Seed URLs Selection
6	Health Information Extraction from Social Media January 2016 (has links) abstract: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media. This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems. This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2016 Bioinformatics Artificial intelligence Public health Deep Learning Information Extraction Machine Learning Natural Language Processing Pharmacovigilance Social Media Mining
7	Comparing Communities & User Clusters in Twitter Network Data Bhowmik, Kowshik January 2019 (has links) No description available. Computer Science Network Data Analysis Text Data Analysis Social Media Mining Community Detection Document Clustering Machine Learning
8	Web mining for social network analysis Elhaddad, Mohamed Kamel Abdelsalam 09 August 2021 (has links) Undoubtedly, the rapid development of information systems and the widespread use of electronic means and social networks have played a significant role in accelerating the pace of events worldwide, such as, in the 2012 Gaza conflict (the 8-day war), in the pro-secessionist rebellion in the 2013-2014 conflict in Eastern Ukraine, in the 2016 US Presidential elections, and in conjunction with the COVID-19 outbreak pandemic since the beginning of 2020. As the number of daily shared data grows quickly on various social networking platforms in different languages, techniques to carry out automatic classification of this huge amount of data timely and correctly are needed. Of the many social networking platforms, Twitter is of the most used ones by netizens. It allows its users to communicate, share their opinions, and express their emotions (sentiments) in the form of short blogs easily at no cost. Moreover, unlike other social networking platforms, Twitter allows research institutions to access its public and historical data, upon request and under control. Therefore, many organizations, at different levels (e.g., governmental, commercial), are seeking to benefit from the analysis and classification of the shared tweets to serve in many application domains, for examples, sentiment analysis to evaluate and determine user’s polarity from the content of their shared text, and misleading information detection to ensure the legitimacy and the credibility of the shared information. To attain this objective, one can apply numerous data representation, preprocessing, natural language processing techniques, and machine/deep learning algorithms. There are several challenges and limitations with existing approaches, including issues with the management of tweets in multiple languages, the determination of what features the feature vector should include, and the assignment of representative and descriptive weights to these features for different mining tasks. Besides, there are limitations in existing performance evaluation metrics to fully assess the developed classification systems. In this dissertation, two novel frameworks are introduced; the first is to efficiently analyze and classify bilingual (Arabic and English) textual content of social networks, while the second is for evaluating the performance of binary classification algorithms. The first framework is designed with: (1) An approach to handle Arabic and English written tweets, and can be extended to cover data written in more languages and from other social networking platforms, (2) An effective data preparation and preprocessing techniques, (3) A novel feature selection technique that allows utilizing different types of features (content-dependent, context-dependent, and domain-dependent), in addition to (4) A novel feature extraction technique to assign weights to the linguistic features based on how representative they are in in the classes they belong to. The proposed framework is employed in performing sentiment analysis and misleading information detection. The performance of this framework is compared to state-of-the-art classification approaches utilizing 11 benchmark datasets comprising both Arabic and English textual content, demonstrating considerable improvement over all other performance evaluation metrics. Then, this framework is utilized in a real-life case study to detect misleading information surrounding the spread of COVID-19. In the second framework, a new multidimensional classification assessment score (MCAS) is introduced. MCAS can determine how good the classification algorithm is when dealing with binary classification problems. It takes into consideration the effect of misclassification errors on the probability of correct detection of instances from both classes. Moreover, it should be valid regardless of the size of the dataset and whether the dataset has a balanced or unbalanced distribution of its instances over the classes. An empirical and practical analysis is conducted on both synthetic and real-life datasets to compare the comportment of the proposed metric against those commonly used. The analysis reveals that the new measure can distinguish the performance of different classification techniques. Furthermore, it allows performing a class-based assessment of classification algorithms, to assess the ability of the classification algorithm when dealing with data from each class separately. This is useful if one of the classifying instances from one class is more important than instances from the other class, such as in COVID-19 testing where the detection of positive patients is much more important than negative ones. / Graduate Text Classification Text Mining Web Mining Data Mining Machine Learning Fake News Misleading Information Detection Sentiment Analysis Fake News Detection Infodemic Social Media Mining Social Network Analysis Coronavirus COVID-19 SARS-CoV-2 Data Analysis
9	Social media mining as an opportunistic citizen science model in ecological monitoring: a case study using invasive alien species in forest ecosystems. Daume, Stefan 27 August 2015 (has links) Dramatische ökologische, ökonomische und soziale Veränderungen bedrohen die Stabilität von Ökosystemen weltweit und stellen zusammen mit neuen Ansprüchen an die vielfältigen Ökosystemdienstleistungen von Wäldern neue Herausforderungen für das forstliche Management und Monitoring dar. Neue Risiken und Gefahren, wie zum Beispiel eingebürgerte invasive Arten (Neobiota), werfen grundsätzliche Fragen hinsichtlich etablierter forstlicher Managementstrategien auf, da diese Strategien auf der Annahme stabiler Ökosysteme basieren. Anpassungsfähige Management- und Monitoringstrategien sind deshalb notwendig, um diese neuen Bedrohungen und Veränderungen frühzeitig zu erkennen. Dies erfordert jedoch ein großflächiges und umfassendes Monitoring, was unter Maßgabe begrenzter Ressourcen nur bedingt möglich ist. Angesichts dieser Herausforderungen haben Forstpraktiker und Wissenschaftler begonnen auch auf die Unterstützung von Freiwilligen in Form sogenannter „Citizen Science“-Projekte (Bürgerwissenschaft) zurückzugreifen, um zusätzliche Informationen zu sammeln und flexibel auf spezifische Fragestellungen reagieren zu können. Mit der allgemeinen Verfügbarkeit des Internets und mobiler Geräte ist in Form sogenannter sozialer Medien zudem eine neue digitale Informationsquelle entstanden. Mittels dieser Technologien übernehmen Nutzer prinzipiell die Funktion von Umweltsensoren und erzeugen indirekt ein ungeheures Volumen allgemein zugänglicher Umgebungs- und Umweltinformationen. Die automatische Analyse von sozialen Medien wie Facebook, Twitter, Wikis oder Blogs, leistet inzwischen wichtige Beiträge zu Bereichen wie dem Monitoring von Infektionskrankheiten, Katastrophenschutz oder der Erkennung von Erdbeben. Anwendungen mit einem ökologischen Bezug existieren jedoch nur vereinzelt, und eine methodische Bearbeitung dieses Anwendungsbereichs fand bisher nicht statt. Unter Anwendung des Mikroblogging-Dienstes Twitter und des Beispiels eingebürgerter invasiver Arten in Waldökosystemen, verfolgt die vorliegende Arbeit eine solche methodische Bearbeitung und Bewertung sozialer Medien im Monitoring von Wäldern. Die automatische Analyse sozialer Medien wird dabei als opportunistisches „Citizen Science“-Modell betrachtet und die verfügbaren Daten, Aktivitäten und Teilnehmer einer vergleichenden Analyse mit existierenden bewusst geplanten „Citizen Science“-Projekten im Umweltmonitoring unterzogen. Die vorliegenden Ergebnisse zeigen, dass Twitter eine wertvolle Informationsquelle über invasive Arten darstellt und dass soziale Medien im Allgemeinen traditionelle Umweltinformationen ergänzen könnten. Twitter ist eine reichhaltige Quelle von primären Biodiversitätsbeobachtungen, einschließlich solcher zu eingebürgerten invasiven Arten. Zusätzlich kann gezeigt werden, dass die analysierten Twitterinhalte für die untersuchten Arten markante Themen- und Informationsprofile aufweisen, die wichtige Beiträge im Management invasiver Arten leisten können. Allgemein zeigt die Studie, dass einerseits das Potential von „Citizen Science“ im forstlichen Monitoring derzeit nicht ausgeschöpft wird, aber andererseits mit denjenigen Nutzern, die Biodiversitätsbeobachtungen auf Twitter teilen, eine große Zahl von Individuen mit einem Interesse an Umweltbeobachtungen zur Verfügung steht, die auf der Basis ihres dokumentierten Interesses unter Umständen für bewusst geplante „Citizen Science“-Projekte mobilisiert werden könnten. Zusammenfassend dokumentiert diese Studie, dass soziale Medien eine wertvolle Quelle für Umweltinformationen allgemein sind und eine verstärkte Untersuchung verdienen, letztlich mit dem Ziel, operative Systeme zur Unterstützung von Risikobewertungen in Echtzeit zu entwickeln. 634 Waldökosysteme forstliches Monitoring Waldbedrohungen Bürgerwissenschaft eingebürgerte invasive Arten Neobiota soziale Medien Analyse sozialer Medien Twitter Umweltmonitoring Biodiversitätsmonitoring forest ecosystems forest monitoring forest threats citizen science invasive alien species social media social media mining Twitter ecological monitoring biodiversity monitoring Forstwirtschaft (PPN621305413)
10	Analyse temporelle et sémantique des réseaux sociaux typés à partir du contenu de sites généré par des utilisateurs sur le Web / Temporal and semantic analysis of richly typed social networks from user-generated content sites on the web Meng, Zide 07 November 2016 (has links) Nous proposons une approche pour détecter les sujets, les communautés d'intérêt non disjointes,l'expertise, les tendances et les activités dans des sites où le contenu est généré par les utilisateurs et enparticulier dans des forums de questions-réponses tels que StackOverFlow. Nous décrivons d'abordQASM (Questions & Réponses dans des médias sociaux), un système basé sur l'analyse de réseauxsociaux pour gérer les deux principales ressources d’un site de questions-réponses: les utilisateurs et lecontenu. Nous présentons également le vocabulaire QASM utilisé pour formaliser à la fois le niveaud'intérêt et l'expertise des utilisateurs. Nous proposons ensuite une approche efficace pour détecter lescommunautés d'intérêts. Elle repose sur une autre méthode pour enrichir les questions avec un tag plusgénéral en cas de besoin. Nous comparons trois méthodes de détection sur un jeu de données extrait dusite populaire StackOverflow. Notre méthode basée sur le se révèle être beaucoup plus simple et plusrapide, tout en préservant la qualité de la détection. Nous proposons en complément une méthode pourgénérer automatiquement un label pour un sujet détecté en analysant le sens et les liens de ses mots-clefs.Nous menons alors une étude pour comparer différents algorithmes pour générer ce label. Enfin, nousétendons notre modèle de graphes probabilistes pour modéliser conjointement les sujets, l'expertise, lesactivités et les tendances. Nous le validons sur des données du monde réel pour confirmer l'efficacité denotre modèle intégrant les comportements des utilisateurs et la dynamique des sujets / We propose an approach to detect topics, overlapping communities of interest, expertise, trends andactivities in user-generated content sites and in particular in question-answering forums such asStackOverFlow. We first describe QASM (Question & Answer Social Media), a system based on socialnetwork analysis to manage the two main resources in question-answering sites: users and contents. Wealso introduce the QASM vocabulary used to formalize both the level of interest and the expertise ofusers on topics. We then propose an efficient approach to detect communities of interest. It relies onanother method to enrich questions with a more general tag when needed. We compared threedetection methods on a dataset extracted from the popular Q&A site StackOverflow. Our method basedon topic modeling and user membership assignment is shown to be much simpler and faster whilepreserving the quality of the detection. We then propose an additional method to automatically generatea label for a detected topic by analyzing the meaning and links of its bag of words. We conduct a userstudy to compare different algorithms to choose the label. Finally we extend our probabilistic graphicalmodel to jointly model topics, expertise, activities and trends. We performed experiments with realworlddata to confirm the effectiveness of our joint model, studying the users’ behaviors and topicsdynamics Web social sémantique Analyse des médias sociaux Modèle graphique probabiliste Sites de questions-réponses Contenu généré par l’utilisateur Modélisation des thématiques Détection d’expertise Détection de communautés recouvrantes Social semantic web Social media mining Probabilistic graphical model Question answer sites User-generated content Topic modeling Expertise detection Overlapping community detection

Search results