Global ETD Search

Return to search

Detecting Dissimilarity in Discourse on Social Media

A lot of interaction between humans take place on social media. Groups and communities are sometimes formed both with and without intention. These interactions generate a large quantity of text data. This project aims to detect dissimilarity in discourse between communities on social media using a distributed approach. A data set of tweets was used to test and evaluate the method. Tweets produced from two communities were extracted from the data set. Two Natural Language Processing techniques were used to vectorise the tweets for each community. Namely LIWC, dictionary based on knowledge acquired from professionals in linguistics and psychology, and BERT, an embedding model which uses machine learning to present words and sentences as a vector of decimal numbers. These vectors were then used as representations of the text to measure the similarity of discourse between the communities. Both distance and similarity were measured. It was concluded that none of the combinations of measure or vectorisation method that was tried could detect a dissimilarity in discourse on social media for the sample data set.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-484416

Natural Language Processing

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-484416
Date	January 2022
Creators	Mineur, Mattias
Publisher	Uppsala universitet, Matematiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC IT, 1401-5749 ; 22025

Page generated in 0.0021 seconds

Detecting Dissimilarity in Discourse on Social Media

Description

Links & Downloads

Tags

Additional Fields