Return to search

A Hybrid Approach to Semantic Hashtag Clustering in Social Media

The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as opposed to the hashtag word itself. The gold standard test demonstrates that sense-level clusters are significantly more accurate than word-level clusters. The second phase is a hybrid semantic clustering algorithm using a consensus clustering approach which finds the consensus between metadata-based sense-level semantic clusters and text-based semantic clusters. The gold standard test shows that the hybrid algorithm outperforms both the text-based algorithm and the metadata-based algorithm for a majority of ground truths tested and that it never underperforms both baseline algorithms. In addition, a larger-scale performance study, conducted with a focus on disagreements in cluster assignments between algorithms, shows that the hybrid algorithm makes the correct cluster assignment in a majority of disagreement cases.

Identiferoai:union.ndltd.org:uvm.edu/oai:scholarworks.uvm.edu:graddis-1622
Date01 January 2016
CreatorsJaved, Ali
PublisherScholarWorks @ UVM
Source SetsUniversity of Vermont
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceGraduate College Dissertations and Theses

Page generated in 0.0023 seconds