Return to search

Comparing Pso-Based Clustering Over Contextual Vector Embeddings to Modern Topic Modeling

Indiana University-Purdue University Indianapolis (IUPUI) / Efficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In this thesis, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on three datasets across different domains. The first dataset consists of posts from the online health forum r/Cancer. The second dataset is a collection of NY Times abstracts and is used to compare

Identiferoai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/29167
Date05 1900
CreatorsMiles, Samuel
ContributorsBen Miled, Zina, Salama, Paul, El-Sharkawy, Mohamed
Source SetsIndiana University-Purdue University Indianapolis
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0015 seconds