Global ETD Search

Return to search

Comparing Pso-Based Clustering Over Contextual Vector Embeddings to Modern Topic Modeling

Indiana University-Purdue University Indianapolis (IUPUI) / Efficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In this thesis, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on three datasets across different domains. The first dataset consists of posts from the online health forum r/Cancer. The second dataset is a collection of NY Times abstracts and is used to compare

Particle Swarm Optimization

Topic Modeling

Vector Embedding

Natural Language Processing

Identifer	oai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/29167
Date	05 1900
Creators	Miles, Samuel
Contributors	Ben Miled, Zina, Salama, Paul, El-Sharkawy, Mohamed
Source Sets	Indiana University-Purdue University Indianapolis
Language	en_US
Detected Language	English
Type	Thesis

Page generated in 0.0021 seconds

Comparing Pso-Based Clustering Over Contextual Vector Embeddings to Modern Topic Modeling

Description

Links & Downloads

Tags

Additional Fields