Return to search

An application of topic modeling algorithms to text analytics in business intelligence

Master of Science / Department of Computing and Information Sciences / Doina Caragea / William H. Hsu / In this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau.
According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters.

Identiferoai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/17580
Date January 1900
CreatorsAlsadhan, Majed
PublisherKansas State University
Source SetsK-State Research Exchange
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds