Return to search

Targeted Topic Modeling for Levantine Arabic

Topic models for focused analysis aim to capture topics within the limiting scope of a targeted aspect (which could be thought of as some inner topic within a certain domain). To serve their analytic purposes, topics are expected to be semantically-coherent and closely aligned with human intuition – this in itself poses a major challenge for the more common topic modeling algorithms which, in a broader sense, perform a full analysis that covers all aspects and themes within a collection of texts. The paper attempts to construct a viable focused-analysis topic model which learns topics from Twitter data written in a closely related group of non-standardized varieties of Arabic widely spoken in the Levant region (i.e Levantine Arabic). Results are compared to a baseline model as well as another targeted topic model designed precisely to serve the purpose of focused analysis. The model is capable of adequately capturing topics containing terms which fall within the scope of the targeted aspect when judged overall. Nevertheless, it fails to produce human-friendly and semantically-coherent topics as several topics contained a number of intruding terms while others contained terms, while still relevant to the targeted aspect, thrown together seemingly at random.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-412975
Date January 2020
CreatorsZahra, Shorouq
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds