Global ETD Search

Return to search

Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style

A topic modeling method, smoothed Latent Dirichlet Allocation (LDA) is applied on a text corpus data of classical Swedish prose fiction. The thesis consists of two parts. In the first part, a smoothed LDA model is applied to the corpus, investigating how changes in hyperparameter values affect the topics in terms of distribution of words within topics and topics within novels. In the second part, two smoothed LDA models are applied to a reduced corpus, only consisting of adjectives. The generated topics are examined to see if they are more likely to occur in a text of a particular author and if the model could be used for identification of writing style. With this new approach, the ability of the smoothed LDA model as a writing style identifier is explored. While the texts analyzed in this thesis is unusally long - as they are not seg- mented prose fiction - the effect of the hyperparameters on model performance was found to be similar to those found in previous research. For the adjectives corpus, the models did succeed in generating topics with a higher probability of occurring in novels by the same author. The smoothed LDA was shown to be a good model for identification of writing style. Keywords: Topic modeling, Smoothed Latent Dirichlet Allocation, Gibbs sam- pling, MCMC, Bayesian statistics, Swedish prose fiction.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-441653

Topic modeling

Smoothed Latent Dirichlet Allocation

Gibbs sam- pling

MCMC

Bayesian statistics

Swedish prose fiction.

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-441653
Date	January 2021
Creators	Apelthun, Catharina
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style

Description

Links & Downloads

Tags

Additional Fields