Global ETD Search

Return to search

Word Classes in Language Modelling

This thesis concerns itself with word classes and their application to language modelling.Considering a purely statistical Markov model trained on sequences of word classes in theSwedish language different problems in language engineering are examined. Problemsconsidered are part-of-speech tagging, evaluating text modifiers such as translators withthe help of probability measurements and matrix norms, and lastly detecting differenttypes of text using the Fourier transform of cross entropy sequences of word classes.The results show that the word class language model is quite weak by itself but that itis able to improve part-of-speech tagging for 1 and 2 letter models. There are indicationsthat a stronger word class model could aid 3-letter and potentially even stronger models.For evaluating modifiers the model is often able to distinguish between shuffled andsometimes translated text as well as to assign a score as to how much a text has beenmodified. Future work on this should however take better care to ensure large enoughtest data. The results from the Fourier approach indicate that a Fourier analysis of thecross entropy sequence between word classes may allow the model to distinguish betweenA.I. generated text as well as translated text from human written text. Future work onmachine learning word class models could be carried out to get further insights into therole of word class models in modern applications. The results could also give interestinginsights in linguistic research regarding word classes.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-349074

Discrete Fourier Transform

Mathematics

Matematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-349074
Date	January 2024
Creators	Erikson, Emrik, Åström, Marcus
Publisher	KTH, Skolan för teknikvetenskap (SCI)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-SCI-GRU ; 2024:152

Page generated in 0.0013 seconds

Word Classes in Language Modelling

Description

Links & Downloads

Tags

Additional Fields