Return to search

An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish

The purpose of this study is to estimate and compare the entropy and redundancy of written English and Swedish. We also investigate and compare the entropy and redundancy of Twitter language. This is done by extracting n consecutive characters called n-grams and calculating their frequencies. No precise values are obtained, due to the amount of text being finite, while the entropy is estimated for text length tending towards infinity. However we do obtain results for n = 1,...,6  and the results show that written Swedish has higher entropy than written English and that the redundancy is lower for Swedish language. When comparing Twitter with the standard languages we find that for Twitter, the entropy is higher and the redundancy is lower.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-64952
Date January 2017
CreatorsJuhlin, Sanna
PublisherLinnéuniversitetet, Institutionen för matematik (MA)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0025 seconds