Return to search

Pragmatický lematizátor českých slov / Pragmatic lemmatizer of Czech language

This thesis is focused on lemmatizing of nouns and adjectives. It is based on morphology of Czech language. The goal is to create a lemmatizer which can stem words with success rate 90% (at least). At the same time the lemmatizer should be very easy, it should consist as little rules as possible. Lemmatizer will be created to work with real estate adverts, especially houses for sale. In this thesis there will be made an analysis of specific characters of this area. Lemmatizer will be created according to results of this analysis. Lemmatizer was written in Java. Only three types of rules were used and generally the lemmatizer created correct stems in 96.4% of all words.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:192440
Date January 2014
CreatorsVacek, Matěj
ContributorsStrossa, Petr, Kliegr, Tomáš
PublisherVysoká škola ekonomická v Praze
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0019 seconds