Return to search

Určení základního tvaru slova / Determination of basic form of words

Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:219288
Date January 2011
CreatorsŠanda, Pavel
ContributorsBurget, Radim, Karásek, Jan
PublisherVysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.002 seconds