Global ETD Search

Return to search

Určení základního tvaru slova / Determination of basic form of words

Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.

http://www.nusl.cz/ntk/nusl-219288

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:219288
Date	January 2011
Creators	Šanda, Pavel
Contributors	Burget, Radim, Karásek, Jan
Publisher	Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Source Sets	Czech ETDs
Language	Czech
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0019 seconds

Určení základního tvaru slova / Determination of basic form of words

Description

Links & Downloads

Tags

Additional Fields