Return to search

Vícejazyčná databáze kolokací / Vícejazyčná databáze kolokací

Collocations are groups of words which are co-occurring more often than appearing separately. They also include phrases that give a new meaning to a group of unrelated words. This thesis is aimed to find collocations in large data and to create a database that allows their retrieval. The Pointwise Mutual Information, a value based on word frequency, is computed for finding the collocations. Words with the highest value of PMI are considered candidates for good collocations. Chosen collocations are stored in a database in a format that allows searching with Apache Lucene. A part of the thesis is to create a Web user interface as a quick and easy way to search collocations. If this service is fast enough and the collocations are good, translators will be able to use it for finding proper equivalents in the target language. Students of a foreign language will also be able to use it to extend their vocabulary. Such database will be created independently in several languages including Czech and English. Powered by TCPDF (www.tcpdf.org)

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:341207
Date January 2014
CreatorsHelcl, Jindřich
ContributorsHajič, Jan, Mareček, David
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0017 seconds