Global ETD Search

Return to search

Vícejazyčná databáze kolokací / Vícejazyčná databáze kolokací

Collocations are groups of words which are co-occurring more often than appearing separately. They also include phrases that give a new meaning to a group of unrelated words. This thesis is aimed to find collocations in large data and to create a database that allows their retrieval. The Pointwise Mutual Information, a value based on word frequency, is computed for finding the collocations. Words with the highest value of PMI are considered candidates for good collocations. Chosen collocations are stored in a database in a format that allows searching with Apache Lucene. A part of the thesis is to create a Web user interface as a quick and easy way to search collocations. If this service is fast enough and the collocations are good, translators will be able to use it for finding proper equivalents in the target language. Students of a foreign language will also be able to use it to extend their vocabulary. Such database will be created independently in several languages including Czech and English. Powered by TCPDF (www.tcpdf.org)

http://www.nusl.cz/ntk/nusl-341207

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:341207
Date	January 2014
Creators	Helcl, Jindřich
Contributors	Hajič, Jan, Mareček, David
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.002 seconds

Vícejazyčná databáze kolokací / Vícejazyčná databáze kolokací

Description

Links & Downloads

Tags

Additional Fields