Global ETD Search

Return to search

Automatické propojování lexikografických zdrojů a korpusových dat. / Automatic linking of lexicographic sources and corpus data

Along with the increasing development of language resources - i.e., new lexicons, lexical databases, corpora, treebanks - the need for their efficient interlinking is growing. With such a linking, one can easily benefit from all their properties and information. Considering the convergence of resources, universal lexicographic formats are frequently discussed. In the present thesis, we investigate and analyse methods of interlinking language resources automatically. We introduce a system for interlinking lexicons (such as VALLEX, PDT-Vallex, FrameNet or SemLex) that offer information on syntactic properties of their entries. The system is automated and can be used repeatedly with newer versions of lexicons under development. We also design a method for identification of multiword expressions in a parsed text based on syntactic information from the SemLex lexicon. An output that verifies feasibility of the used methods is, among others, the mapping between the VALLEX and the PDT-Vallex lexicons, resulting in tens of thousands of annotated treebank sentences from the PDT and the PCEDT treebanks added into VALLEX. Powered by TCPDF (www.tcpdf.org)

http://www.nusl.cz/ntk/nusl-351016

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:351016
Date	January 2015
Creators	Bejček, Eduard
Contributors	Lopatková, Markéta, Horák, Aleš, Žabokrtský, Zdeněk
Source Sets	Czech ETDs
Language	Czech
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0021 seconds

Automatické propojování lexikografických zdrojů a korpusových dat. / Automatic linking of lexicographic sources and corpus data

Description

Links & Downloads

Tags

Additional Fields