Global ETD Search

1	Unknown word sequences in HPSG Mielens, Jason David 06 October 2014 (has links) This work consists of an investigation into the properties of unknown words in HPSG, and in particular into the phenomenon of multi-word unknown expressions consisting of multiple unknown words in a sequence. The work presented consists first of a study determining the relative frequency of multi-word unknown expressions, and then a survey of the efficacy of a variety of techniques for handling these expressions. The techniques presented consist of modified versions of techniques from the existing unknown-word prediction literature as well as novel techniques, and they are evaluated with a specific concern for how they fare in the context of sentences with many unknown words and long unknown sequences. / text Parsing HPSG Unknowns CRF Multi-word expressions
2	Víceslovné lexikální jednotky v Calvinově Il sentiero dei nidi di ragno a jejich protějšky v českém překladu / Multi-Word Expressions in Calvino's Il sentiero dei nidi di ragno and Their Equivalents in Czech Translation Ebrová, Agáta January 2021 (has links) This diploma thesis is embodied in a wider phraseological project CREAMY (Calvino REpertoire for the Analysis of Multilingual PhraseologY) solved at the University of Rome La Sapienza. The aim of the thesis was to compare the Italian multi-word expressions from the novel Il sentiero dei nidi di ragno, written by Italo Calvino, with their counterparts in the Czech translation by Libor Piruchta, with the aid of a data-set obtained through the phraseological web database CREAMY. Processing of part of the Czech entries into the database was integral to writing the thesis. The work is divided into theoretical and practical part. The first chapter of the theoretical part provides basic information about the CREAMY project and the web application of the same name, which is the main tool used in the research within the project. The second chapter deals with the basic typological properties of the studied languages with emphasis on morphosyntax and word formation. The third chapter is devoted to multi-word expressions and their conception in the Italian and Czech linguistic tradition. The introductory chapter of the practical part describes the procedure of entry processing in the CREAMY application. In this chapter, we present two specific examples of processed entries but we also point out the... Read more
3	Zpracování turkických jazyků / Processing of Turkic Languages Ciddi, Sibel January 2014 (has links) Title: Processing of Turkic Languages Author: Sibel Ciddi Department: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Supervisor: RNDr. Daniel Zeman, Ph.D. Abstract: This thesis presents several methods for the morpholog- ical processing of Turkic languages, such as Turkish, which pose a specific set of challenges for natural language processing. In order to alleviate the problems with lack of large language resources, it makes the data sets used for morphological processing and expansion of lex- icons publicly available for further use by researchers. Data sparsity, caused by highly productive and agglutinative morphology in Turkish, imposes difficulties in processing of Turkish text, especially for meth- ods using purely statistical natural language processing. Therefore, we evaluated a publicly available rule-based morphological analyzer, TRmorph, based on finite state methods and technologies. In order to enhance the efficiency of this analyzer, we worked on expansion of lexicons, by employing heuristics-based methods for the extraction of named entities and multi-word expressions. Furthermore, as a prepro- cessing step, we introduced a dictionary-based recognition method for tokenization of multi-word expressions. This method complements... Read more

1

Page generated in 0.0697 seconds