Spelling suggestions: "subject:"crosslanguage"" "subject:"chosenlanguage""
51 |
Construction automatique d'outils et de ressources linguistiques à partir de corpus parallèles / Automatic creation of linguistic tools and resources from parallel corporaZennaki, Othman 11 March 2019 (has links)
Cette thèse porte sur la construction automatique d’outils et de ressources pour l’analyse linguistique de textes des langues peu dotées. Nous proposons une approche utilisant des réseaux de neurones récurrents (RNN - Recurrent Neural Networks) et n'ayant besoin que d'un corpus parallèle ou mutli-parallele entre une langue source bien dotée et une ou plusieurs langues cibles moins bien ou peu dotées. Ce corpus parallèle ou mutli-parallele est utilisé pour la construction d'une représentation multilingue des mots des langues source et cible. Nous avons utilisé cette représentation multilingue pour l’apprentissage de nos modèles neuronaux et nous avons exploré deux architectures neuronales : les RNN simples et les RNN bidirectionnels. Nous avons aussi proposé plusieurs variantes des RNN pour la prise en compte d'informations linguistiques de bas niveau (informations morpho-syntaxiques) durant le processus de construction d'annotateurs linguistiques de niveau supérieur (SuperSenses et dépendances syntaxiques). Nous avons démontré la généricité de notre approche sur plusieurs langues ainsi que sur plusieurs tâches d'annotation linguistique. Nous avons construit trois types d'annotateurs linguistiques multilingues: annotateurs morpho-syntaxiques, annotateurs en SuperSenses et annotateurs en dépendances syntaxiques, avec des performances très satisfaisantes. Notre approche a les avantages suivants : (a) elle n'utilise aucune information d'alignement des mots, (b) aucune connaissance concernant les langues cibles traitées n'est requise au préalable (notre seule supposition est que, les langues source et cible n'ont pas une grande divergence syntaxique), ce qui rend notre approche applicable pour le traitement d'un très grand éventail de langues peu dotées, (c) elle permet la construction d'annotateurs multilingues authentiques (un annotateur pour N langages). / This thesis focuses on the automatic construction of linguistic tools and resources for analyzing texts of low-resource languages. We propose an approach using Recurrent Neural Networks (RNN) and requiring only a parallel or multi-parallel corpus between a well-resourced language and one or more low-resource languages. This parallel or multi-parallel corpus is used to construct a multilingual representation of words of the source and target languages. We used this multilingual representation to train our neural models and we investigated both uni and bidirectional RNN models. We also proposed a method to include external information (for instance, low-level information from Part-Of-Speech tags) in the RNN to train higher level taggers (for instance, SuperSenses taggers and Syntactic dependency parsers). We demonstrated the validity and genericity of our approach on several languages and we conducted experiments on various NLP tasks: Part-Of-Speech tagging, SuperSenses tagging and Dependency parsing. The obtained results are very satisfactory. Our approach has the following characteristics and advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages).
|
52 |
Topic and link detection from multilingual news.January 2003 (has links)
Huang Ruizhang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 110-114). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Defitition of Topic and Event --- p.2 / Chapter 1.2 --- Event and Topic Discovery --- p.2 / Chapter 1.2.1 --- Problem Definition --- p.2 / Chapter 1.2.2 --- Characteristics of the Discovery Problems --- p.3 / Chapter 1.2.3 --- Our Contributions --- p.5 / Chapter 1.3 --- Story Link Detection --- p.5 / Chapter 1.3.1 --- Problem Definition --- p.5 / Chapter 1.3.2 --- Our Contributions --- p.6 / Chapter 1.4 --- Thesis Organization --- p.7 / Chapter 2 --- Literature Review --- p.8 / Chapter 2.1 --- University of Massachusetts (UMass) --- p.8 / Chapter 2.1.1 --- Topic Detection Approach --- p.8 / Chapter 2.1.2 --- Story Link Detection Approach --- p.9 / Chapter 2.2 --- BBN Technologies --- p.10 / Chapter 2.3 --- IBM Research Center --- p.11 / Chapter 2.4 --- Carnegie Mellon University (CMU) --- p.12 / Chapter 2.4.1 --- Topic Detection Approach --- p.12 / Chapter 2.4.2 --- Story Link Detection Approach --- p.14 / Chapter 2.5 --- National Taiwan University (NTU) --- p.14 / Chapter 2.5.1 --- Topic Detection Approach --- p.14 / Chapter 2.5.2 --- Story Link Detection Approach --- p.15 / Chapter 3 --- System Overview --- p.17 / Chapter 3.1 --- News Sources --- p.18 / Chapter 3.2 --- Story Preprocessing --- p.24 / Chapter 3.3 --- Information Extraction --- p.25 / Chapter 3.4 --- Gloss Translation --- p.26 / Chapter 3.5 --- Term Weight Calculation --- p.30 / Chapter 3.6 --- Event And Topic Discovery --- p.31 / Chapter 3.7 --- Story Link Detection --- p.33 / Chapter 4 --- Event And Topic Discovery --- p.34 / Chapter 4.1 --- Overview of Event and Topic discovery --- p.34 / Chapter 4.2 --- Event Discovery Component --- p.37 / Chapter 4.2.1 --- Overview of Event Discovery Algorithm --- p.37 / Chapter 4.2.2 --- Similarity Calculation --- p.39 / Chapter 4.2.3 --- Story and Event Combination --- p.43 / Chapter 4.2.4 --- Event Discovery Output --- p.44 / Chapter 4.3 --- Topic Discovery Component --- p.45 / Chapter 4.3.1 --- Overview of Topic Discovery Algorithm --- p.47 / Chapter 4.3.2 --- Relevance Model --- p.47 / Chapter 4.3.3 --- Event and Topic Combination --- p.50 / Chapter 4.3.4 --- Topic Discovery Output --- p.50 / Chapter 5 --- Event And Topic Discovery Experimental Results --- p.54 / Chapter 5.1 --- Testing Corpus --- p.54 / Chapter 5.2 --- Evaluation Methodology --- p.56 / Chapter 5.3 --- Experimental Results on Event Discovery --- p.58 / Chapter 5.3.1 --- Parameter Tuning --- p.58 / Chapter 5.3.2 --- Event Discovery Result --- p.59 / Chapter 5.4 --- Experimental Results on Topic Discovery --- p.62 / Chapter 5.4.1 --- Parameter Tuning --- p.64 / Chapter 5.4.2 --- Topic Discovery Results --- p.64 / Chapter 6 --- Story Link Detection --- p.67 / Chapter 6.1 --- Topic Types --- p.67 / Chapter 6.2 --- Overview of Link Detection Component --- p.68 / Chapter 6.3 --- Automatic Topic Type Categorization --- p.70 / Chapter 6.3.1 --- Training Data Preparation --- p.70 / Chapter 6.3.2 --- Feature Selection --- p.72 / Chapter 6.3.3 --- Training and Tuning Categorization Model --- p.73 / Chapter 6.4 --- Link Detection Algorithm --- p.74 / Chapter 6.4.1 --- Story Component Weight --- p.74 / Chapter 6.4.2 --- Story Link Similarity Calculation --- p.76 / Chapter 6.5 --- Story Link Detection Output --- p.77 / Chapter 7 --- Link Detection Experimental Results --- p.80 / Chapter 7.1 --- Testing Corpus --- p.80 / Chapter 7.2 --- Topic Type Categorization Result --- p.81 / Chapter 7.3 --- Link Detection Evaluation Methodology --- p.82 / Chapter 7.4 --- Experimental Results on Link Detection --- p.83 / Chapter 7.4.1 --- Language Normalization Factor Tuning --- p.83 / Chapter 7.4.2 --- Link Detection Performance --- p.90 / Chapter 7.4.3 --- Link Detection Performance Breakdown --- p.91 / Chapter 8 --- Conclusions and Future Work --- p.95 / Chapter 8.1 --- Conclusions --- p.95 / Chapter 8.2 --- Future Work --- p.96 / Chapter A --- List of Topic Title Annotated for TDT3 corpus by LDC --- p.98 / Chapter B --- List of Manually Annotated Events for TDT3 Corpus --- p.104 / Bibliography --- p.114
|
53 |
Portable language technology a resource-light approach to morpho-syntactic tagging /Feldman, Anna. January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 258-273).
|
54 |
Entwurf und Implementierung eines Frameworks zur Analyse und Evaluation von Verfahren im Information RetrievalWilhelm, Thomas 13 August 2008 (has links) (PDF)
Diese Diplomarbeit führt kurz in das Thema Information Retrieval mit den Schwerpunkten
Evaluation und Evaluationskampagnen ein. Im Anschluss wird anhand der Nachteile eines
vorhandenen Retrieval Systems ein neues Retrieval Framework zur experimentellen Evaluation
von Ansätzen aus dem Information Retrieval entworfen und umgesetzt.
Die Komponenten des Frameworks sind dabei so abstrakt angelegt, dass verschiedene, bestehende
Retrieval Systeme, wie zum Beispiel Apache Lucene oder Terrier, integriert werden
können. Anhand einer Referenzimplementierung für den ImageCLEF Photographic Retrieval
Task des ImageCLEF Tracks des Cross Language Evaluation Forums wird die Funktionsfähigkeit
des Frameworks überprüft und bestätigt.
|
55 |
Exploring the health experiences of Korean immigrant women in retirementChoi, Jaeyoung Unknown Date
No description available.
|
56 |
The processing of German Sign Language sentences / Three event-related potential studies on phonological, morpho-syntactic, and semantic aspectsHosemann, Jana Alexandra 10 April 2015 (has links)
No description available.
|
57 |
Peer to peer English/Chinese cross-language information retrievalLu, Chengye January 2008 (has links)
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
|
58 |
Word embeddings for monolingual and cross-language domain-specific information retrieval / Ordinbäddningar för enspråkig och tvärspråklig domänspecifik informationssökningWigder, Chaya January 2018 (has links)
Various studies have shown the usefulness of word embedding models for a wide variety of natural language processing tasks. This thesis examines how word embeddings can be incorporated into domain-specific search engines for both monolingual and cross-language search. This is done by testing various embedding model hyperparameters, as well as methods for weighting the relative importance of words to a document or query. In addition, methods for generating domain-specific bilingual embeddings are examined and tested. The system was compared to a baseline that used cosine similarity without word embeddings, and for both the monolingual and bilingual search engines the use of monolingual embedding models improved performance above the baseline. However, bilingual embeddings, especially for domain-specific terms, tended to be of too poor quality to be used directly in the search engines. / Flera studier har visat att ordinbäddningsmodeller är användningsbara för många olika språkteknologiuppgifter. Denna avhandling undersöker hur ordinbäddningsmodeller kan användas i sökmotorer för både enspråkig och tvärspråklig domänspecifik sökning. Experiment gjordes för att optimera hyperparametrarna till ordinbäddningsmodellerna och för att hitta det bästa sättet att vikta ord efter hur viktiga de är i dokumentet eller sökfrågan. Dessutom undersöktes metoder för att skapa domänspecifika tvåspråkiga inbäddningar. Systemet jämfördes med en baslinje utan inbäddningar baserad på cosinuslikhet, och för både enspråkiga och tvärspråkliga sökningar var systemet som använde enspråkiga inbäddningar bättre än baslinjen. Däremot var de tvåspråkiga inbäddningarna, särskilt för domänspecifika ord, av låg kvalitet och gav för dåliga resultat för direkt användning inom sökmotorer.
|
59 |
Portable language technology: a resource-light approach to morpho-syntactic tagginFeldman, Anna 19 September 2006 (has links)
No description available.
|
60 |
Automatic construction of English/Chinese parallel corpus.January 2001 (has links)
Li Kar Wing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 88-96). / Abstracts in English and Chinese. / ABSTRACT --- p.i / ACKNOWLEDGEMENTS --- p.v / LIST OF TABLES --- p.viii / LIST OF FIGURES --- p.ix / CHAPTERS / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Application of corpus-based techniques --- p.2 / Chapter 1.1.1 --- Machine Translation (MT) --- p.2 / Chapter 1.1.1.1 --- Linguistic --- p.3 / Chapter 1.1.1.2 --- Statistical --- p.4 / Chapter 1.1.1.3 --- Lexicon construction --- p.4 / Chapter 1.1.2 --- Cross-lingual Information Retrieval (CLIR) --- p.6 / Chapter 1.1.2.1 --- Controlled vocabulary --- p.6 / Chapter 1.1.2.2 --- Free text --- p.7 / Chapter 1.1.2.3 --- Application corpus-based approach in CLIR --- p.9 / Chapter 1.2 --- Overview of linguistic resources --- p.10 / Chapter 1.3 --- Written language corpora --- p.12 / Chapter 1.3.1 --- Types of corpora --- p.13 / Chapter 1.3.2 --- Limitation of comparable corpora --- p.16 / Chapter 1.4 --- Outline of the dissertation --- p.17 / Chapter 2. --- LITERATURE REVIEW --- p.19 / Chapter 2.1 --- Research in automatic corpus construction --- p.20 / Chapter 2.2 --- Research in translation alignment --- p.25 / Chapter 2.2.1 --- Sentence alignment --- p.27 / Chapter 2.2.2 --- Word alignment --- p.28 / Chapter 2.3 --- Research in alignment of sequences --- p.33 / Chapter 3. --- ALIGNMENT AT WORD LEVEL AND CHARACTER LEVEL --- p.35 / Chapter 3.1 --- Title alignment --- p.35 / Chapter 3.1.1 --- Lexical features --- p.37 / Chapter 3.1.2 --- Grammatical features --- p.40 / Chapter 3.1.3 --- The English/Chinese alignment model --- p.41 / Chapter 3.2 --- Alignment at word level and character level --- p.42 / Chapter 3.2.1 --- Alignment at word level --- p.42 / Chapter 3.2.2 --- Alignment at character level: Longest matching --- p.44 / Chapter 3.2.3 --- Longest common subsequence(LCS) --- p.46 / Chapter 3.2.4 --- Applying LCS in the English/Chinese alignment model --- p.48 / Chapter 3.3 --- Reduce overlapping ambiguity --- p.52 / Chapter 3.3.1 --- Edit distance --- p.52 / Chapter 3.3.2 --- Overlapping in the algorithm model --- p.54 / Chapter 4. --- ALIGNMENT AT TITLE LEVEL --- p.59 / Chapter 4.1 --- Review of score functions --- p.59 / Chapter 4.2 --- The Score function --- p.60 / Chapter 4.2.1 --- (C matches E) and (E matches C) --- p.60 / Chapter 4.2.2 --- Length similarity --- p.63 / Chapter 5. --- EXPERIMENTAL RESULTS --- p.69 / Chapter 5.1 --- Hong Kong government press release articles --- p.69 / Chapter 5.2 --- Hang Seng Bank economic monthly reports --- p.76 / Chapter 5.3 --- Hang Seng Bank press release articles --- p.78 / Chapter 5.4 --- Hang Seng Bank speech articles --- p.81 / Chapter 5.5 --- Quality of the collections and future work --- p.84 / Chapter 6. --- CONCLUSION --- p.87 / Bibliography
|
Page generated in 0.0546 seconds