1 |
A Study of the Effects of Two Reading Environments on L2 Readers’ Strategic Behaviors Toward Unknown WordsLee, Sang Kyo 09 September 2010 (has links)
No description available.
|
2 |
Text Harmonization Strategies for Phrase-Based Statistical Machine TranslationStymne, Sara January 2012 (has links)
In this thesis I aim to improve phrase-based statistical machine translation (PBSMT) in a number of ways by the use of text harmonization strategies. PBSMT systems are built by training statistical models on large corpora of human translations. This architecture generally performs well for languages with similar structure. If the languages are different for example with respect to word order or morphological complexity, however, the standard methods do not tend to work well. I address this problem through text harmonization, by making texts more similar before training and applying a PBSMT system. I investigate how text harmonization can be used to improve PBSMT with a focus on four areas: compounding, definiteness, word order, and unknown words. For the first three areas, the focus is on linguistic differences between languages, which I address by applying transformation rules, using either rule-based or machine learning-based techniques, to the source or target data. For the last area, unknown words, I harmonize the translation input to the training data by replacing unknown words with known alternatives. I show that translation into languages with closed compounds can be improved by splitting and merging compounds. I develop new merging algorithms that outperform previously suggested algorithms and show how part-of-speech tags can be used to improve the order of compound parts. Scandinavian definite noun phrases are identified as a problem forPBSMT in translation into Scandinavian languages and I propose a preprocessing approach that addresses this problem and gives large improvements over a baseline. Several previous proposals for how to handle differences in reordering exist; I propose two types of extensions, iterating reordering and word alignment and using automatically induced word classes, which allow these methods to be used for less-resourced languages. Finally I identify several ways of replacing unknown words in the translation input, most notably a spell checking-inspired algorithm, which can be trained using character-based PBSMT techniques. Overall I present several approaches for extending PBSMT by the use of pre- and postprocessing techniques for text harmonization, and show experimentally that these methods work. Text harmonization methods are an efficient way to improve statistical machine translation within the phrase-based approach, without resorting to more complex models.
|
3 |
Hybrid models for Chinese unknown word resolutionLu, Xiaofei 12 September 2006 (has links)
No description available.
|
4 |
中文動詞自動分類研究 / Automatic Classification of Chinese Unknown Verbs曾慧馨, Tseng, Hui-Hsin Unknown Date (has links)
本文提出以規則法與相似法將未知動詞自動分類至中研院詞庫小組(1993)的動詞分類標記上。規則法中的規則從訓練語料中訓練出,並加上未知動詞重疊的規律,包含率約二成五,正確率約86.86%∼91.32%。規則法的優點在於正確率高,但缺點在於可以處理的未知動詞數量太少。相似法利用與未知動詞的相似例子猜測未知動詞的可能分類,利用詞彙內部的訊息---詞基的詞類、語意類與詞彙結構來計算相似度。相似法的可以全面性的處理未知動詞,缺點容易受到訓練語料中標記錯誤的例子誤導與訓練語料的大小所影響。我們結合規則法與相似法預測未知動詞分類的正確率為72%。 / We present two methods to classify the Chinese unknown verbs. First, we summarize some linguistic rules and morphological patterns from corpus. The accuracy of the rule-based method is 86.86%~91.32%. Second, we use the instance-based categorization to classify the Chinese unknown words. The accuracy of the instance-based method is 67.86%~70.92% and the accuracy of the integrated classifier is about 72%.
|
Page generated in 0.0501 seconds