Global ETD Search

101	Cohesion and Comprehensibility in Polish-English Machine Translated Texts Weiss, Sandra January 2011 (has links) This paper is a study of Polish-English machine translation, where the impact of various types of errors on cohesion and comprehensibility of the translations was investigated. The following phenomena were in focus: 1. The most common errors produced by current state-of-the-art MT systems for Polish-English MT. 2. The effect of various types of errors on text cohesion. 3. The effect of various types of errors on readers’ understanding of the translation. Machine Translation Evaluation Cohesion Polish MT English MT Specific Languages Studier av enskilda språk
102	Evaluation of the user interface of the BLAST annotation tool Kondapalli, Vamshi Prakash January 2012 (has links) In general, annotations are a type of notes that are made on text while reading by highlighting or underlining. Marking of text is considered as error annotations in a machine translation system. Error annotations give information about the translation error classification. The main focus of this thesis was to evaluate the graphical user interface of an annotation tool called BLAST, which can be used to perform human error analysis for any language from any machine translation system. The primary intended use of BLAST is for annotation of translation errors. Evaluation of BLAST mainly focuses on identification of usability issues, understandability and proposal of redesign to overcome issues of usability. By allowing the subjects to explore BLAST, the usage and performance of the tool are observed and later explained. In this usability study, five participants were involved and they were requested to perform user tasks designed to evaluate the usability of tool. Based on the user tasks required data is collected. Data collection methodology included interviews, observation and questionnaire. Collected data were analyzed both using quantitative and qualitative approaches. The Participant’s technical knowledge and interest to experiment new interface shows the impact on the evaluation of the tool. The problems faced by individuals while evaluating was found and solutions to overcome those problems were learned. So finally a redesign proposal for BLAST was an approach to overcome the problems. I proposed few designs addressing the issues found in designing the interface. Designs can be adapted to the existing system or can be implemented new. There is also a chance of doing an evaluation study on interface designs proposed. Usability Evaluation Annotation Error analysis Machine translation Computer Sciences Datavetenskap (datalogi)
103	Lexical selection for machine translation Sabtan, Yasser Muhammad Naguib mahmoud January 2011 (has links) Current research in Natural Language Processing (NLP) tends to exploit corpus resources as a way of overcoming the problem of knowledge acquisition. Statistical analysis of corpora can reveal trends and probabilities of occurrence, which have proved to be helpful in various ways. Machine Translation (MT) is no exception to this trend. Many MT researchers have attempted to extract knowledge from parallel bilingual corpora. The MT problem is generally decomposed into two sub-problems: lexical selection and reordering of the selected words. This research addresses the problem of lexical selection of open-class lexical items in the framework of MT. The work reported in this thesis investigates different methodologies to handle this problem, using a corpus-based approach. The current framework can be applied to any language pair, but we focus on Arabic and English. This is because Arabic words are hugely ambiguous and thus pose a challenge for the current task of lexical selection. We use a challenging Arabic-English parallel corpus, containing many long passages with no punctuation marks to denote sentence boundaries. This points to the robustness of the adopted approach. In our attempt to extract lexical equivalents from the parallel corpus we focus on the co-occurrence relations between words. The current framework adopts a lexicon-free approach towards the selection of lexical equivalents. This has the double advantage of investigating the effectiveness of different techniques without being distracted by the properties of the lexicon and at the same time saving much time and effort, since constructing a lexicon is time-consuming and labour-intensive. Thus, we use as little, if any, hand-coded information as possible. The accuracy score could be improved by adding hand-coded information. The point of the work reported here is to see how well one can do without any such manual intervention. With this goal in mind, we carry out a number of preprocessing steps in our framework. First, we build a lexicon-free Part-of-Speech (POS) tagger for Arabic. This POS tagger uses a combination of rule-based, transformation-based learning (TBL) and probabilistic techniques. Similarly, we use a lexicon-free POS tagger for English. We use the two POS taggers to tag the bi-texts. Second, we develop lexicon-free shallow parsers for Arabic and English. The two parsers are then used to label the parallel corpus with dependency relations (DRs) for some critical constructions. Third, we develop stemmers for Arabic and English, adopting the same knowledge -free approach. These preprocessing steps pave the way for the main system (or proposer) whose task is to extract translational equivalents from the parallel corpus. The framework starts with automatically extracting a bilingual lexicon using unsupervised statistical techniques which exploit the notion of co-occurrence patterns in the parallel corpus. We then choose the target word that has the highest frequency of occurrence from among a number of translational candidates in the extracted lexicon in order to aid the selection of the contextually correct translational equivalent. These experiments are carried out on either raw or POS-tagged texts. Having labelled the bi-texts with DRs, we use them to extract a number of translation seeds to start a number of bootstrapping techniques to improve the proposer. These seeds are used as anchor points to resegment the parallel corpus and start the selection process once again. The final F-score for the selection process is 0.701. We have also written an algorithm for detecting ambiguous words in a translation lexicon and obtained a precision score of 0.89. 006.3
104	Refinements in hierarchical phrase-based translation systems Pino, Juan Miguel January 2015 (has links) The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrase-based systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods for large 5-gram language model rescoring. The last two chapters are devoted to the application of phrase-based grammars to the string regeneration task, which we consider as a means to study the fluency of machine translation output. We design and implement a monolingual phrase-based decoder for string regeneration and achieve state-of-the-art performance on this task. By applying our decoder to the output of a hierarchical phrase-based translation system, we are able to recover the same level of translation quality as the translation system. 410.285
105	L’utilisation de la traduction automatique en contexte professionnel : étude de cas concernant les perceptions de la traduction automatique ainsi que son utilisation en contexte professionnel Lavigne, Pierre-Étienne January 2017 (has links) Résumé : La demande croissante du marché de la traduction n'est pas aisée à combler, ce qui pousse un certain nombre de traducteurs et de fournisseurs de services de traduction à chercher des outils d'aide à la traduction pouvant complémenter ou remplacer ceux liés aux mémoires de traduction (MT) (Lewis 1997 : 256; O'Brien 2002 : 99, 105-106; Saint-André 2015 : 1-8). Certains se sont tournés vers la traduction automatique (TA), qui paraît permettre certains gains de productivité lorsqu'elle est utilisée en combinaison avec un outil de MT (Guerberof 2009; Lewis 1997 : 256; O'Brien 2002 : 99, 105-106). Toutefois, la question de l’utilisation de la TA par des traducteurs en contexte organisationnel réel semble avoir été peu étudiée. La présente étude, menée en collaboration avec l’équipe de traduction des Instituts de recherche en santé du Canada (IRSC), a visé à évaluer si la combinaison d’outils de MT avec des outils de TA peut véritablement rehausser la productivité des traducteurs dans ce type de contexte. Pour ce faire, une analyse des perceptions des traducteurs en rapport aux outils de MT et de TA et une expérience dans le cadre de laquelle des textes ont été traduits à l’aide d’une MT seule ou à l’aide d’une MT et d’un système de TA ont été effectuées. Les données des deux volets de la recherche ont ensuite été analysées pour évaluer les perceptions des participants en rapport aux outils de MT et de TA, déterminer si l’utilisation d’outils de MT et de TA permettait d’atteindre des seuils de productivité plus élevés que l’utilisation d’outils de MT seuls, et vérifier si les perceptions des participants en rapport aux outils utilisés avaient influencé les seuils de productivité atteints. L’étude contribue ainsi à approfondir les connaissances en rapport à l’utilité de la TA en contexte organisationnel réel et en rapport aux perceptions des traducteurs quant aux outils d’aide à la traduction que constituent les MT et la TA. mémoires de traduction traduction automatique perceptions productivité translation memories machine translation perceptions productivity
106	The enhancement of machine translation for low-density languages using Web-gathered parallel texts. Mohler, Michael Augustine Gaylord 12 1900 (has links) The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle. Machine translation text alignment biblical texts Nahuatl Quechua parallel texts low-density languages Machine translating.
107	Towards Communicating Simple Sentence using Pictorial Representations Leong, Chee Wee 05 1900 (has links) Language can sometimes be an impediment in communication. Whether we are talking about people who speak different languages, students who are learning a new language, or people with language disorders, the understanding of linguistic representations in a given language requires a certain amount of knowledge that not everybody has. In this thesis, we propose "translation through pictures" as a means for conveying simple pieces of information across language barriers, and describe a system that can automatically generate pictorial representations for simple sentences. Comparative experiments conducted on visual and linguistic representations of information show that a considerable amount of understanding can be achieved through pictorial descriptions, with results within a comparable range of those obtained with current machine translation techniques. Moreover, a user study conducted around the pictorial translation system reveals that users found the system to generally produce correct word/image associations, and rate the system as interactive and intelligent. Visual communication. pictorial translation pictures communication machine translation translation paradigm language translation
108	Improving Mutual Understanding in Machine Translation Mediated Communication / 機械翻訳を介したコミュニケーションにおける相互理解の改善 Mondheera, Pituxcoosuvarn 23 March 2020 (has links) 付記する学位プログラム名: デザイン学大学院連携プログラム / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22579号 / 情博第716号 / 新制\|\|情\|\|123(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授緒方広明, 特定准教授 LIN Donghui, 教授河原達也, 教授石田亨(京都大学名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Intercultural Collaboration Multilingual Communication Machine Translation Mutual Understanding Cultural Difference 007
109	Word Reordering for Statistical Machine Translation via Modeling Structural Differences between Languages / 統計的機械翻訳のための言語構造の違いのモデル化による語順推定 Goto, Isao 23 May 2014 (has links) 2015-05-27に本文を差替 / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18481号 / 情博第532号 / 新制\|\|情\|\|94(附属図書館) / 31359 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授黒橋禎夫, 教授田中克己, 教授河原達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM reordering statistical machine translation distortion model inversion transduction grammar projection 007
110	Překladač z češtiny do slovenštiny / Czech-Slovak Machine Translation Mydliar, Ján January 2013 (has links) This Master thesis deals with machine translation from Czech to Slovak. The first chapter motivates the work, the second discusses various approaches to machine translation and the third details evaluation of the methods. Chapter 4 introduces the design and implementation of my system, paying a special attention to a new parallel corpus that has been created. Chapter 5 summarizes testing and evaluation of the developed system.

Search results