Global ETD Search

41	Automatic compilation of bilingual terminologies from comparable corpora Kontonatsios, Georgios Nikolaos January 2015 (has links) Bilingual terminological resources play a pivotal role in human and machine translation of technical text. Owing to the immense volume of newly produced terminology in the biomedical domain, existing resources suffer from low coverage and they are only available for a limited number of languages. The need for term alignment methods that accurately identify translations of terms, emerges. In this work, we focus on bilingual terminology induction from freely available comparable corpora, i.e. thematically related documents in two or more languages. We investigate different sources of information that determine translation equivalence, including: (a) the internal structure of terms (compositional clue), (b) the surrounding lexical context (contextual clue) and (c) the topic distribution of terms (topical clue). We present four novel compositional alignment methods and we introduce several extensions over existing compositional, context-based and topic-based approaches. Furthermore, we combine the three translation clues in a single term alignment model and we show substantial improvements over the individual translation signals when considered in isolation. We examine the performance of the proposed term alignment methods on closely related (English-French, English-Spanish) language pairs, on a more distant, low-resource language pair (English-Greek) and on an unrelated (English-Japanese) language pair. As an application, we integrate automatically compiled bilingual terminologies with Statistical Machine Translation systems to more accurately translate unknown terms. Results show that an up-to-date bilingual dictionary of terms improves the translation performance of SMT. 006.3
42	Experimentální překladač z češtiny do slovenštiny / Czech-Slovak Machine Translation Zachar, Lukáš January 2012 (has links) This thesis describes ideas and theories behind machine translatinon, informs the reader about existing machine translation system Moses and by utilizing it proposes system, which is able to learn and later translate from Czech language into Slovak language.
43	Porovnáni metod česko-ruského automatického překladu / Porovnáni metod česko-ruského automatického překladu Bílek, Karel January 2014 (has links) In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
44	Vícejazyčné vyhledávání informací v oblasti medicíny / Cross-Lingual Information Retrieval in the Medical Domain Saleh, Shadi January 2020 (has links) Cross-Lingual Information Retrieval in the Medical Domain Shadi Saleh In recent years, there has been an exponential growth of the digital content available on the Internet, which has correlated with the increasing number of non-English Internet users due to the spread of the Internet across the globe. This raises the importance of unlocking resources for those who want to look up information not limited to the languages they understand. For example, those who want to use the Internet to find medical content related to their health conditions (self-diagnosis) but they do not have access to resources in their language. Cross-Lingual Information Retrieval (CLIR) breaks the lan- guage barriers by allowing search for documents written in a language different from the query language. This thesis tackles the task of CLIR in the medical domain and investigates the two main approaches: query translation (QT) where queries are machine translated to the language of documents and document translation (DT) where documents are translated to the language of queries. We proceed with our research by employing Statistical Machine Translation (SMT) systems that are tuned for the QT approach and the DT approach in the medical domain for seven European languages (Czech, German, French, Spanish, Hungarian, Polish and Swedish) and...
45	Machine Translation and Translation Memory Systems: An Ethnographic Study of Translators’ Satisfaction Mohammadi Dehcheshmeh, Maryam January 2017 (has links) The translator’s workplace (TW) has undergone radical changes since microcomputers were introduced on the market and, as a result, digitization increased enormously. Existing translation-related technologies, such as machine translation (MT), were enhanced and others, such as translation memory (TM) systems, were developed. It is a noteworthy fact that implementing new translation-related technologies in the TW is done in various conditions according to specific goals that subsequently define new work conditions for translators. These new work conditions affect translators’ satisfaction with their job, and their satisfaction will influence career development and employee retention in the translation industry over the long term. In the past two decades, Language Service Providers (LSPs) have started integrating MT into TM systems to benefit from MT suggestions when TM is not helpful. Neither TM nor MT is unfamiliar to the translation industry, but the combination, i.e. TM+MT, is fairly new. So far, there have been few studies on translators’ satisfaction with TM+MT. This study consists of an ethnographic research project on seven translators in a Canada-based company where TM+MT is used. Observations, semi-structured interviews, and in-house document analysis have been used as data collection methods. The data obtained has been analyzed and discussed based on Rodríguez-Castro’s task satisfaction model (2011). This model addresses intrinsic and extrinsic sources of translators’ satisfaction with the activities they do in their job. Investigating the factors and variables of her model in the aforementioned company, I concluded that those sources of satisfaction cannot be considered separately from the job-context factors, such as the company’s policies in implementing TM+MT. Machine Translation Translation Memory Translators' Satisfaction Ethnographic Study
46	Some Contributions to Interactive Machine Translation and to the Applications of Machine Translation for Historical Documents Domingo Ballester, Miguel 28 February 2022 (has links) [ES] Los documentos históricos son una parte importante de nuestra herencia cultural. Sin embargo, debido a la barrera idiomática inherente en el lenguaje humano y a las propiedades lingüísticas de estos documentos, su accesibilidad está principalmente restringida a los académicos. Por un lado, el lenguaje humano evoluciona con el paso del tiempo. Por otro lado, las convenciones ortográficas no se crearon hasta hace poco y, por tanto, la ortografía cambia según el período temporal y el autor. Por estas razones, el trabajo de los académicos es necesario para que los no expertos puedan obtener una comprensión básica de un documento determinado. En esta tesis abordamos dos tareas relacionadas con el procesamiento de documentos históricos. La primera tarea es la modernización del lenguaje que, a fin de hacer que los documentos históricos estén más accesibles para los no expertos, tiene como objetivo reescribir un documento utilizando la versión moderna del idioma original del documento. La segunda tarea es la normalización ortográfica. Las propiedades lingüísticas de los documentos históricos mencionadas con anterioridad suponen un desafío adicional para la aplicación efectiva del procesado del lenguaje natural en estos documentos. Por lo tanto, esta tarea tiene como objetivo adaptar la ortografía de un documento a los estándares modernos a fin de lograr una consistencia ortográfica. Ambas tareas las afrontamos desde una perspectiva de traducción automática, considerando el idioma original de un documento como el idioma fuente, y su homólogo moderno/normalizado como el idioma objetivo. Proponemos varios enfoques basados en la traducción automática estadística y neuronal, y llevamos a cabo una amplia experimentación que ratifica el potencial de nuestras contribuciones -en donde los enfoques estadísticos arrojan resultados iguales o mejores que los enfoques neuronales para la mayoría de los casos-. En el caso de la tarea de modernización del lenguaje, esta experimentación incluye una evaluación humana realizada con la ayuda de académicos y un estudio con usuarios que verifica que nuestras propuestas pueden ayudar a los no expertos a obtener una comprensión básica de un documento histórico sin la intervención de un académico. Como ocurre con cualquier problema de traducción automática, nuestras aplicaciones no están libres de errores. Por lo tanto, para obtener modernizaciones/normalizaciones perfectas, un académico debe supervisar y corregir los errores. Este es un procedimiento común en la industria de la traducción. La metodología de traducción automática interactiva tiene como objetivo reducir el esfuerzo necesario para obtener traducciones de alta calidad uniendo al agente humano y al sistema de traducción en un proceso de corrección cooperativo. Sin embargo,la mayoría de los protocolos interactivos siguen una estrategia de izquierda a derecha. En esta tesis desarrollamos un nuevo protocolo interactivo que rompe con esta barrera de izquierda a derecha. Hemos evaluado este nuevo protocolo en un entorno de traducción automática, obteniendo grandes reducciones del esfuerzo humano. Finalmente, dado que este marco interactivo es de aplicación general a cualquier problema de traducción, lo hemos aplicado -nuestro nuevo protocolo junto con uno de los protocolos clásicos de izquierda a derecha- a la modernización del lenguaje y a la normalización ortográfica. Al igual que en traducción automática, el marco interactivo logra disminuir el esfuerzo requerido para corregir los resultados de un sistema automático. / [CA] Els documents històrics són una part important de la nostra herència cultural. No obstant això, degut a la barrera idiomàtica inherent en el llenguatge humà i a les propietats lingüístiques d'aquests documents, la seua accessibilitat està principalment restringida als acadèmics. D'una banda, el llenguatge humà evoluciona amb el pas del temps. D'altra banda, les convencions ortogràfiques no es van crear fins fa poc i, per tant, l'ortografia canvia segons el període temporal i l'autor. Per aquestes raons, el treball dels acadèmics és necessari perquè els no experts puguen obtindre una comprensió bàsica d'un document determinat. En aquesta tesi abordem dues tasques relacionades amb el processament de documents històrics. La primera tasca és la modernització del llenguatge que, a fi de fer que els documents històrics estiguen més accessibles per als no experts, té per objectiu reescriure un document utilitzant la versió moderna de l'idioma original del document. La segona tasca és la normalització ortogràfica. Les propietats lingüístiques dels documents històrics mencionades amb anterioritat suposen un desafiament addicional per a l'aplicació efectiva del processat del llenguatge natural en aquests documents. Per tant, aquesta tasca té per objectiu adaptar l'ortografia d'un document als estàndards moderns a fi d'aconseguir una consistència ortogràfica. Dues tasques les afrontem des d'una perspectiva de traducció automàtica, considerant l'idioma original d'un document com a l'idioma font, i el seu homòleg modern/normalitzat com a l'idioma objectiu. Proposem diversos enfocaments basats en la traducció automàtica estadística i neuronal, i portem a terme una àmplia experimentació que ratifica el potencial de les nostres contribucions -on els enfocaments estadístics obtenen resultats iguals o millors que els enfocaments neuronals per a la majoria dels casos-. En el cas de la tasca de modernització del llenguatge, aquesta experimentació inclou una avaluació humana realitzada amb l'ajuda d'acadèmics i un estudi amb usuaris que verifica que les nostres propostes poden ajudar als no experts a obtindre una comprensió bàsica d'un document històric sense la intervenció d'un acadèmic. Com ocurreix amb qualsevol problema de traducció automàtica, les nostres aplicacions no estan lliures d'errades. Per tant, per obtindre modernitzacions/normalitzacions perfectes, un acadèmic ha de supervisar i corregir les errades. Aquest és un procediment comú en la indústria de la traducció. La metodologia de traducció automàtica interactiva té per objectiu reduir l'esforç necessari per obtindre traduccions d'alta qualitat unint a l'agent humà i al sistema de traducció en un procés de correcció cooperatiu. Tot i això, la majoria dels protocols interactius segueixen una estratègia d'esquerra a dreta. En aquesta tesi desenvolupem un nou protocol interactiu que trenca amb aquesta barrera d'esquerra a dreta. Hem avaluat aquest nou protocol en un entorn de traducció automàtica, obtenint grans reduccions de l'esforç humà. Finalment, atès que aquest marc interactiu és d'aplicació general a qualsevol problema de traducció, l'hem aplicat -el nostre nou protocol junt amb un dels protocols clàssics d'esquerra a dreta- a la modernització del llenguatge i a la normalitzaciò ortogràfica. De la mateixa manera que en traducció automàtica, el marc interactiu aconsegueix disminuir l'esforç requerit per corregir els resultats d'un sistema automàtic. / [EN] Historical documents are an important part of our cultural heritage. However,due to the language barrier inherent in human language and the linguistic properties of these documents, their accessibility is mostly limited to scholars. On the one hand, human language evolves with the passage of time. On the other hand, spelling conventions were not created until recently and, thus, orthography changes depending on the time period and author. For these reasons, the work of scholars is needed for non-experts to gain a basic understanding of a given document. In this thesis, we tackle two tasks related with the processing of historical documents. The first task is language modernization which, in order to make historical documents more accessible to non-experts, aims to rewrite a document using the modern version of the document's original language. The second task is spelling normalization. The aforementioned linguistic properties of historical documents suppose an additional challenge for the effective natural language processing of these documents. Thus, this task aims to adapt a document's spelling to modern standards in order to achieve an orthography consistency. We affront both task from a machine translation perspective, considering a document's original language as the source language, and its modern/normalized counterpart as the target language. We propose several approaches based on statistical and neural machine translation, and carry out a wide experimentation that shows the potential of our contributions¿with the statistical approaches yielding equal or better results than the neural approaches in most of the cases. For the language modernization task, this experimentation includes a human evaluation conducted with the help of scholars and a user study that verifies that our proposals are able to help non-experts to gain a basic understanding of a historical document without the intervention of a scholar. As with any machine translation problem, our applications are not error-free. Thus, to obtain perfect modernizations/normalizations, a scholar needs to supervise and correct the errors. This is a common procedure in the translation industry. The interactive machine translation framework aims to reduce the effort needed for obtaining high quality translations by embedding the human agent and the translation system into a cooperative correction process. However, most interactive protocols follow a left-to-right strategy. In this thesis, we developed a new interactive protocol that breaks this left-to-right barrier. We evaluated this new protocol in a machine translation environment, obtaining large reductions of the human effort. Finally, since this interactive framework is of general application to any translation problem, we applied it¿our new protocol together with one of the classic left-to-right protocols¿to language modernization and spelling normalization. As with machine translation, the interactive framework diminished the effort required for correcting the outputs of an automatic system. / The research leading to this thesis has been partially funded by Ministerio de Economía y Competitividad (MINECO) under projects SmartWays (grant agreement RTC-2014-1466-4), CoMUN-HaT (grant agreement TIN2015-70924-C2-1-R) and MISMISFAKEnHATE (grant agreement PGC2018-096212-B-C31); Generalitat Valenciana under projects ALMAMATER (grant agreement PROMETEOII/2014/030) and DeepPattern (grant agreement PROMETEO/2019/121); the European Union through Programa Operativo del Fondo Europeo de Desarrollo Regional (FEDER) from Comunitat Valenciana (2014–2020) under project Sistemas de frabricación inteligentes para la indústria 4.0 (grant agreement ID-IFEDER/2018/025); and the PRHLT research center under the research line Machine Learning Applications. / Domingo Ballester, M. (2022). Some Contributions to Interactive Machine Translation and to the Applications of Machine Translation for Historical Documents [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181231 / TESIS Traducción automática estadística Traducción automática Traducción automática neuronal Traducción automática interactiva Documentos históricos Normalización ortográfica Modernización lingüística Machine translation Statistical machine translation Neural machine translation Interactive machine translation Historical documents Spelling normalization Language modernization LENGUAJES Y SISTEMAS INFORMATICOS
47	Taming Translation Technology for L2 Writing: Documenting the Use of Free Online Translation Tools by ESL Students in a Writing Course Farzi, Reza January 2016 (has links) The present study explored the use of translation technology in second language (L2) writing by English as a Second Language (ESL) students at the University level. The appropriate role of translation, and specifically translation technology, in L2 curricula has been the subject of theoretical and practical debate. In order to address knowledge gaps relevant to this debate, the present study sought to document students’ current use of translation technology, specifically free online translation (FOT) tools, and their opinions about these tools. The study’s mixed-methods design included video observations and questionnaires regarding FOT use completed by 19 university students enrolled in a high intermediate-level ESL course. Semi-structured follow-up interviews were conducted with the six participants who were observed using FOT tools extensively on the video recordings. Results showed that high intermediate-level ESL students have a primarily positive attitude toward FOT tools. In addition, the majority of students reported using such tools regularly, even though only about one third of the students were actually observed using the tools significantly in the video recordings. Results are discussed in the context of the ongoing debate over whether and how translation technology should be used in L2 classrooms. Free Online Translation Tools ESL Writing Machine Translation Translation Technology
48	Improving Statistical Machine Translation with Target-Side Dependency Syntax / 目的言語側の依存構文による統計的機械翻訳の改善 John, Walter Richardson 23 September 2016 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20022号 / 情博第617号 / 新制\|\|情\|\|107(附属図書館) / 33118 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授黒橋禎夫, 教授田中克己, 教授河原達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM machine translation dependency syntax MT SMT translation 007
49	Online Machine Translator System and Result Comparison Syahrina, Alvi January 2011 (has links) Translation from one human language to another has been using the help of the capabilities of computer advances. There are a lot of machine translators nowadays, each adapts to different machine translator approaches. This thesis presents the distinction between two selected machine translator approaches, statistical machine translator (SMT) and hybrid machine translator (HMT). The research focuses on creating evaluation for two machine translator of different approaches by both textual studies and evaluation experiment. The result of this research is an evaluation of the translator system and also the translation result. This result is then hoped to add information into the history of machine translators. / Program: Kandidatutbildning i informatik online machine translator computer linguistic manual evaluation statistical machine translation hybrid machine translation Engineering and Technology Teknik och teknologier
50	Improving the Quality of Neural Machine Translation Using Terminology Injection Dougal, Duane K. 01 December 2018 (has links) Most organizations use an increasing number of domain- or organization-specific words and phrases. A translation process, whether human or automated, must also be able to accurately and efficiently use these specific multilingual terminology collections. However, comparatively little has been done to explore the use of vetted terminology as an input to machine translation (MT) for improved results. In fact, no single established process currently exists to integrate terminology into MT as a general practice, and especially no established process for neural machine translation (NMT) exists to ensure that the translation of individual terms is consistent with an approved terminology collection. The use of tokenization as a method of injecting terminology and of evaluating terminology injection is the focus of this thesis. I use the attention mechanism prevalent in state-of-the-art NMT systems to produce the desired results. Attention vectors play an important part of this method to correctly identify semantic entities and to align the tokens that represent them. My methods presented in this thesis use these attention vectors to align the source tokens in the sentence to be translated with the target tokens in the final translation output. Then, supplied terminology is injected, where these alignments correctly identify semantic entities. My methods demonstrate significant improvement to the state-of-the-art results for NMT using terminology injection. attention-vector-based term injection injection machine translation MT neural machine translation neural network NMT semantics terminology Computer Sciences

Search results