Global ETD Search

251	[en] AUTOMATIC TRANSLATION IN TRANSLATION MEMORY SYSTEMS: A COMPARATIVE STUDY OF TWO WORK METHODS / [pt] TRADUÇÃO AUTOMÁTICA EM AMBIENTES DE MEMÓRIA DE TRADUÇÃO: UM ESTUDO COMPARATIVO DE DOIS MÉTODOS DE TRABALHO JORGE MARIO DAVIDSON 26 October 2021 (has links) [pt] Esta dissertação discute a utilização de sistemas de tradução automática em ambientes de memória de tradução (CAT), uma modalidade de trabalho cada vez mais presente no mercado de tradução especializada atual. Foi realizado um estudo experimental envolvendo quatro tradutores profissionais especializados na área de informática. Cada um dos profissionais traduziu dois textos, um deles de marketing de tecnologia e o outro altamente técnico, utilizando diferentes modalidades de trabalho. O objetivo do estudo foi verificar a existência de diferenças entre o uso de tradução automática com pós-edição no nível de segmento e o uso de tradução automática como sugestão no nível de subsegmento. As traduções foram analisadas utilizando recursos de linguística computacional por meio das seguintes métricas: variedade lexical, densidade lexical, distância de edição, considerando sequências de classes gramaticais, e produtividade. Para efeitos comparativos, foram incluídas no estudo experimental traduções 100 por cento humanas e traduções automáticas sem pós-edição. As métricas utilizadas permitiram observar diferenças nos resultados atribuíveis às modalidades de trabalho, bem como comparar os efeitos nos diferentes tipos de textos traduzidos. Finalmente, as diversas traduções de um dos textos foram submetidas à avaliação de leitores para determinar as preferências. / [en] This dissertation addresses the use of automatic translation in translation memory systems (CAT), a fast-growing modality of work in today s specialized translation market. An experimental study was conducted with four professional translators specializing in the field of computing. Each professional translated two texts, one about technology marketing and the other, a highly technical document, using different modalities of work. The purpose of the study was to identify any differences resulting from the use of automatic translation, with segment-based post-editing, and the use of automatic translation as sub-segment translation suggestions. The resources of computational linguistics were employed to analyze the translations, considering the following metrics: lexical diversity, lexical density, edit distance, taking into account grammatical sequences, and productivity. For comparative purposes, the experimental study included 100 percent human translations and automatic translations that were not submitted to post-editing. The metrics employed turned out differing results attributable to the modalities of work, and allowed for the comparison of the effects on the different types of texts translated. Finally, the multiple translations of one of the texts were submitted to the evaluation of the readers, to determine their preferences. [pt] TRADUCAO [pt] CAT TOOLS [pt] POS-EDICAO [pt] AVALIACAO DE TRADUCAO [pt] TRADUCAO AUTOMATICA [en] TRANSLATION [en] CAT TOOLS [en] POST-EDITING [en] TRANSLATION ASSESSMENT [en] MACHINE TRANSLATION
252	Проблемы эквивалентного машинного перевода фразеологизмов (на материале вьетнамского и русского языков) : магистерская диссертация / Problems of equivalent machine translation of phraseological units (on the material of the Vietnamese and Russian languages Нгуен, Т. Т. Х., Nguyen, T. T. H. January 2019 (has links) В работе рассматриваются проблемы, связанные с достижением эквивалентности при переводе фразеологизмов в языковой паре вьетнамский-русский. Любой человек использует фразеологизмы, как в своей речи, так и на письме. Необходимо, чтобы системы машинного перевода корректно и понятно переводили фразеологизмы, если это возможно, подбирая соответствующие эквиваленты. В работе представлены инструменты машинного перевода фразеологизмов, проанализировано прошлое, настоящее и будущее машинного перевода, а также очерчены перспективы перевода фразеологизмов при помощи машины. / The paper deals with the problems associated with the achievement of equivalence when translating phraseological units in the Vietnamese-Russian language pair. Any person uses phraseological units, both in his speech and in writing. It is necessary that machine translation systems correctly and clearly translate phraseological units, if possible, by selecting appropriate equivalents. The paper presents the tools of machine translation of phraseological units, analyzes the past, present and future of machine translation, and outlines the prospects for the translation of phraseological units using a machine. МАШИННЫЙ ПЕРЕВОД ФРАЗЕОЛОГИЗМ СЛОВОСОЧЕТАНИЕ ПРЕДЛОЖЕНИЕ СЛОВО MASTER'S THESIS INTELLECTUAL SYSTEMS MACHINE TRANSLATION PHRASEOLOGICAL UNIT PHRASEOLOGICAL UNIT PHRASE SENTENCE WORD
253	Head-to-head Transfer Learning Comparisons made Possible : A Comparative Study of Transfer Learning Methods for Neural Machine Translation of the Baltic Languages Stenlund, Mathias January 2023 (has links) The struggle of training adequate MT models using data-hungry NMT frameworks for low-resource language pairs has created a need to alleviate the scarcity of sufficiently large parallel corpora. Different transfer learning methods have been introduced as possible solutions to this problem, where a new model for a target task is initialized using parameters learned from some other high-resource task. Many of these methods are claimed to increase the translation quality of NMT systems in some low-resource environments, however, they are often proven to do so using different parent and child language pairs, a variation in data size, NMT frameworks, and training hyperparameters, which makes comparing them impossible. In this thesis project, three such transfer learning methods are put head-to-head in a controlled environment where the target task is to translate from the under-resourced Baltic languages Lithuanian and Latvian to English. In this controlled environment, the same parent language pairs, data sizes, data domains, transformer framework, and training parameters are used to ensure fair comparisons between the three transfer learning methods. The experiments involve training and testing models using all different combinations of transfer learning methods, parent language pairs, and either in-domain or out-domain data for an extensive study where different strengths and weaknesses are observed. The results display that Multi-Round Transfer Learning improves the overall translation quality the most but, at the same time, requires the longest training time by far. The Parameter freezing method provides a marginally lower overall improvement of translation quality but requires only half the training time, while Trivial Transfer learning improves quality the least. Both Polish and Russian work well as parents for the Baltic languages, while web-crawled data improves out-domain translations the most. The results suggest that all transfer learning methods are effective in a simulated low-resource environment, however, none of them can compete with simply having a larger target language pair data set, due to none of them overcoming the strong higher-resource baseline. machine translation transfer learning Latvian Lithuanian low-resource languages transformers parent language child language comparative study
254	Towards Digitization and Machine learning Automation for Cyber-Physical System of Systems Javed, Saleha January 2022 (has links) Cyber-physical systems (CPS) connect the physical and digital domains and are often realized as spatially distributed. CPS is built on the Internet of Things (IoT) and Internet of Services, which use cloud architecture to link a swarm of devices over a decentralized network. Modern CPSs are undergoing a foundational shift as Industry 4.0 is continually expanding its boundaries of digitization. From automating the industrial manufacturing process to interconnecting sensor devices within buildings, Industry 4.0 is about developing solutions for the digitized industry. An extensive amount of engineering efforts are put to design dynamically scalable and robust automation solutions that have the capacity to integrate heterogeneous CPS. Such heterogeneous systems must be able to communicate and exchange information with each other in real-time even if they are based on different underlying technologies, protocols, or semantic definitions in the form of ontologies. This development is subject to interoperability challenges and knowledge gaps that are addressed by engineers and researchers, in particular, machine learning approaches are considered to automate costly engineering processes. For example, challenges related to predictive maintenance operations and automatic translation of messages transmitted between heterogeneous devices are investigated using supervised and unsupervised machine learning approaches. In this thesis, a machine learning-based collaboration and automation-oriented IIoT framework named Cloud-based Collaborative Learning (CCL) is developed. CCL is based on a service-oriented architecture (SOA) offering a scalable CPS framework that provides machine learning-as-a-Service (MLaaS). Furthermore, interoperability in the context of the IIoT is investigated. I consider the ontology of an IoT device to be its language, and the structure of that ontology to be its grammar. In particular, the use of aggregated language and structural encoders is investigated to improve the alignment of entities in heterogeneous ontologies. Existing techniques of entity alignment are based on different approaches to integrating structural information, which overlook the fact that even if a node pair has similar entity labels, they may not belong to the same ontological context, and vice versa. To address these challenges, a model based on a modification of the BERT_INT model on graph triples is developed. The developed model is an iterative model for alignment of heterogeneous IIoT ontologies enabling alignments within nodes as well as relations. When compared to the state-of-the-art BERT_INT, on DBPK15 language dataset the developed model exceeds the baseline model by (HR@1/10, MRR) of 2.1%. This motivated the development of a proof-of-concept for conducting an empirical investigation of the developed model for alignment between heterogeneous IIoT ontologies. For this purpose, a dataset was generated from smart building systems and SOSA and SSN ontologies graphs. Experiments and analysis including an ablation study on the proposed language and structural encoders demonstrate the effectiveness of the model. The suggested approach, on the other hand, highlights prospective future studies that may extend beyond the scope of a single thesis. For instance, to strengthen the ablation study, a generalized IIoT ontology that is designed for any type of IoT devices (beyond sensors), such as SAREF can be tested for ontology alignment. Next potential future work is to conduct a crowdsourcing process for generating a validation dataset for IIoT ontology alignment and annotations. Lastly, this work can be considered as a step towards enabling translation between heterogeneous IoT sensor devices, therefore, the proposed model can be extended to a translation module in which based on the ontology graphs of any device, the model can interpret the messages transmitted from that device. This idea is at an abstract level as of now and needs extensive efforts and empirical study for full maturity. Digitization Automation Industry 4.0 Machine-to-Machine Translation Machine Learning Unsupervised Learning Condition Monitoring Ontology Alignment Computer Sciences Datavetenskap (datalogi)
255	Classroom Translanguaging Practices and Secondary Multilingual Learners in Indiana Woongsik Choi (16624299) 20 July 2023 (has links) <p>Many multilingual learners who use a language other than English at home face academic challenges from English monolingualism prevalent in the U.S. school system. English as a New Language (ENL) programs teach English to these learners while playing a role in reinforcing English monolingualism. For educational inclusivity and equity for multilingual learners, it is imperative to center their holistic language repertoires in ENL classrooms; however, this can be challenging due to individual and contextual factors. Using translanguaging as a conceptual framework, this qualitative case study explores how high school multilingual learners’ languages are flexibly used in ENL classes and how the students think about such classroom translanguaging practices. I used ethnographic methods to observe ENL classroom activities and instructional practices, interview the participants, and collect photos and documents in a high school in Indiana for a semester. The participants were an English-Spanish proficient ENL teacher and four students from Puerto Rico, Mexico, Honduras, and the Democratic Republic of the Congo, whose language repertoires included Spanish, Lingala, French, Arabic, and English. The findings describe the difficulties and possibilities of incorporating all students’ multilingual-multisemiotic repertoires in ENL classes. The classroom language practices primarily constituted of Spanish and drawing; some instructional activities and practices, such as the multigenre identity project and the teacher’s use of Google Translate, well integrated the students’ multilingual-multisemiotic repertoires. When the students engaged in English writing, they frequently used machine translation, such as Google Translate, through dynamic processes involving evaluation. While the students perceived such classroom translanguaging practices generally positively, they considered using machine translation as a problem, a resource, or an opportunity. With these findings, I argue that multilingual learners’ competence to use their own languages and machine translation technology freely and flexibly is a valuable resource for learning and should be encouraged and developed in ENL classrooms. To do so, ENL teachers should use instructional activities and practices considering students’ dynamic multilingualism. TESOL teacher education should develop such competence in teachers, and more multilingual resources should be provided to teachers. In the case of a multilingual classroom with singleton students, building mutual understanding, empathy, and equity-mindedness among class members should be prioritized. Finally, I recommend that the evolving multilingual technologies, such as machine translation, be actively used as teaching and learning resources for multilingual learners.</p> translanguaging multilingual learners English language learners (ELLs) English as a Second Language (TESOL) machine translation technology Google Translate
256	Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática Zamora Martínez, Francisco Julián 07 December 2012 (has links) El procesamiento del lenguaje natural es un área de aplicación de la inteligencia artificial, en particular, del reconocimiento de formas que estudia, entre otras cosas, incorporar información sintáctica (modelo de lenguaje) sobre cómo deben juntarse las palabras de una determinada lengua, para así permitir a los sistemas de reconocimiento/traducción decidir cual es la mejor hipótesis �con sentido común�. Es un área muy amplia, y este trabajo se centra únicamente en la parte relacionada con el modelado de lenguaje y su aplicación a diversas tareas: reconocimiento de secuencias mediante modelos ocultos de Markov y traducción automática estadística. Concretamente, esta tesis tiene su foco central en los denominados modelos conexionistas de lenguaje, esto es, modelos de lenguaje basados en redes neuronales. Los buenos resultados de estos modelos en diversas áreas del procesamiento del lenguaje natural han motivado el desarrollo de este estudio. Debido a determinados problemas computacionales que adolecen los modelos conexionistas de lenguaje, los sistemas que aparecen en la literatura se construyen en dos etapas totalmente desacopladas. En la primera fase se encuentra, a través de un modelo de lenguaje estándar, un conjunto de hipótesis factibles, asumiendo que dicho conjunto es representativo del espacio de búsqueda en el cual se encuentra la mejor hipótesis. En segundo lugar, sobre dicho conjunto, se aplica el modelo conexionista de lenguaje y se extrae la hipótesis con mejor puntuación. A este procedimiento se le denomina �rescoring�. Este escenario motiva los objetivos principales de esta tesis: � Proponer alguna técnica que pueda reducir drásticamente dicho coste computacional degradando lo mínimo posible la calidad de la solución encontrada. � Estudiar el efecto que tiene la integración de los modelos conexionistas de lenguaje en el proceso de búsqueda de las tareas propuestas. � Proponer algunas modificaciones del modelo original que permitan mejorar su calidad / Zamora Martínez, FJ. (2012). Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18066 Artificial neural networks Language modeling Decoding Neural network language models Handwritten text recognition Spoken language understanding Automatic speech recognition Statistical machine translation LENGUAJES Y SISTEMAS INFORMATICOS
257	Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories Valor Miró, Juan Daniel 06 November 2017 (has links) Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world. Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions. For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions. / Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo. Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones. Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas. / Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món. Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions. Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees. / Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496 Machine translation Educational technologies Video lecture repositories Usability study Computer-assisted transcription interface design strategies Automatic speech recognition Online learning Multilingual video lectures LENGUAJES Y SISTEMAS INFORMATICOS
258	Interactivity, Adaptation and Multimodality in Neural Sequence-to-sequence Learning Peris Abril, Álvaro 07 January 2020 (has links) [ES] El problema conocido como de secuencia a secuencia consiste en transformar una secuencia de entrada en una secuencia de salida. Bajo esta perspectiva se puede atacar una amplia cantidad de problemas, entre los cuales destacan la traducción automática o la descripción automática de objetos multimedia. La aplicación de redes neuronales profundas ha revolucionado esta disciplina, y se han logrado avances notables. Pero los sistemas automáticos todavía producen predicciones que distan mucho de ser perfectas. Para obtener predicciones de gran calidad, los sistemas automáticos se utilizan bajo la supervisión de un humano, quien corrige los errores. Esta tesis se centra principalmente en el problema de la traducción del lenguaje natural, usando modelos enteramente neuronales. Nuestro objetivo es desarrollar sistemas de traducción neuronal más eficientes. asentándonos sobre dos pilares fundamentales: cómo utilizar el sistema de una forma más eficiente y cómo aprovechar datos generados durante la fase de explotación del mismo. En el primer caso, aplicamos el marco teórico conocido como predicción interactiva a la traducción automática neuronal. Este proceso consiste en integrar usuario y sistema en un proceso de corrección cooperativo, con el objetivo de reducir el esfuerzo humano empleado en obtener traducciones de alta calidad. Desarrollamos distintos protocolos de interacción para dicha tecnología, aplicando interacción basada en prefijos y en segmentos, implementados modificando el proceso de búsqueda del sistema. Además, ideamos mecanismos para obtener una interacción con el sistema más precisa, manteniendo la velocidad de generación del mismo. Llevamos a cabo una extensa experimentación, que muestra el potencial de estas técnicas: superamos el estado del arte anterior por un gran margen y observamos que nuestros sistemas reaccionan mejor a las interacciones humanas. A continuación, estudiamos cómo mejorar un sistema neuronal mediante los datos generados como subproducto de este proceso de corrección. Para ello, nos basamos en dos paradigmas del aprendizaje automático: el aprendizaje muestra a muestra y el aprendizaje activo. En el primer caso, el sistema se actualiza inmediatamente después de que el usuario corrige una frase, aprendiendo de una manera continua a partir de correcciones, evitando cometer errores previos y especializándose en un usuario o dominio concretos. Evaluamos estos sistemas en una gran cantidad de situaciones y dominios diferentes, que demuestran el potencial que tienen los sistemas adaptativos. También llevamos a cabo una evaluación humana, con traductores profesionales. Éstos quedaron muy satisfechos con el sistema adaptativo. Además, fueron más eficientes cuando lo usaron, comparados con un sistema estático. El segundo paradigma lo aplicamos en un escenario en el que se deban traducir grandes cantidades de frases, siendo inviable la supervisión de todas. El sistema selecciona aquellas muestras que vale la pena supervisar, traduciendo el resto automáticamente. Aplicando este protocolo, redujimos de aproximadamente un cuarto el esfuerzo humano necesario para llegar a cierta calidad de traducción. Finalmente, atacamos el complejo problema de la descripción de objetos multimedia. Este problema consiste en describir en lenguaje natural un objeto visual, una imagen o un vídeo. Comenzamos con la tarea de descripción de vídeos pertenecientes a un dominio general. A continuación, nos movemos a un caso más específico: la descripción de eventos a partir de imágenes egocéntricas, capturadas a lo largo de un día. Buscamos extraer relaciones entre eventos para generar descripciones más informadas, desarrollando un sistema capaz de analizar un mayor contexto. El modelo con contexto extendido genera descripciones de mayor calidad que un modelo básico. Por último, aplicamos la predicción interactiva a estas tareas multimedia, disminuyendo el esfuerzo necesa / [CA] El problema conegut com a de seqüència a seqüència consisteix en transformar una seqüència d'entrada en una seqüència d'eixida. Seguint aquesta perspectiva, es pot atacar una àmplia quantitat de problemes, entre els quals destaquen la traducció automàtica, el reconeixement automàtic de la parla o la descripció automàtica d'objectes multimèdia. L'aplicació de xarxes neuronals profundes ha revolucionat aquesta disciplina, i s'han aconseguit progressos notables. Però els sistemes automàtics encara produeixen prediccions que disten molt de ser perfectes. Per a obtindre prediccions de gran qualitat, els sistemes automàtics són utilitzats amb la supervisió d'un humà, qui corregeix els errors. Aquesta tesi se centra principalment en el problema de la traducció de llenguatge natural, el qual s'ataca emprant models enterament neuronals. El nostre objectiu principal és desenvolupar sistemes més eficients. Per a aquesta tasca, les nostres contribucions s'assenten sobre dos pilars fonamentals: com utilitzar el sistema d'una manera més eficient i com aprofitar dades generades durant la fase d'explotació d'aquest. En el primer cas, apliquem el marc teòric conegut com a predicció interactiva a la traducció automàtica neuronal. Aquest procés consisteix en integrar usuari i sistema en un procés de correcció cooperatiu, amb l'objectiu de reduir l'esforç humà emprat per obtindre traduccions d'alta qualitat. Desenvolupem diferents protocols d'interacció per a aquesta tecnologia, aplicant interacció basada en prefixos i en segments, implementats modificant el procés de cerca del sistema. A més a més, busquem mecanismes per a obtindre una interacció amb el sistema més precisa, mantenint la velocitat de generació. Duem a terme una extensa experimentació, que mostra el potencial d'aquestes tècniques: superem l'estat de l'art anterior per un gran marge i observem que els nostres sistemes reaccionen millor a les interacciones humanes. A continuació, estudiem com millorar un sistema neuronal mitjançant les dades generades com a subproducte d'aquest procés de correcció. Per a això, ens basem en dos paradigmes de l'aprenentatge automàtic: l'aprenentatge mostra a mostra i l'aprenentatge actiu. En el primer cas, el sistema s'actualitza immediatament després que l'usuari corregeix una frase. Per tant, el sistema aprén d'una manera contínua a partir de correccions, evitant cometre errors previs i especialitzant-se en un usuari o domini concrets. Avaluem aquests sistemes en una gran quantitat de situacions i per a dominis diferents, que demostren el potencial que tenen els sistemes adaptatius. També duem a terme una avaluació amb traductors professionals, qui varen quedar molt satisfets amb el sistema adaptatiu. A més, van ser més eficients quan ho van usar, si ho comparem amb el sistema estàtic. Pel que fa al segon paradigma, l'apliquem per a l'escenari en el qual han de traduir-se grans quantitats de frases, i la supervisió de totes elles és inviable. En aquest cas, el sistema selecciona les mostres que paga la pena supervisar, traduint la resta automàticament. Aplicant aquest protocol, reduírem en aproximadament un quart l'esforç necessari per a arribar a certa qualitat de traducció. Finalment, ataquem el complex problema de la descripció d'objectes multimèdia. Aquest problema consisteix en descriure, en llenguatge natural, un objecte visual, una imatge o un vídeo. Comencem amb la tasca de descripció de vídeos d'un domini general. A continuació, ens movem a un cas més específic: la descripció d''esdeveniments a partir d'imatges egocèntriques, capturades al llarg d'un dia. Busquem extraure relacions entre ells per a generar descripcions més informades, desenvolupant un sistema capaç d'analitzar un major context. El model amb context estés genera descripcions de major qualitat que el model bàsic. Finalment, apliquem la predicció interactiva a aquestes tasques multimèdia, di / [EN] The sequence-to-sequence problem consists in transforming an input sequence into an output sequence. A variety of problems can be posed in these terms, including machine translation, speech recognition or multimedia captioning. In the last years, the application of deep neural networks has revolutionized these fields, achieving impressive advances. However and despite the improvements, the output of the automatic systems is still far to be perfect. For achieving high-quality predictions, fully-automatic systems require to be supervised by a human agent, who corrects the errors. This is a common procedure in the translation industry. This thesis is mainly framed into the machine translation problem, tackled using fully neural systems. Our main objective is to develop more efficient neural machine translation systems, that allow for a more productive usage and deployment of the technology. To this end, we base our contributions on two main cornerstones: how to better use of the system and how to better leverage the data generated along its usage. First, we apply the so-called interactive-predictive framework to neural machine translation. This embeds the human agent and the system into a cooperative correction process, that seeks to reduce the human effort spent for obtaining high-quality translations. We develop different interactive protocols for the neural machine translation technology, namely, a prefix-based and a segment-based protocols. They are implemented by modifying the search space of the model. Moreover, we introduce mechanisms for achieving a fine-grained interaction while maintaining the decoding speed of the system. We carried out a wide experimentation that shows the potential of our contributions. The previous state of the art is overcame by a large margin and the current systems are able to react better to the human interactions. Next, we study how to improve a neural system using the data generated as a byproduct of this correction process. To this end, we rely on two main learning paradigms: online and active learning. Under the first one, the system is updated on the fly, as soon as a sentence is corrected. Hence, the system is continuously learning from the corrections, avoiding previous errors and specializing towards a given user or domain. A large experimentation stressed the adaptive systems under different conditions and domains, demonstrating the capabilities of adaptive systems. Moreover, we also carried out a human evaluation of the system, involving professional users. They were very pleased with the adaptive system, and worked more efficiently using it. The second paradigm, active learning, is devised for the translation of huge amounts of data, that are infeasible to being completely supervised. In this scenario, the system selects samples that are worth to be supervised, and leaves the rest automatically translated. Applying this framework, we obtained reductions of approximately a quarter of the effort required for reaching a desired translation quality. The neural approach also obtained large improvements compared with previous translation technologies. Finally, we address another challenging problem: visual captioning. It consists in generating a description in natural language from a visual object, namely an image or a video. We follow the sequence-to-sequence framework, under a a multimodal perspective. We start by tackling the task of generating captions of videos from a general domain. Next, we move on to a more specific case: describing events from egocentric images, acquired along the day. Since these events are consecutive, we aim to extract inter-eventual relationships, for generating more informed captions. The context-aware model improved the generation quality with respect to a regular one. As final point, we apply the intractive-predictive protocol to these multimodal captioning systems, reducing the effort required for correcting the outputs. / Section 5.4 describes an user evaluation of an adaptive translation system. This was done in collaboration with Miguel Domingo and the company Pangeanic, with funding from the Spanish Center for Technological and Industrial Development (Centro para el Desarrollo Tecnológico Industrial). [...] Most of Chapter 6 is the result of a collaboration with Marc Bolaños, supervised by Prof. Petia Radeva, from Universitat de Barcelona/CVC. This collaboration was supported by the R-MIPRCV network, under grant TIN2014-54728-REDC. / Peris Abril, Á. (2019). Interactivity, Adaptation and Multimodality in Neural Sequence-to-sequence Learning [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/134058 Traducción automática Reconocimiento de formas Aprendizaje automático Redes neuronales Neural Networks Deep Learning Machine Translation Pattern Recognition Machine Learning LENGUAJES Y SISTEMAS INFORMATICOS
259	Evaluación de la calidad de la traducción automática de reseñas turísticas en línea desde la perspectiva de la localización Rosa Sorlozano, Maria Carmen 21 June 2024 (has links) [ES] Los usuarios de internet han pasado a ser contribuidores activos en la Web 2.0. Estudios recientes revelan que siete de cada 10 usuarios de internet en todo el mundo confían en la opinión y las reseñas publicadas en línea por otros usuarios. De la misma forma, según estadísticas recientes de las agencias españolas de turismo, el uso de internet ha crecido más de un 29 %: casi todos los usuarios (99,2 %) lo usan para buscar información, un 76,5 % lo usa para hacer reservas y el 52,4 % para el pago de servicios. A pesar del gran potencial y el volumen de negocio de este sector, las plataformas de reseñas (tanto de viajes como de reservas de restaurantes) solo utilizan la traducción automática (TA) de las opiniones que los consumidores dejan en dichas páginas sin procesamiento ni revisión. En paralelo, los estudios de traducción han concedido un papel clave durante los últimos 30 años a la subdisciplina de la localización, consistente en adaptar el mensaje a las preferencias lingüísticas y culturales específicas del usuario. Por este motivo, esta investigación se centra en el análisis de un corpus de reseñas traducidas automáticamente para identificar patrones de error y sus efectos en la calidad del texto según parámetros estudiados en la localización. Áreas como el turismo, las finanzas y el marketing podrían verse beneficiadas si mejoran sus procesos de traducción, puesto que un mejor entendimiento de los servicios anunciados facilita la interacción del consumidor con dichos productos y servicios. / [CA] Els usuaris d'internet han passat a ser contribuïdors actius en la Web 2.0. Estudis recents revelen que set de cada 10 usuaris d'internet a tot el món confien en l'opinió i les ressenyes publicades en línia per altres usuaris. De la mateixa forma, segons estadístiques recents de les agències espanyoles de turisme, l'ús d'internet ha crescut més d'un 29%: quasi tots els usuaris (99,2%) l'utilitzen per a buscar informació, un 76,5% l'usa per a fer reserves i el 52,4% per al pagament de servicis. Malgrat el gran potencial i el volum de negoci d'este sector, les plataformes de ressenyes (tant de viatges com de reserves de restaurants) només utilitzen la traducció automàtica (TA) de les opinions que els consumidors deixen en estes pàgines sense processament ni revisió. En paral·lel, els estudis de traducció han concedit un paper clau durant els últims 30 anys a la subdisciplina de la localització, consistent a adaptar el missatge a les preferències lingüístiques i culturals específiques de l'usuari. Per aquest motiu, esta investigació se centra en l'anàlisi d'un corpus de ressenyes traduïdes automàticament per a identificar patrons d'error i els seus efectes en la qualitat del text segons paràmetres estudiats en la localització. Àrees com el turisme, les finances i el màrqueting podrien veure's beneficiades si milloren els seus processos de traducció, ja que un millor enteniment dels servicis anunciats facilita la interacció del consumidor amb aquests productes i servicis. / [EN] Internet users have become active contributors to the Web 2.0. Recent studies reveal that seven out of 10 internet users worldwide rely on the opinions and reviews posted online by other users. Similarly, according to recent statistics from the main Spanish tourism agencies, Internet usage has grown by more than 29%: almost all users (99.2%) use it to search for information, 76.5% use it to make reservations and 52.4% use it to pay for services. Despite the great potential and turnover of this sector, review platforms (both for travel and restaurant reservations) only use machine translation (MT) of the reviews that consumers leave on these pages without processing or review. In parallel, translation studies have attributed a major role over the last 30 years to the sub-discipline of localization, which consists of adapting the message to the user's specific linguistic and cultural preferences. For this reason, this research focuses on the analysis of a corpus of machine-translated reviews to identify error patterns and their effects on text quality according to parameters studied in localization. Areas such as tourism, finance and marketing could benefit from improving their translation processes, since a better understanding of the advertised services facilitates consumer interaction with those services. / Rosa Sorlozano, MC. (2024). Evaluación de la calidad de la traducción automática de reseñas turísticas en línea desde la perspectiva de la localización [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/205487 Machine Translation User-generated content Localisation User-generated reviews Postediting Traducción automática Contenido generado por el usuario Localización Posedición Reseñas generadas por el usuario FILOLOGIA INGLESA
260	Разработка пользовательского интерфейса для системы устного перевода: особенности и проблемы : магистерская диссертация / Development of user interface for interpreting system: peculiarities and problems Демкин, А. А., Demkin, A. A. January 2024 (has links) В данной выпускной квалификационной работе разработана система устного перевода с акцентом на улучшение пользовательского интерфейса. В ходе исследования выявлены ключевые аспекты, влияющие на удобство и эффективность взаимодействия пользователя с системой, такие как мультиязычность, интуитивная понятность и адаптация к различным культурным контекстам. Разработан интерфейс, сочетающий голосовое управление и графические элементы, поддерживающий английский и русский языки. Проведена проверка работоспособности, показавшая стабильность и надежность приложения. Даны рекомендации по улучшению контекстуальной точности, адаптации для людей с ограниченными возможностями, расширению функциональности и обеспечению конфиденциальности данных. Работа представляет значительный вклад в развитие методов создания и оценки пользовательских интерфейсов, способствуя прогрессу в области компьютерных наук и межкультурного общения. / In this graduate qualification work the interpreting system with the focus on the improvement of user interface is developed. In the course of the research the key aspects influencing the convenience and efficiency of user interaction with the system, such as multilingualism, intuitive comprehensibility and adaptation to different cultural contexts, are identified. An interface combining voice control and graphical elements was developed, supporting English and Russian languages. A performance test was conducted, showing the stability and reliability of the application. Recommendations are made to improve contextual accuracy, adaptation for people with disabilities, enhanced functionality, and data privacy. The work represents a significant contribution to the development of methods for creating and evaluating user interfaces, contributing to advances in computer science and intercultural communication. MASTER'S THESIS INTERPRETING SYSTEM USER INTERFACE MULTILINGUALISM VOICE CONTROL MACHINE TRANSLATION МУЛЬТИЯЗЫЧНОСТЬ ГОЛОСОВОЕ УПРАВЛЕНИЕ МАШИННЫЙ ПЕРЕВОД

Search results