Spelling suggestions: "subject:"epeech translation"" "subject:"cpeech translation""
11 |
Tradução grafema-fonema para a língua portuguesa baseada em autômatos adaptativos. / Grapheme-phoneme translation for portuguese based on adaptive automata.Shibata, Danilo Picagli 25 March 2008 (has links)
Este trabalho apresenta um estudo sobre a utilização de dispositivos adaptativos para realizar tradução texto-voz. O foco do trabalho é a criação de um método para a tradução grafema-fonema para a língua portuguesa baseado em autômatos adaptativos e seu uso em um software de tradução texto-voz. O método apresentado busca mimetizar o comportamento humano no tratamento de regras de tonicidade, separação de sílabas e as influências que as sílabas exercem sobre suas vizinhas. Essa característica torna o método facilmente utilizável para outras variações da língua portuguesa, considerando que essas características são invariantes em relação à localidade e a época da variedade escolhida. A variação contemporânea da língua falada na cidade de São Paulo foi escolhida como alvo de análise e testes neste trabalho. Para essa variação, o modelo apresenta resultados satisfatórios superando 95% de acerto na tradução grafema-fonema de palavras, chegando a 90% de acerto levando em consideração a resolução de dúvidas geradas por palavras que podem possuir duas representações sonoras e gerando uma saída sonora inteligível aos nativos da língua por meio da síntese por concatenação baseada em sílabas. Como resultado do trabalho, além do modelo para tradução grafema-fonema de palavras baseado em autômatos adaptativos, foi criado um método para escolha da representação fonética correta em caso de ambigüidade e foram criados dois softwares, um para simulação de autômatos adaptativos e outro para a tradução grafema-fonema de palavras utilizando o modelo de tradução criado e o método de escolha da representação correta. Esse último software foi unificado ao sintetizador desenvolvido por Koike et al. (2007) para a criação de um tradutor texto-voz para a língua portuguesa. O trabalho mostra a viabilidade da utilização de autômatos adaptativos como base ou como um elemento auxiliar para o processo de tradução texto-voz na língua portuguesa. / This work presents a study on the use of adaptive devices for text-to-speech translation. The work focuses on the development of a grapheme-phoneme translation method for Portuguese based on Adaptive Automata and the use of this method in a text-to-speech translation software. The presented method resembles human behavior when handling syllable separation rules, syllable stress definition and influences syllables have on each other. This feature makes the method easy to use with different variations of Portuguese, since these characteristics are invariants of the language. Portuguese spoken nowadays in São Paulo, Brazil has been chosen as the target for analysis and tests in this work. The method has good results for such variation of Portuguese, reaching 95% accuracy rate for grapheme-phoneme translation, clearing the 90% mark after resolution of ambiguous cases in which different representations are accepted for a grapheme and generating phonetic output intelligible for native speakers based on concatenation synthesis using syllables as concatenation units. As final results of this work, a model is presented for grapheme-phoneme translation for Portuguese words based on Adaptive Automata, a methodology to choose the correct phonetic representation for the grapheme in ambiguous cases, a software for Adaptive Automata simulation and a software for grapheme-phoneme translation of texts using both the model of translation and methodology for disambiguation. The latter software was unified with the speech synthesizer developed by Koike et al. (2007) to create a text-to-speech translator for Portuguese. This work evidences the feasibility of text-to-speech translation for Portuguese using Adaptive Automata as the main instrument for such task.
|
12 |
De l'utilisation de mesures de confiance en traduction automatique : évaluation, post-édition et application à la traduction de la parole / On the use of confidence measures in machine translation : evaluation, post edition and application to speech translationRaybaud, Sylvain 05 December 2012 (has links)
Cette thèse de doctorat aborde les problématiques de l'estimation de confiance pour la traduction automatique, et de la traduction automatique statistique de la parole spontanée à grand vocabulaire. J'y propose une formalisation du problème d'estimation de confiance, et aborde expérimentalement le problème sous le paradigme de la classification et régression multivariée. Je propose une évaluation des performances des différentes méthodes évoquées, présente les résultats obtenus lors d'une campagne d'évaluation internationale et propose une application à la post-édition par des experts de documents traduits automatiquement. J'aborde ensuite le problème de la traduction automatique de la parole. Après avoir passé en revue les spécificités du medium oral et les défis particuliers qu'il soulève, je propose des méthodes originales pour y répondre, utilisant notamment les réseaux de confusion phonétiques, les mesures de confiances et des techniques de segmentation de la parole. Je montre finalement que le prototype propose rivalise avec des systèmes état de l'art à la conception plus classique / In this thesis I shall deal with the issues of confidence estimation for machine translation and statistical machine translation of large vocabulary spontaneous speech translation. I shall first formalize the problem of confidence estimation. I present experiments under the paradigm of multivariate classification and regression. I review the performances yielded by different techniques, present the results obtained during the WMT2012 internation evaluation campaign and give the details of an application to post edition of automatically translated documents. I then deal with the issue of speech translation. After going into the details of what makes it a very specific and particularly challenging problem, I present original methods to partially solve it, by using phonetic confusion networks, confidence estimation techniques and speech segmentation. I show that the prototype I developped yields performances comparable to state-of-the-art of more standard design
|
13 |
Tradução grafema-fonema para a língua portuguesa baseada em autômatos adaptativos. / Grapheme-phoneme translation for portuguese based on adaptive automata.Danilo Picagli Shibata 25 March 2008 (has links)
Este trabalho apresenta um estudo sobre a utilização de dispositivos adaptativos para realizar tradução texto-voz. O foco do trabalho é a criação de um método para a tradução grafema-fonema para a língua portuguesa baseado em autômatos adaptativos e seu uso em um software de tradução texto-voz. O método apresentado busca mimetizar o comportamento humano no tratamento de regras de tonicidade, separação de sílabas e as influências que as sílabas exercem sobre suas vizinhas. Essa característica torna o método facilmente utilizável para outras variações da língua portuguesa, considerando que essas características são invariantes em relação à localidade e a época da variedade escolhida. A variação contemporânea da língua falada na cidade de São Paulo foi escolhida como alvo de análise e testes neste trabalho. Para essa variação, o modelo apresenta resultados satisfatórios superando 95% de acerto na tradução grafema-fonema de palavras, chegando a 90% de acerto levando em consideração a resolução de dúvidas geradas por palavras que podem possuir duas representações sonoras e gerando uma saída sonora inteligível aos nativos da língua por meio da síntese por concatenação baseada em sílabas. Como resultado do trabalho, além do modelo para tradução grafema-fonema de palavras baseado em autômatos adaptativos, foi criado um método para escolha da representação fonética correta em caso de ambigüidade e foram criados dois softwares, um para simulação de autômatos adaptativos e outro para a tradução grafema-fonema de palavras utilizando o modelo de tradução criado e o método de escolha da representação correta. Esse último software foi unificado ao sintetizador desenvolvido por Koike et al. (2007) para a criação de um tradutor texto-voz para a língua portuguesa. O trabalho mostra a viabilidade da utilização de autômatos adaptativos como base ou como um elemento auxiliar para o processo de tradução texto-voz na língua portuguesa. / This work presents a study on the use of adaptive devices for text-to-speech translation. The work focuses on the development of a grapheme-phoneme translation method for Portuguese based on Adaptive Automata and the use of this method in a text-to-speech translation software. The presented method resembles human behavior when handling syllable separation rules, syllable stress definition and influences syllables have on each other. This feature makes the method easy to use with different variations of Portuguese, since these characteristics are invariants of the language. Portuguese spoken nowadays in São Paulo, Brazil has been chosen as the target for analysis and tests in this work. The method has good results for such variation of Portuguese, reaching 95% accuracy rate for grapheme-phoneme translation, clearing the 90% mark after resolution of ambiguous cases in which different representations are accepted for a grapheme and generating phonetic output intelligible for native speakers based on concatenation synthesis using syllables as concatenation units. As final results of this work, a model is presented for grapheme-phoneme translation for Portuguese words based on Adaptive Automata, a methodology to choose the correct phonetic representation for the grapheme in ambiguous cases, a software for Adaptive Automata simulation and a software for grapheme-phoneme translation of texts using both the model of translation and methodology for disambiguation. The latter software was unified with the speech synthesizer developed by Koike et al. (2007) to create a text-to-speech translator for Portuguese. This work evidences the feasibility of text-to-speech translation for Portuguese using Adaptive Automata as the main instrument for such task.
|
14 |
Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual TranslationsDella Corte, Giuseppe January 2020 (has links)
The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hours of English audio-book recordings and align them with their Italian textual translations, using exclusively public domain resources gathered semi-automatically from the web. The pipeline consists in three main areas: text collection, bilingual text alignment, and forced alignment. For the text collection task, we show how to automatically find e-book titles in a target language by using machine translation, web information retrieval, and named entity recognition and translation techniques. For the bilingual text alignment task, we investigated three methods: the Gale–Church algorithm in conjunction with a small-size hand-crafted bilingual dictionary, the Gale–Church algorithm in conjunction with a bigger bilingual dictionary automatically inferred through statistical machine translation, and bilingual text alignment by computing the vector similarity of multilingual embeddings of concatenation of consecutive sentences. Our findings seem to indicate that the consecutive-sentence-embeddings similarity computation approach manages to improve the alignment of difficult sentences by indirectly performing sentence re-segmentation. For the forced alignment task, we give a theoretical overview of the preferred method depending on the properties of the text to be aligned with the audio, suggesting and using a TTS-DTW (text-to-speech and dynamic time warping) based approach in our pipeline. The result of our experiments is a publicly available multi-modal corpus composed of about 130 hours of English speech aligned with its Italian textual translation and split in 60561 triplets of English audio, English transcript, and Italian textual translation. We also post-processed the corpus so as to extract 40-MFCCs features from the audio segments and released them as a data-set.
|
15 |
En undersökning av AI-verktyget Whisper som potentiell ersättare till det manuella arbetssättet inom undertextframtagning / A Study of the AI-tool Whisper as a Potential Substitute to the Manual Process of SubtitlingKaka, Mailad Waled Kider, Oummadi, Yassin January 2023 (has links)
Det manuella arbetssättet för undertextframtagning är en tidskrävande och kostsam process. Arbetet undersöker AI-verktyget Whisper och dess potential att ersätta processen som används idag. Processen innefattar både transkribering och översättning. För att verktyget ska kunna göra denna transkribering och översättning behöver den i första hand kunna omvandla tal till text. Detta kallas för taligenkänning och är baserat på upptränade språkmodeller. Precisionen för transkriberingen kan mätas med ordfelfrekvens (Word Error Rate – WER) och för översättningen med COMET-22. Resultaten visade sig klara av Microsofts krav för maximalt tillåten WER och anses därför vara tillräckligt bra för användning. Resultaten indikerade även att de maskinproducerade översättningarna uppnår tillfredställande kvalitet. Undertextframtagning, som är det andra steget i processen, visade sig Whisper ha svårare för när det gäller skapandet av undertexter. Detta gällde både för transkriberingen i originalspråk samt den engelsköversatta versionen. Kvaliteten på undertexternas formatering, som mäts med SubER-metoden, kan tolkas som för låga för att anses vara användbara. Resultaten låg i intervallet 59 till 96% vilket innebär hur stor del av den automatiskt tillverkade undertexten behöver korrigeras för att matcha referensen. Den övergripande slutsatsen man kan dra är att Whisper eventuellt kan ersätta den faktiska transkriberings -och översättningsprocessen, då den både är snabbare och kostar mindre resurser än det manuella tillvägagångssättet. Den är dock inte i skrivande stund tillräcklig för att ersätta undertextframtagningen. / The manual process of subtitling creation is a time consuming and costly process. This study examines the AI-tool Whisper and its potential of substituting the process used today. The process consists of both speech recognition and speech translation. For the tool to accomplish the transcription and translation, it first needs to be able to convert speech-to-text. This is called speech recognition and is based on trained speech models. The precision for the transcription can be measured using the Word Error Rate (WER), while the translation uses COMET-22 for measuring precision. The results met the requirements for maximal allowed WER-value and were therefore considered to be usable. The results also indicated that the machine produced translations reached satisfactory quality. Subtitle creation, which is the second part of the process, turned out to be more of a challenge for Whisper. This applied to both the transcription in the original language and the English translated version. The quality of the subtitling format, measured using the SubER-method, can be interpreted as too low to be considered useful. The results were in the interval of 59 to 96% which informs how large part of the automatically created subtitle need to be corrected to match the reference. The conclusion one can draw is that Whisper could eventually substitute the actual transcription and translation process, since it is both faster and costs less resources than the manual process. Though it is not good enough, in the moment of writing, to substitute the subtitling creation.
|
16 |
Concept of Operations (CONOPS) for foreign language and speech translation technologies in a coalition military environmentMarshall, Susan LaVonne 03 1900 (has links)
Approved for public release, distribution is unlimited / This thesis presents Concept of Operations (CONOPS) for two specific automated language translation (ALT) devices, the P2 Phraselator and the Voice Response Translator (VRT). The CONOPS for each device are written as Appendix A and Appendix B respectively. The body of the thesis presents a broad introduction to the present state of ALT technology for the reader who is new to the general subject. It pursues this goal by introducing the human language translation problem followed by nine characteristic descriptors of ALT technology devices to provide a basic comparison framework of existing technologies. The premise is that ALT technology is presently in a state where it is tackled incrementally with various approaches. Two tables are provided that illustrate six commercially available devices using the descriptors. A scenario is then described in which the author observed the two subject ALT devices (depicted in the CONOPS in the Appendices) being employed within an international military exercise. Some unique human observations associated with the use of these devices in the exercise are discussed. A summary is provided of the Department of Defense (DOD) process that is exploring ALT technology devices, specifically the Language and Speech Exploitation Resources (LASER) Advanced Concept Technology Demonstration ACTD. / Lieutenant Commander, United States Navy
|
17 |
Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational ResourcesPérez González de Martos, Alejandro Manuel 12 July 2022 (has links)
[ES] En los últimos años, el aprendizaje profundo ha cambiado significativamente el panorama en diversas áreas del campo de la inteligencia artificial, entre las que se incluyen la visión por computador, el procesamiento del lenguaje natural, robótica o teoría de juegos. En particular, el sorprendente éxito del aprendizaje profundo en múltiples aplicaciones del campo del procesamiento del lenguaje natural tales como el reconocimiento automático del habla (ASR), la traducción automática (MT) o la síntesis de voz (TTS), ha supuesto una mejora drástica en la precisión de estos sistemas, extendiendo así su implantación a un mayor rango de aplicaciones en la vida real. En este momento, es evidente que las tecnologías de reconocimiento automático del habla y traducción automática pueden ser empleadas para producir, de forma efectiva, subtítulos multilingües de alta calidad de contenidos audiovisuales. Esto es particularmente cierto en el contexto de los vídeos educativos, donde las condiciones acústicas son normalmente favorables para los sistemas de ASR y el discurso está gramaticalmente bien formado. Sin embargo, en el caso de TTS, aunque los sistemas basados en redes neuronales han demostrado ser capaces de sintetizar voz de un realismo y calidad sin precedentes, todavía debe comprobarse si esta tecnología está lo suficientemente madura como para mejorar la accesibilidad y la participación en el aprendizaje en línea. Además, existen diversas tareas en el campo de la síntesis de voz que todavía suponen un reto, como la clonación de voz inter-lingüe, la síntesis incremental o la adaptación zero-shot a nuevos locutores. Esta tesis aborda la mejora de las prestaciones de los sistemas actuales de síntesis de voz basados en redes neuronales, así como la extensión de su aplicación en diversos escenarios, en el contexto de mejorar la accesibilidad en el aprendizaje en línea. En este sentido, este trabajo presta especial atención a la adaptación a nuevos locutores y a la clonación de voz inter-lingüe, ya que los textos a sintetizar se corresponden, en este caso, a traducciones de intervenciones originalmente en otro idioma. / [CA] Durant aquests darrers anys, l'aprenentatge profund ha canviat significativament el panorama en diverses àrees del camp de la intel·ligència artificial, entre les quals s'inclouen la visió per computador, el processament del llenguatge natural, robòtica o la teoria de jocs. En particular, el sorprenent èxit de l'aprenentatge profund en múltiples aplicacions del camp del processament del llenguatge natural, com ara el reconeixement automàtic de la parla (ASR), la traducció automàtica (MT) o la síntesi de veu (TTS), ha suposat una millora dràstica en la precisió i qualitat d'aquests sistemes, estenent així la seva implantació a un ventall més ampli a la vida real. En aquest moment, és evident que les tecnologies de reconeixement automàtic de la parla i traducció automàtica poden ser emprades per a produir, de forma efectiva, subtítols multilingües d'alta qualitat de continguts audiovisuals. Això és particularment cert en el context dels vídeos educatius, on les condicions acústiques són normalment favorables per als sistemes d'ASR i el discurs està gramaticalment ben format. No obstant això, al cas de TTS, encara que els sistemes basats en xarxes neuronals han demostrat ser capaços de sintetitzar veu d'un realisme i qualitat sense precedents, encara s'ha de comprovar si aquesta tecnologia és ja prou madura com per millorar l'accessibilitat i la participació en l'aprenentatge en línia. A més, hi ha diverses tasques al camp de la síntesi de veu que encara suposen un repte, com ara la clonació de veu inter-lingüe, la síntesi incremental o l'adaptació zero-shot a nous locutors. Aquesta tesi aborda la millora de les prestacions dels sistemes actuals de síntesi de veu basats en xarxes neuronals, així com l'extensió de la seva aplicació en diversos escenaris, en el context de millorar l'accessibilitat en l'aprenentatge en línia. En aquest sentit, aquest treball presta especial atenció a l'adaptació a nous locutors i a la clonació de veu interlingüe, ja que els textos a sintetitzar es corresponen, en aquest cas, a traduccions d'intervencions originalment en un altre idioma. / [EN] In recent years, deep learning has fundamentally changed the landscapes of a number of areas in artificial intelligence, including computer vision, natural language processing, robotics, and game theory. In particular, the striking success of deep learning in a large variety of natural language processing (NLP) applications, including automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS), has resulted in major accuracy improvements, thus widening the applicability of these technologies in real-life settings. At this point, it is clear that ASR and MT technologies can be utilized to produce cost-effective, high-quality multilingual subtitles of video contents of different kinds. This is particularly true in the case of transcription and translation of video lectures and other kinds of educational materials, in which the audio recording conditions are usually favorable for the ASR task, and there is a grammatically well-formed speech. However, although state-of-the-art neural approaches to TTS have shown to drastically improve the naturalness and quality of synthetic speech over conventional concatenative and parametric systems, it is still unclear whether this technology is already mature enough to improve accessibility and engagement in online learning, and particularly in the context of higher education. Furthermore, advanced topics in TTS such as cross-lingual voice cloning, incremental TTS or zero-shot speaker adaptation remain an open challenge in the field. This thesis is about enhancing the performance and widening the applicability of modern neural TTS technologies in real-life settings, both in offline and streaming conditions, in the context of improving accessibility and engagement in online learning. Thus, particular emphasis is placed on speaker adaptation and cross-lingual voice cloning, as the input text corresponds to a translated utterance in this context. / Pérez González De Martos, AM. (2022). Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational Resources [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/184019 / Premios Extraordinarios de tesis doctorales
|
Page generated in 0.098 seconds