1 |
Computational approaches to figurative languageShutova, Ekaterina January 2011 (has links)
No description available.
|
2 |
Tree encoding of speech signals at low bit ratesChu, Chung Cheung. January 1986 (has links)
No description available.
|
3 |
Tree encoding of speech signals at low bit ratesChu, Chung Cheung. January 1986 (has links)
No description available.
|
4 |
The word segmentation & part-of-speech tagging system for the modern Chinese. / Word segmentation and part-of-speech tagging system for the modern ChineseJanuary 1994 (has links)
Liu Hon-lung. / Title also in Chinese characters. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves [58-59]). / Chapter 1. --- Introduction --- p.1 / Chapter 2. --- "Word Segmentation and Part-of-Speech Tagging: Techniques, Current Researches and The Embraced Problems" --- p.6 / Chapter 2.1. --- Various Methods on Word Segmentation and Part-of-Speech Tagging --- p.6 / Chapter 2.2. --- Current Researches on Word Segmentation and Part-of-Speech Tagging --- p.9 / Chapter 2.3. --- Embraced Problems in Word Segmentation and Part-of-Speech Tagging --- p.9 / Chapter 3. --- Branch-and-Bound Algorithm for Combinational Optimization of the Probabilistic Scoring Function --- p.15 / Chapter 3.1. --- Definition of Word Segmentation and Part-of-Speech Tagging --- p.15 / Chapter 3.2. --- Framework --- p.17 / Chapter 3.3. --- "Weight Assignment, Intermediate Score Computation & Optimization" --- p.20 / Chapter 4. --- Implementation Issues of the Proposed Word Segmentation and Part-of-Speech Tagging System --- p.26 / Chapter 4.1. --- Design of System Dictionary and Data Structure --- p.30 / Chapter 4.2. --- Training Process --- p.33 / Chapter 4.3. --- Tagging Process --- p.35 / Chapter 4.4. --- Tagging Samples of the Word Segmentation & Part-of-Speech Tagging System --- p.39 / Chapter 5. --- Experiments on the Proposed Word Segmentation and Part-Of-Speech Tagging System --- p.41 / Chapter 5.1. --- Closed Test --- p.41 / Chapter 5.2. --- Open Test --- p.42 / Chapter 6. --- Testing and Statistics --- p.43 / Chapter 7. --- Conclusions and Discussions --- p.47 / References / Appendices / Appendix A: sysdict.tag Sample / Appendix B: econ.tag Sample / Appendix C: open. tag Sample / Appendix D:漢語分詞及詞性標注系統for Windows / Appendix E: Neural Network
|
5 |
Machine Learning Methods for Articulatory DataBerry, Jeffrey James January 2012 (has links)
Humans make use of more than just the audio signal to perceive speech. Behavioral and neurological research has shown that a person's knowledge of how speech is produced influences what is perceived. With methods for collecting articulatory data becoming more ubiquitous, methods for extracting useful information are needed to make this data useful to speech scientists, and for speech technology applications. This dissertation presents feature extraction methods for ultrasound images of the tongue and for data collected with an Electro-Magnetic Articulograph (EMA). The usefulness of these features is tested in several phoneme classification tasks. Feature extraction methods for ultrasound tongue images presented here consist of automatically tracing the tongue surface contour using a modified Deep Belief Network (DBN) (Hinton et al. 2006), and methods inspired by research in face recognition which use the entire image. The tongue tracing method consists of training a DBN as an autoencoder on concatenated images and traces, and then retraining the first two layers to accept only the image at runtime. This 'translational' DBN (tDBN) method is shown to produce traces comparable to those made by human experts. An iterative bootstrapping procedure is presented for using the tDBN to assist a human expert in labeling a new data set. Tongue contour traces are compared with the Eigentongues method of (Hueber et al. 2007), and a Gabor Jet representation in a 6-class phoneme classification task using Support Vector Classifiers (SVC), with Gabor Jets performing the best. These SVC methods are compared to a tDBN classifier, which extracts features from raw images and classifies them with accuracy only slightly lower than the Gabor Jet SVC method.For EMA data, supervised binary SVC feature detectors are trained for each feature in three versions of Distinctive Feature Theory (DFT): Preliminaries (Jakobson et al. 1954), The Sound Pattern of English (Chomsky and Halle 1968), and Unified Feature Theory (Clements and Hume 1995). Each of these feature sets, together with a fourth unsupervised feature set learned using Independent Components Analysis (ICA), are compared on their usefulness in a 46-class phoneme recognition task. Phoneme recognition is performed using a linear-chain Conditional Random Field (CRF) (Lafferty et al. 2001), which takes advantage of the temporal nature of speech, by looking at observations adjacent in time. Results of the phoneme recognition task show that Unified Feature Theory performs slightly better than the other versions of DFT. Surprisingly, ICA actually performs worse than running the CRF on raw EMA data.
|
6 |
Automatic phonological transcription using forced alignment : FAVE toolkit performance on four non-standard varieties of EnglishSella, Valeria January 2018 (has links)
Forced alignment, a speech recognition software performing semi-automatic phonological transcription, constitutes a methodological revolution in the recent history of linguistic research. Its use is progressively becoming the norm in research fields such as sociophonetics, but its general performance and range of applications have been relatively understudied. This thesis investigates the performance and portability of the Forced Alignment and Vowel Extraction program suite (FAVE), an aligner that was trained on, and designed to study, American English. It was decided to test FAVE on four non-American varieties of English (Scottish, Irish, Australian and Indian English) and a control variety (General American). First, the performance of FAVE was compared with human annotators, and then it was tested on three potentially problematic variables: /p, t, k/ realization, rhotic consonants and /l/. Although FAVE was found to perform significantly differently from human annotators on identical datasets, further analysis revealed that the aligner performed quite similarly on the non-standard varieties and the control variety, suggesting that the difference in accuracy does not constitute a major drawback to its extended usage. The study discusses the implications of the findings in relation to doubts expressed about the usage of such technology and argues for a wider implementation of forced alignment tools such as FAVE in sociophonetic research.
|
7 |
Effective automatic speech recognition data collection for under–resourced languages / de Vries N.J.De Vries, Nicolaas Johannes January 2011 (has links)
As building transcribed speech corpora for under–resourced languages plays a pivotal role in developing
automatic speech recognition (ASR) technologies for such languages, a key step in developing
these technologies is the effective collection of ASR data, consisting of transcribed audio and associated
meta data.
The problem is that no suitable tool currently exists for effectively collecting ASR data for such
languages. The specific context and requirements for effectively collecting ASR data for underresourced
languages, render all currently known solutions unsuitable for such a task. Such requirements
include portability, Internet independence and an open–source code–base.
This work documents the development of such a tool, called Woefzela, from the determination
of the requirements necessary for effective data collection in this context, to the verification and
validation of its functionality. The study demonstrates the effectiveness of using smartphones without
any Internet connectivity for ASR data collection for under–resourced languages. It introduces a semireal–
time quality control philosophy which increases the amount of usable ASR data collected from
speakers.
Woefzela was developed for the Android Operating System, and is freely available for use on
Android smartphones, with its source code also being made available. A total of more than 790 hours
of ASR data for the eleven official languages of South Africa have been successfully collected with
Woefzela.
As part of this study a benchmark for the performance of a new National Centre for Human
Language Technology (NCHLT) English corpus was established. / Thesis (M.Ing. (Electrical Engineering))--North-West University, Potchefstroom Campus, 2012.
|
8 |
Effective automatic speech recognition data collection for under–resourced languages / de Vries N.J.De Vries, Nicolaas Johannes January 2011 (has links)
As building transcribed speech corpora for under–resourced languages plays a pivotal role in developing
automatic speech recognition (ASR) technologies for such languages, a key step in developing
these technologies is the effective collection of ASR data, consisting of transcribed audio and associated
meta data.
The problem is that no suitable tool currently exists for effectively collecting ASR data for such
languages. The specific context and requirements for effectively collecting ASR data for underresourced
languages, render all currently known solutions unsuitable for such a task. Such requirements
include portability, Internet independence and an open–source code–base.
This work documents the development of such a tool, called Woefzela, from the determination
of the requirements necessary for effective data collection in this context, to the verification and
validation of its functionality. The study demonstrates the effectiveness of using smartphones without
any Internet connectivity for ASR data collection for under–resourced languages. It introduces a semireal–
time quality control philosophy which increases the amount of usable ASR data collected from
speakers.
Woefzela was developed for the Android Operating System, and is freely available for use on
Android smartphones, with its source code also being made available. A total of more than 790 hours
of ASR data for the eleven official languages of South Africa have been successfully collected with
Woefzela.
As part of this study a benchmark for the performance of a new National Centre for Human
Language Technology (NCHLT) English corpus was established. / Thesis (M.Ing. (Electrical Engineering))--North-West University, Potchefstroom Campus, 2012.
|
9 |
Automatic Speech Recognition System Continually Improving Based on Subtitled Speech Data / Automatic Speech Recognition System Continually Improving Based on Subtitled Speech DataKocour, Martin January 2019 (has links)
V dnešnej dobe systémy rozpoznávania reči s veľkým slovníkom dosahujú pomerne vysoké presnosti. Za ich výsledkami však často stoja desiatky ba až stovky hodín manuálne oanotovaných trénovacích dát. Takéto dáta sú často bežne nedostupné alebo pre požadovaný jazyk vôbec neexistujú. Možným riešením je použitie bežne dostupných no menej kvalitných audiovizuálnych dát. Táto práca sa zaoberá technikou zpracovania práve takýchto dát a ich použitím pre trénovanie akustických modelov. Ďalej táto práca pojednáva o možnom využití týchto dát pre kontinuálne vylepšovanie modelov, kedže tieto dáta sú prakticky nevyčerpateľné. Pre tieto účely bol v rámci práce navrhnutý nový prístup pre výber dát.
|
10 |
Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast MediaGarcés Díaz-Munío, Gonzalo Vicente 25 November 2024 (has links)
[ES] En la última década, el reconocimiento automático del habla (RAH) y la traducción automática (TA) han mejorado enormemente mediante el uso de modelos de redes neuronales profundas (RNP) en constante evolución. Si a principios de los 2010 los sistemas de RAH y TA previos a las RNP llegaron a afrontar con éxito algunas aplicaciones reales como la transcripción y traducción de vídeos docentes pregrabados, ahora en los 2020 son abordables aplicaciones que suponen un reto mucho mayor, como la subtitulación de retransmisiones audiovisuales en directo.
En este mismo período, se están invirtiendo cada vez mayores esfuerzos en la accesibilidad a los medios audiovisuales para todos, incluidas las personas sordas. El RAH y la TA, en su estado actual, son grandes herramientas para aumentar la disponibilidad de medidas de accesibilidad como subtítulos, transcripciones y traducciones, y también para proporcionar acceso multilingüe a todo tipo de contenidos.
En esta tesis doctoral presentamos resultados de investigación sobre RAH y TA basadas en RNP en tres campos muy activos: los recursos educativos abiertos, los contenidos parlamentarios y los medios audiovisuales.
En el área de los recursos educativos abiertos (REA), presentamos primeramente trabajos sobre la evaluación y postedición de RAH y TA con métodos de interacción inteligente, en el marco del proyecto de investigación europeo "transLectures: Transcripción y Traducción de Vídeos Docentes". Los resultados obtenidos confirman que la interacción inteligente puede reducir aún más el esfuerzo de postedición de transcripciones y traducciones automáticas. Seguidamente, en el contexto del posterior proyecto europeo X5gon, presentamos una investigación sobre el desarrollo de sistemas de TA neuronal basados en RNP, y sobre sacar el máximo partido de corpus de TA masivos mediante filtrado automático de datos. Este trabajo dio como resultado sistemas de TA neuronal clasificados entre los mejores en una competición internacional de TA, y mostramos cómo estos nuevos sistemas mejoraron la calidad de los subtítulos multilingües en casos reales de REA.
En el ámbito también en crecimiento de las tecnologías del lenguaje para contenidos parlamentarios, describimos una investigación sobre técnicas de filtrado de datos de habla para el RAH en tiempo real en el contexto de debates del Parlamento Europeo. Esta investigación permitió la publicación de Europarl-ASR, un nuevo y amplio corpus de habla para entrenamiento y evaluación de sistemas de RAH en continuo, así como para la evaluación comparativa de técnicas de filtrado de datos de habla.
Finalmente, presentamos un trabajo en un ámbito en la vanguardia tecnológica del RAH y de la TA: la subtitulación de retransmisiones audiovisuales en directo, en el marco del Convenio de colaboración I+D+i 2020-2023 entre la radiotelevisión pública valenciana À Punt y la Universitat Politècnica de València para la subtitulación asistida por ordenador de contenidos audiovisuales en tiempo real. Esta investigación ha resultado en la implantación de sistemas de RAH en tiempo real, de alta precisión y baja latencia, para una lengua no mayoritaria en el mundo (el catalán) y una de las lenguas más habladas del mundo (el castellano) en un medio audiovisual real. / [CA] En l'última dècada, el reconeixement automàtic de la parla (RAP) i la traducció automàtica (TA) han millorat enormement mitjançant l'ús de models de xarxes neuronals profundes (XNP) en constant evolució. Si a principis dels 2010 els sistemes de RAP i TA previs a les XNP van arribar a afrontar amb èxit algunes aplicacions reals com la transcripció i traducció de vídeos docents pregravats, ara en els 2020 són abordables aplicacions que suposen un repte molt major, com la subtitulació de retransmissions audiovisuals en directe.
En aquest mateix període, s'estan invertint cada vegada majors esforços en l'accessibilitat als mitjans audiovisuals per a tots, incloses les persones sordes. El RAP i la TA, en el seu estat actual, són grans eines per a incrementar la disponibilitat de mesures d'accessibilitat com subtítols, transcripcions i traduccions, també com una manera de proporcionar accés multilingüe a tota classe de continguts.
En aquesta tesi doctoral presentem resultats d'investigació sobre RAP i TA basades en XNP en tres camps molt actius: els recursos educatius oberts, els continguts parlamentaris i els mitjans audiovisuals.
En l'àrea dels recursos educatius oberts (REO), presentem primerament treballs sobre l'avaluació i postedició de RAP i TA amb mètodes d'interacció intel·ligent, en el marc del projecte d'investigació europeu "transLectures: Transcripció i traducció de vídeos docents". Els resultats obtinguts confirmen que la interacció intel·ligent pot reduir encara més l'esforç de postedició de transcripcions i traduccions automàtiques. Seguidament, en el context del posterior projecte europeu X5gon, presentem una investigació sobre el desenvolupament de sistemes de TA neuronal basats en XNP, i sobre traure el màxim partit de corpus de TA massius mitjançant filtratge automàtic de dades. Aquest treball va donar com a resultat sistemes de TA neuronal classificats entre els millors en una competició internacional de TA, i mostrem com aquests nous sistemes milloren la qualitat dels subtítols multilingües en casos reals de REO.
En l'àmbit també en creixement de les tecnologies del llenguatge per a continguts parlamentaris, descrivim una investigació sobre tècniques de filtratge de dades de parla per al RAP en temps real en el context de debats del Parlament Europeu. Aquesta investigació va permetre la publicació d'Europarl-ASR, un corpus de parla nou i ampli per a l'entrenament i l'avaluació de sistemes de RAP en continu, així com per a l'avaluació comparativa de tècniques de filtratge de dades de parla.
Finalment, presentem un treball en un àmbit en l'avantguarda tecnològica del RAP i de la TA: la subtitulació de retransmissions audiovisuals en directe, en el context del Conveni de col·laboració R+D+i 2020-2023 entre la radiotelevisió pública valenciana À Punt i la Universitat Politècnica de València per a la subtitulació assistida per ordinador de continguts audiovisuals en temps real. Aquesta investigació ha donat com a resultat la implantació de sistemes de RAP en temps real, amb alta precisió i baixa latència, per a una llengua no majoritària en el món (el català) i una de les llengües més parlades del món (el castellà) en un mitjà audiovisual real. / [EN] In the last decade, automatic speech recognition (ASR) and machine translation (MT) have improved enormously through the use of constantly evolving deep neural network (DNN) models. If at the beginning of the 2010s the then pre-DNN ASR and MT systems were ready to tackle with success some real-life applications such as offline video lecture transcription and translation, now in the 2020s much more challenging applications are within grasp, such as live broadcast media subtitling.
At the same time in this period, media accessibility for everyone, including deaf and hard-of-hearing people, is being given more and more importance. ASR and MT, in their current state, are powerful tools to increase the coverage of accessibility measures such as subtitles, transcriptions and translations, also as a way of providing multilingual access to all types of content.
In this PhD thesis, we present research results on automatic speech recognition and machine translation based on deep neural networks in three very active domains: open educational resources, parliamentary contents and broadcast media.
Regarding open educational resources (OER), we first present work on the evaluation and post-editing of ASR and MT with intelligent interaction approaches, as carried out in the framework of EU project transLectures: Transcription and Translation of Video Lectures. The results obtained confirm that the intelligent interaction approach can make post-editing automatic transcriptions and translations even more cost-effective. Then, in the context of subsequent EU project X5gon, we present research on developing DNN-based neural MT systems, and making the most of larger MT corpora through automatic data filtering. This work resulted in a first-rank classification in an international evaluation campaign on MT, and we show how these new NMT systems improved the quality of multilingual subtitles in real OER scenarios.
In the also growing domain of language technologies for parliamentary contents, we describe research on speech data curation techniques for streaming ASR in the context of European Parliament debates. This research resulted in the release of Europarl-ASR, a new, large speech corpus for streaming ASR system training and evaluation, as well as for the benchmarking of speech data curation techniques.
Finally, we present work in a domain on the edge of the state of the art for ASR and MT: the live subtitling of broadcast media, in the context of the 2020-2023 R&D collaboration agreement between the Valencian public broadcaster À Punt and the Universitat Politècnica de València for real-time computer assisted subtitling of media contents. This research has resulted in the deployment of high-quality, low-latency, real-time streaming ASR systems for a less-spoken language (Catalan) and a widely spoken language (Spanish) in a real broadcast use case. / The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755 (transLectures), Competitiveness and Innovation Framework Programme (CIP) under grant agreement no. 621030 (EMMA), Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5gon) and no. 952215 (TAILOR), and EU4Health Programme 2021–2027 as part of Europe’s Beating Cancer Plan under grant agreements no. 101056995 (INTERACT-EUROPE) and no. 101129375 (INTERACT-EUROPE 100); from the Government of Spain’s research projects iTrans2 (ref. TIN2009-14511, MICINN/ERDF EU), MORE (ref. TIN2015-68326-R,MINECO/ERDF EU), Multisub (ref. RTI2018-094879-B-I00, MCIN/AEI/10.13039/501100011033 ERDF “A way of making Europe”), and XLinDub (ref. PID2021-122443OB-I00, MCIN/AEI/10.13039/501100011033 ERDF “A way of making Europe”); from the Generalitat Valenciana’s “R&D collaboration agreement between the Corporació Valenciana de Mitjans de Comunicació (À Punt Mèdia) and the Universitat Politècnica de València (UPV) for real-time computer assisted subtitling of audiovisual contents based on artificial intelligence”, and research project Classroom Activity Recognition (PROMETEO/2019/111); and from the Universitat Politècnica de València’s PAID-01-17 R&D support programme. This work uses data from the RTVE 2018 and 2020 Databases. This set of data has been provided by RTVE Corporation to help develop Spanish-language speech technologies. / Garcés Díaz-Munío, GV. (2024). Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/212454
|
Page generated in 0.0682 seconds