1 |
Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond MolapoMolapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown
dramatically with the advent of smart phones, in which speech recognition can greatly enhance the
user experience. Currently, the languages with extensive ASR support on these devices are languages
that have thousands of hours of transcribed speech corpora already collected. Developing a speech
system for such a language is made simpler because extensive resources already exist. However for
languages that are not as prominent, the process is more difficult. Many obstacles such as reliability
and cost have hampered progress in this regard, and various separate tools for every stage of the
development process have been developed to overcome these difficulties.
Developing a system that is able to combine these identified partial solutions, involves customising
existing tools and developing new ones to interface the overall end-to-end process. This work documents
the integration of several tools to enable the end-to-end development of an Automatic Speech
Recognition system in a typical under-resourced language. Google App Engine is employed as the
core environment for data verification, storage and distribution, and used in conjunction with existing
tools for gathering text data and for speech data recording. We analyse the data acquired by each of
the tools and develop an ASR system in Shona, an important under-resourced language of Southern
Africa. Although unexpected logistical problems complicated the process, we were able to collect
a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that
language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
|
2 |
Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond MolapoMolapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown
dramatically with the advent of smart phones, in which speech recognition can greatly enhance the
user experience. Currently, the languages with extensive ASR support on these devices are languages
that have thousands of hours of transcribed speech corpora already collected. Developing a speech
system for such a language is made simpler because extensive resources already exist. However for
languages that are not as prominent, the process is more difficult. Many obstacles such as reliability
and cost have hampered progress in this regard, and various separate tools for every stage of the
development process have been developed to overcome these difficulties.
Developing a system that is able to combine these identified partial solutions, involves customising
existing tools and developing new ones to interface the overall end-to-end process. This work documents
the integration of several tools to enable the end-to-end development of an Automatic Speech
Recognition system in a typical under-resourced language. Google App Engine is employed as the
core environment for data verification, storage and distribution, and used in conjunction with existing
tools for gathering text data and for speech data recording. We analyse the data acquired by each of
the tools and develop an ASR system in Shona, an important under-resourced language of Southern
Africa. Although unexpected logistical problems complicated the process, we were able to collect
a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that
language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
|
3 |
Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken DataFosteri, Iliana January 2023 (has links)
Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance.
|
Page generated in 0.086 seconds