Spelling suggestions: "subject:"forced alignment"" "subject:"enforced alignment""
1 |
Automatisk metod för läs-screening i lågstadiet / Automated screening method for reading difficulties in lower schoolLindmark, Ada, Vos, Christian January 2020 (has links)
En av de mest centrala delarna av undervisningen i lågstadiet fokuserar på svenskämnet och speciellt elevers läsförmåga. Trots obligatoriska screeningmoment och nationella bedömningsstöd uppfattar personal inom skola att kartläggning av läskunskaper är tidskrävande, komplext och subjektivt till följd av dess manuella format. Denna studie undersöker hur en automatiserad läs-screening kan implementeras som ett kompletterande verktyg i lågstadiet genom forced alignment. Syftet är att fastställa om ett program är tillräckligt pålitligt för att underlätta screeningprocessen och tidigare kunna identifiera elever med lässvårigheter. Genom intervjuer med personer som jobbar inom skola och framtagandet av en prototyp har resultaten analyserats. Studiens slutsats blev att det finns fördelar med att komplettera den manuella läs-screeningen med ett automatiskt verktyg, men att det inte går att dra några slutsatser om huruvida verktyget går att implementera i samtliga skolor med positiva effekter. Det kan vara mer fördelaktigt att använda sig av ett automatiskt verktyg i mellanstadiet, men till följd av låg svarsfrekvens vid intervjuer kan detta inte fastställas. / One of the most central parts of the education in lower school is Swedish and especially reading skills. Even though there are mandatory screening moments and national evaluation support, employees in schools think of the screenings as a timeconsuming, complex, and subjective process due to their manual format. This study investigates how an automatized reading screening can be implemented in lower school as a complemental tool by forced alignment. The purpose is to confirm whether a program is reliable enough to make it easier to perform reading screening and earlier identify pupils with reading difficulties. By interviewing employees in schools and creating a prototype, the results have been analyzed. The study concluded was that there are advantages of having an automatic complementary tool in the reading screening process. Still, there was not possible to make any conclusions about whether the tool can be implemented in all schools with positive effects. It could be more beneficial to use an automatic tool in middle school, but due to low answering frequency, this cannot be established.
|
2 |
Automatic phonological transcription using forced alignment : FAVE toolkit performance on four non-standard varieties of EnglishSella, Valeria January 2018 (has links)
Forced alignment, a speech recognition software performing semi-automatic phonological transcription, constitutes a methodological revolution in the recent history of linguistic research. Its use is progressively becoming the norm in research fields such as sociophonetics, but its general performance and range of applications have been relatively understudied. This thesis investigates the performance and portability of the Forced Alignment and Vowel Extraction program suite (FAVE), an aligner that was trained on, and designed to study, American English. It was decided to test FAVE on four non-American varieties of English (Scottish, Irish, Australian and Indian English) and a control variety (General American). First, the performance of FAVE was compared with human annotators, and then it was tested on three potentially problematic variables: /p, t, k/ realization, rhotic consonants and /l/. Although FAVE was found to perform significantly differently from human annotators on identical datasets, further analysis revealed that the aligner performed quite similarly on the non-standard varieties and the control variety, suggesting that the difference in accuracy does not constitute a major drawback to its extended usage. The study discusses the implications of the findings in relation to doubts expressed about the usage of such technology and argues for a wider implementation of forced alignment tools such as FAVE in sociophonetic research.
|
3 |
Forced alignment pomocí neuronových sítí / Forced Alignment via Neural NetworksBeňovič, Marek January 2020 (has links)
Watching videos with subtitles in the original language is one of the most effective ways of learning a foreign language. Highlighting words at the moment they are pronounced helps to synchronize visual and auditory perception and increases learning efficiency. The method for aligning orthographic transcriptions to audio recordings is known as forced alignment. This work implements a tool for aligning transcript of YouTube videos with the speech in their audio recording, providing a web user interface with video player presenting the results. It integrates two state-of-the-art forced aligners based on Kaldi, first using standard HMM approach, second based on neural networks and compares their accuracy. Integrated aligners also provide a phone level alignment, which can be used for training statistical models in further speech recognition research. Work describes implementation and architectural concepts the tool is based on, which can be used in various software projects. 1
|
4 |
Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual TranslationsDella Corte, Giuseppe January 2020 (has links)
The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hours of English audio-book recordings and align them with their Italian textual translations, using exclusively public domain resources gathered semi-automatically from the web. The pipeline consists in three main areas: text collection, bilingual text alignment, and forced alignment. For the text collection task, we show how to automatically find e-book titles in a target language by using machine translation, web information retrieval, and named entity recognition and translation techniques. For the bilingual text alignment task, we investigated three methods: the Gale–Church algorithm in conjunction with a small-size hand-crafted bilingual dictionary, the Gale–Church algorithm in conjunction with a bigger bilingual dictionary automatically inferred through statistical machine translation, and bilingual text alignment by computing the vector similarity of multilingual embeddings of concatenation of consecutive sentences. Our findings seem to indicate that the consecutive-sentence-embeddings similarity computation approach manages to improve the alignment of difficult sentences by indirectly performing sentence re-segmentation. For the forced alignment task, we give a theoretical overview of the preferred method depending on the properties of the text to be aligned with the audio, suggesting and using a TTS-DTW (text-to-speech and dynamic time warping) based approach in our pipeline. The result of our experiments is a publicly available multi-modal corpus composed of about 130 hours of English speech aligned with its Italian textual translation and split in 60561 triplets of English audio, English transcript, and Italian textual translation. We also post-processed the corpus so as to extract 40-MFCCs features from the audio segments and released them as a data-set.
|
5 |
Étude de la réduction segmentale en français parlé à travers différents styles : apports des grands corpus et du traitement automatique de la parole à l’étude du schwa, du /ʁ/ et des réductions à segments multiples / Segmental reduction in spoken French through different speech styles : contributions of large speech corpora and automatic speech processing on schwa, /ʁ/ and reduction of multiple segmentsWu, Yaru 14 September 2018 (has links)
Ce travail sur la réduction segmentale (i.e. délétion ou réduction temporelle) en français spontané nous a permis non seulement de proposer deux méthodes de recherche pour les études en linguistique, mais également de nous interroger sur l'influence de différents facteurs de variation sur divers phénomènes de réduction et d'apporter des connaissances sur la propension à la réduction des segments. Nous avons appliqué la méthode descendante qui utilise l'alignement forcé avec variantes lorsqu’il s’agissait de phénomènes de réduction spécifiques. Lorsque ce n'était pas le cas, nous avons utilisé la méthode ascendante qui examine des segments absents et courts. Trois phénomènes de réduction ont été choisis : l'élision du schwa, la chute du /ʁ/ et la propension à la réduction des segments. La méthode descendante a été utilisée pour les deux premiers. Les facteurs en commun étudiés sont le contexte post-lexical, le style, le sexe et la profession. L’élision du schwa en syllabe initiale de mots polysyllabiques et la chute du /ʁ/ post-consonantique en finale de mots ne sont pas toujours influencées par les mêmes facteurs. De même, l’élision du schwa lexical et celle du schwa épenthétique ne sont pas conditionnées par les mêmes facteurs. L’étude sur la propension à la réduction des segments nous a permis d'appliquer la méthode ascendante et d’étudier la réduction des segments de manière générale. Les résultats suggèrent que les liquides et les glides résistent moins à la réduction que les autres consonnes et que les voyelles nasales résistent mieux à la réduction que les voyelles orales. Parmi les voyelles orales, les voyelles hautes arrondies ont tendance à être plus souvent réduites que les autres voyelles orales. / This study on segmental reduction (i.e. deletion or temporal reduction) in spontaneous French allows us to propose two research methods for linguistic studies on large corpora, to investigate different factors of variation and to bring new insights on the propensity of segmental reduction. We applied the descendant method using forced alignment with variants when it concerns a specific reduction phenomena. Otherwise, we used the ascendant method using absent and short segments as indicators. Three reduction phenomena are studied: schwa elision, /ʁ/ deletion and the propensity of segmental reduction. The descendant method was used for analyzing schwa elision and /ʁ/ deletion. Common factors used for the two studies are post-lexical context, speech style, sex and profession. Schwas elision at initial syllable position in polysyllabic words and post-consonantal /ʁ/ deletion at word final position are not always conditioned by the same variation factors. Similarly, lexical schwa and epenthetic schwa are not under the influence of the same variation factors. The study on the propensity of segmental reduction allows us to apply the ascendant method and to investigate segmental reduction in general. Results suggest that liquids and glides resist less the reduction procedure than other consonants and nasal vowels resist better reduction procedure than oral vowels. Among oral vowels, high rounded vowels tend to be reduced more often than other oral vowels.
|
6 |
Automatic Annotation of Speech: Exploring Boundaries within Forced Alignment for Swedish and Norwegian / Automatisk Anteckning av Tal: Utforskning av Gränser inom Forced Alignment för Svenska och NorskaBiczysko, Klaudia January 2022 (has links)
In Automatic Speech Recognition, there is an extensive need for time-aligned data. Manual speech segmentation has been shown to be more laborious than manual transcription, especially when dealing with tens of hours of speech. Forced alignment is a technique for matching a signal with its orthographic transcription with respect to the duration of linguistic units. Most forced aligners, however, are language-dependent and trained on English data, whereas under-resourced languages lack the resources to develop an acoustic model required for an aligner, as well as manually aligned data. An alternative solution to the training of new models can be cross-language forced alignment, in which an aligner trained on one language is used for aligning data in another language. This thesis aimed to evaluate state-of-the-art forced alignment algorithms available for Swedish and test whether a Swedish model could be applied for aligning Norwegian. Three approaches for forced aligners were employed: (1) one forced aligner based on Dynamic Time Warping and text-to-speech synthesis Aeneas, (2) two forced aligners based on Hidden Markov Models, namely the Munich AUtomatic Segmentation System (WebMAUS) and the Montreal Forced Aligner (MFA) and (3) Connectionist Temporal Classification (CTC) segmentation algorithm with two pre-trained and fine-tuned Wav2Vec2 Swedish models. First, small speech test sets for Norwegian and Swedish, covering different types of spontaneousness in the speech, were created and manually aligned to create gold-standard alignments. Second, the performance of the Swedish dataset was evaluated with respect to the gold standard. Finally, it was tested whether Swedish forced aligners could be applied for aligning Norwegian data. The performance of the aligners was assessed by measuring the difference between the boundaries set in the gold standard from that of the comparison alignment. The accuracy was estimated by calculating the proportion of alignments below a particular threshold proposed in the literature. It was found that the performance of the CTC segmentation algorithm with Wav2Vec2 (VoxRex) was superior to other forced alignment systems. The differences between the alignments of two Wav2Vec2 models suggest that the training data may have a larger influence on the alignments, than the architecture of the algorithm. In lower thresholds, the traditional HMM approach outperformed the deep learning models. Finally, findings from the thesis have demonstrated promising results for cross-language forced alignment using Swedish models to align related languages, such as Norwegian.
|
Page generated in 0.0772 seconds