• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 5
  • 1
  • Tagged with
  • 16
  • 16
  • 16
  • 9
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Applying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearing

Weigel, Carla January 2021 (has links)
This project discusses the importance of accurate note-taking for D/deaf and hard of hearing students who have accomodation requirements and offers innovative opportunities to improve the student experience in order to encourage more D/deaf and hard of hearing individuals to persue academia. It also includes a linguistic analysis of speech singals that correspond to transcription output errors produced by speech-to-text programs, which can be utilized to advance and improve speech recognition systems. / In hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, speech-to-text has been suggested to address notetaking issues. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers in academic contexts. Observations regarding functionality and error analysis are detailed in this thesis. This project has several objectives, including: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs lacked accurate utterance breaks and contained poor punctuation. The captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers. An analysis of errors showed that some errors are less severe than others; in response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors including: assimilation, vowel approximation, epenthesis, phoneme reduction, and overall intelligibility. Both programs worked best with intelligible speech, as measured by human perception. Speech rate trends were surprising: Otter seemed to prefer fast speech from native English speakers and Ava preferred, as expected, slow speech, but results differed between scripted and spontaneous speech. Correlations of accuracy and fundamental frequencies showed conflicting results. Some reasons for errors could not be determined without knowing more about how the systems were programed. / Thesis / Master of Science (MSc) / In hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, automatic captioning has been suggested to address notetaking issues. Captioning programs use speech recognition (SR) technology to caption lectures in real-time and produce a transcript afterwards. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers. Observations regarding functionality and error analysis are detailed in this thesis. The project has several objectives: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some types of errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs contain poor punctuation and lack breaks between thoughts. Captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers; and an analysis of errors showed that some errors are less severe than others. In response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors. Both programs worked best with intelligible speech; One seemed to prefer fast speech from native English speakers and the other preferred slow speech; a preference of male or female voices showed conflicting results. Some reasons for errors could not be determined, as one would have to observe how the systems were programed.
12

Improving accuracy of speech recognition for low resource accents : Testing the performance of fine-tuned Wav2vec2 models on accented Swedish / Förbättrad taligenkänning för lågresurs-brytningar : Testning av prestandan för finjusterade Wav2vec2-modeller på bryten svenska

Dabiri, Arash January 2023 (has links)
While the field of speech recognition has recently advanced quickly, even the highest performing models struggle with accents. There are several methods of improving the performance on accents, but many are hard to implement or need high amounts of data and are therefore costly to implement. Therefore, examining the performance of the Wav2vec2 architecture, which previously has performed well on small amounts of labeled data, becomes relevant. Using a model trained in Swedish, this thesis fine-tunes the model on small datasets of three Swedish accents, to create both accent-dependent specialized models as well as an accent-independent general model. The specialized models perform better than the original model, and the general model performs approximately as well as each specialized model without sacrificing performance on non-accented Swedish. This means that the Wav2vec2 framework offers a low cost method of improving speech recognition that can be used to improve private and public services for larger parts of the population. / Trots att området för taligenkänning nyligen har avancerat snabbt, presterar även de bästa modellerna sämre vid språk med utländsk brytning. Det finns flera metoder för att förbättra prestandan på accenter, men många är komplexa eller behöver stora mängder data och är därför dyra att implementera. Därför blir det relevant att undersöka prestandan för Wav2vec2-arkitekturen, som tidigare har presterat väl med små mängder märkt träningsdata. En modell tränad i svenska finjusteras i denna avhandling på tre små datamängder bestående av olika svenska brytningar, för att skapa både brytningsberoende specialiserade modeller såväl som en brytningsoberoende generell modell. De specialiserade modellerna presterar bättre än originalmodellen, och den allmänna modellen presterar ungefär lika bra som varje specialiserad modell utan att ge avkall på prestanda på ickebruten svenska. Detta innebär att ramverket Wav2vec2 erbjuder en lågkostnadsmetod för att förbättra taligenkänning som kan användas för att förbättra privata och offentliga tjänster för större delar av befolkningen.
13

Universitetsutbildning med tolk : Döva studenters perspektiv på utbildning i samarbete med teckenspråkstolk / University Interpreting : Deaf Students’ Perspectives on an Education in Collaboration with Sign Language Interpreters

Georgieva, Joanna January 2023 (has links)
I takt med att döva personer i Sverige får ökad tillgång till alla områden i samhället börjar också antalet döva högskolestudenter på olika lärosäten runtom i landet att öka. Denna studie undersöker döva universitetsstudenters upplevelser av en utbildning i samarbete med teckenspråks- eller skrivtolk och har som mål att utöka kunskapen kring detta relativt outforskade område. I detta syfte utfördes sju semistrukturerade intervjuer med döva studenter vid lärosäten i Sverige där de fick svara på frågor om bland annat sin tillgång till tolkning, sin upplevelse av tolkningen och sin känsla av inkludering i klassen. Av resultaten framgår att studenterna utnyttjar möjligheten till tolkning på ett sätt som inte alltid reflekterar deras upplevda behov av det. Studenterna ställer sig övervägande positivt till tolkarnas arbete på högskolenivå. Det framgår även att studenterna i olika grad känner sig inkluderade i sin klass, både vad gäller relationer till lärare och till andra hörande studenter. Faktorer som kan ha påverkat studenternas intervjusvar är deras upplevda hörsel, nivån på deras nuvarande utbildning och dövmedvetenheten hos de lärare de mötts av. Då studien fokuserat på många aspekter av studenternas utbildning kan det inte dras djupgående slutsatser kring varje tema som framkommit. / As Deaf people in Sweden receive increased access to all areas of society, the number of Deaf students enrolled in higher education around the country increases. This study examines the experiences of Deaf university students regarding their education in collaboration with a sign language or speech-to-text interpreter and aims to expand the knowledge around this relatively unexplored field. For this purpose, seven semi-structured interviews were held with Deaf students at universities around Sweden, in which they were asked to answer questions regarding, among other things, their access to interpreters, impression of the interpreting received and feeling of inclusion in their class. The results show that the students do not always make use of interpreters in a way that reflects their perceived need for them. The students are generally positive regarding the interpreters’ work in a higher educational setting. It is also shown that the students feel included in their classes to varying degrees, both in regard to their relationships with teachers, and with other hearing students. Factors that may have affected the students’ responses are their perceived hearing, the level of their current education and the Deaf awareness of their lecturers. As the study has focused on many different aspects of the students’ education, no in-depth conclusions can be drawn regarding each of the subject matters raised.
14

A Comparative Analysis of Whisper and VoxRex on Swedish Speech Data

Fredriksson, Max, Ramsay Veljanovska, Elise January 2024 (has links)
With the constant development of more advanced speech recognition models, the need to determine which models are better in specific areas and for specific purposes becomes increasingly crucial. Even more so for low-resource languages such as Swedish, dependent on the progress of models for the large international languages. Lagerlöf (2022) conducted a comparative analysis between Google’s speech-to-text model and NLoS’s VoxRex B, concluding that VoxRex was the best for Swedish audio. Since then, OpenAI released their Automatic Speech Recognition model Whisper, prompting a reassessment of the preferred choice for transcribing Swedish. In this comparative analysis using data from Swedish radio news segments, Whisper performs better than VoxRex in tests on the raw output, highly affected by more proficient sentence constructions. It is not possible to conclude which model is better regarding pure word prediction. However, the results favor VoxRex, displaying a lower variability, meaning that even though Whisper can predict full text better, the decision of what model to use should be determined by the user’s needs.
15

Röststyrning i industriella miljöer : En undersökning av ordfelsfrekvens för olika kombinationer mellan modellarkitekturer, kommandon och brusreduceringstekniker / Voice command in industrial environments : An investigation of Word Error Rate for different combinations of model architectures, commands and noise reduction techniques

Eriksson, Ulrika, Hultström, Vilma January 2024 (has links)
Röststyrning som användargränssnitt kan erbjuda flera fördelar jämfört med mer traditionella styrmetoder. Det saknas dock färdiga lösningar för specifika industriella miljöer, vilka ställer särskilda krav på att korta kommandon tolkas korrekt i olika grad av buller och med begränsad eller ingen internetuppkoppling. Detta arbete ämnade undersöka potentialen för röststyrning i industriella miljöer. Ett koncepttest genomfördes där ordfelsfrekvens (på engelska Word Error Rate eller kortare WER) användes för att utvärdera träffsäkerheten för olika kombinationer av taligenkänningsarkitekturer, brusreduceringstekniker samt kommandolängder i verkliga bullriga miljöer. Undersökningen tog dessutom hänsyn till Lombard-effekten.  Resultaten visar att det för samtliga testade miljöer finns god potential för röststyrning med avseende på träffsäkerheten. Framför allt visade DeepSpeech, en djupinlärd taligenkänningsmodell med rekurrent lagerstruktur, kompletterad med domänspecifika språkmodeller och en riktad kardioid-mikrofon en ordfelsfrekvens på noll procent i vissa scenarier och sällan över fem procent. Resultaten visar även att utformningen av kommandon påverkar ordfelsfrekvensen.  För en verklig implementation i industriell miljö behövs ytterligare studier om säkerhetslösningar, inkluderande autentisering och hantering av risker med falskt positivt tolkade kommandon. / Voice command as a user interface can offer several advantages over more traditional control methods. However, there is a lack of ready-made solutions for specific industrial environments, which place particular demands on short commands being interpreted correctly in varying degrees of noise and with limited or no internet connection. This work aimed to investigate the potential for voice command in industrial environments. A proof of concept was conducted where Word Error Rate (WER) was used to evaluate the accuracy of various combinations of speech recognition architectures, noise reduction techniques, and command lengths in authentic noisy environments. The investigation also took into account the Lombard effect.  The results indicate that for all tested environments there is good potential for voice command with regard to accuracy. In particular, DeepSpeech, a deep-learned speech recognition model with recurrent layer structure, complemented with domain-specific language models and a directional cardioid microphone, showed WER values of zero percent in certain scenarios and rarely above five percent. The results also demonstrate that the design of commands influences WER. For a real implementation in an industrial environment, further studies are needed on security solutions, including authentication and management of risks with false positive interpreted commands.
16

Att skriva eller att tala in text? Likheter och skillnader i textkvalitet och textlängd med och utan tal-till-text-teknik / Similarities and differences in students' text quality and text length when typing with keyboard compared to when using speech-to-text technology.

Treml, Felicia, Claesson, Pontus January 2021 (has links)
Att kunna uttrycka sig skriftligt är en förutsättning för delaktighet i samhället och att kunna utbilda sig inför yrkeslivet. Forskning visar att kompensatoriska hjälpmedel i form av assisterande teknik för individer med läs- och skrivsvårigheter är särskilt viktigt i inlärningssammanhang. Denna studie undersökte likheter och skillnader i elevers textkvalitet och textlängd vid skrivande med tangentbord jämfört med användning av assisterande teknik i form av tal-till-text-program. I studien deltog 41 svenska mellanstadieelever. Resultaten visade att användning av taligenkänningsprogram, varigenom elever får producera text genom att tala istället för att skriva med tangentbord, genererar både längre texter och texter av högre kvalitet. Tal-till-text-program sparade också tid jämfört med skrivande med tangentbord. Utifrån dessa resultat så kan taligenkänningsteknik medföra pedagogiska fördelar. Resultaten diskuteras utifrån tidigare forskning och metodologiska begränsningar. Mer forskning behövs bland annat i syfte att förstå hur långsiktig användning av assisterande teknik kan påverka elevers skrivförmåga. / Being able to express yourself in writing is a prerequisite for academic success and participation in society. Research shows that compensatory aids in the form of assistive technologies for individuals with reading and writing difficulties are particularly important in learning contexts. This study examined similarities and differences in students’ text quality and text length when typing with keyboard compared to when using a particular type of assistive technology in the form of a speech-to-text program. The study comprised of 41 Swedish middle school pupils. The results showed that using speech recognition software, whereby students are allowed to produce text by speaking instead of typing, generates both longer texts and higher-quality texts. Speech-to-text programs were also significantly more time efficient. Based on these results, speech recognition technology can bring educational benefits. The results are discussed based on previous research and methodological limitations. More research is needed, among other things, in order to understand how long-term use of assistant technology can affect students’ writing ability.

Page generated in 0.0412 seconds