Global ETD Search

1	A Swedish wav2vec versus Google speech-to-text Lagerlöf, Ester January 2022 (has links) As the automatic speech recognition technology is becoming more advanced, the possibilities of in which fields it can operate are growing. The best automatic speech recognition technologies today are mainly based on - and made for - the English language. However, the national library of Sweden recently released open-source wav2vec models purposefully with the Swedish language in mind. With the interest of investigating their performance, one of their models is chosen to assess how well they transcribe the Swedish news broadcasts ”kvart-i-fem”-ekot, comparing its results with Google speech-to-text. The results present wav2vec as the prominent model for this type of audio data, securing a word error rate average that is 9 percentage points less than Google-speech-to-text. A part of this performance could be attributed to the self-supervising method the wav2vec model uses to access large amounts of unlabeled data in its training. In spite of this, both models displayed difficulty with transcribing audio that has poor quality such as disturbing background noise and stationary sounds. Words like abbreviations and names was also difficult for them both to correctly transcribe. Google speech-to-text did however perform better than the wav2vec model on this part. ASR automatic speech recognition speech-to-text wav2vec Google speech-to-text model comparison Probability Theory and Statistics Sannolikhetsteori och statistik
2	Using cloud services and machine learning to improve customer support : Study the applicability of the method on voice data Spens, Henrik, Lindgren, Johan January 2018 (has links) This project investigated how machine learning could be used to classify voice calls in a customer support setting. A set of a few hundred labeled voice calls were recorded and used as data. The calls were transcribed to text using a speech-to-text cloud service. This text was then normalized and used to train models able to predict new voice calls. Different algorithms were used to build the models, including support vector machines and neural networks. The optimal model, found by extensive parameter search, was found to be a support vector machine. Using this optimal model a program that can classify live voice calls was made. speech-to-text machine learning natural language processing
3	Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法 Ueno, Sei 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制\|\|情\|\|133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授西野恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Speech Recognition Data Augmentation Domain Adaptation Text-to-Speech Speech-to-Text 007
4	Evaluating Speech-to-Text Systems and AR-glasses : A study to develop a potential assistive device for people with hearing impairments Eksvärd, Siri, Falk, Julia January 2021 (has links) Suffering from a hearing impairment or deafness has major consequences on the individual's social life. Today, there exist various aids, but there are some challenges with these, like availability, reliability and high cognitive load when the user trying to focus on both the aid and the surrounding context. To overcome these challenges, one potential solution could make use of a combination of Augmented Reality (AR) and speech-to-text systems, where speech is converted into text that is then presented in AR glasses. However, in AR, one crucial problem is the legibility and readability of text under different environmental conditions. Moreover, different types of AR-glasses have different usage characteristics, which implies that a certain type of glasses might be more suitable for the proposed system than others. For speech-to-text systems, it is necessary to consider factors such as accuracy, latency and robustness when used in different acoustic environments and with different speech audio. In this master thesis, two different AR-glasses are being evaluated based on the different characteristics of the glasses, such as optical, visual and ergonomic. Moreover, user tests are conducted with 23 normal hearing individuals to evaluate the legibility and readability of text under different environmental contexts. Due to the pandemic, it was not possible to conduct the tests with hearing impaired individuals. Finally, a literature review is performed on speech-to-text systems available on the Swedish market. The results indicate that the legibility and readability are affected by several factors, such as ambient illuminance, background properties and also how the text is presented with respect to polarity, opacity, size and number of lines. Moreover, the characteristics of the glasses impact the user experience, but which glasses are preferable depends on the individual's preferences. For the choice of a speech-to-text system, four speech-to-text APIs available on the Swedish market were identified. Based on our research, Google Cloud Speech API is recommended for the proposed system. However, a more extensive evaluation of these systems would be required to determine this. / Speech-to-Text System using Augmented Reality for People with Hearing Deficits Augmented Reality Legibility Readability Speech-to-Text User experience Head-mounted displays Assistive device Hearing Impairments Engineering and Technology Teknik och teknologier
5	ENHANCING ELECTRONIC HEALTH RECORDS SYSTEMS AND DIAGNOSTIC DECISION SUPPORT SYSTEMS WITH LARGE LANGUAGE MODELS Furqan Ali Khan (19203916) 26 July 2024 (has links) <p dir="ltr">Within Electronic Health Record (EHR) Systems, physicians face extensive documentation, leading to alarming mental burnout. The disproportionate focus on data entry over direct patient care underscores a critical concern. Integration of Natural Language Processing (NLP) powered EHR systems offers relief by reducing time and effort in record maintenance.</p><p dir="ltr">Our research introduces the Automated Electronic Health Record System, which not only transcribes dialogues but also employs advanced clinical text classification. With an accuracy exceeding 98.97%, it saves over 90% of time compared to manual entry, as validated on MIMIC III and MIMIC IV datasets.</p><p dir="ltr">In addition to our system's advancements, we explore integration of Diagnostic Decision Support System (DDSS) leveraging Large Language Models (LLMs) and transformers, aiming to refine healthcare documentation and improve clinical decision-making. We explore the advantages, like enhanced accuracy and contextual understanding, as well as the challenges, including computational demands and biases, of using various LLMs.</p> Natural language processing EHR systems BERT for biomedical data NLP-Natural Language Processing Speech to text text classification model
6	Speech Enabled Navigation in Virtual Environments Rajashekar, Raksha 09 September 2019 (has links) No description available. Computer Engineering Computer Science Visualization Speech Input Application VRUI AWS VTK Google Cloud Speech-to-Text API Speech Navigation Virtual Reality DIVE
7	Teknik för dokumentering avmöten och konferenser / Technology for documenting meetings and conferences Stojanovic, Milan January 2019 (has links) Documentation of meetings and conferences is performed at most companies by one or more people sitting at a computer and typing what has been said during the meeting. This may lead to typing mistakes or incorect perception by the person who records. The human factor is quite large. This work will focus on developing proposals for new technologies that reduce or eliminate the human factor, thus improving the documentation of meetings and conferences. It represents a problem for many companies and institutions, including Seavus Stockholm, where this study is conducted. It is assumed that most of the companies do not document their meetings and conferences in video or audio format, so this study will therefore only be about text-based documentation.The aim of this study was to investigate how to implement new features and build a modern conference system, using modern technologies and new applications to improve the documentation of meetings and conferences. Speech to text in combination with speech recognition is something that has not yet been implemented for such a purpose, and it can facilitate documenting meetings and conferences.To complete the study, several methods were combined to achieve the desired goals. First, the projects scope and objectives were defined. Then, based on analysis of the observations made in the company documenting process, a design proposal was created. Following this, interviews with the stakeholders were conducted where the proposals were presented and a requirement specification was created. Then the theory was studied to create an understanding of how different techniques work to then design and create a proposal for the architecture.The result of this study contains a proposal for architecture that shows that it is possible to implement these techniques to improve the documentation process. Furthermore, possible use cases and interaction diagrams are presented that show how the system may work.Although the proof of the concept is considered to be satisfactory, additional work and testing is needed to fully implement and integrate the concept into reality. / Dokumentering av möten och konferenser utförs på de flesta företag av en eller flera personer som sitter vid en dator och antecknar det som har sagts under mötet. Det kan medföra att det som skrivs ner inte stämmer med det som har sagts eller att det uppfattades felaktigt av personen som antecknar. Den mänskliga faktorn är ganska stor. Detta arbete kommer att fokusera på att ta fram förslag på nya tekniker som minskar eller eliminerar den mänskliga faktorn, och därmed förbättrar dokumenteringen av möten och konferenser. Det föreställer ett problem för många företag och institutioner, däribland för Seavus Stockholm, där denna studie utförs. Det antas att de flesta företag inte dokumenterar deras möten och konferenser i video eller ljudformat, och därmed kommer denna studie bara att handla om dokumentering i textformat.Målet med denna studie var att undersöka hur man, med hjälp av moderna tekniker och nya tillämpningar, kan implementera nya funktioner och bygga ett modernt konferenssystem, för att förbättra dokumenteringen av möten och konferenser. Tal till text i kombination med talarigenkänning är något som ännu inte har implementerats för ett sådant ändamål, och det kan underlätta dokumenteringen av möten och konferenser.För att slutföra studien kombinerades flera metoder för att uppnå de önskade målen.Först definierades projektens omfattning och mål. Därefter, baserat på analys och observationer av företagets dokumenteringsprocess, skapades ett designförslag. Därefter genomfördes intervjuer med intressenterna där förslagen presenterades och en kravspecifikation skapades. Då studerades teorin för att skapa förståelse för hur olika tekniker arbetar, för att sedan designa och skapa ett förslag till arkitekturen.Resultatet av denna studie innehåller ett förslag till arkitektur, som visar att det är möjligt att implementera dessa tekniker för att förbättra dokumentationsprocessen. Dessutom presenteras möjliga användningsfall och interaktionsdiagram som visar hur systemet kan fungera.Även om beviset av konceptet anses vara tillfredsställande, ytterligare arbete och test behövs för att fullt ut implementera och integrera konceptet i verkligheten. Speech-to-text Speaker recognition Software development Architechture Design Conference tool Tal-till-text Talarigenkänning Systemutveckling Arkitektur Design Konferensverktyg Computer and Information Sciences Data- och informationsvetenskap
8	Ovládání kooperativních robotů hlasem / Voice control of cooperative robots Bubla, Lukáš January 2021 (has links) The aim of the diploma thesis was to create a program with which it will be possible to control a collaborative robot by voice. First chapters contain a search of the current state in the field of collaborative robotics in terms of safety, work efficiency, robot programming and communication with the robot. Furthermore, the issue of machine processing of the human voice is discussed. In practical part was proposed an experiment in which we work with off-line simulation of UR3 robot in PolyScope 3.15.0 software. This simulation was linked to a Python program which uses SpeechRecognition and urx libraries. Simple voice instructions have been designed to move robot to defined position.
9	Tal till text för relevant metadatataggning av ljudarkiv hos Sveriges Radio / Speech to text for relevant metadata tagging of audio archive at Sveriges Radio Jansson, Annika January 2015 (has links) Tal till text för relevant metadatataggning av ljudarkiv hos Sveriges Radio Sammanfattning Under åren 2009-2013 har Sveriges Radio digitaliserat sitt programarkiv. Sveriges Radios ambition är att mer material från de 175 000 timmar radio som sänds varje år ska arkiveras. Det är en relativt tidsödande process att göra allt material sökbart och det är långt ifrån säkert att kvaliteten på dessa data är lika hög hos alla objekt. Frågeställningen som har behandlats för detta examensarbete är: Vilka tekniska lösningar finns för att utveckla ett system åt Sveriges Radio för automatisk igenkänning av svenskt tal till text utifrån deras ljudarkiv? System inom tal till text har analyserats och undersökts för att ge Sveriges Radio en aktuell sammanställning inom området. Intervjuer med andra liknande organisationer som arbetar inom området har utförts för att se hur långt de har kommit i sin utveckling av det berörda ämnet. En litteraturstudie har genomförts på de senare forskningsrapporterna inom taligenkänning för att jämföra vilket system som skulle passa Sveriges Radio behov och krav bäst att gå vidare med. Det Sveriges Radio bör koncentrera sig på först för att kunna bygga en ASR, Automatic Speech Recognition, är att transkribera sitt ljudmaterial. Där finns det tre alternativ, antingen transkribera själva genom att välja ut ett antal program med olika inriktning för att få en så stor bredd som möjligt på innehållet, gärna med olika talare för att sedan även kunna utveckla vidare för igenkänning av talare. Enklaste sättet är att låta olika yrkeskategorier som lägger in inslagen/programmen i systemet göra det. Andra alternativet är att starta ett liknade projekt som BBC har gjort och ta hjälp av allmänheten. Tredje alternativet är att köpa tjänsten för transkribering. Mitt råd är att fortsätta utvärdera systemet Kaldi, eftersom det har utvecklats mycket på senaste tiden och verkar vara relativt lätt att utvidga. Även den öppna källkod som Lingsoft använder sig av är intressant att studera vidare. / Speech to text for relevant metadata tagging of audio archive at Sveriges Radio Abstract In the years 2009-2013, Sveriges Radio digitized its program archive. Sveriges Radio's ambition is that more material from the 175 000 hours of radio they broadcast every year should be archived. This is a relatively time-consuming process to make all materials to be searchable and it's far from certain that the quality of the data is equally high on all items. The issue that has been treated for this thesis is: What opportunities exist to develop a system to Sveriges Radio for Swedish speech to text? Systems for speech to text has been analyzed and examined to give Sveriges Radio a current overview in this subject. Interviews with other similar organizations working in the field have been performed to see how far they have come in their development of the concerned subject. A literature study has been conducted on the recent research reports in speech recognition to compare which system would match Sveriges Radio's needs and requirements best to get on with. What Sveriges Radio should concentrate at first, in order to build an ASR, Automatic Speech Recognition, is to transcribe their audio material. Where there are three alternatives, either transcribe themselves by selecting a number of programs with different orientations to get such a large width as possible on the content, preferably with different speakers and then also be able to develop further recognition of the speaker. The easiest way is to let different professions who make the features/programs in the system do it. Other option is to start a similar project that the BBC has done and take help of the public. The third option is to buy the service for transcription. My advice is to continue evaluate the Kaldi system, because it has evolved significantly in recent years and seems to be relatively easy to extend. Also the open-source that Lingsoft uses is interesting to study further. Speech to text ASR Automatic Speech Recognition transcription metadata taggning Swedish Radio Tal till text ASR Automatisk taligenkänning transkribering metadatataggning Sveriges Radio Computer Engineering Datorteknik Media and Communication Technology Medieteknik
10	The Invention of Access: Speech-to-Text Writing and the Emergent Methodologies of Disability Service Transcription Iwertz, Chad Everett 02 October 2019 (has links) No description available. Composition Technology Technical Communication Rhetoric Multicultural Education composition studies transcription studies disability studies rhetorical invention grounded theory qualitative analysis quantitative analysis caption studies speech-to-text writing methodologies of transcription

Search results