Global ETD Search

1	Detection and localization of cough from audio samples for cough-based COVID-19 detection / Detektion och lokalisering av hosta från ljudprover för hostbaserad COVID-19-upptäckt Krishnamurthy, Deepa January 2021 (has links) Since February 2020, the world is in a COVID-19 pandemic [1]. Researchers around the globe are pitching in to develop a fast reliable, non-invasive testing methodology to solve this problem and one of the key directions of research is to utilize coughs and their corresponding vocal biomarkers for diagnosis of COVID-19. In this thesis, we propose a fast, real-time cough detection pipeline that can be used to detect and localize coughs from audio samples. The core of the pipeline utilizes the yolo-v3 model [2] from vision domain to localize coughs in the audio spectrograms by treating them as objects. This outcome is transformed to localize the boundaries of cough utterances in the input signal. The system to detect coughs from CoughVid dataset [3] is then evaluated. Furthermore, the pipeline is compared with other existing algorithms like tinyyolo-v3 to test for better localization and classification. Average precision(AP@0.5) of yolo-v3 and tinyyolo-v3 model are 0.67 and 0.78 respectively. Based on the AP values, tinyyolo-v3 performs better than yolo-v3 by atleast 10% and based on its computational advantage, its inference time was also found to be 2.4 times faster than yolo-v3 model in our experiments. This work is considered to be novel and significant in detection and localization of cough in an audio stream. In the end, the resulting cough events are used to extract MFCC features from it and classifiers were trained to predict whether a cough has COVID-19 or not. The performance of different classifiers were compared and it was observed that random forest outperformed other models with a precision of 83.04%. It can also be inferred from the results that the classifier looks promising, however, in future this model has to be trained using clinically approved dataset and tested for its reliability in using this model in a clinical setup. / Sedan februari 2020 är världen inne i en COVID-19-pandemi [1]. Forskare runt om i världen satsar på att utveckla en snabb tillförlitlig, icke-invasiv testmetodik för att lösa detta problem och en av de viktigaste forskningsriktningarna är att använda hosta och deras motsvarande vokala biomarkörer för diagnos av COVID-19. I denna avhandling föreslår vi en snabb pipeline för hostdetektering i realtid som kan användas för att upptäcka och lokalisera hosta från ljudprover. Kärnan i rörledningen använder yolo-v3-modellen [2] från syndomänen för att lokalisera hosta i ljudspektrogrammen genom att behandla dem som objekt. Detta resultat transformeras för att lokalisera gränserna för hosta yttranden i insignalen. Systemet för att upptäcka hosta från CoughVid dataset [3] utvärderas sedan. Dessutom jämförs rörledningen med andra befintliga algoritmer som tinyyolo-v3 för att testa för bättre lokalisering och klassificering. Genomsnittlig precision (AP@0.5) för modellen yolo-v3 och tinyyolo-v3 är 0,67 respektive 0,78. Baserat på AP-värdena fungerar tinyyolo-v3 bättre än yolo-v3 med minst 10% och baserat på dess beräkningsfördel befanns dess inferenstid också vara 2,4 gånger snabbare än yolo-v3- modellen i våra experiment. Detta arbete anses vara nytt och viktigt för att upptäcka och lokalisera hosta i en ljudström. I slutändan används de resulterande hosthändel-serna för att extrahera MFCC-funktioner från det och klassificerare utbildades för att förutsäga om en hosta har COVID-19 eller inte. Prestanda för olika klassificerare jämfördes och det observerades att slumpmässig skog överträffade andra modeller med en precision på 83.04%. Av resultaten kan man också dra slutsatsen att klassificeraren ser lovande ut, men i framtiden måste denna modell utbildas med hjälp av kliniskt godkänd dataset och testas med avseende på dess tillförlitlighet vid användning av denna modell i ett kliniskt upplägg. Cough detection and Localization Cough based digital biomarkers COVID- 19 You Only Look Once algorithm Hostdetektering och lokalisering Hostbaserade digitala biomarkörer COVID- 19 You Only Look Once algoritm Computer and Information Sciences Data- och informationsvetenskap
2	Automatic Speech Recognition Model for Swedish using Kaldi Wang, Yihan January 2020 (has links) With the development of intelligent era, speech recognition has been a hottopic. Although many automatic speech recognition(ASR) tools have beenput into the market, a considerable number of them do not support Swedishbecause of its small number. In this project, a Swedish ASR model basedon Hidden Markov Model and Gaussian Mixture Models is established usingKaldi which aims to help ICA Banken complete the classification of aftersalesvoice calls. A variety of model patterns have been explored, whichhave different phoneme combination methods and eigenvalue extraction andprocessing methods. Word Error Rate and Real Time Factor are selectedas evaluation criteria to compare the recognition accuracy and speed ofthe models. As far as large vocabulary continuous speech recognition isconcerned, triphone is much better than monophone. Adding feature transformationwill further improve the speed of accuracy. The combination oflinear discriminant analysis, maximum likelihood linear transformand speakeradaptive training obtains the best performance in this implementation. Fordifferent feature extraction methods, mel-frequency cepstral coefficient ismore conducive to obtain higher accuracy, while perceptual linear predictivetends to improve the overall speed. / Det existerar flera lösningar för automatisk transkribering på marknaden, menen stor del av dem stödjer inte svenska på grund utav det relativt få antalettalare. I det här projektet så skapades automatisk transkribering för svenskamed Hidden Markov models och Gaussian mixture models genom att användaKaldi. Detta för att kunna möjliggöra för ICABanken att klassificera samtal tillsin kundtjänst. En mängd av modellvariationer med olika fonemkombinationsmetoder,egenvärdesberäkning och databearbetningsmetoder har utforskats.Word error rate och real time factor är valda som utvärderingskriterier föratt jämföra precisionen och hastigheten mellan modellerna. När det kommertill kontinuerlig transkribering för ett stort ordförråd så resulterar triphonei mycket bättre prestanda än monophone. Med hjälp utav transformationerså förbättras både precisionen och hastigheten. Kombinationen av lineardiscriminatn analysis, maximum likelihood linear transformering och speakeradaptive träning resulterar i den bästa prestandan i denna implementation.För olika egenskapsextraktioner så bidrar mel-frequency cepstral koefficiententill en bättre precision medan perceptual linear predictive tenderar att ökahastigheten. Speech recognition Kaldi Mel-frequency cepstral coefficient Perceptual linear predictive Speaker adaptive training Weight Finite State Transducers Taligenkänning Kaldi Cefstralskoefficient för Mel-Frekvens Perceptuell linjär prediktiv Uppladdning av högtalaren Viktfinitomvandlare Elektroteknik och elektronik
3	Channel Modeling Applied to Robust Automatic Speech Recognition Sklar, Alexander Gabriel 01 January 2007 (has links) In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.

1

Page generated in 0.0769 seconds