• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 7
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 64
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • 11
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Fault-tolerance and noise and vibration aspects of electrical drives: Application to wind turbines and electrical vehicle traction

Mollet, Yves 06 November 2017 (has links)
The awareness of the human responsibility in global warming has led to various private and public initiatives to reduce the emission of greenhouse gases, up to international level. In this context the development of renewable technologies in two sectors having an important ecological footprint, i.e. production of electricity and transportation, is targeted.In the firstly mentioned sector, the progression of wind energy is at present the most rapid among all renewable energies. But wind turbines still suffer from a global lack of reliability and accessibility compared to classical power plants, leading to potentially important production losses and repair costs. The first part of the present work focuses on the improvement of the electrical chain reliability by the combination of an estimator and a fault-detection algorithm to achieve sensor-fault tolerance, taking benefit from the already available measurement redundancies on doubly-fed-induction-machine (DFIG) drives.Estimators and sensor-fault detection and isolation (FDI) in DFIGs have been the object of many research papers. However, most of them only consider one unique type of measurement and only a few works consider magnetic saturation. A new combination of a closed-loop observer with a cumulative-sum-based FDI technique, considering magnetic saturation and using limited computational resources is proposed here to estimate electromagnetic torque, rotor currents and position for sensor-fault detection and tolerance. This algorithm is then validated in steady state and in case of moderate transients, unbalanced conditions and misestimation of DFIG parameters. The estimator can also start on the fly during the start-up process of the generator.In the transportation sector, new hybrid and full-electric vehicles start to be visible on the roads, but still need important technological improvements in terms of autonomy, performances, but also produced noise and vibrations. The objectives of the second part of this doctoral thesis are related to this last challenge and consist of the experimental investigation of noise, vibration and harshness (NVH) aspects of an 8/6 switched-reluctance machine (SRM) designed for an electrical vehicle (EV).The NVH issues of SRMs, limiting their usage in automotive and other domains, have been the subject of various papers. However, most of them focus on modal analysis or detailed phenomena, while a global evaluation of NVH aspects of SRMs in normal working conditions is rarely made, as well as the use of reproducible sound metrics. A global and relatively fast experimental method to assess the evolution of noise and vibration is proposed. Tests are performed in transient regime, using continuously varying working conditions when possible, for the excitation of a large band of frequencies. The resulting current, radial vibration and acoustic noise are presented as spectrograms for an easy distinction of affected and unaffected frequencies and compared with the associated loudness and sharpness.Furthermore, the implementation of a new faster-sampled current-hysteresis controller has allowed to improve the quality of the control and of the acoustic noise by reducing the current-ripple amplitude and the excitation of resonances. The various tests show that the switching frequency has to be high enough to avoid exciting the ovalization mode of the SRM, but not too high to avoid producing a too sharp noise. The ripple amplitude also has to be considered to limit the loudness. Therefore, soft chopping, or a reduced DC-bus voltage at low speeds, has to be preferred with a relative small hysteresis bandwidth. Finally, the case of an open-phase fault has been investigated showing amplified even current orders in the vibration and acoustic-noise plots. / La prise de conscience de la responsabilité humaine dans le réchauffement climatique est à la source de nombreuses initiatives publiques et privées parfois internationales pour réduire les émissions de gaz à effet de serre. Dans ce contexte, le développement de technologies durables dans deux secteurs à forte empreinte écologique est visé: la production d'énergie électrique et les transports. Dans le premier secteur, la progression de l'éolien est à présent la plus rapide parmi toutes les énergies renouvelables. Cependant, les éoliennes souffrent d'un manque global de fiabilité et d'accessibilité par rapport aux centrales électriques classiques, ce qui conduit potentiellement à des pertes de production et des coûts de réparation importants. La première partie de ce travail se focalise sur l'amélioration de la chaîne électrique en la rendant tolérante aux défauts de capteurs au moyen de la combinaison d'un estimateur et d'un algorithme de détection de défauts, tirant avantage de la redondance de mesures déjà présente sur les entraînements à machines asynchrones à double alimentation (MADA). Les estimateurs et la détection et l'isolation de défauts de capteurs sur les MADA a fait l'objet de nombreuses publications scientifiques. Cependant, la plupart d'entre elles considèrent un seul type de mesure et peu de travaux prennent en compte la saturation magnétique. Une nouvelle combinaison d'un observateur et d'un algorithme de détection de défauts de type ‘CUSUM', considérant la saturation magnétique et nécessitant une puissance de calcul limitée, est proposée dans cette thèse pour l'estimation du couple électromagnétique, des courants et de la position rotoriques en vue d'obtenir la tolérance aux défauts de capteurs. Cet algorithme est validé en régime permanent et cas de transitoires modérés, de tensions du réseau déséquilibrées et d'erreurs d'estimation des paramètres de laMADA. L'estimateur est aussi capable de démarrer seul lors du démarrage de la génératrice. Dans le secteur des transports, des véhicules hybrides et électriques commencent à être visibles sur les routes, malgré que des progrès technologiques importants en termes d'autonomie, de performances, mais aussi de bruits et vibrations soient encore nécessaires pour une utilisation plus intensive. L'objectif de la deuxième partie de cette thèse se rapporte à ce dernier défi et consiste à analyser les aspects acoustiques et vibratoires d'une machine à réluctance variable 8/6 conçue pour propulser un véhicule électrique. Ces problèmes acoustiques et vibratoires, qui limitent notamment l'usage de telles machines dans des applications de propulsion, ont été l'objet de divers articles scientifiques. Cependant, la plupart d'entre eux sont focalisés sur des analyses modales ou de phénomènes particuliers, alors qu'une évaluation globale des problèmes de bruit et de vibration des machines à réluctance variable en conditions normales de fonctionnement est rarement proposée, de même que l'utilisation de critères de qualité sonore. Une méthode expérimentale globale et relativement rapide pour évaluer l'évolution du bruit et des vibrations est proposée dans ce travail. Les essais sont réalisés en régime transitoire pour exciter une large bande de fréquences et en faisant varier continuellement, quand cela est possible, les conditions de fonctionnement. Les courants, vibrations radiales et bruits acoustiques résultants sont présentés sous formes de cartographies couleur pour une distinction aisée des fréquences affectées et non-affectées et comparés aux niveaux calculés de bruyance et d'acuité correspondants. Par ailleurs, la mise en place d'un nouveau régulateur à hystérèse en courant à plus grande fréquence d'échantillonnage a permis d'améliorer la qualité de la commande et du bruit acoustique associé en réduisant l'amplitude des oscillations de courant et l'excitation des fréquences de résonance. Les essais montrent que la fréquence de commutation doit être suffisamment élevée pour éviter l'excitation du mode d'ovalisation de la machine, mais pas trop pour éviter une trop grande acuité du son produit. L'amplitude des oscillations doit aussi être considérée pour limiter la bruyance. En conséquence, une commande en ‘soft chopping', ou une tension réduite du bus continu à basse vitesse, doit être combinée à une bande d'hystérèse relativement faible. Enfin, le cas d'un défaut de phase ouverte a été étudié et a montré une amplification des ordres pairs du courant dans les spectres vibratoires et acoustiques. / De bewustwording van de menselijke verantwoordelijkheid in de opwarming van de aarde heeft tot verschillende private en publieke initiatieven geleid om de uitstoot van broeikasgassen te verminderen. In deze context is de ontwikkeling van hernieuwbare technologieën hoofdzakelijk gericht op twee sectoren met een belangrijke ecologische impact: elektriciteitsproductie en transport.In de eerste sector ontwikkelt windenergie zich op dit moment sneller dan alle andere hernieuwbare energieën. Maar windturbines lijden nog steeds aan een gebrek aan betrouwbaarheid en toegankelijkheid, en dus aan potentieel hogere productieverliezen en herstelkosten, als ze met klassieke krachtcentrales worden vergeleken. In het eerste deel van deze doctoraatsthesis wordt op de verbetering van de elektrische keten geconcentreerd door de combinatie van een schatter en een foutdetectie- en -isolatiealgoritme (FDI-algoritme) om sensorfouttolerantie te verkrijgen dankzij de reeds aanwezige meetovertolligheid op dubbelgevoede inductiemachine (DFIG) aandrijvingen.Schatters en sensor-FDI-algoritmen zijn het onderwerp van vele wetenschappelijke artikelen geweest. Meestal wordt maar één sensortype beschouwd en met de magnetische verzadiging wordt niet vaak rekening gehouden. Een nieuwe combinatie van een schatter met gesloten terugkoppeling en een FDI-techniek gebaseerd op het ‘cumulative-sum' principe is voorgesteld. Zo kan het elektromagnetische koppel, de rotorstromen en positie worden geschat voor sensor FDI en fouttolerantie met beperkte rekenkosten en zonder de magnetische verzadering te verwaarlozen. Het algoritme wordt in stabiele toestand gevalideerd, maar ook in het geval van gematigde transiënte situaties, onevenwichtige netwerkomstandigheden en een verkeerde schatting van DFIG parameters. Het kan ook vanzelf starten tijdens de startprocedure van de generator.In de vervoersector beginnen hybride en elektrische voertuigen op de wegen te rijden. Maar vooreen intensiever gebruik van zo'n wagens zijn er nog technologische verbeteringen nodig met betrekking tot autonomie, prestaties en ook geluid en trillingen (NVH). Het tweede deel van de thesis betreft die laatste uitdaging en bestaat uit het experimentele onderzoek van geluid en trillingen op een 8/6 variabelereluctantiemachine (SRM) ontwikkeld voor elektrische voertuigen.De NVH-problemen van SRM's beperken hun gebruik in automobiele en andere toepassingen enonderzoek wordt erover voortgezet. Vele wetenschappelijke artikelen focussen toch op modale analyse of gedetailleerde fenomenen terwijl een globale evaluatie van NVH aspecten in SRM's in gewone operatiecondities nauwelijks wordt gemaakt. Hetzelfde geldt voor het gebruik van reproduceerbare geluidsmetrieken. Een globale en vrij vlugge experimentele methode is hier voorgesteld om het NVH gedrag te schatten. Testen worden in transiënte situaties uitgevoerd om een brede frequentieband te exciteren, indien mogelijk met voortdurend variërende condities. De gemeten fasestroom, trilling en geluid worden als kleurmappen geplot om het verschil tussen beïnvloede en niet geaffecteerde frequenties te vergemakkelijken en met de berekende akoestische luidheid en scherpte vergeleken.Bovendien heeft de implementatie van een sneller bemonsterd stroomhysteresisregelaar geleid tot een verbetering van de regulatie- en akoestische kwaliteit door de amplitude van de stroomrimpeling en de excitatie van resonantiefrequenties te verminderen. De testresultaten tonen dat de schakelfrequentie voldoende hoog moet zijn om de excitatie van de ovale vervormingsmode te vermijden, maar niet te hoog om de scherpte van het geluid te beperken. De amplitude van de rimpel beïnvloedt ook de luidheid en daarvoor moet in aanmerking worden genomen. Bijgevolg zou ‘soft chopping'mode, of een lagere spanning op de DC-bus bij lage toerentallen, met een relatief klein hysteresisband beter worden gebruikt. Uiteindelijk wordt het geval van een openfasefout bestudeerd en onthult versterkte gelijke frequentievolgorden in de trilling- en geluidplots. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
52

Detektor tempa hudebních nahrávek na bázi neuronové sítě / Tempo detector based on a neural network

Suchánek, Tomáš January 2021 (has links)
This Master’s thesis deals with beat tracking systems, whose functionality is based on neural networks. It describes the structure of these systems and how the signal is processed in their individual blocks. Emphasis is then placed on recurrent and temporal convolutional networks, which by they nature can effectively detect tempo and beats in audio recordings. The selected methods, network architectures and their modifications are then implemented within a comprehensive detection system, which is further tested and evaluated through a cross-validation process on a genre-diverse data-set. The results show that the system, with proposed temporal convolutional network architecture, produces comparable results with foreign publications. For example, within the SMC dataset, it proved to be the most successful, on the contrary, in the case of other datasets it was slightly below the accuracy of state-of-the-art systems. In addition,the proposed network retains low computational complexity despite increased number of internal parameters.
53

Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classification

Frenger, Tobias, Häggmark, Johan January 2020 (has links)
Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training.
54

Wavebender GAN : Deep architecture for high-quality and controllable speech synthesis through interpretable features and exchangeable neural synthesizers / Wavebender GAN : Djup arkitektur för kontrollerbar talsyntes genom tolkningsbara attribut och utbytbara neurala syntessystem

Döhler Beck, Gustavo Teodoro January 2021 (has links)
Modeling humans’ speech is a challenging task that originally required a coalition between phoneticians and speech engineers. Yet, the latter, disengaged from phoneticians, have strived for evermore natural speech synthesis in the absence of an awareness of speech modelling due to data- driven and ever-growing deep learning models. By virtue of decades of detachment between phoneticians and speech engineers, this thesis presents a deep learning architecture, alleged Wavebender GAN, that predicts mel- spectrograms that are processed by a vocoder, HiFi-GAN, to synthesize speech. Wavebender GAN pushes for progress in both speech science and technology, allowing phoneticians to manipulate stimuli and test phonological models supported by high-quality synthesized speeches generated through interpretable low-level signal properties. This work sets a new step of cooperation for phoneticians and speech engineers. / Att modellera mänskligt tal är en utmanande uppgift som ursprungligen krävde en samverkan mellan fonetiker och taltekniker. De senare har dock, utan att vara kopplade till fonetikerna, strävat efter en allt mer naturlig talsyntes i avsaknad av en djup medvetenhet om talmodellering på grund av datadrivna och ständigt växande modeller fördjupinlärning. Med anledning av decennier av distansering mellan fonetiker och taltekniker presenteras i denna avhandling en arkitektur för djupinlärning, som påstås vara Wavebender GAN, som förutsäger mel-spektrogram som tas emot av en vocoder, HiFi-GAN, för att syntetisera tal. Wavebender GAN driver på för framsteg inom både tal vetenskap och teknik, vilket gör det möjligt för fonetiker att manipulera stimulus och testa fonologiska modeller som stöds av högkvalitativa syntetiserade tal som genereras genom tolkningsbara signalegenskaper på lågnivå. Detta arbete inleder en ny era av samarbete för fonetiker och taltekniker.
55

Kan datorer höra fåglar? / Can Computers Hear Birds?

Movin, Andreas, Jilg, Jonathan January 2019 (has links)
Ljudigenkänning möjliggörs genom spektralanalys, som beräknas av den snabba fouriertransformen (FFT), och har under senare år nått stora genombrott i samband med ökningen av datorprestanda och artificiell intelligens. Tekniken är nu allmänt förekommande, i synnerhet inom bioakustik för identifiering av djurarter, en viktig del av miljöövervakning. Det är fortfarande ett växande vetenskapsområde och särskilt igenkänning av fågelsång som återstår som en svårlöst utmaning. Även de främsta algoritmer i området är långt ifrån felfria. I detta kandidatexamensarbete implementerades och utvärderades enkla algoritmer för att para ihop ljud med en ljuddatabas. En filtreringsmetod utvecklades för att urskilja de karaktäristiska frekvenserna vid fem tidsramar som utgjorde basen för jämförelsen och proceduren för ihopparning. Ljuden som användes var förinspelad fågelsång (koltrast, näktergal, kråka och fiskmås) så väl som egeninspelad mänsklig röst (4 unga svenska män). Våra resultat visar att framgångsgraden normalt är 50–70%, den lägsta var fiskmåsen med 30% för en liten databas och den högsta var koltrasten med 90% för en stor databas. Rösterna var svårare för algoritmen att särskilja, men de hade överlag framgångsgrader mellan 50% och 80%. Dock gav en ökning av databasstorleken generellt inte en ökning av framgångsgraden. Sammanfattningsvis visar detta kandidatexamensarbete konceptbeviset bakom fågelsångigenkänning och illustrerar såväl styrkorna som bristerna av dessa enkla algoritmer som har utvecklats. Algoritmerna gav högre framgångsgrad än slumpen (25%) men det finns ändå utrymme för förbättring eftersom algoritmen vilseleddes av ljud av samma frekvenser. Ytterligare studier behövs för att bedöma den utvecklade algoritmens förmåga att identifiera ännu fler fåglar och röster. / Sound recognition is made possible through spectral analysis, computed by the fast Fourier transform (FFT), and has in recent years made major breakthroughs along with the rise of computational power and artificial intelligence. The technology is now used ubiquitously and in particular in the field of bioacoustics for identification of animal species, an important task for wildlife monitoring. It is still a growing field of science and especially the recognition of bird song which remains a hard-solved challenge. Even state-of-the-art algorithms are far from error-free. In this thesis, simple algorithms to match sounds to a sound database were implemented and assessed. A filtering method was developed to pick out characteristic frequencies at five time frames which were the basis for comparison and the matching procedure. The sounds used were pre-recorded bird songs (blackbird, nightingale, crow and seagull) as well as human voices (4 young Swedish males) that we recorded. Our findings show success rates typically at 50–70%, the lowest being the seagull of 30% for a small database and the highest being the blackbird at 90% for a large database. The voices were more difficult for the algorithms to distinguish, but they still had an overall success rate between 50% and 80%. Furthermore, increasing the database size did not improve success rates in general. In conclusion, this thesis shows the proof of concept and illustrates both the strengths as well as short-comings of the simple algorithms developed. The algorithms gave better success rates than pure chance of 25% but there is room for improvement since the algorithms were easily misled by sounds of the same frequencies. Further research will be needed to assess the devised algorithms' ability to identify even more birds and voices.
56

Exploring State-of-the-Art Machine Learning Methods for Quantifying Exercise-induced Muscle Fatigue / Exploring State-of-the-Art Machine Learning Methods for Quantifying Exercise-induced Muscle Fatigue

Afram, Abboud, Sarab Fard Sabet, Danial January 2023 (has links)
Muscle fatigue is a severe problem for elite athletes, and this is due to the long resting times, which can vary. Various mechanisms can cause muscle fatigue which signifies that the specific muscle has reached its maximum force and cannot continue the task. This thesis was about surveying and exploring state-of-the-art methods and systematically, theoretically, and practically testing the applicability and performance of more recent machine learning methods on an existing EMG to muscle fatigue pipeline. Several challenges within the EMG domain exist, such as inadequate data, finding the most suitable model, and how they should be addressed to achieve reliable prediction. This required approaches for addressing these problems by combining and comparing various state-of-the-art methodologies, such as data augmentation techniques for upsampling, spectrogram methods for signal processing, and transfer learning to gain a reliable prediction by various pre-trained CNN models. The approach during this study was to conduct seven experiments consisting of a classification task that aims to predict muscle fatigue in various stages. These stages are divided into 7 classes from 0-6, and higher classes represent a fatigued muscle. In the tabular part of the experiments, the Decision Tree, Random Forest, and Support Vector Machine (SVM) were trained, and the accuracy was determined. A similar approach was made for the spectrogram part, where the signals were converted to spectrogram images, and with a combination of traditional- and intelligent data augmentation techniques, such as noise and DCGAN, the limited dataset was increased. A comparison between the performance of AlexNet, VGG16, DenseNet, and InceptionV3 pre-trained CNN models was made to predict differences in jump heights. The result was evaluated by implementing baseline classifiers on tabular data and pre-trained CNN model classifiers for CWT and STFT spectrograms with and without data augmentation. The evaluation of various state-of-the-art methodologies for a classification problem showed that DenseNet and VGG16 gave a reliable accuracy of 89.8 % on intelligent data augmented CWT images. The intelligent data augmentation applied on CWT images allows the pre-trained CNN models to learn features that can generalize unseen data. Proving that the combination of state-of-the-art methods can be introduced and address the challenges within the EMG domain.
57

Machine Learning for Speech Forensics and Hypersonic Vehicle Applications

Emily R Bartusiak (6630773) 06 December 2022 (has links)
<p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p> <p><br></p> <p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p>
58

Towards detection of user-intended tendon motion with pulsed-wave Doppler ultrasound for assistive hand exoskeleton applications

Stegman, Kelly J. 31 August 2009 (has links)
Current bio-robotic assistive devices have developed into intelligent and dexterous machines. However, the sophistication of these wearable devices still remains limited by the inherent difficulty in controlling them by sensing user-intention. Even the most commonly used sensing method, which detects the electrical activity of skeletal muscles, offer limited information for multi-function control. An alternative bio-sensing strategy is needed to allow for the assistive device to bear more complex functionalities. In this thesis, a different sensing approach is introduced using Pulsed-Wave Doppler ultrasound in order to non-invasively detect small tendon displacements in the hand. The returning Doppler shifted signals from the moving tendon are obtained with a new processing technique. This processing technique involves a unique way to acquire raw data access from a commercial clinical ultrasound machine and to process the signal with Fourier analysis in order to determine the tendon displacements. The feasibility of the proposed sensing method and processing technique is tested with three experiments involving a moving string, a moving biological beef tendon and a moving human hand tendon. Although the proposed signal processing technique will be useful in many clinical applications involving displacement monitoring of biological tendons, its uses are demonstrated in this thesis for ultrasound-based user intention analysis for the ultimate goal of controlling assistive exoskeletal robotic hands.
59

Automatické tagování hudebních děl pomocí metod strojového učení / Automatic tagging of musical compositions using machine learning methods

Semela, René January 2020 (has links)
One of the many challenges of machine learning are systems for automatic tagging of music, the complexity of this issue in particular. These systems can be practically used in the content analysis of music or the sorting of music libraries. This thesis deals with the design, training, testing, and evaluation of artificial neural network architectures for automatic tagging of music. In the beginning, attention is paid to the setting of the theoretical foundation of this field. In the practical part of this thesis, 8 architectures of neural networks are designed (4 fully convolutional and 4 convolutional recurrent). These architectures are then trained using the MagnaTagATune Dataset and mel spectrogram. After training, these architectures are tested and evaluated. The best results are achieved by the four-layer convolutional recurrent neural network (CRNN4) with the ROC-AUC = 0.9046 ± 0.0016. As the next step of the practical part of this thesis, a completely new Last.fm Dataset 2020 is created. This dataset uses Last.fm and Spotify API for data acquisition and contains 100 tags and 122877 tracks. The most successful architectures are then trained, tested, and evaluated on this new dataset. The best results on this dataset are achieved by the six-layer fully convolutional neural network (FCNN6) with the ROC-AUC = 0.8590 ± 0.0011. Finally, a simple application is introduced as a concluding point of this thesis. This application is designed for testing individual neural network architectures on a user-inserted audio file. Overall results of this thesis are similar to other papers on the same topic, but this thesis brings several new findings and innovations. In terms of innovations, a significant reduction in the complexity of individual neural network architectures is achieved while maintaining similar results.
60

Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet

Joborn, Ludvig, Beming, Mattias January 2022 (has links)
Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS).

Page generated in 0.0874 seconds