Global ETD Search

1	Drill Failure Detection based on Sound using Artificial Intelligence Tran, Thanh January 2021 (has links) In industry, it is crucial to be able to detect damage or abnormal behavior in machines. A machine's downtime can be minimized by detecting and repairing faulty components of the machine as early as possible. It is, however, economically inefficient and labor-intensive to detect machine fault sounds manual. In comparison with manual machine failure detection, automatic failure detection systems can reduce operating and personnel costs. Although prior research has identified many methods to detect failures in drill machines using vibration or sound signals, this field still remains many challenges. Most previous research using machine learning techniques has been based on features that are extracted manually from the raw sound signals and classified using conventional classifiers (SVM, Gaussian mixture model, etc.). However, manual extraction and selection of features may be tedious for researchers, and their choices may be biased because it is difficult to identify which features are good and contain an essential description of sounds for classification. Recent studies have used LSTM, end-to-end 1D CNN, and 2D CNN as classifiers for classification, but these have limited accuracy for machine failure detection. Besides, machine failure occurs very rarely in the data. Moreover, the sounds in the real-world dataset have complex waveforms and usually are a combination of noise and sound presented at the same time. Given that drill failure detection is essential to apply in the industry to detect failures in machines, I felt compelled to propose a system that can detect anomalies in the drill machine effectively, especially for a small dataset. This thesis proposed modern artificial intelligence methods for the detection of drill failures using drill sounds provided by Valmet AB. Instead of using raw sound signals, the image representations of sound signals (Mel spectrograms and log-Mel spectrograms) were used as the input of my proposed models. For feature extraction, I proposed using deep learning 2-D convolutional neural networks (2D-CNN) to extract features from image representations of sound signals. To classify three classes in the dataset from Valmet AB (anomalous sounds, normal sounds, and irrelevant sounds), I proposed either using conventional machine learning classifiers (KNN, SVM, and linear discriminant) or a recurrent neural network (long short-term memory). For using conventional machine learning methods as classifiers, pre-trained VGG19 was used to extract features and neighborhood component analysis (NCA) as the feature selection. For using long short-term memory (LSTM), a small 2D-CNN was proposed to extract features and used an attention layer after LSTM to focus on the anomaly of the sound when the drill changes from normal to the broken state. Thus, my findings will allow readers to detect anomalies in drill machines better and develop a more cost-effective system that can be conducted well on a small dataset. There is always background noise and acoustic noise in sounds, which affect the accuracy of the classification system. My hypothesis was that noise suppression methods would improve the sound classification application's accuracy. The result of my research is a sound separation method using short-time Fourier transform (STFT) frames with overlapped content. Unlike traditional STFT conversion, in which every sound is converted into one image, a different approach is taken. In contrast, splitting the signal into many STFT frames can improve the accuracy of model prediction by increasing the variability of the data. Images of these frames separated into clean and noisy ones are saved as images, and subsequently fed into a pre-trained CNN for classification. This enables the classifier to become robust to noise. The FSDNoisy18k dataset is chosen in order to demonstrate the efficiency of the proposed method. In experiments using the proposed approach, 94.14 percent of 21 classes were classified successfully, including 20 classes of sound events and a noisy class. / <p>Vid tidpunkten för disputationen var följande delarbeten opublicerade: delarbete 2 och 3 inskickat.</p><p>At the time of the doctoral defence the following papers were unpublished: paper 2 and 3 submitted.</p> / AISound – Akustisk sensoruppsättning för AI-övervakningssystem / MiLo — miljön i kontrolloopen Convolutional neural network machine failure detection Mel-spectrogram long short-term memory sound signal processing Other Computer and Information Science Annan data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi)
2	Multi-objective optimization for model selection in music classification / Flermålsoptimering för modellval i musikklassificering Ujihara, Rintaro January 2021 (has links) With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. The collaborating company of this thesis, Ichigoichie AB, is currently developing a system to categorize music data into positive/negative classes. To enhance the accuracy of the existing system, this project aims to figure out the best model through experiments with six audio features (Mel spectrogram, MFCC, HPSS, Onset, CENS, Tonnetz) and several machine learning models including deep neural network models for the classification task. For each model, hyperparameter tuning is performed and the model evaluation is carried out according to pareto optimality with regard to accuracy and execution time. The results show that the most promising model accomplished 95% correct classification with an execution time of less than 15 seconds. / I och med genombrottet av maskininlärningstekniker har forskning kring känsloklassificering i musik sett betydande framsteg genom att kombinera olikamusikanalysverktyg med nya maskinlärningsmodeller. Trots detta är hur man förbehandlar ljuddatat och valet av vilken maskinklassificeringsalgoritm som ska tillämpas beroende på vilken typ av data man arbetar med samt målet med projektet. Denna uppsats samarbetspartner, Ichigoichie AB, utvecklar för närvarande ett system för att kategorisera musikdata enligt positiva och negativa känslor. För att höja systemets noggrannhet är målet med denna uppsats att experimentellt hitta bästa modellen baserat på sex musik-egenskaper (Mel-spektrogram, MFCC, HPSS, Onset, CENS samt Tonnetz) och ett antal olika maskininlärningsmodeller, inklusive Deep Learning-modeller. Varje modell hyperparameteroptimeras och utvärderas enligt paretooptimalitet med hänsyn till noggrannhet och beräkningstid. Resultaten visar att den mest lovande modellen uppnådde 95% korrekt klassificering med en beräkningstid på mindre än 15 sekunder. Music emotion recognition Mel spectrogram MFCC CENS Onset Tonnetz HPSS 1D convolutional neural network Attention LSTM 1DCNN BiLSTM Pareto optimality Mathematics Matematik
3	Detektor tempa hudebních nahrávek na bázi neuronové sítě / Tempo detector based on a neural network Suchánek, Tomáš January 2021 (has links) This Master’s thesis deals with beat tracking systems, whose functionality is based on neural networks. It describes the structure of these systems and how the signal is processed in their individual blocks. Emphasis is then placed on recurrent and temporal convolutional networks, which by they nature can effectively detect tempo and beats in audio recordings. The selected methods, network architectures and their modifications are then implemented within a comprehensive detection system, which is further tested and evaluated through a cross-validation process on a genre-diverse data-set. The results show that the system, with proposed temporal convolutional network architecture, produces comparable results with foreign publications. For example, within the SMC dataset, it proved to be the most successful, on the contrary, in the case of other datasets it was slightly below the accuracy of state-of-the-art systems. In addition,the proposed network retains low computational complexity despite increased number of internal parameters.
4	Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classification Frenger, Tobias, Häggmark, Johan January 2020 (has links) Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training. Convolutional neural networks Object classification Audio classification Transfer learning Inception-V3 Inception-ResNet-V2 Keras ImageNet Mini-Inception Mini-Inception-ResNet Mel-spectrogram CIFAR-10 Information Systems, Social aspects
5	Wavebender GAN : Deep architecture for high-quality and controllable speech synthesis through interpretable features and exchangeable neural synthesizers / Wavebender GAN : Djup arkitektur för kontrollerbar talsyntes genom tolkningsbara attribut och utbytbara neurala syntessystem Döhler Beck, Gustavo Teodoro January 2021 (has links) Modeling humans’ speech is a challenging task that originally required a coalition between phoneticians and speech engineers. Yet, the latter, disengaged from phoneticians, have strived for evermore natural speech synthesis in the absence of an awareness of speech modelling due to data- driven and ever-growing deep learning models. By virtue of decades of detachment between phoneticians and speech engineers, this thesis presents a deep learning architecture, alleged Wavebender GAN, that predicts mel- spectrograms that are processed by a vocoder, HiFi-GAN, to synthesize speech. Wavebender GAN pushes for progress in both speech science and technology, allowing phoneticians to manipulate stimuli and test phonological models supported by high-quality synthesized speeches generated through interpretable low-level signal properties. This work sets a new step of cooperation for phoneticians and speech engineers. / Att modellera mänskligt tal är en utmanande uppgift som ursprungligen krävde en samverkan mellan fonetiker och taltekniker. De senare har dock, utan att vara kopplade till fonetikerna, strävat efter en allt mer naturlig talsyntes i avsaknad av en djup medvetenhet om talmodellering på grund av datadrivna och ständigt växande modeller fördjupinlärning. Med anledning av decennier av distansering mellan fonetiker och taltekniker presenteras i denna avhandling en arkitektur för djupinlärning, som påstås vara Wavebender GAN, som förutsäger mel-spektrogram som tas emot av en vocoder, HiFi-GAN, för att syntetisera tal. Wavebender GAN driver på för framsteg inom både tal vetenskap och teknik, vilket gör det möjligt för fonetiker att manipulera stimulus och testa fonologiska modeller som stöds av högkvalitativa syntetiserade tal som genereras genom tolkningsbara signalegenskaper på lågnivå. Detta arbete inleder en ny era av samarbete för fonetiker och taltekniker. Mel-spectrogram Speech Synthesis Wavebender GAN HiFi-GAN Control- lability Interpretability Low-level Signal Properties Mel-spektrogram Talsyntes Wavebender GAN HiFi-GAN Kontrollerbarhet Tolkbarhet Signalegenskaper På Låg Nivå Computer and Information Sciences Data- och informationsvetenskap
6	Automatické tagování hudebních děl pomocí metod strojového učení / Automatic tagging of musical compositions using machine learning methods Semela, René January 2020 (has links) One of the many challenges of machine learning are systems for automatic tagging of music, the complexity of this issue in particular. These systems can be practically used in the content analysis of music or the sorting of music libraries. This thesis deals with the design, training, testing, and evaluation of artificial neural network architectures for automatic tagging of music. In the beginning, attention is paid to the setting of the theoretical foundation of this field. In the practical part of this thesis, 8 architectures of neural networks are designed (4 fully convolutional and 4 convolutional recurrent). These architectures are then trained using the MagnaTagATune Dataset and mel spectrogram. After training, these architectures are tested and evaluated. The best results are achieved by the four-layer convolutional recurrent neural network (CRNN4) with the ROC-AUC = 0.9046 ± 0.0016. As the next step of the practical part of this thesis, a completely new Last.fm Dataset 2020 is created. This dataset uses Last.fm and Spotify API for data acquisition and contains 100 tags and 122877 tracks. The most successful architectures are then trained, tested, and evaluated on this new dataset. The best results on this dataset are achieved by the six-layer fully convolutional neural network (FCNN6) with the ROC-AUC = 0.8590 ± 0.0011. Finally, a simple application is introduced as a concluding point of this thesis. This application is designed for testing individual neural network architectures on a user-inserted audio file. Overall results of this thesis are similar to other papers on the same topic, but this thesis brings several new findings and innovations. In terms of innovations, a significant reduction in the complexity of individual neural network architectures is achieved while maintaining similar results.
7	Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet Joborn, Ludvig, Beming, Mattias January 2022 (has links) Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS). voice activity detection VAD deep learning machine learning ML artificial intelligence AI convolutional neural network CNN deep neural network DNN sound event detection SED mel spectrogram audio processing polyphonic sound detection score PSDS signal processing signal to noise ratio SNR RCRNN sampling rate Gaussian noise Computer Sciences Datavetenskap (datalogi)
8	Machine Learning for Speech Forensics and Hypersonic Vehicle Applications Emily R Bartusiak (6630773) 06 December 2022 (has links) <p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p> <p><br></p> <p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p> Audio processing Computer vision Digital forensics Deep learning machine learning deep learning speech forensics media forensics convolutional neural network transformer convolutional transformer ensemble spectrogram analysis mel spectrogram analysis synthesized speech detection synthesized speech attribution closed set open set t-stochastic neighbor embedding latent space analysis hypersonics hypersonic glide vehicles vehicle classification flight phase prediction stochastic grammar k-nearest neighbors support vector machine probabilistic context-free grammar automatic distillation of structure generalized earley parser transfer learning

Search results