• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 7
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 64
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • 11
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Estudo da separação entre voz patológica e normal por meio da avaliação da energia global do sinal de voz / Investigation of discrimination between healthy and pathological voice through the analysis of the global energy of the voice signal

Oliveira, Marlice Fernandes de 04 July 2007 (has links)
Voice analysis is an important tool in the diagnosis of laryngeal disorders. Among distinct signal processing techniques employed for voice analysis, the spectrogram is commonly used, as it allows for a visualization of the variation of the energy of the signal as a function of the both time and frequency. In this context, this study investigates the use of the global energy of the voice signal, estimated through the spectrogram, as a tool for discrimination between signals obtained from healthy and pathological subjects. This research has also exploited the potential use of the global energy of the voice signal to discriminate distinct laryngeal disorders. In total, 94 subjets were involved in this study, from which 46 were dysphonic and 48 normal. The diagnosis of laryngeal disorders was confirmed by means of a videolaryngoscopic examination. Participants were also subjected to a clinical examination of vocal acoustic through the recording of the sustained vowel. The global energy allowed for the discrimination between normal and disphonic voice. Furthermore, this technique could discriminate the voice signal of patients suffering from left vocal fold paralysis from those suffering from other investigated disorders. The results suggest the global energy of the signal as an auxiliary and alternative tool for the diagnosis between normal and dysphonic voice. / A análise do sinal de voz é uma ferramenta importante no diagnóstico dos distúrbios laríngeos. Dentre as diversas técnicas para o processamento da voz destaca-se o espectrograma por permitir uma visualização da variação da energia do sinal em função do tempo e freqüência. Neste contexto, esta pesquisa investiga a energia global do sinal de voz, estimada a partir do espectrograma, como ferramenta capaz de discriminar esses sinais, obtidos de pacientes com diferentes doenças, daqueles coletados de sujeitos saudáveis. O estudo ainda verifica a possibilidade do uso da energia global na discriminação de distúrbios laríngeos. No total 94 indivíduos participaram desse estudo, sendo 46 disfônicos e 48 eufônicos. Inicialmente os sujeitos foram submetidos a um exame de videolaringoscopia para a determinaçãoção do diagnóstico. Posteriormente eles foram submetidos a um exame clínico de análise acústica vocal por meio da gravação da vogal sustentada. Utilizando a energia global foi possível discriminar as vozes normais das vozes disfônicas. Por meio da energia global foi possível ainda separar as vozes de pacientes portadores de paralisia da prega vocal esquerda de todas as outras doenças laríngeas investigadas. Podemos sugerir a energia global da voz como uma ferramenta auxiliar no diagnóstico diferencial entre vozes normais e disfônicas. / Mestre em Ciências
42

Algoritmo para estimar gravidade de DPOC através de sinais acústicos. / Algorithm to estimate the severity of COPD by acoustic signals.

Rosemeire Cardozo Vidal 11 April 2017 (has links)
O presente estudo tem como objetivo determinar se a gravidade da DPOC poderá ser estimada através da área do gráfico das intensidades sonoras dos sons respiratórios de pacientes com DPOC. O estudo realizado com 51 pacientes com DPOC leve, moderado, grave, muito grave e 7 indivíduos saudáveis não fumantes. Os sons respiratórios de cada participante, foram coletados através de estetoscópio adaptado com um mini microfone. O método compara as áreas das intensidades sonoras em função da frequência de pacientes de DPOC e indivíduos saudáveis. Neste contexto, para atender ao objetivo, um método foi proposto e testado baseado na combinação de técnicas de filtragem e TFTC, seguida de análise estatística, cálculo da média, desvio padrão e interpolação. Os resultados sugerem que a área do gráfico da variância da intensidade sonora em função da frequência diminui quando aumenta a gravidade da DPOC, exceto para os casos em que a bronquite crônica é predominante. / The present study aims to determine if the severity of COPD can be estimated through the chart area of the sound intensities of respiratory sounds in patients with COPD. The study included 51 patients with mild, moderate, severe, very severe COPD and 7 healthy non-smokers. The breathing sounds of each participant were collected through a stethoscope adapted with a mini microphone. The method compares the areas of intensity sonic densities as a function of the frequency of COPD patients and healthy individuals. In this context, to meet the objective, a method was proposed and tested based on the combination of filtering techniques and TFTC, followed by statistical analysis, calculation of the mean, standard deviation and interpolation. The results suggest that the area of the graph of frequency-frequency sound intensity variance decreases as the severity of COPD increases, except for cases where chronic bronchitis is predominant.
43

Sparse coding for speech recognition

Smit, Willem Jacobus 11 November 2008 (has links)
The brain is a complex organ that is computationally strong. Recent research in the field of neurobiology help scientists to better understand the working of the brain, especially how the brain represents or codes external signals. The research shows that the neural code is sparse. A sparse code is a code in which few neurons participate in the representation of a signal. Neurons communicate with each other by sending pulses or spikes at certain times. The spikes send between several neurons over time is called a spike train. A spike train contains all the important information about the signal that it codes. This thesis shows how sparse coding can be used to do speech recognition. The recognition process consists of three parts. First the speech signal is transformed into a spectrogram. Thereafter a sparse code to represent the spectrogram is found. The spectrogram serves as the input to a linear generative model. The output of themodel is a sparse code that can be interpreted as a spike train. Lastly a spike train model recognises the words that are encoded in the spike train. The algorithms that search for sparse codes to represent signals require many computations. We therefore propose an algorithm that is more efficient than current algorithms. The algorithm makes it possible to find sparse codes in reasonable time if the spectrogram is fairly coarse. The system achieves a word error rate of 19% with a coarse spectrogram, while a system based on Hidden Markov Models achieves a word error rate of 15% on the same spectrograms. / Thesis (PhD)--University of Pretoria, 2008. / Electrical, Electronic and Computer Engineering / unrestricted
44

Evaluation of CNN in ESM Data Classification by Perspective of  Military Utility / Utvärdering av convolutional neural networks för ESM-dataklassifikation genom perspektivet av militär nytta

Johansson, Jimmy January 2020 (has links)
Modern society has seen an increase in automation using AI in a variety of applications. To keep up with recent development, it is therefore logical to investigate the application of AI programs to military tasks. The great advantage with automation lies in the possible increase in efficiency and possible relocation of resources of personnel to other tasks. Therefore, this study aims to evaluate the use of Convolutional Neural Networks (CNN) in classification of communication and radar emitters based on collected Electronic Support Measures (ESM) data and to estimate to what extent human analysts could be replaced. The evaluation was performed by applying the concept of military Utility as a framework for evaluation with the addition of Technology Readiness Level (TRL) to survey how far the technology has developed. Data was collected using two methods: Firstly, through a literature review of research done on the application of CNNs in classifying information such as spectrograms and images. Secondly, by interviewing a subject matter expert from SAAB, who mainly helped estimate the TRL of the technology’s components. The study found that CNN appears suitable to apply on the proposed task and that the program could potentially replace human analysts to a great extent, at least when doing routine classifications. Full automation seems unlikely as analysts would be required with more challenging classifications, especially those outside the range of the training data used in teaching the CNN. Finally, challenges involved with deep learning programs inherent structure, demands and application to military tasks are discussed and subjects for future research are proposed. / Det moderna samhället har sett en ökad automatisering med AI i en mängd olika applikationer och för att hålla jämna steg med den senaste utvecklingen är det därför logiskt att undersöka tillämpningen av AI-program på militära uppgifter. Den stora fördelen med automatisering ligger i den möjliga ökningen av effektivitet och möjlig flytt av personalresurser till andra uppgifter. Därför syftar denna studie till att utvärdera användningen av convolutional neural networks (CNN) vid klassificering av kommunikations- och radarsändare baserat på insamlade data från elektronisk stödverksamhet (sv. ES motsvara eng. ESM) och att uppskatta i vilken utsträckning mänskliga analytiker kan ersättas. Utvärderingen genomfördes genom att använda konceptet militär nytta som ett ramverk för utvärdering med tillägg av technology readiness level (TRL) för att kartlägga hur långt tekniken har utvecklats. Data samlades in med två metoder: För det första genom en litteraturöversikt av forskning som gjorts om tillämpningen av CNN för att klassificera information såsom spektrogram och bilder. För det andra genom att intervjua en ämnesexpert från SAAB, som främst hjälpte till att uppskatta TRL för teknikens komponenter. Studien fann att CNN verkar lämplig att använda till den föreslagna uppgiften och att programmet potentiellt skulle kunna ersätta mänskliga analytiker i stor utsträckning, åtminstone for rutinklassificeringar. En fullständig automatisering verkar osannolik eftersom analytiker skulle krävas med mer utmanande klassificeringar, särskilt de som ligger utanför utbildningsdata som används för att lära upp programmet. Slutligen diskuteras utmaningar kopplade till djup-inlärningsprogrammens struktur, krav och tillämpning på militära uppgifter samt att ämnen för framtida forskning föreslås.
45

Jamming Detection and Classification via Conventional Machine Learning and Deep Learning with Applications to UAVs

Yuchen Li (11831105) 13 December 2021 (has links)
<div>With the constant advancement of modern radio technology, the safety of radio communication has become a growing concern for us. Communication has become an essential component, particularly in the application of modern technology such as unmanned aerial vehicle (UAV). As a result, it is critical to ensure that a drone can fly safely and reliably while completing duties. Simultaneously, machine learning (ML) is rapidly developing in the twenty-first century. For example, ML is currently being used in social media and digital marking for predicting and addressing users' varies interests. This also serves as the impetus for this thesis. The goal of this thesis is to combine ML and radio communication to identify and classify UAV interference with high accuracy.</div><div>In this work, a ML approach is explored for detecting and classifying jamming attacks against orthogonal frequency division multiplexing (OFDM) receivers, with applicability to UAVs. Four types of jamming attacks, including barrage, protocol-aware, single-tone, and successive-pulse jamming, are launched and analyzed using software-defined radio (SDR). The jamming range, launch complexity, and attack severity are all considered qualitatively when evaluating each type. Then, a systematic testing procedure is established, where a SDR is placed in the vicinity of a drone to extract radiometric features before and after a jamming attack is launched. Traditional ML methods are used to create classification models with numerical features such as signal-to-noise ratio (SNR), energy threshold, and important OFDM parameters. Furthermore, deep learning method (i.e., convolutional neural networks) are used to develop classification models trained with spectrogram images filling in it. Quantitative indicators such as detection and false alarm rates are used to evaluate the performance of both methods. The spectrogram-based model correctly classifies jamming with a precision of 99.79% and a false-alarm rate of 0.03%, compared to 92.20% and 1.35% for the feature-based counterpart.</div>
46

Drill Failure Detection based on Sound using Artificial Intelligence

Tran, Thanh January 2021 (has links)
In industry, it is crucial to be able to detect damage or abnormal behavior in machines. A machine's downtime can be minimized by detecting and repairing faulty components of the machine as early as possible. It is, however, economically inefficient and labor-intensive to detect machine fault sounds manual. In comparison with manual machine failure detection, automatic failure detection systems can reduce operating and personnel costs.  Although prior research has identified many methods to detect failures in drill machines using vibration or sound signals, this field still remains many challenges. Most previous research using machine learning techniques has been based on features that are extracted manually from the raw sound signals and classified using conventional classifiers (SVM, Gaussian mixture model, etc.). However, manual extraction and selection of features may be tedious for researchers, and their choices may be biased because it is difficult to identify which features are good and contain an essential description of sounds for classification. Recent studies have used LSTM, end-to-end 1D CNN, and 2D CNN as classifiers for classification, but these have limited accuracy for machine failure detection. Besides, machine failure occurs very rarely in the data. Moreover, the sounds in the real-world dataset have complex waveforms and usually are a combination of noise and sound presented at the same time. Given that drill failure detection is essential to apply in the industry to detect failures in machines, I felt compelled to propose a system that can detect anomalies in the drill machine effectively, especially for a small dataset. This thesis proposed modern artificial intelligence methods for the detection of drill failures using drill sounds provided by Valmet AB. Instead of using raw sound signals, the image representations of sound signals (Mel spectrograms and log-Mel spectrograms) were used as the input of my proposed models. For feature extraction, I proposed using deep learning 2-D convolutional neural networks (2D-CNN) to extract features from image representations of sound signals. To classify three classes in the dataset from Valmet AB (anomalous sounds, normal sounds, and irrelevant sounds), I proposed either using conventional machine learning classifiers (KNN, SVM, and linear discriminant) or a recurrent neural network (long short-term memory). For using conventional machine learning methods as classifiers, pre-trained VGG19 was used to extract features and neighborhood component analysis (NCA) as the feature selection. For using long short-term memory (LSTM), a small 2D-CNN was proposed to extract features and used an attention layer after LSTM to focus on the anomaly of the sound when the drill changes from normal to the broken state. Thus, my findings will allow readers to detect anomalies in drill machines better and develop a more cost-effective system that can be conducted well on a small dataset. There is always background noise and acoustic noise in sounds, which affect the accuracy of the classification system. My hypothesis was that noise suppression methods would improve the sound classification application's accuracy. The result of my research is a sound separation method using short-time Fourier transform (STFT) frames with overlapped content. Unlike traditional STFT conversion, in which every sound is converted into one image, a different approach is taken. In contrast, splitting the signal into many STFT frames can improve the accuracy of model prediction by increasing the variability of the data. Images of these frames separated into clean and noisy ones are saved as images, and subsequently fed into a pre-trained CNN for classification. This enables the classifier to become robust to noise. The FSDNoisy18k dataset is chosen in order to demonstrate the efficiency of the proposed method. In experiments using the proposed approach, 94.14 percent of 21 classes were classified successfully, including 20 classes of sound events and a noisy class. / <p>Vid tidpunkten för disputationen var följande delarbeten opublicerade: delarbete 2 och 3 inskickat.</p><p>At the time of the doctoral defence the following papers were unpublished: paper 2 and 3 submitted.</p> / AISound – Akustisk sensoruppsättning för AI-övervakningssystem / MiLo — miljön i kontrolloopen
47

Multi-objective optimization for model selection in music classification / Flermålsoptimering för modellval i musikklassificering

Ujihara, Rintaro January 2021 (has links)
With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. The collaborating company of this thesis, Ichigoichie AB, is currently developing a system to categorize music data into positive/negative classes. To enhance the accuracy of the existing system, this project aims to figure out the best model through experiments with six audio features (Mel spectrogram, MFCC, HPSS, Onset, CENS, Tonnetz) and several machine learning models including deep neural network models for the classification task. For each model, hyperparameter tuning is performed and the model evaluation is carried out according to pareto optimality with regard to accuracy and execution time. The results show that the most promising model accomplished 95% correct classification with an execution time of less than 15 seconds. / I och med genombrottet av maskininlärningstekniker har forskning kring känsloklassificering i musik sett betydande framsteg genom att kombinera olikamusikanalysverktyg med nya maskinlärningsmodeller. Trots detta är hur man förbehandlar ljuddatat och valet av vilken maskinklassificeringsalgoritm som ska tillämpas beroende på vilken typ av data man arbetar med samt målet med projektet. Denna uppsats samarbetspartner, Ichigoichie AB, utvecklar för närvarande ett system för att kategorisera musikdata enligt positiva och negativa känslor. För att höja systemets noggrannhet är målet med denna uppsats att experimentellt hitta bästa modellen baserat på sex musik-egenskaper (Mel-spektrogram, MFCC, HPSS, Onset, CENS samt Tonnetz) och ett antal olika maskininlärningsmodeller, inklusive Deep Learning-modeller. Varje modell hyperparameteroptimeras och utvärderas enligt paretooptimalitet med hänsyn till noggrannhet och beräkningstid. Resultaten visar att den mest lovande modellen uppnådde 95% korrekt klassificering med en beräkningstid på mindre än 15 sekunder.
48

Measures of Voice Onset Time: A Methodological Study

Rae, Rebecca C. 03 May 2018 (has links)
No description available.
49

Long-Term Ambient Noise Statistics in the Gulf of Mexico

Snyder, Mark Alan 15 December 2007 (has links)
Long-term omni-directional ambient noise was collected at several sites in the Gulf of Mexico during 2004 and 2005. The Naval Oceanographic Office deployed bottom moored Environmental Acoustic Recording System (EARS) buoys approximately 159 nautical miles south of Panama City, Florida, in water depths of 3200 meters. The hydrophone of each buoy was 265 meters above the bottom. The data duration ranged from 10-14 months. The buoys were located near a major shipping lane, with an estimated 1.5 to 4.5 ships per day passing nearby. The data were sampled at 2500 Hz and have a bandwidth of 10-1000 Hz. Data are processed in eight 1/3-octave frequency bands, centered from 25 to 950 Hz, and monthly values of the following statistical quantities are computed from the resulting eight time series of noise spectral level: mean, median, standard deviation, skewness, kurtosis and coherence time. Four hurricanes were recorded during the summer of 2004 and they have a major impact on all of the noise statistics. Noise levels at higher frequencies (400-950 Hz) peak during extremely windy months (summer hurricanes and winter storms). Standard deviation is least in the region 100-200 Hz but increases at higher frequencies, especially during periods of high wind variability (summer hurricanes). Skewness is positive from 25-400 Hz and negative from 630-950 Hz. Skewness and kurtosis are greatest near 100 Hz. Coherence time is low in shipping bands and high in weather bands, and it peaks during hurricanes. The noise coherence is also analyzed. The 14-month time series in each 1/3- octave band is highly correlated with other 1/3-octave band time series ranging from 2 octaves below to 2 octaves above the band's center frequency. Spatial coherence between hydrophones is also analyzed for hydrophone separations of 2.29, 2.56 and 4.84 km over a 10-month period. The noise field is highly coherent out to the maximum distance studied, 4.84 km. Additionally, fluctuations of each time series are analyzed to determine time scales of greatest variability. The 14-month data show clearly that variability occurs primarily over three time scales: 7-22 hours (shipping-related), 56-282 hours (2-12 days, weather-related) and over an 8-12 month period.
50

Suivi de formants par analyse en multirésolution / Formant tracking by Multiresolution Analysis

Jemâa, Imen 19 February 2013 (has links)
Nos travaux de recherches présentés dans ce manuscrit ont pour objectif, l'optimisation des performances des algorithmes de suivi des formants. Pour ce faire, nous avons commencé par l'analyse des différentes techniques existantes utilisées dans le suivi automatique des formants. Cette analyse nous a permis de constater que l'estimation automatique des formants reste délicate malgré l'emploi de diverses techniques complexes. Vue la non disponibilité des bases de données de référence en langue arabe, nous avons élaboré un corpus phonétiquement équilibré en langue arabe tout en élaborant un étiquetage manuel phonétique et formantique. Ensuite, nous avons présenté nos deux nouvelles approches de suivi de formants dont la première est basée sur l'estimation des crêtes de Fourier (maxima de spectrogramme) ou des crêtes d'ondelettes (maxima de scalogramme) en utilisant comme contrainte de suivi le calcul de centre de gravité de la combinaison des fréquences candidates pour chaque formant, tandis que la deuxième approche de suivi est basée sur la programmation dynamique combinée avec le filtrage de Kalman. Finalement, nous avons fait une étude exploratrice en utilisant notre corpus étiqueté manuellement comme référence pour évaluer quantitativement nos deux nouvelles approches par rapport à d'autres méthodes automatiques de suivi de formants. Nous avons testé la première approche par détection des crêtes ondelette, utilisant le calcul de centre de gravité, sur des signaux synthétiques ensuite sur des signaux réels de notre corpus étiqueté en testant trois types d'ondelettes complexes (CMOR, SHAN et FBSP). Suite à ces différents tests, il apparaît que le suivi de formants et la résolution des scalogrammes donnés par les ondelettes CMOR et FBSP sont meilleurs qu'avec l'ondelette SHAN. Afin d'évaluer quantitativement nos deux approches, nous avons calculé la différence moyenne absolue et l'écart type normalisée. Nous avons fait plusieurs tests avec différents locuteurs (masculins et féminins) sur les différentes voyelles longues et courtes et la parole continue en prenant les signaux étiquetés issus de la base élaborée comme référence. Les résultats de suivi ont été ensuite comparés à ceux de la méthode par crêtes de Fourier en utilisant le calcul de centre de gravité, de l'analyse LPC combinée à des bancs de filtres de Mustafa Kamran et de l'analyse LPC dans le logiciel Praat. D'après les résultats obtenus sur les voyelles /a/ et /A/, nous avons constaté que le suivi fait par la méthode ondelette avec CMOR est globalement meilleur que celui des autres méthodes Praat et Fourier. Cette méthode donne donc un suivi de formants (F1, F2 et F3) pertinent et plus proche de suivi référence. Les résultats des méthodes Fourier et ondelette sont très proches dans certains cas puisque toutes les deux présentent moins d'erreurs que la méthode Praat pour les cinq locuteurs masculins ce qui n'est pas le cas pour les autres voyelles où il y a des erreurs qui se présentent parfois sur F2 et parfois sur F3. D'après les résultats obtenus sur la parole continue, nous avons constaté que dans le cas des locuteurs masculins, les résultats des deux nouvelles approches sont notamment meilleurs que ceux de la méthode LPC de Mustafa Kamran et ceux de Praat même si elles présentent souvent quelques erreurs sur F3. Elles sont aussi très proches de la méthode par détection de crêtes de Fourier utilisant le calcul de centre de gravité. Les résultats obtenus dans le cas des locutrices féminins confirment la tendance observée sur les locuteurs / Our research work presented in this thesis aims the optimization of the performance of formant tracking algorithms. We began by analyzing different existing techniques used in the automatic formant tracking. This analysis showed that the automatic formant estimation remains difficult despite the use of complex techniques. For the non-availability of database as reference in Arabic, we have developed a phonetically balanced corpus in Arabic while developing a manual phonetic and formant tracking labeling. Then we presented our two new automatic formant tracking approaches which are based on the estimation of Fourier ridges (local maxima of spectrogram) or wavelet ridges (local maxima of scalogram) using as a tracking constraint the calculation of center of gravity of a set of candidate frequencies for each formant, while the second tracking approach is based on dynamic programming combined with Kalman filtering. Finally, we made an exploratory study using manually labeled corpus as a reference to quantify our two new approaches compared to other automatic formant tracking methods. We tested the first approach based on wavelet ridges detection, using the calculation of the center of gravity on synthetic signals and then on real signals issued from our database by testing three types of complex wavelets (CMOR, SHAN and FBSP). Following these tests, it appears that formant tracking and scalogram resolution given by CMOR and FBSP wavelets are better than the SHAN wavelet. To quantitatively evaluate our two approaches, we calculated the absolute difference average and standard deviation. We made several tests with different speakers (male and female) on various long and short vowels and continuous speech signals issued from our database using it as a reference. The formant tracking results are compared to those of Fourier ridges method calculating the center of gravity, LPC analysis combined with filter banks method of Kamran.M and LPC analysis integrated in Praat software. According to the results of the vowels / a / and / A /, we found that formant tracking by the method with wavelet CMOR is generally better than other methods. Therefore, this method provides a correct formant tracking (F1, F2 and F3) and closer to the reference. The results of Fourier and wavelet methods are very similar in some cases since both have fewer errors than the method Praat. These results are proven for the five male speakers which is not the case for the other vowels where there are some errors which are present sometimes in F2 and sometimes in F3. According to the results obtained on continuous speech, we found that in the case of male speakers, the result of both approaches are particularly better than those of Kamran.M method and those of Praat even if they are often few errors in F3. They are also very close to the Fourier ridges method using the calculation of center of gravity. The results obtained in the case of female speakers confirm the trend observed over the male speakers

Page generated in 0.0711 seconds