• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 48
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 73
  • 73
  • 73
  • 31
  • 29
  • 20
  • 18
  • 17
  • 14
  • 14
  • 12
  • 12
  • 12
  • 12
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

EMONAS : Evolutionary Multi-objective Neuron Architecture Search of Deep Neural Network / EMONAS : Evolutionär multi-objektiv neuronarkitektursökning av djupa neurala nätverk för inbyggda system

Feng, Jiayi January 2023 (has links)
Customized Deep Neural Network (DNN) accelerators have been increasingly popular in various applications, from autonomous driving and natural language processing to healthcare and finance, etc. However, deploying them directly on embedded system peripherals within real-time operating systems (RTOS) is not easy due to the paradox of the complexity of DNNs and the simplicity of embedded system devices. As a result, DNN implementation on embedded system devices requires customized accelerators with tailored hardware due to their numerous computations, latency, power consumption, etc. Moreover, the computational capacity, provided by potent microprocessors or graphics processing units (GPUs), is necessary to unleash the full potential of DNN, but these computational resources are often not easily available in embedded system devices. In this thesis, we propose an innovative method to evaluate and improve the efficiency of DNN implementation within the constraints of resourcelimited embedded system devices. The Evolutionary Multi-Objective Neuron Architecture Search-Binary One Optimization (EMONAS-BOO) optimizes both the image classification accuracy and the innovative Binary One Optimization (BOO) objectives, with Multiple Objective Optimization (MOO) methods. The EMONAS-BOO automates neural network searching and training, and the neural network architectures’ diversity is also guaranteed with the help of an evolutionary algorithm that consists of tournament selection, polynomial mutation, and point crossover mechanisms. Binary One Optimization (BOO) is used to evaluate the difficulty in implementing DNNs on resource-limited embedded system peripherals, employing a binary format for DNN weights. A deeper implementation of the innovative Binary One Optimization will significantly boost not only computation efficiency but also memory storage, power dissipation, etc. It is based on the reduction of weights binary 1’s that need to be computed and stored, where the reduction of binary 1 brings reduced arithmetic operations and thus simplified neural network structures. In addition, analyzed from a digital circuit waveform perspective, the embedded system, in interpreting the neural network, will register an increase in zero weights leading to a reduction in voltage transition frequency, which, in turn, benefits power efficiency improvement. The proposed EMONAS employs the MOO method which optimizes two objectives. The first objective is image classification accuracy, and the second objective is Binary One Optimization (BOO). This approach enables EMONAS to outperform manually constructed and randomly searched DNNs. Notably, 12 out of 100 distinct DNNs maintained their image classification accuracy. At the same time, they also exhibit superior BOO performance. Additionally, the proposed EMONAS ensures automated searching and training of DNNs. It achieved significant reductions in key performance metrics: Compared with random search, evolutionary-searched BOO was lowered by up to 85.1%, parameter size by 85.3%, and FLOPs by 83.3%. These improvements were accomplished without sacrificing the image classification accuracy, which saw an increase of 8.0%. These results demonstrate that the EMONAS is an excellent choice for optimizing innovative objects that did not exist before, and greater multi-objective optimization performance can be guaranteed simultaneously if computational resources are adequate. / Customized Deep Neural Network (DNN)-acceleratorer har blivit alltmer populära i olika applikationer, från autonom körning och naturlig språkbehandling till sjukvård och ekonomi, etc. Att distribuera dem direkt på kringutrustning för inbyggda system inom realtidsoperativsystem (RTOS) är dock inte lätt på grund av paradoxen med komplexiteten hos DNN och enkelheten hos inbyggda systemenheter. Som ett resultat kräver DNNimplementering på inbäddade systemenheter skräddarsydda acceleratorer med skräddarsydd hårdvara på grund av deras många beräkningar, latens, strömförbrukning, etc. Dessutom är beräkningskapaciteten, som tillhandahålls av potenta mikroprocessorer eller grafikprocessorer (GPU), nödvändig för att frigöra den fulla potentialen hos DNN, men dessa beräkningsresurser är ofta inte lätt tillgängliga i inbyggda systemenheter. I den här avhandlingen föreslår vi en innovativ metod för att utvärdera och förbättra effektiviteten av DNN-implementering inom begränsningarna av resursbegränsade inbäddade systemenheter. Den evolutionära Multi-Objective Neuron Architecture Search-Binary One Optimization (EMONAS-BOO) optimerar både bildklassificeringsnoggrannheten och de innovativa Binary One Optimization (BOO) målen, med Multiple Objective Optimization (MOO) metoder. EMONAS-BOO automatiserar sökning och träning av neurala nätverk, och de neurala nätverksarkitekturernas mångfald garanteras också med hjälp av en evolutionär algoritm som består av turneringsval, polynommutation och punktövergångsmekanismer. Binary One Optimization (BOO) används för att utvärdera svårigheten att implementera DNN på resursbegränsade kringutrustning för inbäddade system, med ett binärt format för DNN-vikter. En djupare implementering av den innovativa Binary One Optimization kommer att avsevärt öka inte bara beräkningseffektiviteten utan också minneslagring, effektförlust, etc. Den är baserad på minskningen av vikter binära 1:or som behöver beräknas och lagras, där minskningen av binär 1 ger minskade aritmetiska operationer och därmed förenklade neurala nätverksstrukturer. Dessutom, analyserat ur ett digitalt kretsvågformsperspektiv, kommer det inbäddade systemet, vid tolkning av det neurala nätverket, att registrera en ökning av nollvikter, vilket leder till en minskning av spänningsövergångsfrekvensen, vilket i sin tur gynnar en förbättring av effekteffektiviteten. Den föreslagna EMONAS använder MOO-metoden som optimerar två mål. Det första målet är bildklassificeringsnoggrannhet och det andra målet är Binary One Optimization (BOO). Detta tillvägagångssätt gör det möjligt för EMONAS att överträffa manuellt konstruerade och slumpmässigt genomsökta DNN. Noterbart behöll 12 av 100 distinkta DNN:er sin bildklassificeringsnoggrannhet. Samtidigt uppvisar de också överlägsen BOOprestanda. Dessutom säkerställer den föreslagna EMONAS automatisk sökning och utbildning av DNN. Den uppnådde betydande minskningar av nyckelprestandamått: BOO sänktes med upp till 85,1%, parameterstorleken med 85,3% och FLOP:s med 83,3%. Dessa förbättringar åstadkoms utan att offra bildklassificeringsnoggrannheten, som såg en ökning med 8,0%. Dessa resultat visar att EMONAS är ett utmärkt val för att optimera innovativa objekt som inte existerade tidigare, och större multi-objektiv optimeringsprestanda kan garanteras samtidigt om beräkningsresurserna är tillräckliga.
52

Multimodal Deep Learning for Multi-Label Classification and Ranking Problems

Dubey, Abhishek January 2015 (has links) (PDF)
In recent years, deep neural network models have shown to outperform many state of the art algorithms. The reason for this is, unsupervised pretraining with multi-layered deep neural networks have shown to learn better features, which further improves many supervised tasks. These models not only automate the feature extraction process but also provide with robust features for various machine learning tasks. But the unsupervised pretraining and feature extraction using multi-layered networks are restricted only to the input features and not to the output. The performance of many supervised learning algorithms (or models) depends on how well the output dependencies are handled by these algorithms [Dembczy´nski et al., 2012]. Adapting the standard neural networks to handle these output dependencies for any specific type of problem has been an active area of research [Zhang and Zhou, 2006, Ribeiro et al., 2012]. On the other hand, inference into multimodal data is considered as a difficult problem in machine learning and recently ‘deep multimodal neural networks’ have shown significant results [Ngiam et al., 2011, Srivastava and Salakhutdinov, 2012]. Several problems like classification with complete or missing modality data, generating the missing modality etc., are shown to perform very well with these models. In this work, we consider three nontrivial supervised learning tasks (i) multi-class classification (MCC), (ii) multi-label classification (MLC) and (iii) label ranking (LR), mentioned in the order of increasing complexity of the output. While multi-class classification deals with predicting one class for every instance, multi-label classification deals with predicting more than one classes for every instance and label ranking deals with assigning a rank to each label for every instance. All the work in this field is associated around formulating new error functions that can force network to identify the output dependencies. Aim of our work is to adapt neural network to implicitly handle the feature extraction (dependencies) for output in the network structure, removing the need of hand crafted error functions. We show that the multimodal deep architectures can be adapted for these type of problems (or data) by considering labels as one of the modalities. This also brings unsupervised pretraining to the output along with the input. We show that these models can not only outperform standard deep neural networks, but also outperform standard adaptations of neural networks for individual domains under various metrics over several data sets considered by us. We can observe that the performance of our models over other models improves even more as the complexity of the output/ problem increases.
53

Odhad kanálu v OFDM systémech pomocí deep learning metod / Utilization of deep learning for channel estimation in OFDM systems

Hubík, Daniel January 2019 (has links)
This paper describes a wireless communication model based on IEEE 802.11n. Typical methods for channel equalisation and estimation are described, such as the least squares method and the minimum mean square error method. Equalization based on deep learning was used as well. Coded and uncoded bit error rate was used as a performance identifier. Experiments with topology of the neural network has been performed. Programming languages such as MATLAB and Python were used in this work.
54

Generátor neuronových sítí pro potřeby měření podobnosti obrazu / Neural network generator for image similarity measurement

Hipča, Tomáš January 2019 (has links)
This thesis deals with designing an automatic generator of deep neural networks for image classification. Theoretical part clarifies what a neural network and formal neuron are. Furthermore, the types of neural network architectures are presented. The focus of this thesis is convolutional neural networks, several pieces of research from this field are mentioned. The practical part of this thesis describes information with regards to the implementation of neural network generator, possible frameworks and programming languages for such implementation. Brief description of the implementation itself is presented as well as implemented layers. Generated neural networks are tested on Google-Landmarks dataset and results are commented upon.
55

Anomaly Detection and Security Deep Learning Methods Under Adversarial Situation

Miguel Villarreal-Vasquez (9034049) 27 June 2020 (has links)
<p>Advances in Artificial Intelligence (AI), or more precisely on Neural Networks (NNs), and fast processing technologies (e.g. Graphic Processing Units or GPUs) in recent years have positioned NNs as one of the main machine learning algorithms used to solved a diversity of problems in both academia and the industry. While they have been proved to be effective in solving many tasks, the lack of security guarantees and understanding of their internal processing disrupts their wide adoption in general and cybersecurity-related applications. In this dissertation, we present the findings of a comprehensive study aimed to enable the absorption of state-of-the-art NN algorithms in the development of enterprise solutions. Specifically, this dissertation focuses on (1) the development of defensive mechanisms to protect NNs against adversarial attacks and (2) application of NN models for anomaly detection in enterprise networks.</p><p>In this state of affairs, this work makes the following contributions. First, we performed a thorough study of the different adversarial attacks against NNs. We concentrate on the attacks referred to as trojan attacks and introduce a novel model hardening method that removes any trojan (i.e. misbehavior) inserted to the NN models at training time. We carefully evaluate our method and establish the correct metrics to test the efficiency of defensive methods against these types of attacks: (1) accuracy with benign data, (2) attack success rate, and (3) accuracy with adversarial data. Prior work evaluates their solutions using the first two metrics only, which do not suffice to guarantee robustness against untargeted attacks. Our method is compared with the state-of-the-art. The obtained results show our method outperforms it. Second, we proposed a novel approach to detect anomalies using LSTM-based models. Our method analyzes at runtime the event sequences generated by the Endpoint Detection and Response (EDR) system of a renowned security company running and efficiently detects uncommon patterns. The new detecting method is compared with the EDR system. The results show that our method achieves a higher detection rate. Finally, we present a Moving Target Defense technique that smartly reacts upon the detection of anomalies so as to also mitigate the detected attacks. The technique efficiently replaces the entire stack of virtual nodes, making ongoing attacks in the system ineffective.</p><p> </p>
56

Investigation of hierarchical deep neural network structure for facial expression recognition

Motembe, Dodi 01 1900 (has links)
Facial expression recognition (FER) is still a challenging concept, and machines struggle to comprehend effectively the dynamic shifts in facial expressions of human emotions. The existing systems, which have proven to be effective, consist of deeper network structures that need powerful and expensive hardware. The deeper the network is, the longer the training and the testing. Many systems use expensive GPUs to make the process faster. To remedy the above challenges while maintaining the main goal of improving the accuracy rate of the recognition, we create a generic hierarchical structure with variable settings. This generic structure has a hierarchy of three convolutional blocks, two dropout blocks and one fully connected block. From this generic structure we derived four different network structures to be investigated according to their performances. From each network structure case, we again derived six network structures in relation to the variable parameters. The variable parameters under analysis are the size of the filters of the convolutional maps and the max-pooling as well as the number of convolutional maps. In total, we have 24 network structures to investigate, and six network structures per case. After simulations, the results achieved after many repeated experiments showed in the group of case 1; case 1a emerged as the top performer of that group, and case 2a, case 3c and case 4c outperformed others in their respective groups. The comparison of the winners of the 4 groups indicates that case 2a is the optimal structure with optimal parameters; case 2a network structure outperformed other group winners. Considerations were done when choosing the best network structure, considerations were; minimum accuracy, average accuracy and maximum accuracy after 15 times of repeated training and analysis of results. All 24 proposed network structures were tested using two of the most used FER datasets, the CK+ and the JAFFE. After repeated simulations the results demonstrate that our inexpensive optimal network architecture achieved 98.11 % accuracy using the CK+ dataset. We also tested our optimal network architecture with the JAFFE dataset, the experimental results show 84.38 % by using just a standard CPU and easier procedures. We also compared the four group winners with other existing FER models performances recorded recently in two studies. These FER models used the same two datasets, the CK+ and the JAFFE. Three of our four group winners (case 1a, case 2a and case 4c) recorded only 1.22 % less than the accuracy of the top performer model when using the CK+ dataset, and two of our network structures, case 2a and case 3c came in third, beating other models when using the JAFFE dataset. / Electrical and Mining Engineering
57

Normalization of Deep and Shallow CNNs tasked with Medical 3D PET-scans : Analysis of technique applicability

Pllashniku, Edlir, Stanikzai, Zolal January 2021 (has links)
There has in recent years been interdisciplinary research on utilizing machine learning for detecting and classifying neurodegenerative disorders with the sole goal of outperforming state-of-the-art models in terms of metrics such as accuracy, specificity, and sensitivity. Specifically, these studies have been conducted using existing networks on ”novel” methods of pre-processing data or by developing new convolutional neural networks. As of now, no work has looked into how different normalization techniques affect a deep or shallow convolutional neural network in terms of numerical stability, its performance, explainability, and interpretability. This work delves into what normalization technique is most suitable for deep and shallow convolutional neural networks. Two baselines were created, one shallow and one deep, and applied eight different normalization techniques to these model architectures. Conclusions were drawn based on our analysis of numerical stability, performance (metrics), and methods of Explainable Artificial Intelligence. Our findings indicate that normalization techniques affect models differently regarding the mentioned aspects of our analysis, especially numerical stability and explainability. Moreover, we show that there should indeed be a preference to select one method over the other in future studies of this interdisciplinary field.
58

Data Trustworthiness Assessment for Traffic Condition Participatory Sensing Scenario / Uppgifternas tillförlitlighet Bedömning av trafik Villkor Deltagande Scenario för avkänning

Gao, Hairuo January 2022 (has links)
Participatory Sensing (PS) is a common mode of data collection where valuable data is gathered from many contributors, each providing data from the user’s or the device’s surroundings via a mobile device, such as a smartphone. This has the advantage of cost-efficiency and wide-scale data collection. One of the application areas for PS is the collection of traffic data. The cost of collecting roving sensor data, such as vehicle probe data, is significantly lower than that of traditional stationary sensors such as radar and inductive loops. The collected data could pave the way for providing accurate and high-resolution traffic information that is important to transportation planning. The problem with PS is that it is open, and anyone can register and participate in a sensing task. A malicious user is likely to submit false data without performing the sensing task for personal advantage or, even worse, to attack on a large scale with clear intentions. For example, in real-time traffic monitoring, attackers may report false alerts of traffic jams to divert traffic on the road ahead or directly interfere with the system’s observation and judgment of road conditions, triggering large-scale traffic guidance errors. An efficient method of assessing the trustworthiness of data is therefore required. The trustworthiness problem can be approximated as the problem of anomaly detection in time-series data. Traditional predictive model-based anomaly detection models include univariate models for univariate time series such as Auto Regressive Integrated Moving Average (ARIMA), hypothesis testing, and wavelet analysis, and recurrent neural networks (RNNs) for multiple time series such as Gated Recurrent Unit (GRU) and Long short-term memory (LSTM). When talking about traffic scenarios, some prediction models that consider both spatial and temporal dependencies are likely to perform better than those that only consider temporal dependencies, such as Diffusion Convolutional Recurrent Neural Network (DCRNN) and Spatial-Temporal Attention Wavenet (STAWnet). In this project, we built a detailed traffic condition participatory sensing scenario as well as an adversary model. The attacker’s intent is refined into four attack scenarios, namely faking congestion, prolonging congestion, and masking congestion from the beginning or midway through. On the basis, we established a mechanism for assessing the trustworthiness of the data using three traffic prediction models. One model is the time-dependent deep neural network prediction model DCRNN, and the other two are a simplified version of the model DCRNN-NoCov, which ignores spatial dependencies, and ARIMA. The ultimate goal of this evaluation mechanism is to give a list of attackers and to perform data filtering. We use the success rate of distinguishing users as benign or attackers as a metric to evaluate the system’s performance. In all four attack scenarios mentioned above, the system achieves a success rate of more than 80%, obtaining satisfactory results. We also discuss the more desirable attack strategies from the attacker’s point of view. / Participatory Sensing (PS) är ett vanligt sätt att samla in data där värdefulla data samlas in från många bidragsgivare, som alla tillhandahåller data från användarens eller enhetens omgivning via en mobil enhet, t.ex. en smartphone. Detta har fördelen av kostnadseffektivitet och omfattande datainsamling. Ett av tillämpningsområdena för PS är insamling av trafikdata. Kostnaden för att samla in data från mobila sensorer, t.ex. data från fordonssonderingar, är betydligt lägre än kostnaden för traditionella stationära sensorer, t.ex. radar och induktiva slingor. De insamlade uppgifterna skulle kunna bana väg för att tillhandahålla exakt och högupplöst trafikinformation som är viktig för transportplaneringen. Problemet med deltagande avkänning är att den är öppen och att vem som helst kan registrera sig och delta i en avkänningsuppgift. En illasinnad användare kommer sannolikt att lämna in falska uppgifter utan att utföra avkänningsuppgiften för personlig vinning eller, ännu värre, för att angripa en stor skala med tydliga avsikter. Vid trafikövervakning i realtid kan t.ex. angripare rapportera falska varningar om trafikstockningar för att avleda trafiken på vägen framåt eller direkt störa systemets observation och bedömning av vägförhållanden, vilket kan utlösa storskaliga fel i trafikstyrningen. Det finns därför ett akut behov av en effektiv metod för att bedöma uppgifternas tillförlitlighet. Problemet med trovärdighet kan approximeras som problemet med upptäckt av anomalier i tidsserier. Traditionella modeller för anomalidetektion som bygger på prediktiva modeller omfattar univariata modeller för univariata tidsserier, t.ex. ARIMA (Autoregressive Integrated Moving Average), hypotesprövning och waveletanalys, och återkommande neurala nätverk (RNN) för flera tidsserier, t.ex. GRU (Gated Recurrent Unit) och LSTM (Long short-term memory). När man talar om trafikscenarier kommer vissa prognosmodeller som tar hänsyn till både rumsliga och tidsmässiga beroenden sannolikt att prestera bättre än de som endast tar hänsyn till tidsmässiga beroenden, till exempel Diffusion Convolutional Recurrent Neural Network (DCRNN) och Spatial-Temporal Attention Wavenet (STAWnet). I det här projektet byggde vi upp ett detaljerat scenario för deltagande av trafikförhållanden och en motståndarmodell. Angriparens avsikt är raffinerad i fyra angreppsscenarier, nämligen att fejka trafikstockning, förlänga trafikstockning och maskera trafikstockning från början eller halvvägs in i processen. På grundval av detta har vi inrättat en mekanism för att bedöma uppgifternas tillförlitlighet med hjälp av tre typiska trafikprognosmodeller. Den ena modellen är den tidsberoende djupa neurala nätverksförutsägelsemodellen DCRNN, och de andra två är en förenklad version av modellen DCRNN-NoCov, som ignorerar rumsliga beroenden, och ARIMA. Det slutliga målet med denna utvärderingsmekanism är att ge en lista över angripare och att utföra datafiltrering. Vi använder framgångsfrekvensen när det gäller att särskilja användare som godartade eller angripare som ett mått för att utvärdera systemets prestanda. I alla fyra olika attackscenarier som nämns ovan uppnår systemet en framgångsfrekvens på mer än 80%, vilket ger tillfredsställande resultat. Vi diskuterar också de mer önskvärda angreppsstrategierna ur angriparens synvinkel.
59

Évaluer le potentiel et les défis de la variation intraspécifique pour les réseaux neuronaux profonds de reconnaissance de chants d’oiseaux : l’exemple des bruants des prés (Passerculus sandwichensis) de l’île Kent, Nouveau-Brunswick

Rondeau Saint-Jean, Camille 08 1900 (has links)
Les réseaux neuronaux profonds sont des outils prometteurs pour l'évaluation de la biodiversité aviaire, en particulier pour la détection des chants et la classification acoustique des espèces. Toutefois, on connaît mal l’étendue de leur capacité de généralisation face à la variation intraspécifique présente dans les chants d’oiseaux, ce qui pourrait mener à des biais. Notre étude porte sur l'évaluation des performances de BirdNET, un réseau neuronal profond, pour le traitement d’un corpus d'enregistrements audio caractérisés par une variation intraspécifique significative, en utilisant l’exemple du chant du bruant des prés (Passerculus sandwichensis). Dans la population de l'île de Kent, au Nouveau-Brunswick, les individus sont suivis et enregistrés grâce à leurs bagues de couleur et la présence de microdialectes est solidement documentée. Nous avons recueilli et annoté 69 606 chants provenant de 52 individus et analysé ces données à l'aide d’une version récente de BirdNET. Nos résultats révèlent que BirdNET démontre une précision globale suffisante, prédisant correctement 81,9 % des chants, ce qui dépasse les résultats rapportés par ses développeurs. Toutefois, nous avons observé une variation considérable dans les scores de confiance et les taux de prédiction exactes entre les individus, ce qui suggère des biais potentiels. Cependant, nos recherches n'ont pas mis en évidence de variation entre les résultats des différents microdialectes, ce qui souligne la relative robustesse de l'algorithme. Nous avançons que la variation observée entre les individus est due au fait que certains d’entre eux chantent systématiquement plus près des microphones, résultant en des chants plus clairs donc plus faciles à identifier. Pour mieux comprendre le processus de prise de décision de BirdNET, nous avons tenté de produire des cartes d'activation de classe, qui constituent un outil précieux pour identifier les éléments d’un chant qui déterminent une prédiction. Cependant, il ne nous a pas été possible d’obtenir des cartes d’activation de classe d’après la version actuellement disponible du code de BirdNET sans avoir recours à des connaissances avancées en informatique. L'accès à des outils explicatifs adaptés aux innovations récentes dans les architectures de réseaux neuronaux 4 profonds serait crucial pour mieux interpréter les résultats et renforcer la confiance des utilisateurs. Nos résultats soulignent la nécessité de poursuivre les recherches sur la capacité de généralisation des réseaux neuronaux profonds pour la bioacoustique en utilisant des ensembles de données monospécifiques portant sur de plus longues périodes ou des aires de répartition géographique plus vastes. En outre, l'extension de cette étude à des espèces ayant des répertoires plus importants ou des différences plus subtiles entre le chant des individus pourrait nous informer davantage sur les limites et le potentiel des algorithmes d'apprentissage profond pour la détection et la classification acoustiques des espèces. En conclusion, notre étude démontre les performances prometteuses de BirdNET pour le traitement d'un large corpus de chants de bruants des prés, et confirme son potentiel en tant qu'outil précieux pour l'évaluation de la biodiversité aviaire. Les biais dus aux techniques d’enregistrement et la variation dans les taux de succès observés entre les individus méritent d'être étudiés plus en détail. / Machine learning, particularly deep neural networks, has gained prominence as a valuable tool in ecological studies and wildlife conservation planning. In the field of avian biodiversity assessment, deep neural networks have shown remarkable promise, particularly in acoustic species detection and classification. Despite their success, a critical knowledge gap exists concerning the generalization ability of these algorithms across intraspecific variation in bird song. This raises concerns about potential biases and misinterpretation of results. This study focuses on evaluating the performance of BirdNET, a deep neural network, in processing audio recordings characterized by significant intraspecific variation in the Savannah Sparrow (Passerculus sandwichensis) song. Savannah Sparrows are an ideal candidate for this investigation, given their well-studied population on Kent Island, New Brunswick, Canada. Each male sings a unique, unchanging song throughout its life, and the population exhibits well-documented geographical microdialects. We collected a large corpus of Savannah Sparrow songs using autonomous and focal recorders on Kent Island, yielding a total of 69,606 manually annotated songs from 52 different sparrows. We analyzed the audio data using BirdNET-Analyzer. The resulting confidence scores were used to assess the algorithm's performance across microdialects and individual birds. Our results revealed that BirdNET exhibited considerable overall accuracy, correctly predicting 81.9% of the songs, which surpassed the results reported by the developers of BirdNET. We observed variations in BirdNET's confidence scores among individual birds, suggesting potential biases in its classifications. However, our investigation indicated no evidence of distinct biases towards specific microdialects, highlighting the algorithm's relative robustness across these groups. We suspect that the variation observed amongst individuals is caused by the fact that some were singing consistently closer to microphones, yielding clearer songs. To gain insights into BirdNET's decision-making process, we sought to employ class activation maps, a valuable tool for identifying essential song elements contributing to species predictions. However, we were unable to produce class activation maps from the current version of BirdNET 6 without advanced computer science skills. Access to informative tools adapted to recent innovations in deep neural network architectures for bioacoustic applications is crucial for understanding and interpreting results better. Such tools would enhance user confidence and favour accountability for conservation decisions based on these predictions. Our findings underscore the need for further research investigating the generalization capacity of deep neural networks in bioacoustics on single-species datasets with more extensive intraspecific variation and broader geographical ranges. Additionally, expanding this investigation to species with larger song repertoires or more subtle inter-individual song differences could provide valuable insights into the limits and potential of deep learning algorithms for acoustic species detection and classification. In conclusion, our study demonstrates BirdNET's promising performance in processing a large corpus of Savannah Sparrow songs, highlighting its potential as a valuable tool for avian biodiversity assessment. Biases and variations in confidence scores observed across individual birds warrant further investigation.
60

Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN

Nord, Sofia January 2021 (has links)
Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field. / När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område.

Page generated in 0.1212 seconds