Global ETD Search

11	Redundant and Irrelevant Attribute Elimination using Autoencoders / Redundant och irrelevant attributeliminering med autoencoders Granskog, Tim January 2017 (has links) Real-world data can often be high-dimensional and contain redundant or irrelevant attributes. High-dimensional data are problematic for machine learning as the high dimensionality causes learning to take more time and, unless the dataset is sufficiently large to provide an ample number of samples for each class, the accuracy will suffer. Redundant and irrelevant attributes cause the data to take on a higher dimensionality than necessary and obfuscates the important attributes. Because of this, it is of interest to be able to reduce the dimensionality of the data whilst preserving the important attributes. Several techniques have been presented in the field of computer science in order to reduce the dimensionality of data. One of these is the autoencoder which is an unsupervised learning neural network which uses its input as the target output, and by limiting the number of neurons in the hidden layer the autoencoder is forced to learn a lower dimensional representation of the data. This study focuses on using the autoencoder to reduce the dimensionality, and eliminate irrelevant or redundant attributes, of four different datasets from different domains. The results show that the autoencoder can eliminate redundant attributes, that are a linear combination of the other attributes, and provide a better lower dimensional representation of the data than that of the unreduced data. However, in data that is gathered under a controlled and carefully managed situation, the autoencoder cannot always provide a better lower dimensional representation than the data with redundant attributes. Lastly, the results show that the autoencoder cannot eliminate irrelevant attributes which have no correlation to the class or other attributes. / Verklig data kan ofta vara högdimensionella och innehålla överflödiga eller irrelevanta attribut. Högdimensionell data är problematisk för maskininlärning, eftersom det medför att lärandet tar längre tid och om inte datasetet är tillräckligt stort för att ge ett tillräckligt antal instanser för varje klass kommer precisionen att drabbas. Överflödiga och irrelevanta attribut gör att datan får en högre dimension än vad som är nödvändigt och gör de svårare att avgöra vilka de viktiga attributen är. På grund av detta är det av intresse att kunna reducera datans dimensionalitet samtidigt som de viktiga attributen bevaras. Flera tekniker har presenterats för dimensionsreducering av data. En utav dessa tekniker är autoencodern, som är ett oövervakat lärande neuralt nätverk som använder sin indata som målutdata, och genom att begränsa antalet neuroner i det dolda lagret tvingas autoencodern att lära sig en representation av datan i en lägre dimension. Denna studie fokuserar på att använda autoencodern för att minska dimensionerna och eliminera irrelevanta eller överflödiga attribut, av fyra olika dataset från olika domäner. Resultaten visar att autoenkodern kan eliminera redundanta attribut, som är en linjär kombination av de andra attributen, och ge en bättre lägre dimensionell representation av datan än den ej reducerade datan. I data som samlats in under en kontrollerad och noggrant hanterad situation kan emellertid autoencodern inte alltid ge en bättre lägre dimensionell representation än datan med redundanta attribut. Slutligen visar resultaten att autoencodern inte kan eliminera irrelevanta attribut, som inte har någon korrelation med klassen eller andra attribut. Computer Sciences Datavetenskap (datalogi)
12	Credit Card Transaction Fraud Detection Using Neural Network Classifiers / Detektering av bedrägliga korttransaktioner m.h.a neurala nätverk Nazeriha, Ehsan January 2023 (has links) With increasing usage of credit card payments, credit card fraud has also been increasing. Therefore a fast and accurate fraud detection system is vital for the banks. To solve the problem of fraud detection, different machine learning classifiers have been designed and trained on a credit card transaction dataset. However, the dataset is heavily imbalanced which poses a problem for the performance of the algorithms. To resolve this issue, the generative methods Generative Adversarial Network (GAN), Variational Autoencoders (VAE) and Synthetic Minority Oversampling Technique (SMOTE) have been used to generate synthetic samples for the minority class in order to achieve a more balanced dataset. The main purpose of this study is to evaluate the generative methods and investigate the impact of their generated minority samples on the classifiers. The results from this study indicated that GAN does not outperform the other classifiers as the generated samples from VAE were most effective in three out of five classifiers. Also the validation and histogram of the generated samples indicate that the VAE samples have captured the distribution of the data better than SMOTE and GAN. A suggestion to improve on this work is to perform data engineering on the dataset. For instance, using correlation analysis for the features and analysing which features have the greatest impact on the classification and subsequently dropping the less important features and train the generative methods and classifiers with the trimmed down samples. / Med ökande användning av kreditkort som betalningsmetod i världen, har även kreditkort bedrägeri ökat. Därför finns det behov av ett snabbt och tillförligt system för att upptäcka bedrägliga transkationer. För att lösa problemet med att detektera kreditkort bedrägerier, har olika maskininlärnings klassifiseringsmetoder designats och tränats med ett dataset som innehåller kreditkortstransaktioner. Dock är dessa dataset väldigt obalanserade och innehåller mest normala transaktioner, vilket är problematiskt för systemets noggranhet vid klassificering. Därför har generativa metoderna Generative adversarial networks, Variational autoencoder och Synthetic minority oversampling technique använs för att skapa syntetisk data av minoritetsklassen för att balansera datasetet och uppnå bättre noggranhet. Det centrala målet med denna studie var därmed att evaluera dessa generativa metoder och invetigera påverkan av de syntetiska datapunkterna på klassifiseringsmetoderna. Resultatet av denna studie visade att den generativa metoden generative adversarial networks inte överträffade de andra generativa metoderna då syntetisk data från variational autoencoders var mest effektiv i tre av de fem klassifisieringsmetoderna som testades i denna studie. Dessutom visar valideringsmetoden att variational autoencoder lyckades bäst med att lära sig distributionen av orginal datat bättre än de andra generativa metoderna. Ett förslag för vidare utveckling av denna studie är att jobba med data behandling på datasetet innan datasetet används för träning av algoritmerna. Till exempel kan man använda korrelationsanalys för att analysera vilka features i datasetet har störst påverkan på klassificeringen och därmed radera de minst viktiga och sedan träna algortimerna med data som innehåller färre features. GAN Deep Learning Variational Autoencoder Anomaly Detection SMOTE GAN Djupinlärning Variational Autoencoder Anomali detektering SMOTE Computer and Information Sciences Data- och informationsvetenskap
13	Fraud Detection on Unlabeled Data with Unsupervised Machine Learning / Bedrägeridetektering på omärkt data med oövervakad maskininlärning Renström, Martin, Holmsten, Timothy January 2018 (has links) A common problem in systems handling user interaction was the risk for fraudulent behaviour. As an example, in a system with credit card transactions it could have been a person using a another user's account for purchases, or in a system with advertisment it could be bots clicking on ads. These malicious attacks were often disguised as normal interactions and could be difficult to detect. It was especially challenging when working with datasets that did not contain so called labels, which showed if the data point was fraudulent or not. This meant that there were no data that had previously been classified as fraud, which in turn made it difficult to develop an algorithm that could distinguish between normal and fraudulent behavior. In this thesis, the area of anomaly detection was explored with the intent of detecting fraudulent behavior without labeled data. Three neural network based prototypes were developed in this study. All three prototypes were some sort of variation of autoencoders. The first prototype which served as a baseline was a simple three layer autoencoder, the second prototype was a novel autoencoder which was called stacked autoencoder, the third prototype was a variational autoencoder. The prototypes were then trained and evaluated on two different datasets which both contained non fraudulent and fraudulent data. In this study it was found that the proposed stacked autoencoder architecture achieved better performance scores in recall, accuracy and NPV in the tests that were designed to simulate a real world scenario. / Ett vanligt problem med användares interaktioner i ett system var risken för bedrägeri. För ett system som hanterarade dataset med kreditkortstransaktioner så kunde ett exempel vara att en person använde en annans identitet för kortköp, eller i system som hanterade reklam så skulle det kunna ha varit en automatiserad mjukvara som simulerade interaktioner. Dessa attacker var ofta maskerade som normala interaktioner och kunde därmed vara svåra att upptäcka. Inom dataset som inte har korrekt märkt data så skulle det vara speciellt svårt att utveckla en algoritm som kan skilja på om interaktionen var avvikande eller inte. I denna avhandling så utforskas ämnet att upptäcka anomalier i dataset utan specifik data som tyder på att det var bedrägeri. Tre prototyper av neurala nätverk användes i denna studie som tränades och utvärderades på två dataset som innehöll både data som sade att det var bedrägeri och inte bedrägeri. Den första prototypen som fungerade som en bas var en simpel autoencoder med tre lager, den andra prototypen var en ny autoencoder som har fått namnet staplad autoencoder och den tredje prototypen var en variationell autoencoder. För denna studie så gav den föreslagna staplade autoencodern bäst resultat för återkallelse, noggrannhet och NPV i de test som var designade att efterlikna ett verkligt scenario. Anomaly detection fraud detection unsupervised machine learning autoencoder Anomali detektering bedrägeri detektering oövervakad maskininlärning autoencoder Computer Sciences Datavetenskap (datalogi)
14	Noise Robustness of Convolutional Autoencoders and Neural Networks for LPI Radar Classification / Brustålighet hos faltningsbaserade neurala nätverk för klassificering av LPI radar Norén, Gustav January 2020 (has links) This study evaluates noise robustness of convolutional autoencoders and neural networks for classification of Low Probability of Intercept (LPI) radar modulation type. Specifically, a number of different neural network architectures are tested in four different synthetic noise environments. Tests in Gaussian noise show that performance is decreasing with decreasing Signal to Noise Ratio (SNR). Training a network on all SNRs in the dataset achieved a peak performance of 70.8 % at SNR=-6 dB with a denoising autoencoder and convolutional classifier setup. Tests indicate that the models have a difficult time generalizing to SNRs lower than what is provided in training data, performing roughly 10-20% worse than when those SNRs are included in the training data. If intermediate SNRs are removed from the training data the models can generalize and perform similarly to tests where, intermediate noise levels are included in the training data. When testing data is generated with different parameters to training data performance is underwhelming, with a peak performance of 22.0 % at SNR=-6 dB. The last tests done use telecom signals as additive noise instead of Gaussian noise. These tests are performed when the LPI and telecom signals appear at different frequencies. The models preform well on such cases with a peak performance of 80.3 % at an intermidiate noise level. This study also contribute with a different, and more realistic, way of generating data than what is prevalent in literature as well as a network that performs well without the need for signal preprocessing. Without preprocessing a peak performance of 64.9 % was achieved at SNR=-6 dB. It is customary to generate data such that each sample always includes the start of its signals period which increases performance by around 20 % across all tests. In a real application however it is not certain that the start of a received signal can be determined. / Detta arbete studerar brustålighet hos neurala nätverk för klassificering av \textit{låg sannolikhet för avlyssning} (LPI) radars modulationstyp. Specifikt testas ett antal arkitekturer baserade på faltningsnätverk och evalueras i fyra olika syntetiska brusmiljöer. Tester genomförda på data med Gaussiskt brus visar att klasificeringsfelet är ökande med ett minskande signal-till-brusförhållande. Om man låter nätverken träna på alla brusnivåer som ingår i datan uppnås en högsta pricksäkerhet om 70.8 % vid ett signal-till-brusförhållande på -6 dB. Vidare tester tyder på att nätverken presterar sämre på låga signal-till-brusförhållanden om de inte finns med i träningsdata och ger i allmänhet mellan 10-20 % sämre pricksäkerhet. Om de mellersta brusnivåerna inte finns med i träningsdata presterar nätverken lika bra som när de finns med i träningsdata. Om träningsdata och testdata genereras med olika parameterar presterar nätverken dåligt. För dessa tester uppnås en högsta pricksäkerhet om 22.0 % vid ett signal-till-brusförhållande på -6 dB. Den sista brusmiljön som testades på använder sig av telekom signaler som om de vore brus istället för Gaussiskt brus. I detta fall är LPI och telekom signalerna väl skiljda i frekvens och nätverken presterar lika bra som tester i Gaussiskt brus med högt signal-till-brusförhållande. Högsta pricksäkerhet som uppnåts på dessa tester är 80.3 % i mellanhög brusnivå. Detta arbete bidrar även med nätverk som presterar bra utan att data behöver signalbehandlas innnan den kan klassificeras samt genererar data på ett mer realistiskt vis än tidigare litteratur inom detta område. Utan att signalbehandla data uppnåddes en högsta pricksäkerhet om 64.9 % vid ett signal-till-brusförhållande på -6 dB. Den mer realistiska datan genereras så att dess startpunkt är slumpmässig. I litteraturen brukar startpunkten inkluderas och uppnår på så vis överlag pricksäkerheter som är ungefär 20 % högre än de tester som genomförs i detta arbete. I verkliga applikationer är det sällan man kan identifera en signals startpunkt med säkerhet. LPI radar CNN autoencoder noise robustness denoising LPI radar CNN autoencoder brustålighet avbrusning Probability Theory and Statistics Sannolikhetsteori och statistik
15	[en] QUALITY ENHANCEMENT OF HIGHLY DEGRADED MUSIC USING DEEP LEARNING-BASED PREDICTION MODELS / [pt] RECONSTRUÇÃO DE MÚSICAS ALTAMENTE DEGRADADAS USANDO MODELOS DE APRENDIZADO PROFUNDO ARTHUR COSTA SERRA 21 October 2022 (has links) [pt] A degradação da qualidade do áudio pode ter muitas causas. Para aplicações musicais, esta fragmentação pode levar a experiências altamente desagradáveis. Algoritmos de restauração podem ser empregados para reconstruir partes do áudio de forma semelhante à reconstrução da imagem, em uma abordagem chamada Audio Inpainting. Os métodos atuais de última geração para Audio Inpainting cobrem cenários limitados, com janelas de intervalo bem definidas e pouca variedade de gêneros musicais. Neste trabalho, propomos um método baseado em aprendizado profundo para Audio Inpainting acompanhado por um conjunto de dados com condições de fragmentação aleatórias que se aproximam de situações reais de deficiência. O conjunto de dados foi coletado utilizando faixas de diferentes gêneros musicais, o que proporciona uma boa variabilidade de sinal. Nosso melhor modelo melhorou a qualidade de todos os gêneros musicais, obtendo uma média de 13,1 dB de PSNR, embora tenha funcionado melhor para gêneros musicais nos quais os instrumentos acústicos são predominantes. / [en] Audio quality degradation can have many causes. For musical applications, this fragmentation may lead to highly unpleasant experiences. Restoration algorithms may be employed to reconstruct missing parts of the audio in a similar way as for image reconstruction - in an approach called audio inpainting. Current state-of-theart methods for audio inpainting cover limited scenarios, with well-defined gap windows and little variety of musical genres. In this work, we propose a Deep-Learning-based (DLbased) method for audio inpainting accompanied by a dataset with random fragmentation conditions that approximate real impairment situations. The dataset was collected using tracks from different music genres to provide a good signal variability. Our best model improved the quality of all musical genres, obtaining an average of 13.1 dB of PSNR, although it worked better for musical genres in which acoustic instruments are predominant. [pt] APRENDIZADO PROFUNDO [pt] RECONSTRUCAO DE MUSICA [pt] AUDIO INPAINTING [pt] AUTOENCODER [en] DEEP LEARNING [en] MUSIC RECONSTUCTION [en] AUDIO INPAINTING [en] AUTOENCODER
16	Decoding Neural Signals Associated to Cytokine Activity / Identifiering av Nervsignaler Associerade Till Cytokin Aktivitet Andersson, Gabriel January 2021 (has links) The Vagus nerve has shown to play an important role regarding inflammatory diseases, regulating the production of proteins that mediate inflammation. Two important such proteins are the pro-inflammatory cytokines, TNF and IL-1β. This thesis makes use of Vagus nerve recordings, where TNF and IL-1β are subsequently injected in mice, with the aim to see if cytokine-specific information can be extracted. To this end, a type of semi-supervised learning approach is applied, where the observed waveform-data are modeled using a conditional probability distribution. The conditioning is done based on an estimate of how often each observed waveform occurs and local maxima of the conditional distribution are interpreted as candidate-waveforms to encode cytokine information. The methodology yields varying, but promising results. The occurrence of several candidate waveforms are found to increase substantially after exposure to cytokine. Difficulties obtaining coherent results are discussed, as well as different approaches for future work. / Vagusnerven har visat sig spela en viktig roll beträffande inflammatoriska sjukdomar. Denna nerv reglerar produktionen av inflammatoriska protein, som de inflammationsfrämjande cytokinerna TNF och IL-1β. Detta arbete använder sig av elektroniska mätningar av Vagusnerven i möss som under tiden blir injicerade med de två cytokinerna TNF och IL-1β. Syftet med arbetet är att undersöka om det är möjligt att extrahera information om de specifika cytokinerna från Vagusnervmätningarna. För att uppnå detta designar vi en semi-vägledd lärandemetod som modellerar dem observerade vågformerna med en betingad sannolikhetsfunktion. Betingandet baseras på en uppskattning av hur ofta varje enskild vågform förekommer och lokala maximum av den betingade sannolikhetsfunktionen tolkas som möjliga kandidat-vågformer att innehålla cytokin-information. Metodiken ger varierande, men lovande resultat. Förekomsten av flertalet kandidat-vågformer har en tydlig ökning efter tidpunkten för cytokin-injektion. Vidare så diskuteras svårigheter i att uppnå konsekventa resultat för alla mätningar, samt olika möjligheter för framtida arbete inom området. Cytokines Neural Signals Vagus Nerve Variational Inference Variational Autoencoder Cytokiner Nervsignaler Vagusnerven Variational inference Variational autoencoder Mathematics Matematik
17	A study on the similarities of Deep Belief Networks and Stacked Autoencoders de Giorgio, Andrea January 2015 (has links) Restricted Boltzmann Machines (RBMs) and autoencoders have been used - in several variants - for similar tasks, such as reducing dimensionality or extracting features from signals. Even though their structures are quite similar, they rely on different training theories. Lately, they have been largely used as building blocks in deep learning architectures that are called deep belief networks (instead of stacked RBMs) and stacked autoencoders. In light of this, the student has worked on this thesis with the aim to understand the extent of the similarities and the overall pros and cons of using either RBMs, autoencoders or denoising autoencoders in deep networks. Important characteristics are tested, such as the robustness to noise, the influence on training of the availability of data and the tendency to overtrain. The author has then dedicated part of the thesis to study how the three deep networks in exam form their deep internal representations and how similar these can be to each other. In result of this, a novel approach for the evaluation of internal representations is presented with the name of F-Mapping. Results are reported and discussed. Deep Learning Restricted Boltzmann Machine Stacked Autoencoder Autoencoder Depp Belief Network Machine Learning Computer Science F-Mapping Elektroteknik och elektronik
18	EVALUATING PERFORMANCE OF GENERATIVE MODELS FOR TIME SERIES SYNTHESIS Haris, Muhammad Junaid January 2023 (has links) Motivated by successes in the image generation domain, this thesis presents a novel Hybrid VQ-VAE (H-VQ-VAE) approach for generating realistic synthetic time series data with categorical features. The primary motivation behind this work is to address the limitations of existing generative models in accurately capturing the underlying structure and patterns of time series data, especially when dealing with categorical features. Our proposed H-VQ-VAE model builds upon the foundation of the VQ-VAE architecture and consists of two separate VQ-VAEs: the whole VQ-VAE and the sliding VQ-VAE. Both models share a ResNet-based architecture with conv1d layers to effectively capture the temporal structure within the time series data. The whole VQ-VAE focuses on entire sequences of data to learn relationships between categorical and numerical features, while the sliding VQ-VAE exclusively processes numerical features using a sliding window approach. We conducted experiments on multiple datasets to evaluate the performance of our H-VQ-VAE model in comparison with the original VQ-VAE and TimeGAN models. Our evaluation used a train-on-real and test-on-synthetic approach, focusing on metrics such as Mean Absolute Error (MAE) and Explained Variance (EV). The H-VQ-VAE model achieved a 25-50% better MAE for numerical features compared to the VQ-VAE and outperformed TimeGAN by 45-75% on the complex dataset indicating its effectiveness in capturing the underlying structure and patterns of the time series data. In conclusion, the H-VQ-VAE model offers a promising approach for generating realistic synthetic time series data with categorical features, with potential applications in various fields where accurate data generation is crucial. GAN Generative Adversarial Network VQ-VAE Vector Quantized Variational AutoEncoder AutoEncoder VAE Time Series Synthesizing Data Synthesis Computer Sciences Datavetenskap (datalogi)
19	Narrow Pretraining of Deep Neural Networks : Exploring Autoencoder Pretraining for Anomaly Detection on Limited Datasets in Non-Natural Image Domains Eriksson, Matilda, Johansson, Astrid January 2022 (has links) Anomaly detection is the process of detecting samples in a dataset that are atypical or abnormal. Anomaly detection can for example be of great use in an industrial setting, where faults in the manufactured products need to be detected at an early stage. In this setting, the available image data might be from different non-natural domains, such as the depth domain. However, the amount of data available is often limited in these domains. This thesis aims to investigate if a convolutional neural network (CNN) can be trained to perform anomaly detection well on limited datasets in non-natural image domains. The attempted approach is to train the CNN as an autoencoder, in which the CNN is the encoder network. The encoder is then extracted and used as a feature extractor for the anomaly detection task, which is performed using Semantic Pyramid Anomaly Detection (SPADE). The results are then evaluated and analyzed. Two autoencoder models were used in this approach. As the encoder network, one of the models uses a MobileNetV3-Small network that had been pretrained on ImageNet, while the other uses a more basic network, which is a few layers deep and initialized with random weights. Both these networks were trained as regular convolutional autoencoders, as well as variational autoencoders. The results were compared to a MobileNetV3-Small network that had been pretrained on ImageNet, but had not been trained as an autoencoder. The models were tested on six different datasets, all of which contained images from the depth and intensity domains. Three of these datasets additionally contained images from the scatter domain, and for these datasets, the combination of all three domains was tested as well. The main focus was however on the performance in the depth domain. The results show that there is generally an improvement when training the more complex autoencoder on the depth domain. Furthermore, the basic network generally obtains an equivalent result to the more complex network, suggesting that complexity is not necessarily an advantage for this approach. Looking at the different domains, there is no apparent pattern to which domain yields the best performance. This rather seems to depend on the dataset. Lastly, it was found that training the networks as variational autoencoders did generally not improve the performance in the depth domain compared to the regular autoencoders. In summary, an improved anomaly detection was obtained in the depth domain, but for optimal anomaly detection with regard to domain and network, one must look at the individual datasets. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> CNN convolutional neural network autoencoder variational autoencoder anomaly detection SPADE limited dataset non-natural image domain depth domain scatter domain intensity domain machine learning Media and Communication Technology Medieteknik
20	VAE-clustering of neural signals and their association to cytokines / VAE-klustring av nervsignaler och dess associationer till cytokiner Eskandari, Aram January 2020 (has links) In this thesis we start by reproducing previous experiments by Zanos et al., where they have shown that it is possible to associate neural signals with specific cytokines. One future aim of this project is to send synthetic neural signals through the efferent arc of the vagus nerve and observe reactions without the corresponding catalyst of the symptoms. We use a variational autoencoder (VAE) in our experiment to create a model able to generate new neural signals, and we introduce a novel clustering technique called VAE-clustering, which will be used to cluster neural signals with their associated cytokines. The focus of this paper is the implementation of this method and applying it on the neural signals. Running VAE-clustering on the MNIST dataset shows it to be viable for finding detailed properties of a dataset. We also find that using a VAE as a generative model for neural signals is a good way for recreating detailed waveforms. / I detta examensarbete börjar vi med att reproducera tidigare experiment av Zanos et al., där dom visat att det är möjligt att associera nervsignaler med specifika cytokiner. Ett framtida mål med detta projekt är att skicka syntetiska nervsignaler till kroppen för att observera reaktioner utan motsvarande katalysator av symptomen. Vi använder en variational autoencoder (VAE) i våra experiment för att skapa en modell kapabel till att generera nya nervsignaler, och vi introducerar en ny klusterings-teknik kallad VAE-klustring, vilken kommer att användas för att klustra nervsignaler med dess associerade cytokiner. Fokuset i detta arbete ligger i implementationen av denna metod och applicerandet på nervsignaler. Efter att ha kört VAE-klustring på MNIST dataset fann vi att det det är användbart för att hitta detaljerade egenskaper hos ett dataset. Vi har även funnit att användningen av en VAE som en generativ modell för nervsignaler är ett bra sätt att återskapa detaljerade vågformer. Statistics applied mathematics variational autoencoder cytokines VAE-clustering neuron signals Statistik tillämpad matematik variational autoencoder cytokiner VAE-klustring nervsignaler Probability Theory and Statistics Sannolikhetsteori och statistik

Search results