Global ETD Search

31	EXTRACTING REGIONS OF INTEREST AND DETECTING OUTLIERS FROM IMAGE DATA Ström, Jessica, Backhans, Erik January 2023 (has links) Volvo Construction Equipment (CE) are facing the challenge of vibrations in their wheel loaders that generate disruptive noise and impact the driver's experience. These vibrations have been linked to the contact surface between the crown wheel and pinion gear in the vehicles drive-axles. In response, this thesis was created to develop an Artificial Intelligence (AI) system, which can identify outliers in a dataset containing images of the contact surfaces between the crown wheel and pinion gear. However, the dataset exhibits variations in image sharpness, exposure and centering of the crown wheel, which hinders its suitability for machine vision tasks. The varying quality of the images poses the challenge of accurately extracting relevant features required to analyze the images through machine learning algorithms. This research aims to address these challenges by investigating two research questions. (1) what method can be employed to extract the Region of Interest (ROI) in images of crown wheels? And (2) which method is suitable for detection of outliers within the ROI? To find answers to these questions, a literature study was conducted leading up to the implementation of two architectures: You Only Look Once (YOLO) v5 Oriented Bounding Boxes (OBB) and a Hybrid Autoencoder (BAE). Visual evaluation of the results showed promising outcomes particularly for the extraction of ROIs, where the relevant areas were accurately identified despite the large variations in image quality. The BAE successfully identified outliers that deviated from the majority, however, the results of the model were influenced by the differences in image quality, rather than the geometrical shape of the contact patterns. These findings suggest that using the same feature extraction method on a higher-quality dataset or employing a more robust segmentation method, could increase the likelihood of identifying the contact patterns responsible for the vibrations. Artificial Intelligence (AI) Machine Vision Outlier Detection Autoencoders Neural Networks Region Of Interest (ROI) Computer Sciences Datavetenskap (datalogi)
32	Multivariate analysis of the parameters in a handwritten digit recognition LSTM system / Multivariat analys av parametrarna i ett LSTM-system för igenkänning av handskrivna siffror Zervakis, Georgios January 2019 (has links) Throughout this project, we perform a multivariate analysis of the parameters of a long short-term memory (LSTM) system for handwritten digit recognition in order to understand the model’s behaviour. In particular, we are interested in explaining how this behaviour precipitate from its parameters, and what in the network is responsible for the model arriving at a certain decision. This problem is often referred to as the interpretability problem, and falls under scope of Explainable AI (XAI). The motivation is to make AI systems more transparent, so that we can establish trust between humans. For this purpose, we make use of the MNIST dataset, which has been successfully used in the past for tackling digit recognition problem. Moreover, the balance and the simplicity of the data makes it an appropriate dataset for carrying out this research. We start by investigating the linear output layer of the LSTM, which is directly associated with the models’ predictions. The analysis includes several experiments, where we apply various methods from linear algebra such as principal component analysis (PCA) and singular value decomposition (SVD), to interpret the parameters of the network. For example, we experiment with different setups of low-rank approximations of the weight output matrix, in order to see the importance of each singular vector for each class of the digits. We found out that cutting off the fifth left and right singular vectors the model practically losses its ability to predict eights. Finally, we present a framework for analysing the parameters of the hidden layer, along with our implementation of an LSTM based variational autoencoder that serves this purpose. / I det här projektet utför vi en multivariatanalys av parametrarna för ett long short-term memory system (LSTM) för igenkänning av handskrivna siffror för att förstå modellens beteende. Vi är särskilt intresserade av att förklara hur detta uppträdande kommer ur parametrarna, och vad i nätverket som ligger bakom den modell som kommer fram till ett visst beslut. Detta problem kallas ofta för interpretability problem och omfattas av förklarlig AI (XAI). Motiveringen är att göra AI-systemen öppnare, så att vi kan skapa förtroende mellan människor. I detta syfte använder vi MNIST-datamängden, som tidigare framgångsrikt har använts för att ta itu med problemet med igenkänning av siffror. Dessutom gör balansen och enkelheten i uppgifterna det till en lämplig uppsättning uppgifter för att utföra denna forskning. Vi börjar med att undersöka det linjära utdatalagret i LSTM, som är direkt kopplat till modellernas förutsägelser. Analysen omfattar flera experiment, där vi använder olika metoder från linjär algebra, som principalkomponentanalys (PCA) och singulärvärdesfaktorisering (SVD), för att tolka nätverkets parametrar. Vi experimenterar till exempel med olika uppsättningar av lågrangordnade approximationer av viktutmatrisen för att se vikten av varje enskild vektor för varje klass av siffrorna. Vi upptäckte att om man skär av den femte vänster och högervektorn förlorar modellen praktiskt taget sin förmåga att förutsäga siffran åtta. Slutligen lägger vi fram ett ramverk för analys av parametrarna för det dolda lagret, tillsammans med vårt genomförande av en LSTM-baserad variational autoencoder som tjänar detta syfte. Deep Learning Interpretability Handwritten Digit Recognition MNIST Recurrent Neural Networks PCA SVD Variational Autoencoders Computer and Information Sciences Data- och informationsvetenskap
33	Deep Synthetic Noise Generation for RGB-D Data Augmentation Hammond, Patrick Douglas 01 June 2019 (has links) Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets. RGB-D images depth completion synthetic augmentation deep-generative neural networks variational autoencoders conditional GANs Computer Sciences Physical Sciences and Mathematics
34	Decoding communication of non-human species - Unsupervised machine learning to infer syntactical and temporal patterns in fruit-bats vocalizations. Assom, Luigi January 2023 (has links) Decoding non-human species communication offers a unique chance to explore alternative intelligence forms using machine learning. This master thesis focuses on discreteness and grammar, two of five linguistic areas machine learning can support, and tackles inferring syntax and temporal structures from bioacoustics data annotated with animal behavior. The problem lies in a lack of species-specific linguistic knowledge, time-consuming feature extraction and availability of limited data; additionally, unsupervised clustering struggles to discretize vocalizations continuous to human perception due to unclear parameter tuning to preprocess audio. This thesis investigates unsupervised learning to generalize deciphering syntax and short-range temporal patterns in continuous-type vocalizations, specifically fruit-bats, to address the research questions: How does dimensionality reduction affect unsupervised manifold learning to quantify size and diversity of the animal repertoire? and How do syntax and temporal structure encode contextual information? An experimental strategy is designed to improve effectiveness of unsupervised clustering for quantifying the repertoire and to investigate linguistic properties with classifiers and sequence mining; acoustic segments are collected from a dataset of fruit-bat vocalizations annotated with behavior. The methodology keeps clustering methods constant while varying dimensionality reduction techniques on spectrograms and their latent representations learnt by Autoencoders. Uniform Manifold Approximation and Projection (UMAP) embeds data into a manifold; density-based clusterings are applied to its embeddings and compared with agglomerative-based labels, used as ground-truth proxy to test robustness of models. Vocalizations are encoded into label sequences. Syntactic rules and short-range patterns in sequences are investigated with classifiers (Support Vector Machines, Random Forests); graph-analytics and prefix-suffix trees. Reducing the temporal dimension of Mel-spectrograms outperformed previous clustering baseline (Silhouette score > 0.5, 95% assignment accuracy). UMAP embeddings from sequential autoencoders showed potential advantages over convolutional autoencoders. The study revealed a repertoire between seven and approximately 20 vocal-units characterized by combinatorial patterns: context-classification achieved F1-score > 0.9 also with permuted sequences; repetition characterized vocalizations of isolated pups. Vocal-unit distributions were significantly different (p < 0.05) across contexts; a truncated-power law (alpha < 2) described the distribution of maximal repetitions. This thesis contributed to unsupervised machine learning in bioacoustics for decoding non-human communication, aiding research in language evolution and animal cognition. animal decision making unsupervised machine learning UMAP autoencoders classifiers bioacoustics combinatory syntax animal communication Information Systems
35	Quantum Algorithms for Feature Selection and Compressed Feature Representation of Data / Kvantalgoritmer för Funktionsval och Datakompression Laius Lundgren, William January 2023 (has links) Quantum computing has emerged as a new field that may have the potential to revolutionize the landscape of information processing and computational power, although physically constructing quantum hardware has proven difficult,and quantum computers in the current Noisy Intermediate Scale Quantum (NISQ) era are error prone and limited in the number of qubits they contain.A sub-field within quantum algorithms research which holds potential for the NISQ era, and which has seen increasing activity in recent years, is quantum machine learning, where researchers apply approaches from classical machine learning to quantum computing algorithms and explore the interplay between the two. This master thesis investigates feature selection and autoencoding algorithms for quantum computers. Our review of the prior art led us to focus on contributing to three sub-problems: A) Embedded feature selection on quantum annealers, B) short depth quantum autoencoder circuits, and C)embedded compressed feature representation for quantum classifier circuits.For problem A, we demonstrate a working example by converting ridge regression to the Quadratic Unconstrained Binary Optimization (QUBO) problem formalism native to quantum annealers, and solving it on a simulated backend. For problem B we develop a novel quantum convolutional autoencoder architecture and successfully run simulation experiments to study its performance.For problem C, we choose a classifier quantum circuit ansatz based on theoretical considerations from the prior art, and experimentally study it in parallel with a classical benchmark method for the same classification task,then show a method from embedding compressed feature representation onto that quantum circuit. / Kvantberäkning är ett framväxande område som potentiellt kan revolutionera informationsbehandling och beräkningskraft. Dock är praktisk konstruktion av kvantdatorer svårt, och nuvarande kvantdatorer i den s.k. NISQ-eran lider av fel och begränsningar i antal kvantbitar de kan hantera. Ett lovande delområde inom kvantalgoritmer är kvantmaskininlärning, där forskare tillämpar klassiska maskininlärningsmetoder på kvantalgoritmer och utforskar samspelet mellande två områdena.. Denna avhandling fokuserar på kvantalgoritmer för funktionsval,och datakompression (i form av s.k. “autoencoders”). Vi undersöker tre delproblem: A) Inbäddat funktionsval på en kvantannealer, B) autoencoder-kvantkretsar för datakompression, och C) inbyggt funktionsval för kvantkretsar för klassificering. För problem A demonstrerar vi ett fungerande exempel genom att omvandla ridge regression till problemformuleringen "Quadratic Unconstrained Binary Optimization"(QUBO) som är nativ för kvantannealers,och löser det på en simulerad backend. För problem B utvecklar vi en ny konvolutionerande autoencoder-kvantkrets-arkitektur och utför simuleringsexperimentför att studera dess prestanda. För problem C väljer vi en kvantkrets-ansats för klassificering baserad på teoretiska överväganden från tidigare forskning och studerar den experimentellt parallellt med en klassisk benchmark-metod församma klassificeringsuppgift, samt visar en metod för inbyggt funktionsval (i form av datakompression) i denna kvantkrets. Feature selection autoencoders quantum machine learning quantum circuits quantum annealing Funktionsval datakompression kvantmaskininlärning kvantalgoritmer kvantkretsar Physical Sciences Fysik
36	MACHINE LEARNING FOR MECHANICAL ANALYSIS Bengtsson, Sebastian January 2019 (has links) It is not reliable to depend on a persons inference on dense data of high dimensionality on a daily basis. A person will grow tired or become distracted and make mistakes over time. Therefore it is desirable to study the feasibility of replacing a persons inference with that of Machine Learning in order to improve reliability. One-Class Support Vector Machines (SVM) with three different kernels (linear, Gaussian and polynomial) are implemented and tested for Anomaly Detection. Principal Component Analysis is used for dimensionality reduction and autoencoders are used with the intention to increase performance. Standard soft-margin SVMs were used for multi-class classification by utilizing the 1vsAll and 1vs1 approaches with the same kernels as for the one-class SVMs. The results for the one-class SVMs and the multi-class SVM methods are compared against each other within their respective applications but also against the performance of Back-Propagation Neural Networks of varying sizes. One-Class SVMs proved very effective in detecting anomalous samples once both Principal Component Analysis and autoencoders had been applied. Standard SVMs with Principal Component Analysis produced promising classification results. Twin SVMs were researched as an alternative to standard SVMs. AI Machine Learning Mechanical SVM Neural Networks Support Vector Machines Autoencoders PCA Principal Component Analysis Classification Anomaly Detection Robotics Robotteknik och automation
37	Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data / Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias Vukotic, Verdran 26 September 2017 (has links) La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images. / In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain. Autoencodeurs Apprentissage de représentations Deep neural networks Embedding Continuous representation Multimedia Multimodal Computer vision Spoken langage understanding Crossmodal Generative adversarial networks Autoencoders 006.4
38	Learning Embeddings for Fashion Images Hermansson, Simon January 2023 (has links) Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04. Computer Vision Machine Learning Image Retrieval CLIP Masked Autoencoders (MAE) Vision Transformers Image Captioning Price Prediction AI for Fashion
39	Prediction of Persistence to Treatment for Patients with Rheumatoid Arthritis using Deep Learning / Prediktion av behandlingspersistens för patienter med Reumatoid Artrit med djupinlärning Arda Yilal, Serkan January 2023 (has links) Rheumatoid Arthritis is an inflammatory joint disease that is one of the most common autoimmune diseases in the world. The treatment usually starts with a first-line treatment called Methotrexate, but it is often insufficient. One of the most common second-line treatments is Tumor Necrosis Factor inhibitors (TNFi). Although some patients respond to TNFi, it has a risk of side effects, including infections. Hence, ability to predict patient responses to TNFi becomes important to choose the correct treatment. This work presents a new approach to predict if the patients were still on TNFi, 1 year after they started, by using a generative neural network architecture called Variational Autoencoder (VAE). We combined a VAE and a classifier neural network to create a supervised learning model called Supervised VAE (SVAE), trained on two versions of a tabular dataset containing Swedish register data. The datasets consist of 7341 patient records, and our SVAE achieved an AUROC score of 0.615 on validation data. Nevertheless, compared to machine learning models previously used for the same prediction task, SVAE achieved higher scores than decision trees and elastic net but lower scores than random forest and gradient-boosted decision tree. Despite the regularization effect that VAEs provide during classification training, the scores achieved by the SVAEs tested during this thesis were lower than the acceptable discrimination level. / Reumatoid artrit är en inflammatorisk ledsjukdom och är en av de vanligaste autoimmuna sjukdomarna i världen. Medicinsk behandling börjar ofta med Metotrexat. Vid brist på respons så fortsätter behandlingen ofta med Tumor Necrosis Inhibitors (TNFi). På grund av biverkningar av TNFi, såsom ökad risk för infektioner, är det viktigt att kunna prediktera patienters respons på behandlingen. Här presenteras ett nytt sätt att prediktera om patienter fortfarande stod på TNFi ett år efter initiering. Vi kombinerade Variational Autoencoder (VAE), ett generativt neuralt nätverk, med ett klassificeringsnätverk för att skapa en övervakad inlärningsmodell kallad Supervised VAE (SVAE). Denna tränades på två versioner av svenska registerdata, vilka innehöll information om 7341 patienter i tabellform. Vår SVAE-modell uppnådde 0,615 AUROC på valideringsdata. I jämförelse med maskininlärningsmodeller som tidigare använts för samma prediktionsuppgift uppnådde SVAE högre poäng än Decision Tree och Elastic Net men lägre poäng än Random Forest och Gradient-Boosted Decision Tree. Trots regulariseringseffekten som VAE ger under träning så var poängen som de testade SVAEmodellerna uppnår lägre än den acceptabla diskrimineringsnivån. Variational Autoencoders Rheumatoid Arthritis Precision Medicine Treatment Prediction Deep Learning Supervised Learning Rheumatoid Artrit Precisionsmedicin Behandlingsförutsägelse Djupinlärning Övervakat lärande Information Systems
40	Cognitively Guided Modeling of Visual Perception in Intelligent Vehicles Plebe, Alice 20 April 2021 (has links) This work proposes a strategy for visual perception in the context of autonomous driving. Despite the growing research aiming to implement self-driving cars, no artificial system can claim to have reached the driving performance of a human, yet. Humans---when not distracted or drunk---are still the best drivers you can currently find. Hence, the theories about the human mind and its neural organization could reveal precious insights on how to design a better autonomous driving agent. This dissertation focuses specifically on the perceptual aspect of driving, and it takes inspiration from four key theories on how the human brain achieves the cognitive capabilities required by the activity of driving. The first idea lies at the foundation of current cognitive science, and it argues that thinking nearly always involves some sort of mental simulation, which takes the form of imagery when dealing with visual perception. The second theory explains how the perceptual simulation takes place in neural circuits called convergence-divergence zones, which expand and compress information to extract abstract concepts from visual experience and code them into compact representations. The third theory highlights that perception---when specialized for a complex task as driving---is refined by experience in a process called perceptual learning. The fourth theory, namely the free-energy principle of predictive brains, corroborates the role of visual imagination as a fundamental mechanism of inference. In order to implement these theoretical principles, it is necessary to identify the most appropriate computational tools currently available. Within the consolidated and successful field of deep learning, I select the artificial architectures and strategies that manifest a sounding resemblance with their cognitive counterparts. Specifically, convolutional autoencoders have a strong correspondence with the architecture of convergence-divergence zones and the process of perceptual abstraction. The free-energy principle of predictive brains is related to variational Bayesian inference and the use of recurrent neural networks. In fact, this principle can be translated into a training procedure that learns abstract representations predisposed to predicting how the current road scenario will change in the future. The main contribution of this dissertation is a method to learn conceptual representations of the driving scenario from visual information. This approach forces a semantic internal organization, in the sense that distinct parts of the representation are explicitly associated to specific concepts useful in the context of driving. Specifically, the model uses as few as 16 neurons for each of the two basic concepts here considered: vehicles and lanes. At the same time, the approach biases the internal representations towards the ability to predict the dynamics of objects in the scene. This property of temporal coherence allows the representations to be exploited to predict plausible future scenarios and to perform a simplified form of mental imagery. In addition, this work includes a proposal to tackle the problem of opaqueness affecting deep neural networks. I present a method that aims to mitigate this issue, in the context of longitudinal control for automated vehicles. A further contribution of this dissertation experiments with higher-level spaces of prediction, such as occupancy grids, which could conciliate between the direct application to motor controls and the biological plausibility. Settore INF/01 - Informatica

Search results