Global ETD Search

201	Dataset Drift in Radar Warning Receivers : Out-of-Distribution Detection for Radar Emitter Classification using an RNN-based Deep Ensemble Coleman, Kevin January 2023 (has links) Changes to the signal environment of a radar warning receiver (RWR) over time through dataset drift can negatively affect a machine learning (ML) model, deployed for radar emitter classification (REC). The training data comes from a simulator at Saab AB, in the form of pulsed radar in a time-series. In order to investigate this phenomenon on a neural network (NN), this study first implements an underlying classifier (UC) in the form of a deep ensemble (DE), where each ensemble member consists of an NN with two independently trained bidirectional LSTM channels for each of the signal features pulse repetition interval (PRI), pulse width (PW) and carrier frequency (CF). From tests, the UC performs best for REC when using all three features. Because dataset drift can be treated as detecting out-of-distribution (OOD) samples over time, the aim is to reduce NN overconfidence on data from unseen radar emitters in order to enable OOD detection. The method estimates uncertainty with predictive entropy and classifies samples reaching an entropy larger than a threshold as OOD. In the first set of tests, OOD is defined from holding out one feature modulation from the training dataset, and choosing this as the only modulation in the OOD dataset used during testing. With this definition, Stagger and Jitter are most difficult to detect as OOD. Moreover, using DEs with 6 ensemble members and implementing LogitNorm to the architecture improves the OOD detection performance. Furthermore, the OOD detection method performs well for up to 300 emitter classes and predictive entropy outperforms the baseline for almost all tests. Finally, the model performs worse when OOD is simply defined as signals from unseen emitters, because of a precision decrease. In conclusion, the implemented changes managed to reduce the overconfidence for this particular NN, and improve OOD detection for REC. Radar Emitter Classification Pulse Descriptor Word Out of Distribution Detection Dataset Drift Uncertainty Estimation Deep Ensembles Recurrent Neural Networks LSTM Computer Sciences Datavetenskap (datalogi)
202	Narrow Pretraining of Deep Neural Networks : Exploring Autoencoder Pretraining for Anomaly Detection on Limited Datasets in Non-Natural Image Domains Eriksson, Matilda, Johansson, Astrid January 2022 (has links) Anomaly detection is the process of detecting samples in a dataset that are atypical or abnormal. Anomaly detection can for example be of great use in an industrial setting, where faults in the manufactured products need to be detected at an early stage. In this setting, the available image data might be from different non-natural domains, such as the depth domain. However, the amount of data available is often limited in these domains. This thesis aims to investigate if a convolutional neural network (CNN) can be trained to perform anomaly detection well on limited datasets in non-natural image domains. The attempted approach is to train the CNN as an autoencoder, in which the CNN is the encoder network. The encoder is then extracted and used as a feature extractor for the anomaly detection task, which is performed using Semantic Pyramid Anomaly Detection (SPADE). The results are then evaluated and analyzed. Two autoencoder models were used in this approach. As the encoder network, one of the models uses a MobileNetV3-Small network that had been pretrained on ImageNet, while the other uses a more basic network, which is a few layers deep and initialized with random weights. Both these networks were trained as regular convolutional autoencoders, as well as variational autoencoders. The results were compared to a MobileNetV3-Small network that had been pretrained on ImageNet, but had not been trained as an autoencoder. The models were tested on six different datasets, all of which contained images from the depth and intensity domains. Three of these datasets additionally contained images from the scatter domain, and for these datasets, the combination of all three domains was tested as well. The main focus was however on the performance in the depth domain. The results show that there is generally an improvement when training the more complex autoencoder on the depth domain. Furthermore, the basic network generally obtains an equivalent result to the more complex network, suggesting that complexity is not necessarily an advantage for this approach. Looking at the different domains, there is no apparent pattern to which domain yields the best performance. This rather seems to depend on the dataset. Lastly, it was found that training the networks as variational autoencoders did generally not improve the performance in the depth domain compared to the regular autoencoders. In summary, an improved anomaly detection was obtained in the depth domain, but for optimal anomaly detection with regard to domain and network, one must look at the individual datasets. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> CNN convolutional neural network autoencoder variational autoencoder anomaly detection SPADE limited dataset non-natural image domain depth domain scatter domain intensity domain machine learning Media and Communication Technology Medieteknik
203	People flow maps for socially conscious robot navigation Fox O'Loughlin, Rex January 2023 (has links) With robots becoming increasingly common in human occupied spaces, there has been a growing body of research into the problem of socially conscious robot navigation. A robot must be able to predict and anticipate the movements of people around it in order to navigate in a way that is socially acceptable, or it may face rejection and therefore failure. Often this motion prediction is achieved using neural networks or artificial intelligence to predict the trajectories or flow of people, requiring large amounts of expensive and time-consuming real-world data collection. Therefore, many recent studies have attempted to find a way to create simulated human trajectory data. A variety of methods have been used to achieve this, the main ones being path planning algorithms and pedestrian simulators, but no study has evaluated these methods against each other and real-world data. This thesis compares the ability of two path planning algorithms (A* and RRT) and a pedestrian simulator (PTV Vissim) to make realistic maps of dynamics. It concludes that A-based path planners are the best choice when balancing the ability to replicate realistic people flow with the ease of generating large amounts of data. people flow maps of dynamics occupancy maps motion prediction trajectory dataset Elektroteknik och elektronik Aerospace Engineering Rymd- och flygteknik
204	An empirical study on synthetic image generation techniques for object detectors Arcidiacono, Claudio Salvatore January 2018 (has links) Convolutional Neural Networks are a very powerful machine learning tool that outperformed other techniques in image recognition tasks. The biggest drawback of this method is the massive amount of training data required, since producing training data for image recognition tasks is very labor intensive. To tackle this issue, different techniques have been proposed to generate synthetic training data automatically. These synthetic data generation techniques can be grouped in two categories: the first category generates synthetic images using computer graphic software and CAD models of the objects to recognize; the second category generates synthetic images by cutting the object from an image and pasting it on another image. Since both techniques have their pros and cons, it would be interesting for industries to investigate more in depth the two approaches. A common use case in industrial scenarios is detecting and classifying objects inside an image. Different objects appertaining to classes relevant in industrial scenarios are often undistinguishable (for example, they all the same component). For these reasons, this thesis work aims to answer the research question “Among the CAD model generation techniques, the Cut-paste generation techniques and a combination of the two techniques, which technique is more suitable for generating images for training object detectors in industrial scenarios”. In order to answer the research question, two synthetic image generation techniques appertaining to the two categories are proposed.The proposed techniques are tailored for applications where all the objects appertaining to the same class are indistinguishable, but they can also be extended to other applications. The two synthetic image generation techniques are compared measuring the performances of an object detector trained using synthetic images on a test dataset of real images. The performances of the two synthetic data generation techniques used for data augmentation have been also measured. The empirical results show that the CAD models generation technique works significantly better than the Cut-Paste generation technique where synthetic images are the only source of training data (61% better),whereas the two generation techniques perform equally good as data augmentation techniques. Moreover, the empirical results show that the models trained using only synthetic images performs almost as good as the model trained using real images (7,4% worse) and that augmenting the dataset of real images using synthetic images improves the performances of the model (9,5% better). / Konvolutionella neurala nätverk är ett mycket kraftfullt verktyg för maskininlärning som överträffade andra tekniker inom bildigenkänning. Den största nackdelen med denna metod är den massiva mängd träningsdata som krävs, eftersom det är mycket arbetsintensivt att producera träningsdata för bildigenkänningsuppgifter. För att ta itu med detta problem har olika tekniker föreslagits för att generera syntetiska träningsdata automatiskt. Dessa syntetiska datagenererande tekniker kan grupperas i två kategorier: den första kategorin genererar syntetiska bilder med hjälp av datorgrafikprogram och CAD-modeller av objekten att känna igen; Den andra kategorin genererar syntetiska bilder genom att klippa objektet från en bild och klistra in det på en annan bild. Eftersom båda teknikerna har sina fördelar och nackdelar, skulle det vara intressant för industrier att undersöka mer ingående de båda metoderna. Ett vanligt fall i industriella scenarier är att upptäcka och klassificera objekt i en bild. Olika föremål som hänför sig till klasser som är relevanta i industriella scenarier är ofta oskiljbara (till exempel de är alla samma komponent). Av dessa skäl syftar detta avhandlingsarbete till att svara på frågan “Bland CAD-genereringsteknikerna, Cut-paste generationsteknikerna och en kombination av de två teknikerna, vilken teknik är mer lämplig för att generera bilder för träningsobjektdetektorer i industriellascenarier”. För att svara på forskningsfrågan föreslås två syntetiska bildgenereringstekniker som hänför sig till de två kategorierna. De föreslagna teknikerna är skräddarsydda för applikationer där alla föremål som tillhör samma klass är oskiljbara, men de kan också utökas till andra applikationer. De två syntetiska bildgenereringsteknikerna jämförs med att mäta prestanda hos en objektdetektor som utbildas med hjälp av syntetiska bilder på en testdataset med riktiga bilder. Föreställningarna för de två syntetiska datagenererande teknikerna som används för dataförökning har också uppmätts. De empiriska resultaten visar att CAD-modelleringstekniken fungerar väsentligt bättre än Cut-Paste-genereringstekniken, där syntetiska bilder är den enda källan till träningsdata (61% bättre), medan de två generationsteknikerna fungerar lika bra som dataförstoringstekniker. Dessutom visar de empiriska resultaten att modellerna som utbildats med bara syntetiska bilder utför nästan lika bra som modellen som utbildats med hjälp av riktiga bilder (7,4% sämre) och att förstora datasetet med riktiga bilder med hjälp av syntetiska bilder förbättrar modellens prestanda (9,5% bättre). Computer and Information Sciences Data- och informationsvetenskap
205	OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data Braunschweig, Katrin, Eberius, Julian, Thiele, Maik, Lehner, Wolfgang 27 January 2023 (has links) Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches. In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data. info:eu-repo/classification/ddc/004 ddc:004
206	Automation and Validation of Big Data Generation via Simulation Pipeline for Flexible Assemblies Adrian, Alexander F. 26 October 2022 (has links) No description available. Mechanical Engineering sheet metal forming simulation automation springback prediction multi-stage simulation dataset generation flexible assemblies finite element analysis machine learning simulation automation data curation neural network
207	Predicting Customer Satisfaction in the Context of Last-Mile Delivery using Supervised and Automatic Machine Learning Höggren, Carl January 2022 (has links) The prevalence of online shopping has steadily risen in the last few years. In response to these changes, last-mile delivery services have emerged that enable goods to reach customers within a shorter timeframe compared to traditional logistics providers. However, with decreased lead times follows greater exposure to risks that directly influence customer satisfaction. More specifically, this report aims to investigate the extent to which Supervised and Automatic Machine Learning can be leveraged to extract those features that have the highest explanatory power dictating customer ratings. The implementation suggests that Random Forest Classifier outperforms both Multi-Layer Perceptron and Support Vector Machine in predicting customer ratings on a highly imbalanced version of the dataset, while AutoML soars when the dataset is subject to undersampling. Using Permutation Feature Importance and Shapley Additive Explanations, it was further concluded that whether the delivery is on time, whether the delivery is executed within the stated time window, and whether the delivery is executed during the morning, afternoon, or evening, are paramount drivers of customer ratings. / Förekomsten av online-shopping har kraftigt ökat de senaste åren. I kölvattnet av dessa förändringar har flertalet sista-milen företag etablerats som möjliggör för paket att nå kunder inom en kortare tidsperiod jämfört med traditionella logistikföretag. Däremot, med minskade ledtider följer större exponering mot risker som direkt påverkar kundernas upplevelse av sista-milen tjänsten. Givet detta syftar denna rapport till att undersöka huruvida övervakad och automtisk maskininlärning kan användas för att extrahera de parametrar som har störst påverkan på kundnöjdhet. Implementationen visar att slumpmässiga beslutsträd överträffar både neurala nätverk och stödvektorsmaskiner i syfte att förutspå kundnöjdhet på en obalanserad version av träningsdatan, medan automatisk maskininlärning överträffar övriga modeller på en balanserad version. Genom användning av metoderna Permutation Feature Importance och Shapley Additive Explanations, framgick att huruvida paketet är försenad, huruvida paketet levereras inom det angivet tidsfönster, och huruvida paketet anländer under morgonen, eftermiddagen, eller kvällen, har störst påverkan på kundnöjdhet. Last-mile Delivery Customer Satisfaction Supervised Machine Learning Automatic Machine Learning Imbalanced Datasets Sista-milen Leveranser Kundnöjdhet Övervakad Maskininlärning Automatiserad Maskininlärning Obalanserade Dataset Computer and Information Sciences Data- och informationsvetenskap
208	Failure Inference in Drilling Bits: : Leveraging YOLO Detection for Dominant Failure Analysis Akumalla, Gnana Spandana January 2023 (has links) Detecting failures in tricone drill bits is crucial in the mining industry due to their potential consequences, including operational losses, safety hazards, and delays in drilling operations. Timely identification of failures allows for proactive maintenance and necessary measures to ensure smooth drilling processes and minimize associated risks. Accurate failure detection helps mining operations avoid financial losses by preventing unplanned breakdowns, costly repairs, and extended downtime. Moreover, it optimizes operational efficiency by enabling timely maintenance interventions, extending the lifespan of drill bits, and minimizing disruptions. Failure detection also plays a critical role in ensuring the safety of personnel and equipment involved in drilling operations. Traditionally, failure detection in tricone drill bits relies on manual inspection, which can be time-consuming and labor-intensive. Incorporating artificial intelligence-based approaches can significantly enhance efficiency and accuracy. This thesis uses machine learning methods for failure inference in tricone drill bits. A classic Convolutional Neural Network (CNN) classification method was initially explored, but its performance was insufficient due to the small dataset size and imbalanced data. The problem was reformulated as an object detection task to overcome these limitations, and a post-processing operation was incorporated. Data augmentation techniques enhanced the training and evaluation datasets, improving failure detection accuracy. Experimental results highlighted the need for revising the initial CNN classification method, given the limitations of the small and imbalanced dataset. However, You Only Look Once (YOLO) algorithms such as YOLOv5 and YOLOv8 models exhibited improved performance. The post-processing operation further refined the results obtained from the YOLO algorithm, specifically YOLOv5 and YOLOv8 models. While YOLO provides bounding box coordinates and class labels, the post-processing step enhanced drill bit failure detection through various techniques such as confidence thresholding, etc. By effectively leveraging the YOLO-based models and incorporating post-processing, this research advances failure detection in tricone drill bits. These intelligent methods enable more precise and efficient detection, preventing operational losses and optimizing maintenance processes. The findings underscore the potential of machine learning techniques in the mining industry, particularly in mechanical drilling, driving progress and enhancing overall operational efficiency Computer vision Image processing Drill bit failure detection CNN YOLOv5 FNN YOLOv8 Dataset StyleGAN-ADA Ethics Sustainability Artificial Intellegence Tricone drill bit Object detection. Engineering and Technology Teknik och teknologier
209	En komparativ studie av OCR-verktyg för granskning av handlingar : Med prestanda och precision i fokus / A comparative study of OCR tools for reviewing documents : Focusing on performance and precision Sjöstedt, Niklas January 2023 (has links) Dagens samhälle präglas av en exponentiell tillväxt av data, med förväntningar på en ökning från dagens 33 Zettabytes till 175 Zettabytes år 2025. Denna utveckling medför både fördelar och utmaningar för de individer och organisationer som arbetar med analys av denna massiva datamängd. För att underlätta granskning och analys av data i text- eller bildform kan ett OCR- verktyg användas. OCR-verktyg, byggda på AI-teknik, kan underlätta och automatisera granskningen av data. Det finns i dagsläget en mängd olika OCR-verktyg som presterar mer eller mindre bra. Denna studie genomfördes på uppdrag av Etteplan som i dagsläget upplever en hög tid- och resursåtgång för granskning av elnätsritningar. Syftet med denna studie var att undersöka och jämföra OCR-verktygen PyTesseract, EasyOCR och PaddleOCR utifrån ett antal prestandakriterier. De kriterium som jämfördes i denna studie var exekveringstid, precision, Levenshtein-avstånd, antal tecken per millisekund, CPU-, RAM- och GPU-användning. Studien var ämnad att kunna ge en rekommendation på vilket OCR-verktyg som presterar bäst till Etteplan. Tre likvärdiga testapplikationer skapades för de olika OCR-verktygen med hjälp av Python. Dessa testapplikationers uppgift var att läsa in textdata från bilder innehållande tabeller, för att sedan jämföra resultatet av inläsningen mot en lista innehållande den faktiska texten. Denna funktionalitet gjorde det möjligt för författaren av denna studie att mäta de olika prestandakriterierna och sedan ställa dem mot varandra. Resultatet av denna studie visar att PaddleOCR är det verktyg som presterar bäst när det kommer till precision, Levenshtein-avstånd och exekveringstid. Men detta på bekostnad av högre resursanvändning. Optisk teckenigenkänning Artificiell Intelligens Python Komparativ Automatisering Dokumentgranskning Dataset Bildbehandling Testning Information Systems, Social aspects
210	Experimental Research on a Continuous Integrating pipeline with a Machine Learning approach : Master Thesis done in collaboration with Electronic Arts Sigurdardóttir, Sigrún Arna January 2021 (has links) Time-consuming code builds within the Continuous Integration pipeline is a common problem in today’s software industry. With fast-evolving trends and technologies, Machine Learning has become a more popular approach to tackle and solve real problems within the software industry. It has been shown to be successful to train Machine Learning models that can classify whether a code change is likely to be successful or fail during a code build. Reducing the time it takes to run code builds within the Continuous Integration pipeline can lead to higher productivity in software development, faster feedback for developers, and lower the cost of hardware resources used to run the builds. To answer the research question: How accurate can success or failure in code build be predicted by using Machine Learning techniques on the historical data collection? The important factor is the historical data available and understanding the data. Thorough data analysis was conducted on the historical data and a data cleaning process to create a dataset suitable for feeding the Machine Learning models. The dataset was imbalanced, favouring the successful builds, and to balance the dataset the SMOTE method was used to create synthetic samples. Binary classification and supervised learning comparison of four Machine Learning models were performed; Random Forest, Logistic Regression, Support Vector Machine, and Neural Network. The performance metrics used to measure the performance of the models were recall, precision, specificity, f1-score, ROC curve, and AUC score. To reduce the dimensionality of the features the PCA method was used. The outcome of the Machine Learning models revealed that historical data can be used to accurately predict if a code change will result in a code build success or failure. / Den tidskrävande koden bygger inom pipeline för kontinuerlig integration är en vanlig faktor i dagens mjukvaruindustri. Med trender och teknologier som utvecklas snabbt har maskininlärning blivit ett mer populärt tillvägagångssätt för att ta itu med och lösa verkliga problem inom programvaruindustrin. Det har visat sig vara framgångsrikt att träna maskininlärningsmodeller som kan klassificeras om en kodändring sannolikt kommer att lyckas eller misslyckas under en kodbyggnad. Genom att förbättra och minska den tid det tar att köra kodbyggnader i den kontinuerliga integrationsrörledningen kan det leda till högre produktivitet inom mjukvaruutveckling och snabbare feedback för utvecklare. För att svara på forskningsfrågan: Hur korrekt kan förutsäga framgång eller misslyckande i kodbyggnad med hjälp av Machine Learning-tekniker för historisk datainsamling? Den viktiga faktorn är den tillgängliga historiska informationen och förståelsen för data. Noggrann dataanalys utfördes på historiska data och en datarengöringsprocess för att skapa en datamängd lämplig för matning av maskininlärningsmodellerna. Datauppsättningen var obalanserad och för att balansera användes uppsättningen SMOTE-metoden. Med binär klassificering och övervakad inlärningsjämförelse gjordes fyra maskininlärningsmodeller, Random Forest, Logistic Regression, Support Vector Machine och Neural Network. Prestandamätvärdena som används för att mäta prestandan hos modellerna är återkallelse, precision, f1-poäng och genomsnittlig ROCAUC-poäng. För att minska dimensionaliteten hos funktionerna användes PCA-metoden. Resultatet av modellerna avslöjar att de med god noggrannhet kan klassificeras om en kodändring misslyckas eller lyckas baserat på den datamängd som skapats från historiska data som används för att träna modellerna. Machine Learning Continuous Integration Builds Prediction dataset Code Change Data Analysis Maskininlärning Kontinuerlig Integration Byggnader Förutsägelse Datamängd Kodändring Dataanalys Software Engineering Programvaruteknik

Search results