Global ETD Search

131	Twittersentimentanalys : Jämförelse av klassificeringsmodeller tränade på olika datamängder. / Twitter Sentiment Analysis : Comparison of classification models trained on different data sets. Bandgren, Johannes, Selberg, Johan January 2018 (has links) Twitter är en av de populäraste mikrobloggarna, som används för att uttryckatankar och åsikter om olika ämnen. Ett område som har dragit till sig mycketintresse under de senaste åren är twittersentimentanalys. Twittersentimentanalyshandlar om att bedöma vad för sentiment ett inlägg på Twitter uttrycker, om detuttrycker någonting positivt eller negativt. Olika metoder kan användas för attutföra twittersentimentanalys, där vissa lämpar sig bättre än andra. De vanligastemetoderna för twittersentimentanalys använder maskininlärning.Syftet med denna studie är att utvärdera tre stycken klassificeringsalgoritmerinom maskininlärning och hur märkningen av en datamängd påverkar en klassifi-ceringsmodells förmåga att märka ett twitterinlägg korrekt för twittersentimenta-nalys. Naive Bayes, Support Vector Machine och Convolutional Neural Network ärklassificeringsalgoritmerna som har utvärderats. För varje klassificeringsalgoritmhar två klassificeringsmodeller tagits fram, som har tränats och testats på två se-parata datamängder: Stanford Twitter Sentiment och SemEval. Det som skiljer detvå datamängderna åt, utöver innehållet i twitterinläggen, är märkningsmetodenoch mängden twitterinlägg. Utvärderingen har gjorts utefter vilken prestanda deframtagna klassificeringmodellerna uppnår på respektive datamängd, hur lång tidde tar att träna och hur invecklade de var att implementera.Resultaten av studien visar att samtliga modeller som tränades och testades påSemEval uppnådde en högre prestanda än de som tränades och testades på Stan-ford Twitter Sentiment. Klassificeringsmodellerna som var framtagna med Convo-lutional Neural Network uppnådde bäst resultat över båda datamängderna. Dockär ett Convolutional Neural Network mer invecklad att implementera och tränings-tiden är betydligt längre än Naive Bayes och Support Vector Machine. / Twitter is one of the most popular microblogs, which is used to express thoughtsand opinions on different topics. An area that has attracted much interest in recentyears is Twitter sentiment analysis. Twitter sentiment analysis is about assessingwhat sentiment a Twitter post expresses, whether it expresses something positiveor negative. Different methods can be used to perform Twitter sentiment analysis.The most common methods of Twitter sentiment analysis use machine learning.The purpose of this study is to evaluate three classification algorithms in ma-chine learning and how the labeling of a data set affects classification models abilityto classify a Twitter post correctly for Twitter sentiment analysis. Naive Bayes,Support Vector Machine and Convolutional Neural Network are the classificationalgorithms that have been evaluated. For each classification algorithm, two classi-fication models have been trained and tested on two separate data sets: StanfordTwitter Sentiment and SemEval. What separates the two data sets, in addition tothe content of the twitter posts, is the labeling method and the amount of twitterposts. The evaluation has been done according to the performance of the classifi-cation models on the respective data sets, training time and how complicated theywere to implement.The results show that all models trained and tested on SemEval achieved ahigher performance than those trained and tested on Stanford Twitter Sentiment.The Convolutional Neural Network models achieved the best results over both datasets. However, a Convolutional Neural Network is more complicated to implementand the training time is significantly longer than Naive Bayes and Support VectorMachine. Twitter sentiment analysis machine learning Naive Bayes Support Vector Machine Convolutional Neural Network SemEval Stanford Twitter Sen- timent pre-processing. Twittersentimentanalys maskininlärning Naive Bayes Support Vector Machine Convolutional Neural Network SemEval Stanford Twitter Sentiment databearbetning. Engineering and Technology Teknik och teknologier
132	Produktmatchning EfficientNet vs. ResNet : En jämförelse / Product matching EfficientNet vs. ResNet Malmgren, Emil, Järdemar, Elin January 2021 (has links) E-handeln ökar stadigt och mellan åren 2010 och 2014 var det en ökning på antalet konsumenter som handlar online från 28,9% till 34,2%. Otillräcklig information kring en produkts pris tvingar köpare att leta bland flera olika återförsäljare efter det bästa priset. Det finns olika sätt att ta fram informationen som krävs för att kunna jämföra priser. En metod för att kunna jämföra priser är automatiserad produktmatchning. Denna metod använder algoritmer för bildigenkänning där dess syfte är att detektera, lokalisera och känna igen objekt i bilder. Bildigenkänningsalgoritmer har ofta problem med att hitta objekt i bilder på grund av yttre faktorer såsom belysning, synvinklar och om bilden innehåller mycket onödig information. Tidigare har algoritmer såsom ANN (artificial neural network), random forest classifier och support vector machine används men senare undersökningar har visat att CNN (convolutional neural network) är bättre på att hitta viktiga egenskaper hos objekt som gör dem mindre känsliga mot dessa yttre faktorer. Två exempel på alternativa CNN-arkitekturer som vuxit fram är EfficientNet och ResNet som båda har visat bra resultat i tidigare forskning men det finns inte mycket forskning som hjälper en välja vilken CNN-arkitektur som leder till ett så bra resultat som möjligt. Vår frågeställning är därför: Vilken av EfficientNet- och ResNetarkitekturerna ger det högsta resultatet på produktmatchning med utvärderingsmåtten f1-score, precision och recall? Resultatet av studien visar att EfficientNet är den över lag bästa arkitekturen för produktmatchning på studiens datamängd. Resultatet visar också att ResNet var bättre än EfficientNet på att föreslå rätt matchningar av bilderna. De matchningarna ResNet gör stämmer mer än de matchningar EfficientNet föreslår då Resnet fick ett högre recall än vad EfficientNet fick. EfficientNet uppnår dock en bättre recall som visar att EfficientNet är bättre än ResNet på att hitta fler eller alla korrekta matchningar bland sina potentiella matchningar. Men skillnaden i recall är större mellan modellerna vilket göra att EfficientNet får en högre f1-score och är över lag bättre än ResNet, men vad som är viktigast kan diskuteras. Är det viktigt att de föreslagna matchningarna är korrekta eller att man hittar alla korrekta matchningar. Är det viktigaste att de föreslagna matchningarna är korrekta har ResNet ett övertag men är det viktigare att hitta alla korrekta matchningar har EfficientNet ett övertag. Resultatet beror därför på vad som anses vara viktigast för att avgöra vilken av arkitekturerna som ger bäst resultat. / E-commerce is steadily increasing and between the years 2010 and 2014, there was an increase in the number of consumers shopping online from 28,9% to 34,2%. Insufficient information about the price of a product forces buyers to search among several different retailers for the best price. There are different ways to produce the information required to be able to compare prices. One method to compare prices is automated product matching. This method uses image recognition algorithms where its purpose is to detect, locate and recognize objects in images. Image recognition algorithms often have problems finding objects in images due to external factors such as brightness, viewing angles and if the image contains a lot of unnecessary information. In the past, algorithms such as ANN, random forest classifier and support vector machine have been used, but recent studies have shown that CNN is better at finding important properties of objects that make them less sensitive to these external factors. Two examples of alternative CNN architectures that have emerged are EfficientNet and ResNet, both of which have shown good results in previous studies, but there is not a lot of research that helps one choose which CNN architecture that leads to the best possible result. Our question is therefore: Which of the EfficientNet and ResNet architectures gives the highest result on product matching with the evaluation measures f1-score, precision, and recall? The results of the study show that EfficientNet is the overall best architecture for product matching on the dataset. The results also show that ResNet was better than EfficientNet in proposing the right matches for the images. The matches ResNet makes are more accurate than the matches EfficientNet suggests when Resnet received a higher precision than EfficientNet. However, EfficientNet achieves a better recall that shows that EfficientNet is better than ResNet at finding more or all correct matches among its potential matches. The difference in recall is greater than the difference in precision between the models, which means that EfficientNet gets a higher f1-score and is generally better than ResNet, but what is most important can be discussed. Is it important that the suggested matches are correct or that you find all the correct matches? If the most important thing is that the proposed matches are correct, ResNet has an advantage, but if it is more important to find all correct matches, EfficientNet has an advantage. The result therefore depends on what is considered to be most important in determining which of the architectures gives the best results EfficientNet ResNet CNN Convolutional Neural Network image classification product matching price matching object recognition. EfficientNet ResNet CNN Convolutional Neural Network bildklassificering produktmatchning prismatchning objektigenkänning. Computer and Information Sciences Data- och informationsvetenskap
133	UAV geolocalization in Swedish fields and forests using Deep Learning / Geolokalisering av UAVs över svenska fält och skogar med hjälp av djupinlärning Rohlén, Andreas January 2021 (has links) The ability for unmanned autonomous aerial vehicles (UAV) to localize themselves in an environment is fundamental for them to be able to function, even if they do not have access to a global positioning system. Recently, with the success of deep learning in vision based tasks, there have been some proposed methods for absolute geolocalization using vison based deep learning with satellite and UAV images. Most of these are only tested in urban environments, which begs the question: How well do they work in non-urban areas like forests and fields? One drawback of deep learning is that models are often regarded as black boxes, as it is hard to know why the models make the predictions they do, i.e. what information is important and is used for the prediction. To solve this, several neural network interpretation methods have been developed. These methods provide explanations so that we may understand these models better. This thesis investigates the localization accuracy of one geolocalization method in both urban and non-urban environments as well as applies neural network interpretation in order to see if it can explain the potential difference in localization accuracy of the method in these different environments. The results show that the method performs best in urban environments, getting a mean absolute horizontal error of 38.30m and a mean absolute vertical error of 16.77m, while it performed significantly worse in non-urban environments, getting a mean absolute horizontal error of 68.11m and a mean absolute vertical error 22.83m. Further, the results show that if the satellite images and images from the unmanned aerial vehicle are collected during different seasons of the year, the localization accuracy is even worse, resulting in a mean absolute horizontal error of 86.91m and a mean absolute vertical error of 23.05m. The neural network interpretation did not aid in providing an explanation for why the method performs worse in non-urban environments and is not suitable for this kind of problem. / Obemannade autonoma luftburna fordons (UAV) förmåga att lokaliera sig själva är fundamental för att de ska fungera, även om de inte har tillgång till globala positioneringssystem. Med den nyliga framgången hos djupinlärning applicerat på visuella problem har det kommit metoder för absolut geolokalisering med visuell djupinlärning med satellit- och UAV-bilder. De flesta av dessa metoder har bara blivit testade i stadsmiljöer, vilket leder till frågan: Hur väl fungerar dessa metoder i icke-urbana områden som fält och skogar? En av nackdelarna med djupinlärning är att dessa modeller ofta ses som svarta lådor eftersom det är svårt att veta varför modellerna gör de gissningar de gör, alltså vilken information som är viktig och används för gissningen. För att lösa detta har flera metoder för att tolka neurala nätverk utvecklats. Dessa metoder ger förklaringar så att vi kan förstå dessa modeller bättre. Denna uppsats undersöker lokaliseringsprecisionen hos en geolokaliseringsmetod i både urbana och icke-urbana miljöer och applicerar även en tolkningsmetod för neurala nätverk för att se ifall den kan förklara den potentialla skillnaden i precision hos metoden i dessa olika miljöer. Resultaten visar att metoden fungerar bäst i urbana miljöer där den får ett genomsnittligt absolut horisontellt lokaliseringsfel på 38.30m och ett genomsnittligt absolut vertikalt fel på 16.77m medan den presterade signifikant sämre i icke-urbana miljöer där den fick ett genomsnittligt absolut horisontellt lokaliseringsfel på 68.11m och ett genomsnittligt absolut vertikalt fel på 22.83m. Vidare visar resultaten att om satellitbilderna och UAV-bilderna är tagna från olika årstider blir lokaliseringsprecisionen ännu sämre, där metoden får genomsnittligt absolut horisontellt lokaliseringsfel på 86.91m och ett genomsnittligt absolut vertikalt fel på 23.05m. Tolkningsmetoden hjälpte inte i att förklara varför metoden fungerar sämre i icke-urbana miljöer och är inte passande att använda för denna sortens problem. Unmanned Aerial Vehicle Absolute Localization Deep Learning Convolutional Neural Network Cross-View Neural Network Interpretation Obemannad Luftburet Fordon Absolut Lokalisering Djupinlärning Convolutional Neural Network Kors-Perspektiv Neural nätverkstolkning Computer Sciences Datavetenskap (datalogi)
134	Investigation of hierarchical deep neural network structure for facial expression recognition Motembe, Dodi 01 1900 (has links) Facial expression recognition (FER) is still a challenging concept, and machines struggle to comprehend effectively the dynamic shifts in facial expressions of human emotions. The existing systems, which have proven to be effective, consist of deeper network structures that need powerful and expensive hardware. The deeper the network is, the longer the training and the testing. Many systems use expensive GPUs to make the process faster. To remedy the above challenges while maintaining the main goal of improving the accuracy rate of the recognition, we create a generic hierarchical structure with variable settings. This generic structure has a hierarchy of three convolutional blocks, two dropout blocks and one fully connected block. From this generic structure we derived four different network structures to be investigated according to their performances. From each network structure case, we again derived six network structures in relation to the variable parameters. The variable parameters under analysis are the size of the filters of the convolutional maps and the max-pooling as well as the number of convolutional maps. In total, we have 24 network structures to investigate, and six network structures per case. After simulations, the results achieved after many repeated experiments showed in the group of case 1; case 1a emerged as the top performer of that group, and case 2a, case 3c and case 4c outperformed others in their respective groups. The comparison of the winners of the 4 groups indicates that case 2a is the optimal structure with optimal parameters; case 2a network structure outperformed other group winners. Considerations were done when choosing the best network structure, considerations were; minimum accuracy, average accuracy and maximum accuracy after 15 times of repeated training and analysis of results. All 24 proposed network structures were tested using two of the most used FER datasets, the CK+ and the JAFFE. After repeated simulations the results demonstrate that our inexpensive optimal network architecture achieved 98.11 % accuracy using the CK+ dataset. We also tested our optimal network architecture with the JAFFE dataset, the experimental results show 84.38 % by using just a standard CPU and easier procedures. We also compared the four group winners with other existing FER models performances recorded recently in two studies. These FER models used the same two datasets, the CK+ and the JAFFE. Three of our four group winners (case 1a, case 2a and case 4c) recorded only 1.22 % less than the accuracy of the top performer model when using the CK+ dataset, and two of our network structures, case 2a and case 3c came in third, beating other models when using the JAFFE dataset. / Electrical and Mining Engineering Facial Expression Recognition (FER) Deep Learning Convolutional Neural Network (CNN) Deep Convolutional Neural Network (DCNN) Artificial Intelligence Face Detection Facial Feature Extraction Central Processing Unit (CPU) Graphics Processing Unit (GPU)
135	Towards a 3D building reconstruction using spatial multisource data and computational intelligence techniques / Vers une reconstruction de batiment en 3D utilisant des données spatiales multisources et des techniques d'intelligence informatique Papadopoulos, Georgios 27 November 2019 (has links) La reconstruction de bâtiments à partir de photographies aériennes et d’autres données spatiales urbaines multi-sources est une tâche qui utilise une multitude de méthodes automatisées et semi-automatisées allant des processus ponctuels au traitement classique des images et au balayage laser. Dans cette thèse, un système de relaxation itératif est développé sur la base de l'examen du contexte local de chaque bord en fonction de multiples sources d'entrée spatiales (masques optiques, d'élévation, d'ombre et de feuillage ainsi que d'autres données prétraitées, décrites au chapitre 6). Toutes ces données multisource et multirésolution sont fusionnées de manière à extraire les segments de ligne probables ou les arêtes correspondant aux limites des bâtiments. Deux nouveaux sous-systèmes ont également été développés dans cette thèse. Ils ont été conçus dans le but de fournir des informations supplémentaires, plus fiables, sur les contours des bâtiments dans une future version du système de relaxation proposé. La première est une méthode de réseau de neurones à convolution profonde (CNN) pour la détection de frontières de construction. Le réseau est notamment basé sur le modèle SRCNN (Dong C. L., 2015) de super-résolution à la pointe de la technologie. Il accepte des photographies aériennes illustrant des données de zones urbaines densément peuplées ainsi que leurs cartes d'altitude numériques (DEM) correspondantes. La formation utilise trois variantes de cet ensemble de données urbaines et vise à détecter les contours des bâtiments grâce à une nouvelle cartographie hétéroassociative super-résolue. Une autre innovation de cette approche est la conception d'une couche de perte personnalisée modifiée appelée Top-N. Dans cette variante, l'erreur quadratique moyenne (MSE) entre l'image de sortie reconstruite et l'image de vérité de sol (GT) fournie des contours de bâtiment est calculée sur les 2N pixels de l'image avec les valeurs les plus élevées. En supposant que la plupart des N pixels de contour de l’image GT figurent également dans les 2N pixels supérieurs de la reconstruction, cette modification équilibre les deux catégories de pixels et améliore le comportement de généralisation du modèle CNN. Les expériences ont montré que la fonction de coût Top-N offre des gains de performance par rapport à une MSE standard. Une amélioration supplémentaire de la capacité de généralisation du réseau est obtenue en utilisant le décrochage. Le deuxième sous-système est un réseau de convolution profonde à super-résolution, qui effectue un mappage associatif à entrée améliorée entre les images d'entrée à basse résolution et à haute résolution. Ce réseau a été formé aux données d’altitude à basse résolution et aux photographies urbaines optiques à haute résolution correspondantes. Une telle différence de résolution entre les images optiques / satellites optiques et les données d'élévation est souvent le cas dans les applications du monde réel. / Building reconstruction from aerial photographs and other multi-source urban spatial data is a task endeavored using a plethora of automated and semi-automated methods ranging from point processes, classic image processing and laser scanning. In this thesis, an iterative relaxation system is developed based on the examination of the local context of each edge according to multiple spatial input sources (optical, elevation, shadow & foliage masks as well as other pre-processed data as elaborated in Chapter 6). All these multisource and multiresolution data are fused so that probable line segments or edges are extracted that correspond to prominent building boundaries.Two novel sub-systems have also been developed in this thesis. They were designed with the purpose to provide additional, more reliable, information regarding building contours in a future version of the proposed relaxation system. The first is a deep convolutional neural network (CNN) method for the detection of building borders. In particular, the network is based on the state of the art super-resolution model SRCNN (Dong C. L., 2015). It accepts aerial photographs depicting densely populated urban area data as well as their corresponding digital elevation maps (DEM). Training is performed using three variations of this urban data set and aims at detecting building contours through a novel super-resolved heteroassociative mapping. Another innovation of this approach is the design of a modified custom loss layer named Top-N. In this variation, the mean square error (MSE) between the reconstructed output image and the provided ground truth (GT) image of building contours is computed on the 2N image pixels with highest values . Assuming that most of the N contour pixels of the GT image are also in the top 2N pixels of the re-construction, this modification balances the two pixel categories and improves the generalization behavior of the CNN model. It is shown in the experiments, that the Top-N cost function offers performance gains in comparison to standard MSE. Further improvement in generalization ability of the network is achieved by using dropout.The second sub-system is a super-resolution deep convolutional network, which performs an enhanced-input associative mapping between input low-resolution and high-resolution images. This network has been trained with low-resolution elevation data and the corresponding high-resolution optical urban photographs. Such a resolution discrepancy between optical aerial/satellite images and elevation data is often the case in real world applications. More specifically, low-resolution elevation data augmented by high-resolution optical aerial photographs are used with the aim of augmenting the resolution of the elevation data. This is a unique super-resolution problem where it was found that many of -the proposed general-image SR propositions do not perform as well. The network aptly named building super resolution CNN (BSRCNN) is trained using patches extracted from the aforementioned data. Results show that in comparison with a classic bicubic upscale of the elevation data the proposed implementation offers important improvement as attested by a modified PSNR and SSIM metric. In comparison, other proposed general-image SR methods performed poorer than a standard bicubic up-scaler.Finally, the relaxation system fuses together all these multisource data sources comprising of pre-processed optical data, elevation data, foliage masks, shadow masks and other pre-processed data in an attempt to assign confidence values to each pixel belonging to a building contour. Confidence is augmented or decremented iteratively until the MSE error fails below a specified threshold or a maximum number of iterations have been executed. The confidence matrix can then be used to extract the true building contours via thresholding. Reconstruction de bâtiments Réseau de neurones convolutifs Données d'élévation super-résolution Relaxation itérative Convolutional neural network Elevation data super-resolution Iterative relaxation system 006.31
136	An evaluation of image preprocessing for classification of Malaria parasitization using convolutional neural networks / En utvärdering av bildförbehandlingsmetoder för klassificering av malariaparasiter med hjälp av Convolutional Neural Networks Engelhardt, Erik, Jäger, Simon January 2019 (has links) In this study, the impact of multiple image preprocessing methods on Convolutional Neural Networks (CNN) was studied. Metrics such as accuracy, precision, recall and F1-score (Hossin et al. 2011) were evaluated. Specifically, this study is geared towards malaria classification using the data set made available by the U.S. National Library of Medicine (Malaria Datasets n.d.). This data set contains images of thin blood smears, where uninfected and parasitized blood cells have been segmented. In the study, 3 CNN models were proposed for the parasitization classification task. Each model was trained on the original data set and 4 preprocessed data sets. The preprocessing methods used to create the 4 data sets were grayscale, normalization, histogram equalization and contrast limited adaptive histogram equalization (CLAHE). The impact of CLAHE preprocessing yielded a 1.46% (model 1) and 0.61% (model 2) improvement over the original data set, in terms of F1-score. One model (model 3) provided inconclusive results. The results show that CNN’s can be used for parasitization classification, but the impact of preprocessing is limited. / I denna studie studerades effekten av flera bildförbehandlingsmetoder på Convolutional Neural Networks (CNN). Mätvärden såsom accuracy, precision, recall och F1-score (Hossin et al. 2011) utvärderades. Specifikt är denna studie inriktad på malariaklassificering med hjälp av ett dataset som tillhandahålls av U.S. National Library of Medicine (Malaria Datasets n.d.). Detta dataset innehåller bilder av tunna blodutstryk, med segmenterade oinfekterade och parasiterade blodceller. I denna studie föreslogs 3 CNN-modeller för parasiteringsklassificeringen. Varje modell tränades på det ursprungliga datasetet och 4 förbehandlade dataset. De förbehandlingsmetoder som användes för att skapa de 4 dataseten var gråskala, normalisering, histogramutjämning och kontrastbegränsad adaptiv histogramutjämning (CLAHE). Effekten av CLAHE-förbehandlingen gav en förbättring av 1.46% (modell 1) och 0.61% (modell 2) jämfört med det ursprungliga datasetet, vad gäller F1-score. En modell (modell 3) gav inget resultat. Resultaten visar att CNN:er kan användas för parasiteringsklassificering, men effekten av förbehandling är begränsad. Deep Learning Convolutional Neural Network Malaria Image Recognition Preprocessing Computer Aided Diagnosis Grayscale Normalization Histogram Equalization CLAHE. Deep Learning Convolutional Neural Network Malaria Image Recognition Preprocessing Computer Aided Diagnosis Grayscale Normalization Histogram Equalization CLAHE. Computer and Information Sciences Data- och informationsvetenskap
137	Violin Artist Identification by Analyzing Raga-vistaram Audio Ramlal, Nandakishor January 2023 (has links) With the inception of music streaming and media content delivery platforms, there has been a tremendous increase in the music available on the internet and the metadata associated with it. In this study, we address the problem of violin artist identification, which tries to classify the performing artist based on the learned features. Even though numerous previous works studied the problem in detail and developed features and deep learning models that can be used, an interesting fact was that most studies focused on artist identification in western popular music and less on Indian classical music. For the same reason, there was no standardized dataset for this purpose. Hence, we curated a new dataset consisting of audio recordings from 6 renowned South Indian Carnatic violin artists. In this study, we explore the use of log-Mel-spectrogram feature and the embeddings generated by a pre-learned VGGish network on a Convolutional Neural Network and Convolutional Recurrent Neural Network Model. From the experiments, we observe that the Convolutional Recurrent Neural Network model trained using the log-Mel-spectrogram feature gave the optimal performance with a classification accuracy of 71.70%. / Med starten av plattformar för musikströmning och leverans av mediainnehåll har det skett en enorm ökning av musiken tillgänglig på internet och den metadata som är associerad med den. I denna studie tar vi upp problemet med fiolkonstnärsidentifikation, som försöker klassificera den utövande konstnären utifrån de inlärda dragen. Även om många tidigare verk studerade problemet i detalj och utvecklade funktioner och modeller för djupinlärning som kan användas, var ett intressant faktum att de flesta studier fokuserade på artistidentifiering i västerländsk populärmusik och mindre på indisk klassisk musik. Av samma anledning fanns det ingen standardiserad datauppsättning för detta ändamål. Därför kurerade vi en ny datauppsättning bestående av ljudinspelningar från 6 kända sydindiska karnatiska violinkonstnärer. I den här studien utforskar vi användningen av log-Melspektrogramfunktionen och inbäddningarna som genereras av ett förinlärt VGGishnätverk på ett Convolutional Neural Network och Convolutional Recurrent Neural Network Model. Från experimenten observerar vi att modellen Convolutional Recurrent Neural Network tränad med hjälp av log-Mel-spektrogramfunktionen gav optimal prestanda med en klassificeringsnoggrannhet på 71,70%. Artist identification Music information retrieval Deep Learning Convolutional Neural Network Convolutional Recurrent Neural Network Embeddings log-Melspectrogram Artistidentifiering återhämtning av musikinformation Deep Learning Convolutional Neural Network Convolutional Recurrent Neural Network Inbäddningar log-Melspektrogram Computer and Information Sciences Data- och informationsvetenskap
138	Evaluation of Attention Mechanisms for Just-In-Time Software Defect Prediction / En Utvärdering av Attention Mechanisms för Just-In-Time Software Defect Prediction Isunza Navarro, Abgeiba Yaroslava January 2020 (has links) Just-In-Time Software Defect Prediction (JIT-DP) focuses on predicting errors in software at change-level with the objective of helping developers identify defects while the development process is still ongoing, and improving the quality of software applications. This work studies deep learning techniques by applying attention mechanisms that have been successful in, among others, Natural Language Processing (NLP) tasks. We introduce two networks named Convolutional Neural Network with Bidirectional Attention (BACNN) and Bidirectional Attention Code Network (BACoN) that employ a bi-directional attention mechanism between the code and message of a software change. Furthermore, we examine BERT [17] and RoBERTa [57] attention architectures for JIT-DP. More specifically, we study the effectiveness of the aforementioned attention-based models to predict defective commits compared to the current state of the art, DeepJIT [37] and TLEL [101]. Our experiments evaluate the models by using software changes from the OpenStack open source project. The results showed that attention-based networks outperformed the baseline models in terms of accuracy in the different evaluation settings. The attention-based models, particularly BERT and RoBERTa architectures, demonstrated promising results in identifying defective software changes and proved to be effective in predicting defects in changes of new software releases. / Just-In-Time Defect Prediction (JIT-DP) fokuserar på att förutspå fel i mjukvara vid ändringar i koden, med målet att hjälpa utvecklare att identifiera defekter medan utvecklingsprocessen fortfarande är pågående, och att förbättra kvaliteten hos applikationsprogramvara. Detta arbete studerar djupinlärningstekniker genom att tillämpa attentionmekanismer som har varit framgångsrika inom, bland annat, språkteknologi (NLP). Vi introducerar två nätverk vid namn Convolutional Neural Network with Bidirectional Attention (BACNN), och Bidirectional Attention Code Network (BACoN), som använder en tvåriktad attentionmekanism mellan koden och meddelandet om en mjukvaruändring. Dessutom undersöker vi BERT [17] och RoBERTa [57], attentionarkitekturer för JIT-DP. Mer specifikt studerar vi hur effektivt dessa attentionbaserade modeller kan förutspå defekta ändringar, och jämför dem med de bästa tillgängliga arkitekturerna DeePJIT [37] och TLEL [101]. Våra experiment utvärderar modellerna genom att använda mjukvaruändringar från det öppna källkodsprojektet OpenStack. Våra resultat visar att attentionbaserade nätverk överträffar referensmodellen sett till träffsäkerheten i de olika scenarierna. De attentionbaserade modellerna, framför allt BERT och RoBERTa, demonstrerade lovade resultat när det kommer till att identifiera defekta mjukvaruändringar och visade sig vara effektiva på att förutspå defekter i ändringar av nya mjukvaruversioner. Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Computer and Information Sciences Data- och informationsvetenskap
139	Convolutional neural network based object detection in a fish ladder : Positional and class imbalance problems using YOLOv3 / Objektdetektering i en fisktrappa baserat på convolutional neural networks : Positionell och kategorisk obalans vid användning av YOLOv3 Ekman, Patrik January 2021 (has links) Hydropower plants create blockages in fish migration routes. Fish ladders can serve as alternative routes but are complex to install and follow up to help adapt and develop them further. In this study, computer vision tools are considered in this regard. More specifically, object detection is applied to images collected in a hydropower plant fish ladder to localise and classify wild, farmed and unknown fish labelled according to the presence, absence or uncertainty of an adipose fin. Fish migration patterns are not deterministic, making it a challenge to collect representative and balanced data to train a model that is resilient to changing conditions. In this study, two data imbalances are addressed by modifying a YOLOv3 baseline model: foreground-foreground class imbalance is targeted using hard and soft resampling and positional imbalance using translation augmentation. YOLOv3 is a convolutional neural network predicting bounding box coordinates, class probabilities and confidence scores simultaneously. It divides images into grids and makes predictions based on grid cell locations and anchor box offsets. Performance is estimated across 10 random data splits and different bounding box overlap thresholds, using (mean) average precision as well as recall, precision and F1 score estimated at optimal validation set confidence thresholds. The Wilcoxon signed-ranks test is used for determining statistical significance. In experiments, the best performance was observed on wild and farmed fish, with F1 scores reaching 94.8 and 89.0 percent respectively. The inconsistent appearance of unknown fish appears harder to generalise to, with a corresponding F1 score of 65.7 percent. Soft sampling but especially translation augmentation contributed to enhanced performance and reduced variance, implying that the baseline model is particularly sensitive to positional imbalance. Spatial dependencies introduced by YOLOv3’s grid cell strategy likely produce local bias or overfitting. An experimental evaluation highlight the importance of not relying on a single data split when evaluating performance on a moderately large or custom dataset. A key challenge observed in experiments is the choice of a suitable confidence threshold, influencing the dynamics of the results. / Vattenkraftverk blockerar fiskars vandringsvägar. Fisktrappor kan skapa alternativa vägar men är komplexa att installera och följa upp för vidare anpassning och utveckling. I denna studie betraktas datorseende i detta avseende. Mer specifikt appliceras objektdetektering på bilder samlade i en fisktrappa i anslutning till ett vattenkraftverk, med målet att lokalisera och klassificera vilda, odlade och okända fiskar baserat på förekomsten, avsaknaden eller osäkerheten av en fett-fena. Fiskars migrationsmönster är inte deterministiska vilket gör det svårt att samla representativ och balanserad data för att trana en modell som kan hantera förändrade förutsättningar. I denna studie addresseras två obalanser i datan genom modifikation av en YOLOv3 baslinjemodell: klass-obalans genom hård och mjuk återanvändning av data och positionell obalans genom translation av bilder innan träning. YOLOv3 är ett convolutional neural network som simultant förutsäger avgränsnings-lådor, klass-sannolikheter och prediktions-säkerhet. Bilder delas upp i rutnätceller och prediktioner görs baserat på cellers position samt modifikation av fördefinierade avgränsningslådor. Resultat beräknas på 10 slumpmässiga uppdelningar av datan och för olika tröskelvärden för avgränsningslådors överlappning. På detta beräknas (mean) average precision, liksom recall, precision och F1 score med tröskelvärden för prediktions-säkerhet beräknat på valideringsdata. Wilcoxon signed-ranks test används för att avgöra statistisk signifikans. Bäst resultat observeras på vilda och odlade fiskar, med F1 scores som når 94.8 respektive 89.0 procent. Okända fiskars inkonsekventa utseenden verkar svårare att generalisera till, med en motsvarande F1 score på 65.7 procent. Mjuk återanvändning av data men speciellt translation bidrar till förbättrad prestanda och minskad varians, vilket pekar på att baslinjemodellen är särskilt känslig för positionell obalans. Spatiala beroenden skapade av YOLOv3s rutnäts-strategi producerar troligen lokal partiskhet eller överträning. I en experimentell utvärdering understryks vikten av multipel uppdelning av datan vid evaluering på ett måttligt stort eller egenskapat dataset. Att välja tröskelvärdet för prediktions-säkerhet anses utmanande och påverkar resultatens dynamik. Object detection Computer vision Fish ladder Imbalance problems Imbalanced data YOLO Convolutional Neural Network Deep learning Objektdetektering Datorseende Fisktrappa Obalanser Obalanserad data YOLO Convolutional Neural Network Djupinl¨arning Computer and Information Sciences Data- och informationsvetenskap
140	Non-Bayesian Out-of-Distribution Detection Applied to CNN Architectures for Human Activity Recognition Socolovschi, Serghei January 2022 (has links) Human Activity Recognition (HAR) field studies the application of artificial intelligence methods for the identification of activities performed by people. Many applications of HAR in healthcare and sports require the safety-critical performance of the predictive models. The predictions produced by these models should be not only correct but also trustworthy. However, in recent years it has been shown that modern neural networks tend to produce sometimes wrong and overconfident predictions when processing unusual inputs. This issue puts at risk the prediction credibility and calls for solutions that might help estimate the uncertainty of the model’s predictions. In the following work, we started the investigation of the applicability of Non-Bayesian Uncertainty Estimation methods to the Deep Learning classification models in the HAR. We trained a Convolutional Neural Network (CNN) model with public datasets, such as UCI HAR and WISDM, which collect sensor-based time-series data about activities of daily life. Through a series of four experiments, we evaluated the performance of two Non-Bayesian uncertainty estimation methods, ODIN and Deep Ensemble, on out-of-distribution detection. We found out that the ODIN method is able to separate out-of-distribution samples from the in-distribution data. However, we also obtained unexpected behavior, when the out-of-distribution data contained exclusively dynamic activities. The Deep Ensemble method did not provide satisfactory results for our research question. / Inom området Human Activity Recognition (HAR) studeras tillämpningen av metoder för artificiell intelligens för identifiering av aktiviteter som utförs av människor. Många av tillämpningarna av HAR inom hälso och sjukvård och idrott kräver att de prediktiva modellerna har en säkerhetskritisk prestanda. De förutsägelser som dessa modeller ger upphov till ska inte bara vara korrekta utan också trovärdiga. Under de senaste åren har det dock visat sig att moderna neurala nätverk tenderar att ibland ge felaktiga och överdrivet säkra förutsägelser när de behandlar ovanliga indata. Detta problem äventyrar förutsägelsernas trovärdighet och kräver lösningar som kan hjälpa till att uppskatta osäkerheten i modellens förutsägelser. I följande arbete inledde vi undersökningen av tillämpligheten av icke-Bayesianska metoder för uppskattning av osäkerheten på Deep Learning-klassificeringsmodellerna i HAR. Vi tränade en CNN-modell med offentliga dataset, såsom UCI HAR och WISDM, som samlar in sensorbaserade tidsseriedata om aktiviteter i det dagliga livet. Genom en serie av fyra experiment utvärderade vi prestandan hos två icke-Bayesianska metoder för osäkerhetsuppskattning, ODIN och Deep Ensemble, för upptäckt av out-of-distribution. Vi upptäckte att ODIN-metoden kan skilja utdelade prover från data som är i distribution. Vi fick dock också ett oväntat beteende när uppgifterna om out-of-fdistribution uteslutande innehöll dynamiska aktiviteter. Deep Ensemble-metoden gav inga tillfredsställande resultat för vår forskningsfråga. Human Activity Recognition Deep Learning Time Series Uncertainty Estimation Outofdistribution Detection Convolutional Neural Network Human Activity Recognition Deep Learning Tidsserie Uppskattning av Osäkerheten Outofdistribution Detection Convolutional Neural Network Computer and Information Sciences Data- och informationsvetenskap

Search results