Spelling suggestions: "subject:"detraining"" "subject:"coretraining""
11 |
Recognition of Facial Expressions with Autoencoders and Convolutional-NetsAlmousli, Hani 12 1900 (has links)
Les humains communiquent via différents types de canaux: les mots, la voix, les gestes du corps, des émotions, etc. Pour cette raison, un ordinateur doit percevoir ces divers canaux de communication pour pouvoir interagir intelligemment avec les humains, par exemple en faisant usage de microphones et de webcams.
Dans cette thèse, nous nous intéressons à déterminer les émotions humaines à partir d’images ou de vidéo de visages afin d’ensuite utiliser ces informations dans différents domaines d’applications. Ce mémoire débute par une brève introduction à l'apprentissage machine en s’attardant aux modèles et algorithmes que nous avons utilisés tels que les perceptrons multicouches, réseaux de neurones à convolution et autoencodeurs. Elle présente ensuite les résultats de l'application de ces modèles sur plusieurs ensembles de données d'expressions et émotions faciales.
Nous nous concentrons sur l'étude des différents types d’autoencodeurs (autoencodeur débruitant, autoencodeur contractant, etc) afin de révéler certaines de leurs limitations, comme la possibilité d'obtenir de la coadaptation entre les filtres ou encore d’obtenir une courbe spectrale trop lisse, et étudions de nouvelles idées pour répondre à ces problèmes. Nous proposons également une nouvelle approche pour surmonter une limite des autoencodeurs traditionnellement entrainés de façon purement non-supervisée, c'est-à-dire sans utiliser aucune connaissance de la tâche que nous voulons finalement résoudre (comme la prévision des étiquettes de classe) en développant un nouveau critère d'apprentissage semi-supervisé qui exploite un faible nombre de données étiquetées en combinaison avec une grande quantité de données non-étiquetées afin d'apprendre une représentation adaptée à la tâche de classification, et d'obtenir une meilleure performance de classification. Finalement, nous décrivons le fonctionnement général de notre système de détection d'émotions et proposons de nouvelles idées pouvant mener à de futurs travaux. / Humans communicate via different types of channels: words, voice, body gesture, emotions …etc. For this reason, implementing these channels in computers is inevitable to make them interact intelligently with humans. Using a webcam and a microphone, computers should figure out what we want to tell from our voice, gesture and face emotions.
In this thesis we are interested in figuring human emotions from their images or video in order to use that later in different applications. The thesis starts by giving an introduction to machine learning and some of the models and algorithms we used like multilayer perceptron, convolutional neural networks, autoencoders and finally report the results of applying these models on several facial emotion expression datasets.
We moreover concentrate on studying different kinds of autoencoders (Denoising Autoencoder , Contractive Autoencoder, …etc.) and identify some limitations like the possibility of obtaining filters co-adaptation and undesirably smooth spectral curve and we investigate new ideas to address these problems. We also overcome the limitations of training autoencoders in a purely unsupervised manner, i.e. without using any knowledge of task we ultimately want to solve (such as predicting class labels) and develop a new semi-supervised training criterion which exploits the knowledge of the few labeled data to train the autoencoder together with a large amount of unlabeled data in order to learn a representation better suited for the classification task, and obtain better classification performance. Finally, we describe the general pipeline for our emotion detection system and suggest new ideas for future work.
|
12 |
Recognition of Facial Expressions with Autoencoders and Convolutional-NetsAlmousli, Hani 12 1900 (has links)
No description available.
|
13 |
Delineation of vegetated water through pre-trained convolutional networks / Konturteckning av vegeterat vatten genom förtränade konvolutionella nätverkHansen, Johanna January 2024 (has links)
In a world under the constant impact of global warming, wetlands are decreasing in size all across the globe. As the wetlands are a vital part of preventing global warming, the ability to prevent their shrinkage through restorative measures is critical. Continuously orbiting the Earth are satellites that can be used to monitor the wetlands by collecting images of them over time. In order to determine the size of a wetland, and to register if it is shrinking or not, deep learning models can be used. Especially useful for this task is convolutional neural networks (CNNs). This project uses one type of CNN, a U-Net, to segment vegetated water in satellite data. However, this task requires labeled data, which is expensive to generate and difficult to acquire. The model used therefore needs to be able to generate reliable results even on small data sets. Therefore, pre-training of the network is used with a large-scale natural image segmentation data set called Common Objects in Context (COCO). To transfer the satellite data into RGB images to use as input for the pre-trained network, three different methods are tried. Firstly, the commonly used linear transformation method which simply moves the value of radar data into the RGB feature space. Secondly, two convolutional layers are placed before the U-Net which gradually changes the number of channels of the input data, with weights trained through backpropagation during the fine-tuning of the segmentation model. Lastly, a convolutional auto-encoder is trained in the same way as the convolutional layers. The results show that the autoencoder does not perform very well, but that the linear transformation and convolutional layers methods each can outperform the other depending on the data set. No statistical significance can be shown however between the performance of the two latter. Experimenting with including different amounts of polarizations from Sentinel-1 and bands from Sentinel-2 showed that only using radar data gave the best results. It remains to be determined whether one or both of the polarizations should be included to achieve the best result. / I en värld som ständigt påverkas av den globala uppvärmningen, minskar våtmarkerna i storlek över hela världen. Eftersom våtmarkerna är en viktig del i att förhindra global uppvärmning, är förmågan att förhindra att de krymper genom återställande åtgärder kritisk. Kontinuerligt kretsande runt jorden finns satelliter som kan användas för att övervaka våtmarkerna genom att samla in bilder av dem över tid. För att bestämma storleken på en våtmark, i syfte att registrera om den krymper eller inte, kan djupinlärningsmodeller användas. Speciellt användbar för denna uppgift är konvolutionella neurala nätverk (CNN). Detta projekt använder en typ av CNN, ett U-Net, för att segmentera vegeterat vatten i satellitdata. Denna uppgift kräver dock märkt data, vilket är dyrt att generera och svårt att få tag på. Modellen som används behöver därför kunna generera pålitliga resultat även med små datauppsättning. Därför används förträning av nätverket med en storskalig naturlig bildsegmenteringsdatauppsättning som kallas Common Objects in Context (COCO). För att överföra satellitdata till RGB-bilder som ska användas som indata för det förtränade nätverket prövas tre olika metoder. För det första, den vanliga linjära transformationsmetoden som helt enkelt flyttar värdet av radardatan till RGB-funktionsutrymmet. För det andra två konvolutionella lager placerade före U-Net:et som gradvis ändrar mängden kanaler i indatan, med vikter tränade genom bakåtpropagering under finjusteringen av segmenteringsmodellen. Slutligen tränade en konvolutionell auto encoder på samma sätt som de konvolutionella lagren. Resultaten visar att auto encodern inte fungerar särskilt bra, men att metoderna för linjär transformation och konvolutionella lager var och en kan överträffa den andra beroende på datauppsättningen. Ingen statistisk signifikans kan dock visas mellan prestationen för de två senare. Experiment med att inkludera olika mängder av polariseringar från Sentinell-1 och band från Sentinell-2 visade att endast användning av radardata gav de bästa resultaten. Om att inkludera båda polariseringarna eller bara en är den mest lämpliga återstår fortfarande att fastställa.
|
14 |
Deep Learning for Semantic Segmentation of 3D Point Clouds from an Airborne LiDAR / Semantisk segmentering av 3D punktmoln från en luftburen LiDAR med djupinlärningSerra, Sabina January 2020 (has links)
Light Detection and Ranging (LiDAR) sensors have many different application areas, from revealing archaeological structures to aiding navigation of vehicles. However, it is challenging to interpret and fully use the vast amount of unstructured data that LiDARs collect. Automatic classification of LiDAR data would ease the utilization, whether it is for examining structures or aiding vehicles. In recent years, there have been many advances in deep learning for semantic segmentation of automotive LiDAR data, but there is less research on aerial LiDAR data. This thesis investigates the current state-of-the-art deep learning architectures, and how well they perform on LiDAR data acquired by an Unmanned Aerial Vehicle (UAV). It also investigates different training techniques for class imbalanced and limited datasets, which are common challenges for semantic segmentation networks. Lastly, this thesis investigates if pre-training can improve the performance of the models. The LiDAR scans were first projected to range images and then a fully convolutional semantic segmentation network was used. Three different training techniques were evaluated: weighted sampling, data augmentation, and grouping of classes. No improvement was observed by the weighted sampling, neither did grouping of classes have a substantial effect on the performance. Pre-training on the large public dataset SemanticKITTI resulted in a small performance improvement, but the data augmentation seemed to have the largest positive impact. The mIoU of the best model, which was trained with data augmentation, was 63.7% and it performed very well on the classes Ground, Vegetation, and Vehicle. The other classes in the UAV dataset, Person and Structure, had very little data and were challenging for most models to classify correctly. In general, the models trained on UAV data performed similarly as the state-of-the-art models trained on automotive data.
|
Page generated in 0.0917 seconds