Spelling suggestions: "subject:"btransfer learning"" "subject:"cotransfer learning""
311 |
[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOSEVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links)
[pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques.
|
312 |
Hierarchical Control of Simulated Aircraft / Hierarkisk kontroll av simulerade flygplanMannberg, Noah January 2023 (has links)
This thesis investigates the effectiveness of employing pretraining and a discrete "control signal" bottleneck layer in a neural network trained in aircraft navigation through deep reinforcement learning. The study defines two distinct tasks to assess the efficacy of this approach. The first task is utilized for pretraining specific parts of the network, while the second task evaluates the potential benefits of this technique. The experimental findings indicate that the network successfully learned three main macro actions during pretraining. flying straight ahead, turning left, and turning right, and achieved high rewards on the task. However, utilizing the pretrained network on the transfer task yielded poor performance, possibly due to the limited effective action space or deficiencies in the training process. The study discusses several potential solutions, such as incorporating multiple pretraining tasks and alterations of the training process as avenues for future research. Overall, this study highlights the challanges and opportunities associated with combining pretraining with a discrete bottleneck layer in the context of simulated aircraft navigation using reinforcement learning. / Denna studie undersöker effektiviteten av att använda förträning och en diskret "styrsignal" som fungerar som flaskhals i ett neuralt nätverk tränat i flygnavigering med hjälp av djup förstärkande inlärning. Studien definierar två olika uppgifter för att bedöma effektiviteten hos denna metod. Den första uppgiften används för att förträna specifika delar at nätverket, medan den andra uppgiften utvärderar de potentiella fördelarna med denna teknik. De experimentella resultaten indikerar att nätverket framgångsrikt lärde sig tre huvudsakliga makrohandlingar under förträningen: att flyga rakt fram, att svänga vänster och att svänga höger, och uppnådde höga belöningar för uppgiften. Men att använda det förtränade nätverket för den uppföljande uppgiften gav dålig prestation, möjligen på grund av det begränsade effektiva handlingsutrymmet eller begränsningar i träningsprocessen. Studien diskuterar flera potentiella lösningar, såsom att inkorporera flera förträningsuppgifter och ändringar i träningsprocessen, som möjliga framtida forskningsvägar. Sammantaget belyser denna studie de utmaningar och möjligheter som är förknippade med att kombinera förträning med ett diskret flaskhalslager inom kontexten av simulerad flygnavigering och förstärkningsinlärning.
|
313 |
Towards maintainable machine learning development through continual and modular learningOstapenko, Oleksiy 11 1900 (has links)
As machine learning models grow in size and complexity, their maintainability becomes a critical concern, especially when they are increasingly deployed in dynamic, real-world environments. This thesis addresses the challenges of efficient knowledge retention, integration, and transfer in multitask learning and continuous multitask learning, focusing on improving the maintainability of machine learning systems. At the core of this work is the exploration of modular methods and the strategic use of foundation models (FMs) to facilitate continuous learning (CL) and efficient model management. This thesis first examines how modularity can be exploited to enable continuous learning. The first paper “Continuous Learning via Local Module Composition” introduces the Local Modular Components (LMC) approach, which innovatively uses module-specific local routing to achieve automatic task inference, mitigate forgetting, and enable the fusion of independently trained LMCs. The principle of the local routing component has been extended and refined in subsequent research. The second paper, “Continuous Learning with Foundation Models: An Empirical Study of Latent Replay,” questions the need for complicated continuous learning methods in the era of foundation models. It explores the potential of realizing continuous learning using the encoded features of pre-trained foundation models. This latent continuous learning approach demonstrates that, depending on the characteristics of the tasks and data, latent replay can effectively and efficiently match the performance of traditional end-to-end continuous learning, especially when the alignment between the pre-training and downstream data distributions improves. The third paper, “Towards Modular LLMs by Building and Reusing a Library of LoRAs,” delves into the practical implementation of a hybrid approach combining modularity and foundation models. This work proposes the creation of a library of LoRA adapters, allowing the reuse and combination of these experts in different tasks, facilitated by novel routing techniques called Arrow. This thesis contributes to the field by demonstrating how modularity and foundation models can work together to create adaptive, efficient, and maintainable machine learning systems. It also outlines future directions, emphasizing the need to minimize model retraining through modular architectures and addressing open challenges in managing modular systems. / As machine learning models continue to grow in size and complexity, their maintainability has become a critical concern, especially as they are increasingly deployed in dynamic and real-world environments. This thesis addresses the challenges of efficient knowledge retention, integration, and transfer in multitask and continual multitask learning, focusing on improving the maintainability of machine learning systems. Central to this work is the exploration of modular methods and the strategic use of foundation models (FMs) to facilitate continual learning (CL) and efficient model management. This thesis first investigates how modularity can be leveraged to enable continual learning. The first article “Continual Learning via Local Module Composition” introduces the Local Modular Components (LMC) approach, which innovatively uses module-specific local routing to achieve automatic task inference, mitigate forgetting, and allow the merging of independently trained LMCs. The principle of the local routing component has been extended and refined in subsequent research. The second article, “Continual Learning with Foundation Models: An Empirical Study of Latent Replay”, questions the necessity of complicated continual learning methods in the era of foundation models. It explores the potential of performing CL using the encoded features from pre-trained foundation models. This latent CL approach demonstrates that, depending on the task and data characteristics, latent replay can effectively and efficiently match the performance of traditional end-to-end CL, especially as the alignment between pre-training and downstream data distributions improves. The third article, “Towards Modular LLMs by Building and Reusing a Library of LoRAs”, dives into the practical implementation of a hybrid approach combining modularity and foundation models. This work proposes creating a library of LoRA adapters, enabling the reuse and combination of these experts across different tasks, facilitated by novel routing techniques called Arrow. This thesis contributes to the field by demonstrating how modularity and foundation models can work in tandem to create adaptive, efficient, and maintainable machine learning systems. It also outlines future directions, emphasizing the need for minimizing model retraining through modular architectures and addressing open challenges in modular system management.
|
314 |
BERTie Bott’s Every Flavor Labels : A Tasty Guide to Developing a Semantic Role Labeling Model for GalicianBruton, Micaella January 2023 (has links)
For the vast majority of languages, Natural Language Processing (NLP) tools are either absent entirely, or leave much to be desired in their final performance. Despite having nearly 4 million speakers, one such low-resource language is Galician. In an effort to expand available NLP resources, this project sought to construct a dataset for Semantic Role Labeling (SRL) and produce a baseline for future research to use in comparisons. SRL is a task which has shown success in amplifying the final output for various NLP systems, including Machine Translation and other interactive language models. This project was successful in that fact and produced 24 SRL models and two SRL datasets; one Galician and one Spanish. mBERT and XLM-R were chosen as the baseline architectures; additional models were first pre-trained on the SRL task in a language other than the target to measure the effects of transfer-learning. Scores are reported on a scale of 0.0-1.0. The best performing Galician SRL model achieved an f1 score of 0.74, introducing a baseline for future Galician SRL systems. The best performing Spanish SRL model achieved an f1 score of 0.83, outperforming the baseline set by the 2009 CoNLL Shared Task by 0.025. A pre-processing method, verbal indexing, was also introduced which allowed for increased performance in the SRL parsing of highly complex sentences; effects were amplified in scenarios where the model was both pre-trained and fine-tuned on datasets utilizing the method, but still visible even when only used during fine-tuning. / För de allra flesta språken saknas språkteknologiska verktyg (NLP) helt, eller för dem de var i finns tillgängliga är dessa verktygs prestanda minst sagt, sämre än medelmåttig. Trots sina nästan 4 miljoner talare, är galiciska ett språk med brist på tillräckliga resurser. I ett försök att utöka tillgängliga NLP-resurser för språket, konstruerades i detta projekt en uppsättning data för så kallat Semantic Role Labeling (SRL) som sedan användes för att utveckla grundläggande SRL-modeller att falla tillbaka på och jämföra med i framtida forskning. SRL är en uppgift som har visat framgång när det gäller att förstärka slutresultatet för olika NLP-system, inklusive maskinöversättning och andra interaktiva språkmodeller. I detta avseende visade detta projekt på framgång och som del av det utvecklades 24 SRL-modeller och två SRL-datauppsåttningar; en galicisk och en spansk. mBERT och XLM-R valdes som baslinjearkitekturer; ytterligare modeller tränades först på en SRL-uppgift på ett språk annat än målspråket för att mäta effekterna av överföringsinlärning (Transfer Learning) Poäng redovisas på en skala från 0.0-1.0. Den galiciska SRL-modellen med bäst prestanda uppnådde ett f1-poäng på 0.74, vilket introducerar en baslinje för framtida galiciska SRL-system. Den bästa spanska SRL-modellen uppnådde ett f1-poäng på 0.83, vilket överträffade baslinjen +0.025 som sattes under CoNLL Shared Task 2009. I detta projekt introduceras även en ny metod för behandling av lingvistisk data, så kallad verbalindexering, som ökade prestandan av mycket komplexa meningar. Denna prestandaökning först märktes ytterligare i de scenarier och är en modell både förtränats och finjusterats på uppsättningar data som behandlats med metoden, men visade även på märkbara förbättringar då en modell endast genomgått finjustering. / Para la gran mayoría de los idiomas, las herramientas de procesamiento del lenguaje natural (NLP) están completamente ausentes o dejan mucho que desear en su desempeño final. A pesar de tener casi 4 millones de hablantes, el gallego continúa siendo un idioma de bajos recursos. En un esfuerzo por expandir los recursos de NLP disponibles, el objetivo de este proyecto fue construir un conjunto de datos para el Etiquetado de Roles Semánticos (SRL) y producir una referencia para que futuras investigaciones puedan utilizar en sus comparaciones. SRL es una tarea que ha tenido éxito en la amplificación del resultado final de varios sistemas NLP, incluida la traducción automática, y otros modelos de lenguaje interactivo. Este proyecto fue exitoso en ese hecho y produjo 24 modelos SRL y dos conjuntos de datos SRL; uno en gallego y otro en español. Se eligieron mBERT y XLM-R como las arquitecturas de referencia; previamente se entrenaron modelos adicionales en la tarea SRL en un idioma distinto al idioma de destino para medir los efectos del aprendizaje por transferencia. Las puntuaciones se informan en una escala de 0.0 a 1.0. El modelo SRL gallego con mejor rendimiento logró una puntuación de f1 de 0.74, introduciendo un objetivo de referencia para los futuros sistemas SRL gallegos. El modelo español de SRL con mejor rendimiento logró una puntuación de f1 de 0.83, superando la línea base establecida por la Tarea Compartida CoNLL de 2009 en 0.025. También se introdujo un método de preprocesamiento, indexación verbal, que permitió un mayor rendimiento en el análisis SRL de oraciones muy complejas; los efectos se amplificaron cuando el modelo primero se entrenó y luego se ajustó con los conjuntos de datos que utilizaban el método, pero los efectos aún fueron visibles incluso cuando se lo utilizó solo durante el ajuste.
|
315 |
Diffusion Tensor Imaging Analysis for Subconcussive Trauma in Football and Convolutional Neural Network-Based Image Quality Control That Does Not Require a Big DatasetIkbeom Jang (5929832) 14 May 2019 (has links)
Diffusion Tensor Imaging (DTI) is a magnetic resonance imaging (MRI)-based technique that has frequently been used for the identification of brain biomarkers of neurodevelopmental and neurodegenerative disorders because of its ability to assess the structural organization of brain tissue. In this work, I present (1) preclinical findings of a longitudinal DTI study that investigated asymptomatic high school football athletes who experienced repetitive head impact and (2) an automated pipeline for assessing the quality of DTI images that uses a convolutional neural network (CNN) and transfer learning. The first section addresses the effects of repetitive subconcussive head trauma on the white matter of adolescent brains. Significant concerns exist regarding sub-concussive injury in football since many studies have reported that repetitive blows to the head may change the microstructure of white matter. This is more problematic in youth-aged athletes whose white matter is still developing. Using DTI and head impact monitoring sensors, regions of significantly altered white matter were identified and within-season effects of impact exposure were characterized by identifying the volume of regions showing significant changes for each individual. The second section presents a novel pipeline for DTI quality control (QC). The complex nature and long acquisition time associated with DTI make it susceptible to artifacts that often result in inferior diagnostic image quality. We propose an automated QC algorithm based on a deep convolutional neural network (DCNN). Adaptation of transfer learning makes it possible to train a DCNN with a relatively small dataset in a short time. The QA algorithm detects not only motion- or gradient-related artifacts, but also various erroneous acquisitions, including images with regional signal loss or those that have been incorrectly imaged or reconstructed.
|
316 |
Reconnaissance de postures humaines par fusion de la silhouette et de l'ombre dans l'infrarougeGouiaa, Rafik 01 1900 (has links)
Les systèmes multicaméras utilisés pour la vidéosurveillance sont complexes, lourds et coûteux. Pour la surveillance d'une pièce, serait-il possible de les remplacer par un système beaucoup plus simple utilisant une seule caméra et une ou plusieurs sources lumineuses en misant sur les ombres projetées pour obtenir de l'information 3D ?
Malgré les résultats intéressants offerts par les systèmes multicaméras, la quantité d'information à traiter et leur complexité limitent grandement leur usage. Dans le même contexte, nous proposons de simplifier ces systèmes en remplaçant une caméra par une source lumineuse. En effet, une source lumineuse peut être vue comme une caméra qui génère une image d'ombre révélant l'objet qui bloque la lumière. Notre système sera composé par une seule caméra et une ou plusieurs sources lumineuses infrarouges (invisibles à l'oeil). Malgré les difficultés prévues quant à l'extraction de l'ombre et la déformation et l'occultation de l'ombre par des obstacles (murs, meubles...), les gains sont multiples en utilisant notre système. En effet, on peut éviter ainsi les problèmes de synchronisation et de calibrage de caméras et réduire le coût en remplaçant des caméras par de simples sources infrarouges.
Nous proposons deux approches différentes pour automatiser la reconnaissance de postures humaines. La première approche reconstruit la forme 3D d'une personne pour faire la reconnaissance de la posture en utilisant des descripteurs de forme. La deuxième approche combine directement l'information 2D (ombre+silhouette) pour faire la reconnaissance de postures.
Scientifiquement, nous cherchons à prouver que l'information offerte par une silhouette et l'ombre générée par une source lumineuse est suffisante pour permettre la reconnaissance de postures humaines élémentaires (p.ex. debout, assise, couchée, penchée, etc.).
Le système proposé peut être utilisé pour la vidéosurveillance d'endroits non encombrés tels qu'un corridor dans une résidence de personnes âgées (pour la détection des chutes p. ex.) ou d'une compagnie (pour la sécurité). Son faible coût permettrait un plus grand usage de la vidéosurveillance au bénéfice de la société. Au niveau scientifique, la démonstration théorique et pratique d'un tel système est originale et offre un grand potentiel pour la vidéosurveillance. / Human posture recognition (HPR) from video sequences is one of the major active
research areas of computer vision. It is one step of the global process of human activity
recognition (HAR) for behaviors analysis. Many HPR application systems have
been developed including video surveillance, human-machine interaction, and the video
retrieval. Generally, applications related to HPR can be achieved using mainly two
approaches : single camera or multi-cameras. Despite the interesting performance achieved
by multi-camera systems, their complexity and the huge information to be processed
greatly limit their widespread use for HPR.
The main goal of this thesis is to simplify the multi-camera system by replacing a
camera by a light source. In fact, a light source can be seen as a virtual camera, which
generates a cast shadow image representing the silhouette of the person that blocks the
light. Our system will consist of a single camera and one or more infrared light sources.
Despite some technical difficulties in cast shadow segmentation and cast shadow deformation
because of walls and furniture, different advantages can be achieved by using our
system. Indeed, we can avoid the synchronization and calibration problems of multiple
cameras, reducing the cost of the system and the amount of processed data by replacing
a camera by one light source.
We introduce two different approaches in order to automatically recognize human
postures. The first approach directly combines the person’s silhouette and cast shadow
information, and uses 2D silhouette descriptor in order to extract discriminative features
useful for HPR. The second approach is inspired from the shape from silhouette technique
to reconstruct the visual hull of the posture using a set of cast shadow silhouettes,
and extract informative features through 3D shape descriptor. Using these approaches,
our goal is to prove the utility of the combination of person’s silhouette and cast shadow
information for recognizing elementary human postures (stand, bend, crouch, fall,...)
The proposed system can be used for video surveillance of uncluttered areas such as
a corridor in a senior’s residence (for example, for the detection of falls) or in a company (for security). Its low cost may allow greater use of video surveillance for the benefit of
society.
|
317 |
Interaktivní segmentace 3D CT dat s využitím hlubokého učení / Interactive 3D CT Data Segmentation Based on Deep LearningTrávníčková, Kateřina January 2020 (has links)
This thesis deals with CT data segmentation using convolutional neural nets and describes the problem of training with limited training sets. User interaction is suggested as means of improving segmentation quality for the models trained on small training sets and the possibility of using transfer learning is also considered. All of the chosen methods help improve the segmentation quality in comparison with the baseline method, which is the use of automatic data specific segmentation model. The segmentation has improved by tens of percents in Dice score when trained with very small datasets. These methods can be used, for example, to simplify the creation of a new segmentation dataset.
|
318 |
Segmentace lézí roztroušené sklerózy pomocí hlubokých neuronových sítí / Segmentation of multiple sclerosis lesions using deep neural networksSasko, Dominik January 2021 (has links)
Hlavným zámerom tejto diplomovej práce bola automatická segmentácia lézií sklerózy multiplex na snímkoch MRI. V rámci práce boli otestované najnovšie metódy segmentácie s využitím hlbokých neurónových sietí a porovnané prístupy inicializácie váh sietí pomocou preneseného učenia (transfer learning) a samoriadeného učenia (self-supervised learning). Samotný problém automatickej segmentácie lézií sklerózy multiplex je veľmi náročný, a to primárne kvôli vysokej nevyváženosti datasetu (skeny mozgov zvyčajne obsahujú len malé množstvo poškodeného tkaniva). Ďalšou výzvou je manuálna anotácia týchto lézií, nakoľko dvaja rozdielni doktori môžu označiť iné časti mozgu ako poškodené a hodnota Dice Coefficient týchto anotácií je približne 0,86. Možnosť zjednodušenia procesu anotovania lézií automatizáciou by mohlo zlepšiť výpočet množstva lézií, čo by mohlo viesť k zlepšeniu diagnostiky individuálnych pacientov. Našim cieľom bolo navrhnutie dvoch techník využívajúcich transfer learning na predtrénovanie váh, ktoré by neskôr mohli zlepšiť výsledky terajších segmentačných modelov. Teoretická časť opisuje rozdelenie umelej inteligencie, strojového učenia a hlbokých neurónových sietí a ich využitie pri segmentácii obrazu. Následne je popísaná skleróza multiplex, jej typy, symptómy, diagnostika a liečba. Praktická časť začína predspracovaním dát. Najprv boli skeny mozgu upravené na rovnaké rozlíšenie s rovnakou veľkosťou voxelu. Dôvodom tejto úpravy bolo využitie troch odlišných datasetov, v ktorých boli skeny vytvárané rozličnými prístrojmi od rôznych výrobcov. Jeden dataset taktiež obsahoval lebku, a tak bolo nutné jej odstránenie pomocou nástroju FSL pre ponechanie samotného mozgu pacienta. Využívali sme 3D skeny (FLAIR, T1 a T2 modality), ktoré boli postupne rozdelené na individuálne 2D rezy a použité na vstup neurónovej siete s enkodér-dekodér architektúrou. Dataset na trénovanie obsahoval 6720 rezov s rozlíšením 192 x 192 pixelov (po odstránení rezov, ktorých maska neobsahovala žiadnu hodnotu). Využitá loss funkcia bola Combo loss (kombinácia Dice Loss s upravenou Cross-Entropy). Prvá metóda sa zameriavala na využitie predtrénovaných váh z ImageNet datasetu na enkodér U-Net architektúry so zamknutými váhami enkodéra, resp. bez zamknutia a následného porovnania s náhodnou inicializáciou váh. V tomto prípade sme použili len FLAIR modalitu. Transfer learning dokázalo zvýšiť sledovanú metriku z hodnoty približne 0,4 na 0,6. Rozdiel medzi zamknutými a nezamknutými váhami enkodéru sa pohyboval okolo 0,02. Druhá navrhnutá technika používala self-supervised kontext enkodér s Generative Adversarial Networks (GAN) na predtrénovanie váh. Táto sieť využívala všetky tri spomenuté modality aj s prázdnymi rezmi masiek (spolu 23040 obrázkov). Úlohou GAN siete bolo dotvoriť sken mozgu, ktorý bol prekrytý čiernou maskou v tvare šachovnice. Takto naučené váhy boli následne načítané do enkodéru na aplikáciu na náš segmentačný problém. Tento experiment nevykazoval lepšie výsledky, s hodnotou DSC 0,29 a 0,09 (nezamknuté a zamknuté váhy enkodéru). Prudké zníženie metriky mohlo byť spôsobené použitím predtrénovaných váh na vzdialených problémoch (segmentácia a self-supervised kontext enkodér), ako aj zložitosť úlohy kvôli nevyváženému datasetu.
|
319 |
Neural networks regularization through representation learning / Régularisation des réseaux de neurones via l'apprentissage des représentationsBelharbi, Soufiane 06 July 2018 (has links)
Les modèles de réseaux de neurones et en particulier les modèles profonds sont aujourd'hui l'un des modèles à l'état de l'art en apprentissage automatique et ses applications. Les réseaux de neurones profonds récents possèdent de nombreuses couches cachées ce qui augmente significativement le nombre total de paramètres. L'apprentissage de ce genre de modèles nécessite donc un grand nombre d'exemples étiquetés, qui ne sont pas toujours disponibles en pratique. Le sur-apprentissage est un des problèmes fondamentaux des réseaux de neurones, qui se produit lorsque le modèle apprend par coeur les données d'apprentissage, menant à des difficultés à généraliser sur de nouvelles données. Le problème du sur-apprentissage des réseaux de neurones est le thème principal abordé dans cette thèse. Dans la littérature, plusieurs solutions ont été proposées pour remédier à ce problème, tels que l'augmentation de données, l'arrêt prématuré de l'apprentissage ("early stopping"), ou encore des techniques plus spécifiques aux réseaux de neurones comme le "dropout" ou la "batch normalization". Dans cette thèse, nous abordons le sur-apprentissage des réseaux de neurones profonds sous l'angle de l'apprentissage de représentations, en considérant l'apprentissage avec peu de données. Pour aboutir à cet objectif, nous avons proposé trois différentes contributions. La première contribution, présentée dans le chapitre 2, concerne les problèmes à sorties structurées dans lesquels les variables de sortie sont à grande dimension et sont généralement liées par des relations structurelles. Notre proposition vise à exploiter ces relations structurelles en les apprenant de manière non-supervisée avec des autoencodeurs. Nous avons validé notre approche sur un problème de régression multiple appliquée à la détection de points d'intérêt dans des images de visages. Notre approche a montré une accélération de l'apprentissage des réseaux et une amélioration de leur généralisation. La deuxième contribution, présentée dans le chapitre 3, exploite la connaissance a priori sur les représentations à l'intérieur des couches cachées dans le cadre d'une tâche de classification. Cet à priori est basé sur la simple idée que les exemples d'une même classe doivent avoir la même représentation interne. Nous avons formalisé cet à priori sous la forme d'une pénalité que nous avons rajoutée à la fonction de perte. Des expérimentations empiriques sur la base MNIST et ses variantes ont montré des améliorations dans la généralisation des réseaux de neurones, particulièrement dans le cas où peu de données d'apprentissage sont utilisées. Notre troisième et dernière contribution, présentée dans le chapitre 4, montre l'intérêt du transfert d'apprentissage ("transfer learning") dans des applications dans lesquelles peu de données d'apprentissage sont disponibles. L'idée principale consiste à pré-apprendre les filtres d'un réseau à convolution sur une tâche source avec une grande base de données (ImageNet par exemple), pour les insérer par la suite dans un nouveau réseau sur la tâche cible. Dans le cadre d'une collaboration avec le centre de lutte contre le cancer "Henri Becquerel de Rouen", nous avons construit un système automatique basé sur ce type de transfert d'apprentissage pour une application médicale où l'on dispose d’un faible jeu de données étiquetées. Dans cette application, la tâche consiste à localiser la troisième vertèbre lombaire dans un examen de type scanner. L’utilisation du transfert d’apprentissage ainsi que de prétraitements et de post traitements adaptés a permis d’obtenir des bons résultats, autorisant la mise en oeuvre du modèle en routine clinique. / Neural network models and deep models are one of the leading and state of the art models in machine learning. They have been applied in many different domains. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. Our proposal aims mainly at exploiting these dependencies by learning them in an unsupervised way. Validated on a facial landmark detection problem, learning the structure of the output data has shown to improve the network generalization and speedup its training. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. This prior is based on the idea that samples within the same class should have the same internal representation. We formulate this prior as a penalty that we add to the training cost to be minimized. Empirical experiments over MNIST and its variants showed an improvement of the network generalization when using only few training samples. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. The idea consists in re-using the filters of pre-trained convolutional networks that have been trained on large datasets such as ImageNet. Such pre-trained filters are plugged into a new convolutional network with new dense layers. Then, the whole network is trained over a new task. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. A pre-processing of the 3D CT scan to obtain a 2D representation and a post-processing to refine the decision are included in the proposed system. This work has been done in collaboration with the clinic "Rouen Henri Becquerel Center" who provided us with data
|
320 |
Etude et prédiction d'attention visuelle avec les outils d'apprentissage profond en vue d'évaluation des patients atteints des maladies neuro-dégénératives / Study and prediction of visual attention with deep learning net- works in view of assessment of patients with neurodegenerative diseasesChaabouni, Souad 08 December 2017 (has links)
Cette thèse est motivée par le diagnostic et l’évaluation des maladies neuro-dégénératives et dans le but de diagnostique sur la base de l’attention visuelle.Néanmoins, le dépistage à grande échelle de la population n’est possible que si des modèles de prédiction automatique suffisamment robustes peuvent être construits. Dans ce contexte nous nous intéressons `a la conception et le développement des modèles de prédiction automatique pour un contenu visuel spécifique à utiliser dans l’expérience psycho-visuelle impliquant des patients atteints des maladies neuro-dégénératives. La difficulté d’une telle prédiction réside dans une très faible quantité de données d’entraînement. Les modèles de saillance visuelle ne peuvent pas être fondés sur les caractérisitiques “bottom-up” uniquement, comme le suggère la théorie de l’intégration des caractéristiques. La composante “top-down” de l’attention visuelle humaine devient prépondérante au fur et à mesure d’observation de la scène visuelle. L’attention visuelle peut-être prédite en se basant sur les scènes déjà observées. Les réseaux de convolution profonds (CNN) se sont révèlés être un outil puissant pour prédire les zones saillantes dans les images statiques.Dans le but de construire un modèle de prédiction automatique pour les zones saillantes dans les vidéos naturels et intentionnellement dégradées, nous avons conçu une architecture spécifique de CNN profond. Pour surmonter le manque de données d’apprentissage,nous avons conçu un système d’apprentissage par transfert dérivé de la méthode de Bengio.Nous mesurons ses performances lors de la prédiction de régions saillantes. Les r´esultatsobtenus sont int´eressants concernant la r´eaction des sujets t´emoins normaux contre leszones d´egrad´ees dans les vid´eos. La comparaison de la carte de saillance pr´edite des vid´eosintentionnellement d´egrad´ees avec des cartes de densit´e de fixation du regard et d’autresmod`eles de r´ef´erence montre l’int´erˆet du mod`ele d´evelopp´e. / This thesis is motivated by the diagnosis and the evaluation of the dementia diseasesand with the aim of predicting if a new recorded gaze presents a complaint of thesediseases. Nevertheless, large-scale population screening is only possible if robust predictionmodels can be constructed. In this context, we are interested in the design and thedevelopment of automatic prediction models for specific visual content to be used in thepsycho-visual experience involving patients with dementia (PwD). The difficulty of sucha prediction lies in a very small amount of training data.Visual saliency models cannot be founded only on bottom-up features, as suggested byfeature integration theory. The top-down component of human visual attention becomesprevalent as human observers explore the visual scene. Visual saliency can be predictedon the basis of seen data. Deep Convolutional Neural Networks (CNN) have proven tobe a powerful tool for prediction of salient areas in static images. In order to constructan automatic prediction model for the salient areas in natural and intentionally degradedvideos, we have designed a specific CNN architecture. To overcome the lack of learningdata we designed a transfer learning scheme derived from bengio’s method. We measureits performances when predicting salient regions. The obtained results are interestingregarding the reaction of normal control subjects against degraded areas in videos. Thepredicted saliency map of intentionally degraded videos gives an interesting results comparedto gaze fixation density maps and other reference models.
|
Page generated in 0.1031 seconds