Spelling suggestions: "subject:"transferlearning"" "subject:"transferleading""
271 |
Dynamic Headpose Classification and Video Retargeting with Human AttentionAnoop, K R January 2015 (has links) (PDF)
Over the years, extensive research has been devoted to the study of people's head pose due to its relevance in security, human-computer interaction, advertising as well as cognitive, neuro and behavioural psychology. One of the main goals of this thesis is to estimate people's 3D head orientation as they freely move around in naturalistic settings such as parties, supermarkets etc. Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces captured are at low-resolution with a blurred appearance. Also labelling sufficient training data for headpose estimation in such settings is difficult due to the motion of targets and the large possible range of head orientations. Domain adaptation approaches are useful for transferring knowledge from the training source to the test target data having different attributes, minimizing target data labelling efforts in the process. This thesis examines the use of transfer learning for efficient multi-view head pose classification. Relationship between head pose and facial appearance from many labelled examples corresponding to the source data is learned initially. Domain adaptation techniques are then employed to transfer this knowledge to the target data. The following three challenging situations is addressed (I) ranges of head poses in the source and target images is different, (II) where source images capture a stationary person while target images capture a moving person with varying facial appearance due to changing perspective, scale and (III) a combination of (I) and (II). All proposed transfer learning methods are sufficiently tested and benchmarked on a new compiled dataset DPOSE for headpose classification.
This thesis also looks at a novel signature representation for describing object sets for covariance descriptors, Covariance Profiles (CPs). CP is well suited for representing a set of similarly related objects. CPs posit that the covariance matrices, pertaining to a specific entity, share the same eigen-structure. Such a representation is not only compact but also eliminates the need to store all the training data. Experiments on images as well as videos for applications such as object-track clustering and headpose estimation is shown using CP.
In the second part, Human-gaze for interest point detection for video retargeting is explored. Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. This thesis proposes an interactive human-in-loop framework to model eye-movements and predicts visual saliency in yet-unseen frames. Eye-tracking and video content is used to model visual attention in a manner that accounts for temporal discontinuities due to sudden eye movements, noise and behavioural artefacts. Gaze buffering, for eye-gaze analysis and its fusion with content based features is proposed. The method uses eye-gaze information along with bottom-up and top-down saliency to boost the importance of image pixels. Our robust visual saliency prediction is instantiated for content aware Video Retargeting.
|
272 |
Knowledge-based support for surgical workflow analysis and recognition / Assistance fondée sur les connaissances pour l'analyse et la reconnaissance du flux de travail chirurgicalDergachyova, Olga 28 November 2017 (has links)
L'assistance informatique est devenue une partie indispensable pour la réalisation de procédures chirurgicales modernes. Le désir de créer une nouvelle génération de blocs opératoires intelligents a incité les chercheurs à explorer les problèmes de perception et de compréhension automatique de la situation chirurgicale. Dans ce contexte de prise de conscience de la situation, un domaine de recherche en plein essor adresse la reconnaissance automatique du flux chirurgical. De grands progrès ont été réalisés pour la reconnaissance des phases et des gestes chirurgicaux. Pourtant, il existe encore un vide entre ces deux niveaux de granularité dans la hiérarchie du processus chirurgical. Très peu de recherche se concentre sur les activités chirurgicales portant des informations sémantiques vitales pour la compréhension de la situation. Deux facteurs importants entravent la progression. Tout d'abord, la reconnaissance et la prédiction automatique des activités chirurgicales sont des tâches très difficiles en raison de la courte durée d'une activité, de leur grand nombre et d'un flux de travail très complexe et une large variabilité. Deuxièmement, une quantité très limitée de données cliniques ne fournit pas suffisamment d'informations pour un apprentissage réussi et une reconnaissance précise. À notre avis, avant de reconnaître les activités chirurgicales, une analyse soigneuse des éléments qui composent l'activité est nécessaire pour choisir les bons signaux et les capteurs qui faciliteront la reconnaissance. Nous avons utilisé une approche d'apprentissage profond pour évaluer l'impact de différents éléments sémantiques de l'activité sur sa reconnaissance. Grâce à une étude approfondie, nous avons déterminé un ensemble minimum d'éléments suffisants pour une reconnaissance précise. Les informations sur la structure anatomique et l'instrument chirurgical sont de première importance. Nous avons également abordé le problème de la carence en matière de données en proposant des méthodes de transfert de connaissances à partir d'autres domaines ou chirurgies. Les méthodes de ''word embedding'' et d'apprentissage par transfert ont été proposées. Ils ont démontré leur efficacité sur la tâche de prédiction d'activité suivante offrant une augmentation de précision de 22%. De plus, des observations pertinentes / Computer assistance became indispensable part of modern surgical procedures. Desire of creating new generation of intelligent operating rooms incited researchers to explore problems of automatic perception and understanding of surgical situations. Situation awareness includes automatic recognition of surgical workflow. A great progress was achieved in recognition of surgical phases and gestures. Yet, there is still a blank between these two granularity levels in the hierarchy of surgical process. Very few research is focused on surgical activities carrying important semantic information vital for situation understanding. Two important factors impede the progress. First, automatic recognition and prediction of surgical activities is a highly challenging task due to short duration of activities, their great number and a very complex workflow with multitude of possible execution and sequencing ways. Secondly, very limited amount of clinical data provides not enough information for successful learning and accurate recognition. In our opinion, before recognizing surgical activities a careful analysis of elements that compose activity is necessary in order to chose right signals and sensors that will facilitate recognition. We used a deep learning approach to assess the impact of different semantic elements of activity on its recognition. Through an in-depth study we determined a minimal set of elements sufficient for an accurate recognition. Information about operated anatomical structure and surgical instrument was shown to be the most important. We also addressed the problem of data deficiency proposing methods for transfer of knowledge from other domains or surgeries. The methods of word embedding and transfer learning were proposed. They demonstrated their effectiveness on the task of next activity prediction offering 22% increase in accuracy. In addition, pertinent observations about the surgical practice were made during the study. In this work, we also addressed the problem of insufficient and improper validation of recognition methods. We proposed new validation metrics and approaches for assessing the performance that connect methods to targeted applications and better characterize capacities of the method. The work described in this these aims at clearing obstacles blocking the progress of the domain and proposes a new perspective on the problem of surgical workflow recognition.
|
273 |
Representation learning in unsupervised domain translationLavoie-Marchildon, Samuel 12 1900 (has links)
Ce mémoire s'adresse au problème de traduction de domaine non-supervisée. La traduction non-supervisée cherche à traduire un domaine, le domaine source, à un domaine cible sans supervision. Nous étudions d'abord le problème en utilisant le formalisme du transport optimal. Dans un second temps, nous étudions le problème de transfert de sémantique à haut niveau dans les images en utilisant les avancés en apprentissage de représentations et de transfert d'apprentissages développés dans la communauté d'apprentissage profond.
Le premier chapitre est dévoué à couvrir les bases des concepts utilisés dans ce travail. Nous décrivons d'abord l'apprentissage de représentation en incluant la description de réseaux de neurones et de l'apprentissage supervisé et non supervisé. Ensuite, nous introduisons les modèles génératifs et le transport optimal. Nous terminons avec des notions pertinentes sur le transfert d'apprentissages qui seront utiles pour le chapitre 3.
Le deuxième chapitre présente \textit{Neural Wasserstein Flow}. Dans ce travail, nous construisons sur la théorie du transport optimal et démontrons que les réseaux de neurones peuvent être utilisés pour apprendre des barycentres de Wasserstein. De plus, nous montrons que les réseaux de neurones peuvent amortir n'importe quel barycentre, permettant d'apprendre une interpolation continue. Nous montrons aussi comment utiliser ces concepts dans le cadre des modèles génératifs. Finalement, nous montrons que notre approche permet d'interpoler des formes et des couleurs.
Dans le troisième chapitre, nous nous attaquons au problème de transfert de sémantique haut niveau dans les images. Nous montrons que ceci peut être obtenu simplement avec un GAN conditionné sur la représentation apprise par un réseau de neurone. Nous montrons aussi comment ce processus peut être rendu non-supervisé si la représentation apprise est un regroupement. Finalement, nous montrons que notre approche fonctionne sur la tâche de transfert de MNIST à SVHN.
Nous concluons en mettant en relation les deux contributions et proposons des travaux futures dans cette direction. / This thesis is concerned with the problem of unsupervised domain translation. Unsupervised domain translation is the task of transferring one domain, the source domain, to a target domain. We first study this problem using the formalism of optimal transport. Next, we study the problem of high-level semantic image to image translation using advances in representation learning and transfer learning.
The first chapter is devoted to reviewing the background concepts used in this work. We first describe representation learning including a description of neural networks and supervised and unsupervised representation learning. We then introduce generative models and optimal transport. We finish with the relevant notions of transfer learning that will be used in chapter 3.
The second chapter presents Neural Wasserstein Flow. In this work, we build on the theory of optimal transport and show that deep neural networks can be used to learn a Wasserstein barycenter of distributions. We further show how a neural network can amortize any barycenter yielding a continuous interpolation. We also show how this idea can be used in the generative model framework. Finally, we show results on shape interpolation and colour interpolation.
In the third chapter, we tackle the task of high level semantic image to image translation. We show that high level semantic image to image translation can be achieved by simply learning a conditional GAN with the representation learned from a neural network. We further show that we can make this process unsupervised if the representation learning is a clustering. Finally, we show that our approach works on the task of MNIST to SVHN.
|
274 |
Transfer Learning for Multi-surrogate-model OptimizationGvozdetska, Nataliia 14 January 2021 (has links)
Surrogate-model-based optimization is widely used to solve black-box optimization problems if the evaluation of a target system is expensive. However, when the optimization budget is limited to a single or several evaluations, surrogate-model-based optimization may not perform well due to the lack of knowledge about the search space. In this case, transfer learning helps to get a good optimization result due to the usage of experience from the previous optimization runs. And if the budget is not strictly limited, transfer learning is capable of improving the final results of black-box optimization.
The recent work in surrogate-model-based optimization showed that using multiple surrogates (i.e., applying multi-surrogate-model optimization) can be extremely efficient in complex search spaces. The main assumption of this thesis suggests that transfer learning can further improve the quality of multi-surrogate-model optimization. However, to the best of our knowledge, there exist no approaches to transfer learning in the multi-surrogate-model context yet.
In this thesis, we propose an approach to transfer learning for multi-surrogate-model optimization. It encompasses an improved method of defining the expediency of knowledge transfer, adapted multi-surrogate-model recommendation, multi-task learning parameter tuning, and few-shot learning techniques. We evaluated the proposed approach with a set of algorithm selection and parameter setting problems, comprising mathematical functions optimization and the traveling salesman problem, as well as random forest hyperparameter tuning over OpenML datasets. The evaluation shows that the proposed approach helps to improve the quality delivered by multi-surrogate-model optimization and ensures getting good optimization results even under a strictly limited budget.:1 Introduction
1.1 Motivation
1.2 Research objective
1.3 Solution overview
1.4 Thesis structure
2 Background
2.1 Optimization problems
2.2 From single- to multi-surrogate-model optimization
2.2.1 Classical surrogate-model-based optimization
2.2.2 The purpose of multi-surrogate-model optimization
2.2.3 BRISE 2.5.0: Multi-surrogate-model-based software product line for parameter tuning
2.3 Transfer learning
2.3.1 Definition and purpose of transfer learning
2.4 Summary of the Background
3 Related work
3.1 Questions to transfer learning
3.2 When to transfer: Existing approaches to determining the expediency of knowledge transfer
3.2.1 Meta-features-based approaches
3.2.2 Surrogate-model-based similarity
3.2.3 Relative landmarks-based approaches
3.2.4 Sampling landmarks-based approaches
3.2.5 Similarity threshold problem
3.3 What to transfer: Existing approaches to knowledge transfer
3.3.1 Ensemble learning
3.3.2 Search space pruning
3.3.3 Multi-task learning
3.3.4 Surrogate model recommendation
3.3.5 Few-shot learning
3.3.6 Other approaches to transferring knowledge
3.4 How to transfer (discussion): Peculiarities and required design decisions for the TL implementation in multi-surrogate-model setup
3.4.1 Peculiarities of model recommendation in multi-surrogate-model setup
3.4.2 Required design decisions in multi-task learning
3.4.3 Few-shot learning problem
3.5 Summary of the related work analysis
4 Transfer learning for multi-surrogate-model optimization
4.1 Expediency of knowledge transfer
4.1.1 Experiments’ similarity definition as a variability point
4.1.2 Clustering to filter the most suitable experiments
4.2 Dynamic model recommendation in multi-surrogate-model setup
4.2.1 Variable recommendation granularity
4.2.2 Model recommendation by time and performance criteria
4.3 Multi-task learning
4.4 Implementation of the proposed concept
4.5 Conclusion of the proposed concept
5 Evaluation
5.1 Benchmark suite
5.1.1 APSP for the meta-heuristics
5.1.2 Hyperparameter optimization of the Random Forest algorithm
5.2 Environment setup
5.3 Evaluation plan
5.4 Baseline evaluation
5.5 Meta-tuning for a multi-task learning approach
5.5.1 Revealing the dependencies between the parameters of multi-task learning and its performance
5.5.2 Multi-task learning performance with the best found parameters
5.6 Expediency determination approach
5.6.1 Expediency determination as a variability point
5.6.2 Flexible number of the most similar experiments with the help of clustering
5.6.3 Influence of the number of initial samples on the quality of expediency determination
5.7 Multi-surrogate-model recommendation
5.8 Few-shot learning
5.8.1 Transfer of the built surrogate models’ combination
5.8.2 Transfer of the best configuration
5.8.3 Transfer from different experiment instances
5.9 Summary of the evaluation results
6 Conclusion and Future work
|
275 |
Développement d'outils web de détection d'annotations manuscrites dans les imprimés anciensM'Begnan Nagnan, Arthur January 2021 (has links) (PDF)
No description available.
|
276 |
A Client-Server Solution for Detecting Guns in School Environment using Deep Learning TechniquesOlsson, Johan January 2019 (has links)
Att använda maskininlärning för att detektera vapen eliminerar en konstant mänsklig övervakning, vilket också kan leda till en lägre responstid till polis. I den här rapporten undersöks hur en vapendetektor kan konstrueras och byggas som en del av en klient-server-lösning. / With the progress of deep learning methods the last couple of years, object detectionrelated tasks are improving rapidly. Using object detection for detecting guns in schoolsremove the need for human supervision and hopefully reduces police response time. Thispaper investigates how a gun detection system can be built by reading frames locally andusing a server for detection. The detector is based on a pre-trained SSD model and throughtransfer learning is taught to recognize guns. The detector obtained an Average Precisionof 51.1% and the server response time for a frame of size 1920 x 1080 was 480 ms, but couldbe scaled down to 240 x 135 to reach 210 ms, without affecting the accuracy. A non-gunclass was implemented to reduce the number of false positives and on a set of 300 imagescontaining 165 guns, the number of false positives dropped from 21 to 11.
|
277 |
A Client-Server Solution for Detecting Guns in School Environment using Deep Learning TechniquesOlsson, Johan January 2019 (has links)
With the progress of deep learning methods the last couple of years, object detection related tasks are improving rapidly. Using object detection for detecting guns in schools remove the need for human supervision and hopefully reduces police response time. This paper investigates how a gun detection system can be built by reading frames locally and using a server for detection. The detector is based on a pre-trained SSD model and through transfer learning is taught to recognize guns. The detector obtained an Average Precision of 51.1% and the server response time for a frame of size 1920 x 1080 was 480 ms, but could be scaled down to 240 x 135 to reach 210 ms, without affecting the accuracy. A non-gun class was implemented to reduce the number of false positives and on a set of 300 images containing 165 guns, the number of false positives dropped from 21 to 11.
|
278 |
Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classificationFrenger, Tobias, Häggmark, Johan January 2020 (has links)
Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training.
|
279 |
Can Wizards be Polyglots: Towards a Multilingual Knowledge-grounded Dialogue SystemLiu, Evelyn Kai Yan January 2022 (has links)
The research of open-domain, knowledge-grounded dialogue systems has been advancing rapidly due to the paradigm shift introduced by large language models (LLMs). While the strides have improved the performance of the dialogue systems, the scope is mostly monolingual and English-centric. The lack of multilingual in-task dialogue data further discourages research in this direction. This thesis explores the use of transfer learning techniques to extend the English-centric dialogue systems to multiple languages. In particular, this work focuses on five typologically diverse languages, of which well-performing models could generalize to the languages that are part of the language family as the target languages, hence widening the accessibility of the systems to speakers of various languages. I propose two approaches: Multilingual Retrieval-Augmented Dialogue Model (xRAD) and Multilingual Generative Dialogue Model (xGenD). xRAD is adopted from a pre-trained multilingual question answering (QA) system and comprises a neural retriever and a multilingual generation model. Prior to the response generation, the retriever fetches relevant knowledge and conditions the retrievals to the generator as part of the dialogue context. This approach can incorporate knowledge into conversational agents, thus improving the factual accuracy of a dialogue model. In addition, xRAD has advantages over xGenD because of its modularity, which allows the fusion of QA and dialogue systems so long as appropriate pre-trained models are employed. On the other hand, xGenD takes advantage of an existing English dialogue model and performs a zero-shot cross-lingual transfer by training sequentially on English dialogue and multilingual QA datasets. Both automated and human evaluation were carried out to measure the models' performance against the machine translation baseline. The result showed that xRAD outperformed xGenD significantly and surpassed the baseline in most metrics, particularly in terms of relevance and engagingness. Whilst xRAD performance was promising to some extent, a detailed analysis revealed that the generated responses were not actually grounded in the retrieved paragraphs. Suggestions were offered to mitigate the issue, which hopefully could lead to significant progress of multilingual knowledge-grounded dialogue systems in the future.
|
280 |
Brain Tumor Grade Classification in MR images using Deep Learning / Klassificering av hjärntumör-grad i MR-bilder genom djupinlärningChatzitheodoridou, Eleftheria January 2022 (has links)
Brain tumors represent a diverse spectrum of cancer types which can induce grave complications and lead to poor life expectancy. Amongst the various brain tumor types, gliomas are primary brain tumors that compose about 30% of adult brain tumors. They are graded according to the World Health Organization into Grades 1 to 4 (G1-G4), where G4 is the highest grade with the highest malignancy and poor prognosis. Early diagnosis and classification of brain tumor grade is very important since it can improve the treatment procedure and (potentially) prolong a patient's life, since life expectancy largely depends on the level of malignancy and the tumor's histological characteristics. While clinicians have diagnostic tools they use as a gold standard, such as biopsies these are either invasive or costly. A widely used example of a non-invasive technique is magnetic resonance imaging, due to its ability to produce images with different soft-tissue contrast and high spatial resolution thanks to multiple imaging sequences. However, the examination of such images can be overwhelming for radiologists due to the overall large amount of data. Deep learning approaches, on the other hand, have shown great potential in brain tumor diagnosis and can assist radiologists in the decision-making process. In this thesis, brain tumor grade classification in MR images is performed using deep learning. Two popular pre-trained CNN models (VGG-19, ResNet50) were employed using single MR modalities and combinations of them to classify gliomas into three grades. All models were trained using data augmentation on 2D images from the TCGA dataset, which consisted of 3D volumes from 142 anonymized patients. The models were evaluated based on accuracy, precision, recall, F1-score, AUC score, as well as the Wilcoxon Signed-Rank test to establish if one classifier was statistically significantly better than the other. Since deep learning models are typically 'black box' models and can be difficult to interpret by non-experts, Gradient-weighted Class Activation Mapping (Grad-CAM) was used in order to address model explainability. For single modalities, VGG-19 displayed the highest performance with a test accuracy of 77.86%, whilst for combinations of two and three modalities T1ce, FLAIR and T2, T1ce, FLAIR were the best performing ones for VGG-19 with a test accuracy of 74.48%, 75.78%, respectively. Statistical comparisons indicated that for single MR modalities and combinations of two MR modalities, there was not a statistically significant difference between the two classifiers, whilst for combination of three modalities, one model was better than the other. However, given the small size of the test population, these comparisons have low statistical power. The use of Grad-CAM for model explainability indicated that ResNet50 was able to localize the tumor region better than VGG-19.
|
Page generated in 0.0971 seconds