Spelling suggestions: "subject:"zeroshot learning"" "subject:"derashot learning""
11 |
Improving Zero-Shot Learning via Distribution EmbeddingsChalumuri, Vivek January 2020 (has links)
Zero-Shot Learning (ZSL) for image classification aims to recognize images from novel classes for which we have no training examples. A common approach to tackling such a problem is by transferring knowledge from seen to unseen classes using some auxiliary semantic information of class labels in the form of class embeddings. Most of the existing methods represent image features and class embeddings as point vectors, and such vector representation limits the expressivity in terms of modeling the intra-class variability of the image classes. In this thesis, we propose three novel ZSL methods that represent image features and class labels as distributions and learn their corresponding parameters as distribution embeddings. Therefore, the intra-class variability of image classes is better modeled. The first model is a Triplet model, where image features and class embeddings are projected as Gaussian distributions in a common space, and their associations are learned by metric learning. Next, we have a Triplet-VAE model, where two VAEs are trained with triplet based distributional alignment for ZSL. The third model is a simple Probabilistic Classifier for ZSL, which is inspired by energy-based models. When evaluated on the common benchmark ZSL datasets, the proposed methods result in an improvement over the existing state-of-the-art methods for both traditional ZSL and more challenging Generalized-ZSL (GZSL) settings. / Zero-Shot Learning (ZSL) för bildklassificering syftar till att känna igen bilder från nya klasser som vi inte har några utbildningsexempel för. Ett vanligt tillvägagångssätt för att ta itu med ett sådant problem är att överföra kunskap från sett till osynliga klasser med hjälp av någon semantisk information om klassetiketter i form av klassinbäddningar. De flesta av de befintliga metoderna representerar bildfunktioner och klassinbäddningar som punktvektorer, och sådan vektorrepresentation begränsar uttrycksförmågan när det gäller att modellera bildklassernas variation inom klass. I denna avhandling föreslår vi tre nya ZSL-metoder som representerar bildfunktioner och klassetiketter som distributioner och lär sig deras motsvarande parametrar som distributionsinbäddningar. Därför är bildklassernas variation inom klass bättre modellerad. Den första modellen är en Triplet-modell, där bildfunktioner och klassinbäddningar projiceras som Gaussiska fördelningar i ett gemensamt utrymme, och deras föreningar lärs av metrisk inlärning. Därefter har vi en Triplet-VAE-modell, där två VAEs tränas med tripletbaserad fördelningsinriktning för ZSL. Den tredje modellen är en enkel Probabilistic Classifier för ZSL, som är inspirerad av energibaserade modeller. När de utvärderas på de vanliga ZSLdatauppsättningarna, resulterar de föreslagna metoderna i en förbättring jämfört med befintliga toppmoderna metoder för både traditionella ZSL och mer utmanande Generalized-ZSL (GZSL) -inställningar.
|
12 |
Methods for data and user efficient annotation for multi-label topic classification / Effektiva annoteringsmetoder för klassificering med multipla klasserMiszkurka, Agnieszka January 2022 (has links)
Machine Learning models trained using supervised learning can achieve great results when a sufficient amount of labeled data is used. However, the annotation process is a costly and time-consuming task. There are many methods devised to make the annotation pipeline more user and data efficient. This thesis explores techniques from Active Learning, Zero-shot Learning, Data Augmentation domains as well as pre-annotation with revision in the context of multi-label classification. Active ’Learnings goal is to choose the most informative samples for labeling. As an Active Learning state-of-the-art technique Contrastive Active Learning was adapted to a multi-label case. Once there is some labeled data, we can augment samples to make the dataset more diverse. English-German-English Backtranslation was used to perform Data Augmentation. Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-shot via Textual Entailment was leveraged in this study and its usefulness for pre-annotation with revision was reported. The results on the Reviews of Electric Vehicle Charging Stations dataset show that it may be beneficial to use Active Learning and Data Augmentation in the annotation pipeline. Active Learning methods such as Contrastive Active Learning can identify samples belonging to the rarest classes while Data Augmentation via Backtranslation can improve performance especially when little training data is available. The results for Zero-shot Learning via Textual Entailment experiments show that this technique is not suitable for the production environment. / Klassificeringsmodeller som tränas med övervakad inlärning kan uppnå goda resultat när en tillräcklig mängd annoterad data används. Annoteringsprocessen är dock en kostsam och tidskrävande uppgift. Det finns många metoder utarbetade för att göra annoteringspipelinen mer användar- och dataeffektiv. Detta examensarbete utforskar tekniker från områdena Active Learning, Zero-shot Learning, Data Augmentation, samt pre-annotering, där annoterarens roll är att verifiera eller revidera en klass föreslagen av systemet. Målet med Active Learning är att välja de mest informativa datapunkterna för annotering. Contrastive Active Learning utökades till fallet där en datapunkt kan tillhöra flera klasser. Om det redan finns några annoterade data kan vi utöka datamängden med artificiella datapunkter, med syfte att göra datasetet mer mångsidigt. Engelsk-Tysk-Engelsk översättning användes för att konstruera sådana artificiella datapunkter. Zero-shot-inlärning är en teknik i vilken en maskininlärningsmodell kan göra förutsägelser för klasser som den inte var tränad att förutsäga. Zero-shot via Textual Entailment utnyttjades i denna studie för att utöka datamängden med artificiella datapunkter. Resultat från datamängden “Reviews of Electric Vehicle Charging ”Stations visar att det kan vara fördelaktigt att använda Active Learning och Data Augmentation i annoteringspipelinen. Active Learning-metoder som Contrastive Active Learning kan identifiera datapunkter som tillhör de mest sällsynta klasserna, medan Data Augmentation via Backtranslation kan förbättra klassificerarens prestanda, särskilt när få träningsdata finns tillgänglig. Resultaten för Zero-shot Learning visar att denna teknik inte är lämplig för en produktionsmiljö.
|
13 |
Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical RecordsRios, Anthony 01 January 2018 (has links)
Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders.
The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all.
In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding.
|
14 |
On Transfer Learning Techniques for Machine LearningDebasmit Das (8314707) 30 April 2020 (has links)
<pre><pre><p>
</p><p>Recent progress in machine learning has been mainly due to
the availability of large amounts of annotated data used for training complex
models with deep architectures. Annotating this training data becomes
burdensome and creates a major bottleneck in maintaining machine-learning
databases. Moreover, these trained models fail to generalize to new categories
or new varieties of the same categories. This is because new categories or new
varieties have data distribution different from the training data distribution.
To tackle these problems, this thesis proposes to develop a family of
transfer-learning techniques that can deal with different training (source) and
testing (target) distributions with the assumption that the availability of
annotated data is limited in the testing domain. This is done by using the
auxiliary data-abundant source domain from which useful knowledge is
transferred that can be applied to data-scarce target domain. This transferable
knowledge serves as a prior that biases target-domain predictions and prevents
the target-domain model from overfitting. Specifically, we explore structural
priors that encode relational knowledge between different data entities, which
provides more informative bias than traditional priors. The choice of the
structural prior depends on the information availability and the similarity
between the two domains. Depending on the domain similarity and the information
availability, we divide the transfer learning problem into four major
categories and propose different structural priors to solve each of these
sub-problems.</p><p>
</p><p>This thesis first focuses on the
unsupervised-domain-adaptation problem, where we propose to minimize domain
discrepancy by transforming labeled source-domain data to be close to unlabeled
target-domain data. For this problem,
the categories remain the same across the two domains and hence we assume that
the structural relationship between the source-domain samples is carried over
to the target domain. Thus, graph or hyper-graph is constructed as the
structural prior from both domains and a graph/hyper-graph matching formulation
is used to transform samples in the source domain to be closer to samples in
the target domain. An efficient optimization scheme is then proposed to tackle
the time and memory inefficiencies associated with the matching problem. The
few-shot learning problem is studied next, where we propose to transfer
knowledge from source-domain categories containing abundantly labeled data to
novel categories in the target domain that contains only few labeled data. The
knowledge transfer biases the novel category predictions and prevents the model
from overfitting. The knowledge is encoded using a neural-network-based prior
that transforms a data sample to its corresponding class prototype. This neural
network is trained from the source-domain data and applied to the target-domain
data, where it transforms the few-shot samples to the novel-class prototypes
for better recognition performance. The few-shot learning problem is then
extended to the situation, where we do not have access to the source-domain
data but only have access to the source-domain class prototypes. In this limited
information setting, parametric neural-network-based priors would overfit to
the source-class prototypes and hence we seek a non-parametric-based prior
using manifolds. A piecewise linear manifold is used as a structural prior to
fit the source-domain-class prototypes. This structure is extended to the
target domain, where the novel-class prototypes are found by projecting the
few-shot samples onto the manifold. Finally, the zero-shot learning problem is
addressed, which is an extreme case of the few-shot learning problem where we
do not have any labeled data in the target domain. However, we have high-level
information for both the source and target domain categories in the form of
semantic descriptors. We learn the relation between the sample space and the
semantic space, using a regularized neural network so that classification of
the novel categories can be carried out in a common representation space. This
same neural network is then used in the target domain to relate the two spaces.
In case we want to generate data for the novel categories in the target domain,
we can use a constrained generative adversarial network instead of a
traditional neural network. Thus, we use structural priors like graphs, neural
networks and manifolds to relate various data entities like samples, prototypes
and semantics for these different transfer learning sub-problems. We explore
additional post-processing steps like pseudo-labeling, domain adaptation and
calibration and enforce algorithmic and architectural constraints to further
improve recognition performance. Experimental results on standard transfer
learning image recognition datasets produced competitive results with respect
to previous work. Further experimentation and analyses of these methods
provided better understanding of machine learning as well.</p><p>
</p></pre></pre>
|
15 |
Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images / Learning and exploiting semantic representations for image classification and retrievalBucher, Maxime 27 November 2018 (has links)
Dans cette thèse nous étudions différentes questions relatives à la mise en pratique de modèles d'apprentissage profond. En effet malgré les avancées prometteuses de ces algorithmes en vision par ordinateur, leur emploi dans certains cas d'usage réels reste difficile. Une première difficulté est, pour des tâches de classification d'images, de rassembler pour des milliers de catégories suffisamment de données d'entraînement pour chacune des classes. C'est pourquoi nous proposons deux nouvelles approches adaptées à ce scénario d'apprentissage, appelé <<classification zero-shot>>.L'utilisation d'information sémantique pour modéliser les classes permet de définir les modèles par description, par opposition à une modélisation à partir d'un ensemble d'exemples, et rend possible la modélisation sans donnée de référence. L'idée fondamentale du premier chapitre est d'obtenir une distribution d'attributs optimale grâce à l'apprentissage d'une métrique, capable à la fois de sélectionner et de transformer la distribution des données originales. Dans le chapitre suivant, contrairement aux approches standards de la littérature qui reposent sur l'apprentissage d'un espace d'intégration commun, nous proposons de générer des caractéristiques visuelles à partir d'un générateur conditionnel. Une fois générés ces exemples artificiels peuvent être utilisés conjointement avec des données réelles pour l'apprentissage d'un classifieur discriminant. Dans une seconde partie de ce manuscrit, nous abordons la question de l'intelligibilité des calculs pour les tâches de vision par ordinateur. En raison des nombreuses et complexes transformations des algorithmes profonds, il est difficile pour un utilisateur d'interpréter le résultat retourné. Notre proposition est d'introduire un <<goulot d'étranglement sémantique>> dans le processus de traitement. La représentation de l'image est exprimée entièrement en langage naturel, tout en conservant l'efficacité des représentations numériques. L'intelligibilité de la représentation permet à un utilisateur d'examiner sur quelle base l'inférence a été réalisée et ainsi d'accepter ou de rejeter la décision suivant sa connaissance et son expérience humaine. / In this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <<zero-shot learning>>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <<semantic bottleneck>> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision.
|
16 |
Apprentissage automatique en ligne pour un dialogue homme-machine situé / Online learning for situated human-machine dialogueFerreira, Emmanuel 14 December 2015 (has links)
Un système de dialogue permet de doter la Machine de la capacité d'interagir de façon naturelle et efficace avec l'Homme. Dans cette thèse nous nous intéressons au développement d'un système de dialogue reposant sur des approches statistiques, et en particulier du cadre formel des Processus Décisionnel de Markov Partiellement Observable, en anglais Partially Observable Markov Decision Process (POMDP), qui à ce jour fait office de référence dans la littérature en ce qui concerne la gestion statistique du dialogue. Ce modèle permet à la fois une prise en compte améliorée de l'incertitude inhérente au traitement des données en provenance de l'utilisateur (notamment la parole) et aussi l'optimisation automatique de la politique d'interaction à partir de données grâce à l'apprentissage par renforcement, en anglais Reinforcement Learning (RL). Cependant, une des problématiques liées aux approches statistiques est qu'elles nécessitent le recours à une grande quantité de données d'apprentissage pour atteindre des niveaux de performances acceptables. Or, la collecte de telles données est un processus long et coûteux qui nécessite généralement, pour le cas du dialogue, la réalisation de prototypes fonctionnels avec l'intervention d'experts et/ou le développement de solution alternative comme le recours à la simulation d'utilisateurs. En effet, très peu de travaux considèrent à ce jour la possibilité d'un apprentissage de la stratégie de la Machine de part sa mise en situation de zéro (sans apprentissage préalable) face à de vrais utilisateurs. Pourtant cette solution présente un grand intérêt, elle permet par exemple d'inscrire le processus d'apprentissage comme une partie intégrante du cycle de vie d'un système lui offrant la capacité de s'adapter à de nouvelles conditions de façon dynamique et continue. Dans cette thèse, nous nous attacherons donc à apporter des solutions visant à rendre possible ce démarrage à froid du système mais aussi, à améliorer sa capacité à s'adapter à de nouvelles conditions (extension de domaine, changement d'utilisateur,...). Pour ce faire, nous envisagerons dans un premier temps l'utilisation de l'expertise du domaine (règles expertes) pour guider l'apprentissage initial de la politique d'interaction du système. De même, nous étudierons l'impact de la prise en compte de jugements subjectifs émis par l'utilisateur au fil de l'interaction dans l'apprentissage, notamment dans un contexte de changement de profil d'utilisateur où la politique préalablement apprise doit alors pouvoir s'adapter à de nouvelles conditions. Les résultats obtenus sur une tâche de référence montrent la possibilité d'apprendre une politique (quasi-)optimale en quelques centaines d'interactions, mais aussi que les informations supplémentaires considérées dans nos propositions sont à même d'accélérer significativement l'apprentissage et d'améliorer la tolérance aux bruits dans la chaîne de traitement. Dans un second temps nous nous intéresserons à réduire les coûts de développement d'un module de compréhension de la parole utilisé dans l'étiquetage sémantique d'un tour de dialogue. Pour cela, nous exploiterons les récentes avancées dans les techniques de projection des mots dans des espaces vectoriels continus conservant les propriétés syntactiques et sémantiques, pour généraliser à partir des connaissances initiales limitées de la tâche pour comprendre l'utilisateur. Nous nous attacherons aussi à proposer des solutions afin d'enrichir dynamiquement cette connaissance et étudier le rapport de cette technique avec les méthodes statistiques état de l'art. Là encore nos résultats expérimentaux montrent qu'il est possible d'atteindre des performances état de l'art avec très peu de données et de raffiner ces modèles ensuite avec des retours utilisateurs dont le coût peut lui-même être optimisé. / A dialogue system should give the machine the ability to interactnaturally and efficiently with humans. In this thesis, we focus on theissue of the development of stochastic dialogue systems. Thus, we especiallyconsider the Partially Observable Markov Decision Process (POMDP)framework which yields state-of-the-art performance on goal-oriented dialoguemanagement tasks. This model enables the system to cope with thecommunication ambiguities due to noisy channel and also to optimize itsdialogue management strategy directly from data with Reinforcement Learning (RL)methods.Considering statistical approaches often requires the availability of alarge amount of training data to reach good performance. However, corpora of interest are seldom readily available and collectingsuch data is both time consuming and expensive. For instance, it mayrequire a working prototype to initiate preliminary experiments with thesupport of expert users or to consider other alternatives such as usersimulation techniques.Very few studies to date have considered learning a dialogue strategyfrom scratch by interacting with real users, yet this solution is ofgreat interest. Indeed, considering the learning process as part of thelife cycle of a system offers a principle framework to dynamically adaptthe system to new conditions in an online and seamless fashion.In this thesis, we endeavour to provide solutions to make possible thisdialogue system cold start (nearly from scratch) but also to improve its ability to adapt to new conditions in operation (domain extension, new user profile, etc.).First, we investigate the conditions under which initial expertknowledge (such as expert rules) can be used to accelerate the policyoptimization of a learning agent. Similarly, we study how polarized userappraisals gathered throughout the course of the interaction can beintegrated into a reinforcement learning-based dialogue manager. Morespecifically, we discuss how this information can be cast intosocially-inspired rewards to speed up the policy optimisation for bothefficient task completion and user adaptation in an online learning setting.The results obtained on a reference task demonstrate that a(quasi-)optimal policy can be learnt in just a few hundred dialogues,but also that the considered additional information is able tosignificantly accelerate the learning as well as improving the noise tolerance.Second, we focus on reducing the development cost of the spoken language understanding module. For this, we exploit recent word embedding models(projection of words in a continuous vector space representing syntacticand semantic properties) to generalize from a limited initial knowledgeabout the dialogue task to enable the machine to instantly understandthe user utterances. We also propose to dynamically enrich thisknowledge with both active learning techniques and state-of-the-artstatistical methods. Our experimental results show that state-of-the-artperformance can be obtained with a very limited amount of in-domain andin-context data. We also show that we are able to refine the proposedmodel by exploiting user returns about the system outputs as well as tooptimize our adaptive learning with an adversarial bandit algorithm tosuccessfully balance the trade-off between user effort and moduleperformance.Finally, we study how the physical embodiment of a dialogue system in a humanoid robot can help the interaction in a dedicated Human-Robotapplication where dialogue system learning and testing are carried outwith real users. Indeed, in this thesis we propose an extension of thepreviously considered decision-making techniques to be able to take intoaccount the robot's awareness of the users' belief (perspective taking)in a RL-based situated dialogue management optimisation procedure.
|
17 |
FAZT: FEW AND ZERO-SHOT FRAMEWORK TO LEARN TEMPO-VISUAL EVENTS FROM LITTLE OR NO DATANaveen Madapana (11613925) 20 December 2021 (has links)
<div>Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Thus, their immediate application to the domains with few or no observations is limited. This is because of the lack of ability to rapidly generalize to new categories from a few examples or from high-level descriptions of categories. This can be attributed to the significant gap between the way machines represent knowledge and the way humans represent categories in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes an approach to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL). This work refers to this paradigm as the problem of general classification (GCP) and proposes a unified framework for GCP referred to as the Few and Zero-Shot Technique (FAZT). FAZT framework is an end-to-end approach that uses trainable 3D convolutional neural networks and recurrent neural networks to simultaneously optimize for both the semantic and the classification tasks. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition, however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2. Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-set scenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions.<br></div><div><br></div>
|
18 |
Zero/Few-Shot Text Classification : A Study of Practical Aspects and Applications / Textklassificering med Zero/Few-Shot Learning : En Studie om Praktiska Aspekter och ApplikationerÅslund, Jacob January 2021 (has links)
SOTA language models have demonstrated remarkable capabilities in tackling NLP tasks they have not been explicitly trained on – given a few demonstrations of the task (few-shot learning), or even none at all (zero-shot learning). The purpose of this Master’s thesis has been to investigate practical aspects and potential applications of zero/few-shot learning in the context of text classification. This includes topics such as combined usage with active learning, automated data labeling, and interpretability. Two different methods for zero/few-shot learning have been investigated, and the results indicate that: • Active learning can be used to marginally improve few-shot performance, but it seems to be mostly beneficial in settings with very few samples (e.g. less than 10). • Zero-shot learning can be used produce reasonable candidate labels for classes in a dataset, given knowledge of the classification task at hand. • It is difficult to trust the predictions of zero-shot text classification without access to a validation dataset, but IML methods such as saliency maps could find usage in debugging zero-shot models. / Ledande språkmodeller har uppvisat anmärkningsvärda förmågor i att lösa NLP-problem de inte blivit explicit tränade på – givet några exempel av problemet (few-shot learning), eller till och med inga alls (zero-shot learning). Syftet med det här examensarbetet har varit att undersöka praktiska aspekter och potentiella tillämpningar av zero/few-shot learning inom kontext av textklassificering. Detta inkluderar kombinerad användning med aktiv inlärning, automatiserad datamärkning, och tolkningsbarhet. Två olika metoder för zero/few-shot learning har undersökts, och resultaten indikerar att: • Aktiv inlärning kan användas för att marginellt förbättra textklassificering med few-shot learning, men detta verkar vara mest fördelaktigt i situationer med väldigt få datapunkter (t.ex. mindre än 10). • Zero-shot learning kan användas för att hitta lämpliga etiketter för klasser i ett dataset, givet kunskap om klassifikationsuppgiften av intresse. • Det är svårt att lita på robustheten i textklassificering med zero-shot learning utan tillgång till valideringsdata, men metoder inom tolkningsbar maskininlärning såsom saliency maps skulle kunna användas för att felsöka zero-shot modeller.
|
19 |
Learning and Recognizing The Hierarchical and Sequential Structure of Human ActivitiesCheng, Heng-Tze 01 December 2013 (has links)
The mission of the research presented in this thesis is to give computers the power to sense and react to human activities. Without the ability to sense the surroundings and understand what humans are doing, computers will not be able to provide active, timely, appropriate, and considerate services to the humans. To accomplish this mission, the work stands on the shoulders of two giants: Machine learning and ubiquitous computing. Because of the ubiquity of sensor-enabled mobile and wearable devices, there has been an emerging opportunity to sense, learn, and infer human activities from the sensor data by leveraging state-of-the-art machine learning algorithms.
While having shown promising results in human activity recognition, most existing approaches using supervised or semi-supervised learning have two fundamental problems. Firstly, most existing approaches require a large set of labeled sensor data for every target class, which requires a costly effort from human annotators. Secondly, an unseen new activity cannot be recognized if no training samples of that activity are available in the dataset. In light of these problems, a new approach in this area is proposed in our research.
This thesis presents our novel approach to address the problem of human activity recognition when few or no training samples of the target activities are available. The main hypothesis is that the problem can be solved by the proposed NuActiv activity recognition framework, which consists of modeling the hierarchical and sequential structure of human activities, as well as bringing humans in the loop of model training. By injecting human knowledge about the hierarchical nature of human activities, a semantic attribute representation and a two-layer attribute-based learning approach are designed. To model the sequential structure, a probabilistic graphical model is further proposed to take into account the temporal dependency of activities and attributes. Finally, an active learning algorithm is developed to reinforce the recognition accuracy using minimal user feedback.
The hypothesis and approaches presented in this thesis are validated by two case studies and real-world experiments on exercise activities and daily life activities. Experimental results show that the NuActiv framework can effectively recognize unseen new activities even without any training data, with up to 70-80% precision and recall rate. It also outperforms supervised learning with limited labeled data for the new classes. The results significantly advance the state of the art in human activity recognition, and represent a promising step towards bridging the gap between computers and humans.
|
20 |
Learning Image Classification and Retrieval Models / Apprentissage de modèles pour la classification et la recherche d'imagesMensink, Thomas 26 October 2012 (has links)
Nous assistons actuellement à une explosion de la quantité des données visuelles. Par exemple, plusieurs millions de photos sont partagées quotidiennement sur les réseaux sociaux. Les méthodes d'interprétation d'images vise à faciliter l'accès à ces données visuelles, d'une manière sémantiquement compréhensible. Dans ce manuscrit, nous définissons certains buts détaillés qui sont intéressants pour les taches d'interprétation d'images, telles que la classification ou la recherche d'images, que nous considérons dans les trois chapitres principaux. Tout d'abord, nous visons l'exploitation de la nature multimodale de nombreuses bases de données, pour lesquelles les documents sont composés d'images et de descriptions textuelles. Dans ce but, nous définissons des similarités entre le contenu visuel d'un document, et la description textuelle d'un autre document. Ces similarités sont calculées en deux étapes, tout d'abord nous trouvons les voisins visuellement similaires dans la base multimodale, puis nous utilisons les descriptions textuelles de ces voisins afin de définir une similarité avec la description textuelle de n'importe quel document. Ensuite, nous présentons une série de modèles structurés pour la classification d'images, qui encodent explicitement les interactions binaires entre les étiquettes (ou labels). Ces modèles sont plus expressifs que des prédicateurs d'étiquette indépendants, et aboutissent à des prédictions plus fiables, en particulier dans un scenario de prédiction interactive, où les utilisateurs fournissent les valeurs de certaines des étiquettes d'images. Un scenario interactif comme celui-ci offre un compromis intéressant entre la précision, et l'effort d'annotation manuelle requis. Nous explorons les modèles structurés pour la classification multi-étiquette d'images, pour la classification d'image basée sur les attributs, et pour l'optimisation de certaines mesures de rang spécifiques. Enfin, nous explorons les classifieurs par k plus proches voisins, et les classifieurs par plus proche moyenne, pour la classification d'images à grande échelle. Nous proposons des méthodes d'apprentissage de métrique efficaces pour améliorer les performances de classification, et appliquons ces méthodes à une base de plus d'un million d'images d'apprentissage, et d'un millier de classes. Comme les deux méthodes de classification permettent d'incorporer des classes non vues pendant l'apprentissage à un coût presque nul, nous avons également étudié leur performance pour la généralisation. Nous montrons que la classification par plus proche moyenne généralise à partir d'un millier de classes, sur dix mille classes à un coût négligeable, et les performances obtenus sont comparables à l'état de l'art. / We are currently experiencing an exceptional growth of visual data, for example, millions of photos are shared daily on social-networks. Image understanding methods aim to facilitate access to this visual data in a semantically meaningful manner. In this dissertation, we define several detailed goals which are of interest for the image understanding tasks of image classification and retrieval, which we address in three main chapters. First, we aim to exploit the multi-modal nature of many databases, wherein documents consists of images with a form of textual description. In order to do so we define similarities between the visual content of one document and the textual description of another document. These similarities are computed in two steps, first we find the visually similar neighbors in the multi-modal database, and then use the textual descriptions of these neighbors to define a similarity to the textual description of any document. Second, we introduce a series of structured image classification models, which explicitly encode pairwise label interactions. These models are more expressive than independent label predictors, and lead to more accurate predictions. Especially in an interactive prediction scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. We explore structured models for multi-label image classification, for attribute-based image classification, and for optimizing for specific ranking measures. Finally, we explore k-nearest neighbors and nearest-class mean classifiers for large-scale image classification. We propose efficient metric learning methods to improve classification performance, and use these methods to learn on a data set of more than one million training images from one thousand classes. Since both classification methods allow for the incorporation of classes not seen during training at near-zero cost, we study their generalization performances. We show that the nearest-class mean classification method can generalize from one thousand to ten thousand classes at negligible cost, and still perform competitively with the state-of-the-art.
|
Page generated in 0.0665 seconds