Spelling suggestions: "subject:"zeroshot"" "subject:"zero.so""
1 |
Visuo-Haptic recognition of daily-life objects : a contribution to the data scarcity problem / Reconnaissance visio-haptique des objets de la vie quotidienne : à partir de peu de données d'entraînementAbderrahmane, Zineb 29 November 2018 (has links)
Il est important pour les robots de pouvoir reconnaître les objets rencontrés dans la vie quotidienne afin d’assurer leur autonomie. De nos jours, les robots sont équipés de capteurs sophistiqués permettant d’imiter le sens humain du toucher. C’est ce qui permet aux robots interagissant avec les objets de percevoir les propriétés (telles la texture, la rigidité et la matière) nécessaires pour leur reconnaissance. Dans cette thèse, notre but est d’exploiter les données haptiques issues de l’interaction robot-objet afin de reconnaître les objets de la vie quotidienne, et cela en utilisant les algorithmes d’apprentissage automatique. Le problème qui se pose est la difficulté de collecter suffisamment de données haptiques afin d’entraîner les algorithmes d’apprentissage supervisé sur tous les objets que le robot doit reconnaître. En effet, les objets de la vie quotidienne sont nombreux et l’interaction physique entre le robot et chaque objet pour la collection des données prend beaucoup de temps et d’efforts. Pour traiter ce problème, nous développons un système de reconnaissance haptique permettant de reconnaître des objets à partir d'aucune, de une seule, ou de plusieurs données d’entraînement. Enfin, nous intégrons la vision afin d’améliorer la reconnaissance d'objets lorsque le robot est équipé de caméras. / Recognizing surrounding objects is an important skill for the autonomy of robots performing in daily-life. Nowadays robots are equipped with sophisticated sensors imitating the human sense of touch. This allows the recognition of an object based on information ensuing from robot-object physical interaction. Such information can include the object texture, compliance and material. In this thesis, we exploit haptic data to perform haptic recognition of daily life objects using machine learning techniques. The main challenge faced in our work is the difficulty of collecting a fair amount of haptic training data for all daily-life objects. This is due to the continuously growing number of objects and to the effort and time needed by the robot to physically interact with each object for data collection. We solve this problem by developing a haptic recognition framework capable of performing Zero-shot, One-shot and Multi-shot Learning. We also extend our framework by integrating vision to enhance the robot’s recognition performance, whenever such sense is available.
|
2 |
Beyond Supervised Learning: Applications and Implications of Zero-shot Text ClassificationBorst-Graetz, Janos 25 October 2024 (has links)
This dissertation explores the application of zero-shot text classification, a technique for categorizing texts without annotated data in the target domain.
A true zero-shot setting breaks with the conventions of the traditional supervised machine learning paradigm that relies on
quantitative in-domain evaluation for optimization, performance measurement, and model selection.
The dissertation summarizes existing research to build a theoretical foundation for zero-shot methods, emphasizing efficiency and transparency.
It benchmarks selected approaches across various tasks and datasets to understand their general performance, strengths, and weaknesses, mirroring the model selection process.
On this foundation, two case studies demonstrate the application of zero-shot text classification:
The first engages with historical German stock market reports, utilizing zero-shot methods for aspect-based sentiment classification.
The case study reveals that although there are qualitative differences between finetuned and zero-shot approaches,
the aggregated results are not easily distinguishable, sparking a discussion about the practical implications.
The second case study integrates zero-shot text classification into a civil engineering document management system,
showcasing how the flexibility of zero-shot models and the omission of the training process can benefit the development of prototype software,
at the cost of an unknown performance.
These findings indicate that, although zero-shot text classification works for the exemplary cases, the results are not generalizable.
Taking up the findings of these case studies, the dissertation discusses dilemmas and theoretical considerations that arise from omitting
the in-domain evaluation of applying zero-shot text classification.
It concludes by advocating a broader focus beyond traditional quantitative metrics in order to build trust in zero-shot text classification,
highlighting their practical utility as well as the necessity for further exploration as these technologies evolve.:1 Introduction
1.1 Problem Context
1.2 Related Work
1.3 Research Questions & Contribution
1.4 Author’s Publications
1.5 Structure of This Work
2 Research Context
2.1 The Current State of Text Classification
2.2 Efficiency
2.3 Approaches to Addressing Data Scarcity in Machine Learning
2.4 Challenges of Recent Developments
2.5 Model Sizes and Hardware Resources
2.6 Conclusion
3 Zero-shot Text Classification
3.1 Text Classification
3.2 State-of-the-Art in Text Classification
3.3 Neural Network Approaches to Data-Efficient Text Classification
3.4 Zero-shot Text Classification
3.5 Application
3.6 Requirements for Zero-shot Models
3.7 Approaches to Transfer Zero-shot
3.7.1 Terminology
3.7.2 Similarity-based and Siamese Networks
3.7.3 Language Model Token Predictions
3.7.4 Sentence Pair Classification
3.7.5 Instruction-following Models or Dialog-based Systems
3.8 Class Name Encoding in Text Classification
3.9 Approach Selection
3.10 Conclusion
4 Model Performance Survey
4.1 Experiments
4.1.1 Datasets
4.1.2 Model Selection
4.1.3 Hypothesis Templates
4.2 Zero-shot Model Evaluation
4.3 Dataset Complexity
4.4 Conclusion
5 Case Study: Historic German Stock Market Reports
5.1 Project
5.2 Motivation
5.3 Related Work
5.4 The Corpus and Dataset - Berliner Börsenzeitung
5.4.1 Corpus
5.4.2 Sentiment Aspects
5.4.3 Annotations
5.5 Methodology
5.5.1 Evaluation Approach
5.5.2 Trained Pipeline
5.5.3 Zero-shot Pipeline
5.5.4 Dictionary Pipeline
5.5.5 Tradeoffs
5.5.6 Label Space Definitions
5.6 Evaluation - Comparison of the Pipelines on BBZ
5.6.1 Sentence-based Sentiment
5.6.2 Aspect-based Sentiment
5.6.3 Qualitative Evaluation
5.7 Discussion and Conclusion
6 Case Study: Document Management in Civil Engineering
6.1 Project
6.2 Motivation
6.3 Related Work
6.4 The Corpus and Knowledge Graph
6.4.1 Data
6.4.2 BauGraph – The Knowledge Graph
6.5 Methodology
6.5.1 Document Insertion Pipeline
6.5.2 Frontend Integration
6.6 Discussion and Conclusion
7 MLMC
7.1 How it works
7.2 Motivation
7.3 Extensions of the Framework
7.4 Other Projects
7.4.1 Product Classification
7.4.2 Democracy Monitor
7.4.3 Climate Change Adaptation Finance
7.5 Conclusion
8 Discussion: The Five Dilemmas of Zero-shot
8.1 On Evaluation
8.2 The Five Dilemmas of Zero-shot
8.2.1 Dilemma of Evaluation or Are You Working at All?
8.2.2 Dilemma of Comparison or How Do I Get the Best Model?
8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing?
8.2.4 Dilemma of Interpretation or Am I Biased?
8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You?
8.3 Trust in Zero-shot Capabilities
8.4 Conclusion
9 Conclusion
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139
9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141
9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143
9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144
References 147
A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180
A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
181
B Uncertainty vs. Accuracy 182
C Declaration of Authorship 185
D Declaration: Use of AI-Tools 186
E Bibliographic Data 187 / In dieser Dissertation wird die Anwendung von Zero-Shot-Textklassifikation -- die Kategorisierung von Texten ohne annotierte Daten in der Anwendungsdomäne -- untersucht.
Ein echter Zero-Shot-Ansatz bricht mit den Konventionen des traditionellen überwachten maschinellen Lernens, welches auf einer quantitativen Evaluierung in der Zieldomäne
zur Optimierung,
Performanzmessung und Modellauswahl (model selection) basiert.
Eine Zusammenfassung bestehender Forschungsarbeiten bildet die theoretische Grundlage für die verwendeten Zero-Shot-Methoden, wobei Effizienz und Transparenz im Vordergrund stehen.
Ein Vergleich ausgewählter Ansätze mit verschiedenen Tasks und Datensätzen soll allgemeine Stärken und Schwächen aufzeigen und den Prozess der Modellauswahl widerspiegeln.
Auf dieser Grundlage wird die Anwendung der Zero-Shot-Textklassifikation anhand von zwei Fallstudien demonstriert:
Die erste befasst sich mit historischen deutschen Börsenberichten, wobei Zero-Shot zur aspekt-basierten Sentiment-Klassifikation eingesetzt wird.
Es zeigt sich, dass es zwar qualitative Unterschiede zwischen trainierten und Zero-Shot-Ansätzen gibt, dass die aggregierten Ergebnisse aber nicht leicht zu unterscheiden sind, was Überlegungen zu praktischen Implikationen anstößt.
Die zweite Fallstudie integriert Zero-Shot-Textklassifikation in ein Dokumentenmanagementsystem für das Bauwesen und zeigt, wie die Flexibilität von Zero-Shot-Modellen und der Wegfall des Trainingsprozesses die Entwicklung von Prototypen vereinfachen können -- mit dem Nachteil, dass die Genauigkeit des Modells unbekannt bleibt.
Die Ergebnisse zeigen, dass die Zero-Shot-Textklassifikation in den Beispielanwendungen zwar annähernd funktioniert, die Ergebnisse aber nicht leicht verallgemeinerbar sind.
Im Anschluss werden Dilemmata und theoretische Überlegungen erörtert, die sich aus dem Wegfall der Evaluierung in der Zieldomäne von Zero-Shot-Textklassifikation ergeben.
Abschließend wird ein breiterer Fokus über die traditionellen quantitativen Metriken hinaus vorgeschlagen, um Vertrauen in die Zero-Shot-Textklassifikation aufzubauen und
den praktischen Nutzen zu verbessern. Die Überlegungen zeigen aber auch die Notwendigkeit weiterer Forschung im Zuge der Weiterentwicklung dieser Technologien.:1 Introduction
1.1 Problem Context
1.2 Related Work
1.3 Research Questions & Contribution
1.4 Author’s Publications
1.5 Structure of This Work
2 Research Context
2.1 The Current State of Text Classification
2.2 Efficiency
2.3 Approaches to Addressing Data Scarcity in Machine Learning
2.4 Challenges of Recent Developments
2.5 Model Sizes and Hardware Resources
2.6 Conclusion
3 Zero-shot Text Classification
3.1 Text Classification
3.2 State-of-the-Art in Text Classification
3.3 Neural Network Approaches to Data-Efficient Text Classification
3.4 Zero-shot Text Classification
3.5 Application
3.6 Requirements for Zero-shot Models
3.7 Approaches to Transfer Zero-shot
3.7.1 Terminology
3.7.2 Similarity-based and Siamese Networks
3.7.3 Language Model Token Predictions
3.7.4 Sentence Pair Classification
3.7.5 Instruction-following Models or Dialog-based Systems
3.8 Class Name Encoding in Text Classification
3.9 Approach Selection
3.10 Conclusion
4 Model Performance Survey
4.1 Experiments
4.1.1 Datasets
4.1.2 Model Selection
4.1.3 Hypothesis Templates
4.2 Zero-shot Model Evaluation
4.3 Dataset Complexity
4.4 Conclusion
5 Case Study: Historic German Stock Market Reports
5.1 Project
5.2 Motivation
5.3 Related Work
5.4 The Corpus and Dataset - Berliner Börsenzeitung
5.4.1 Corpus
5.4.2 Sentiment Aspects
5.4.3 Annotations
5.5 Methodology
5.5.1 Evaluation Approach
5.5.2 Trained Pipeline
5.5.3 Zero-shot Pipeline
5.5.4 Dictionary Pipeline
5.5.5 Tradeoffs
5.5.6 Label Space Definitions
5.6 Evaluation - Comparison of the Pipelines on BBZ
5.6.1 Sentence-based Sentiment
5.6.2 Aspect-based Sentiment
5.6.3 Qualitative Evaluation
5.7 Discussion and Conclusion
6 Case Study: Document Management in Civil Engineering
6.1 Project
6.2 Motivation
6.3 Related Work
6.4 The Corpus and Knowledge Graph
6.4.1 Data
6.4.2 BauGraph – The Knowledge Graph
6.5 Methodology
6.5.1 Document Insertion Pipeline
6.5.2 Frontend Integration
6.6 Discussion and Conclusion
7 MLMC
7.1 How it works
7.2 Motivation
7.3 Extensions of the Framework
7.4 Other Projects
7.4.1 Product Classification
7.4.2 Democracy Monitor
7.4.3 Climate Change Adaptation Finance
7.5 Conclusion
8 Discussion: The Five Dilemmas of Zero-shot
8.1 On Evaluation
8.2 The Five Dilemmas of Zero-shot
8.2.1 Dilemma of Evaluation or Are You Working at All?
8.2.2 Dilemma of Comparison or How Do I Get the Best Model?
8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing?
8.2.4 Dilemma of Interpretation or Am I Biased?
8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You?
8.3 Trust in Zero-shot Capabilities
8.4 Conclusion
9 Conclusion
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139
9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141
9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143
9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144
References 147
A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180
A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
181
B Uncertainty vs. Accuracy 182
C Declaration of Authorship 185
D Declaration: Use of AI-Tools 186
E Bibliographic Data 187
|
3 |
ZERO-SHOT OBJECT DETECTION METHOD COMPARISON AND ANALYSISChe, Peining 30 August 2019 (has links)
No description available.
|
4 |
Improving Zero-Shot Learning via Distribution EmbeddingsChalumuri, Vivek January 2020 (has links)
Zero-Shot Learning (ZSL) for image classification aims to recognize images from novel classes for which we have no training examples. A common approach to tackling such a problem is by transferring knowledge from seen to unseen classes using some auxiliary semantic information of class labels in the form of class embeddings. Most of the existing methods represent image features and class embeddings as point vectors, and such vector representation limits the expressivity in terms of modeling the intra-class variability of the image classes. In this thesis, we propose three novel ZSL methods that represent image features and class labels as distributions and learn their corresponding parameters as distribution embeddings. Therefore, the intra-class variability of image classes is better modeled. The first model is a Triplet model, where image features and class embeddings are projected as Gaussian distributions in a common space, and their associations are learned by metric learning. Next, we have a Triplet-VAE model, where two VAEs are trained with triplet based distributional alignment for ZSL. The third model is a simple Probabilistic Classifier for ZSL, which is inspired by energy-based models. When evaluated on the common benchmark ZSL datasets, the proposed methods result in an improvement over the existing state-of-the-art methods for both traditional ZSL and more challenging Generalized-ZSL (GZSL) settings. / Zero-Shot Learning (ZSL) för bildklassificering syftar till att känna igen bilder från nya klasser som vi inte har några utbildningsexempel för. Ett vanligt tillvägagångssätt för att ta itu med ett sådant problem är att överföra kunskap från sett till osynliga klasser med hjälp av någon semantisk information om klassetiketter i form av klassinbäddningar. De flesta av de befintliga metoderna representerar bildfunktioner och klassinbäddningar som punktvektorer, och sådan vektorrepresentation begränsar uttrycksförmågan när det gäller att modellera bildklassernas variation inom klass. I denna avhandling föreslår vi tre nya ZSL-metoder som representerar bildfunktioner och klassetiketter som distributioner och lär sig deras motsvarande parametrar som distributionsinbäddningar. Därför är bildklassernas variation inom klass bättre modellerad. Den första modellen är en Triplet-modell, där bildfunktioner och klassinbäddningar projiceras som Gaussiska fördelningar i ett gemensamt utrymme, och deras föreningar lärs av metrisk inlärning. Därefter har vi en Triplet-VAE-modell, där två VAEs tränas med tripletbaserad fördelningsinriktning för ZSL. Den tredje modellen är en enkel Probabilistic Classifier för ZSL, som är inspirerad av energibaserade modeller. När de utvärderas på de vanliga ZSLdatauppsättningarna, resulterar de föreslagna metoderna i en förbättring jämfört med befintliga toppmoderna metoder för både traditionella ZSL och mer utmanande Generalized-ZSL (GZSL) -inställningar.
|
5 |
Machine learning for wireless signal learningSmith, Logan 30 April 2021 (has links)
Wireless networks are vulnerable to adversarial devices by spoofing the digital identity of valid wireless devices, allowing unauthorized devices access to the network. Instead of validating devices based on their digital identity, it is possible to use their unique "physical fingerprint" caused by changes in the signal due to deviations in wireless hardware. In this thesis, the physical fingerprint was validated by performing classification with complex-valued neural networks (NN), achieving a high level of accuracy in the process. Additionally, zero-shot learning (ZSL) was implemented to learn discriminant features to separate legitimate from unauthorized devices using outlier detection and then further separate every unauthorized device into their own cluster. This approach allows 42\% of unauthorized devices to be identified as unauthorized and correctly clustered
|
6 |
Few-Shot and Zero-Shot Learning for Information ExtractionGong, Jiaying 31 May 2024 (has links)
Information extraction aims to automatically extract structured information from unstructured texts.
Supervised information extraction requires large quantities of labeled training data, which is time-consuming and labor-intensive. This dissertation focuses on information extraction, especially relation extraction and attribute-value extraction in e-commerce, with few labeled (few-shot learning) or even no labeled (zero-shot learning) training data. We explore multi-source auxiliary information and novel learning techniques to integrate semantic auxiliary information with the input text to improve few-shot learning and zero-shot learning.
For zero-shot and few-shot relation extraction, the first method explores the existing data statistics and leverages auxiliary information including labels, synonyms of labels, keywords, and hypernyms of name entities to enable zero-shot learning for the unlabeled data. We build an automatic hypernym extraction framework to help acquire hypernyms of different entities directly from the web. The second method explores the relations between seen classes and new classes. We propose a prompt-based model with semantic knowledge augmentation to recognize new relation triplets under the zero-shot setting. In this method, we transform the problem of zero-shot learning into supervised learning with the generated augmented data for new relations. We design the prompts for training using the auxiliary information based on an external knowledge graph to integrate semantic knowledge learned from seen relations. The third work utilizes auxiliary information from images to enhance few-shot learning. We propose a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to learn a multi-modal representation jointly. To supplement the missing contexts in text, this work integrates both local features (object-level) and global features (pixel-level) from different modalities through image-guided attention, object-guided attention, and hybrid feature attention to solve the problem of sparsity and noise.
We then explore the few-shot and zero-shot aspect (attribute-value) extraction in the e-commerce application field. The first work studies the multi-label few-shot learning by leveraging the auxiliary information of anchor (label) and category description based on the prototypical networks, where the hybrid attention helps alleviate ambiguity and capture more informative semantics by calculating both the label-relevant and query-related weights. A dynamic threshold is learned by integrating the semantic information from support and query sets to achieve multi-label inference. The second work explores multi-label zero-shot learning via semi-inductive link prediction of the heterogeneous hypergraph. The heterogeneous hypergraph is built with higher-order relations (generated by the auxiliary information of user behavior data and product inventory data) to capture the complex and interconnected relations between users and the products. / Doctor of Philosophy / Information extraction is the process of automatically extracting structured information from unstructured sources, such as plain text documents, web pages, images, and so on. In this dissertation, we will first focus on general relation extraction, which aims at identifying and classifying semantic relations between entities. For example, given the sentence `Peter was born in Manchester.' in the newspaper, structured information (Peter, place of birth, Manchester) can be extracted. Then, we focus on attribute-value (aspect) extraction in the application field, which aims at extracting attribute-value pairs from product descriptions or images on e-commerce websites. For example, given a product description or image of a handbag, the brand (i.e. brand: Chanel), color (i.e. color: black), and other structured information can be extracted from the product, which provides a better search and recommendation experience for customers.
With the advancement of deep learning techniques, machines (models) trained with large quantities of example input data and the corresponding desired output data, can perform automatic information extraction tasks with high accuracy. Such example input data and the corresponding desired output data are also named annotated data. However, across technological innovation and social change, new data (i.e. articles, products, etc.) is being generated continuously. It is difficult, time-consuming, and costly to annotate large quantities of new data for training. In this dissertation, we explore several different methods to help the model achieve good performance with only a few (few-shot learning) or even no labeled data (zero-shot learning) for training.
Humans are born with no prior knowledge, but they can still recognize new information based on their existing knowledge by continuously learning. Inspired by how human beings learn new knowledge, we explore different auxiliary information that can benefit few-shot and zero-shot information extraction. We studied the auxiliary information from existing data statistics, knowledge graphs, corresponding images, labels, user behavior data, product inventory data, optical characters, etc. We enable few-shot and zero-shot learning by adding auxiliary information to the training data. For example, we study the data statistics of both labeled and unlabeled data. We use data augmentation and prompts to generate training samples for no labeled data. We utilize graphs to learn general patterns and representations that can potentially transfer to unseen nodes and relations. This dissertation provides the exploration of how utilizing the above different auxiliary information to help improve the performance of information extraction with few annotated or even no annotated training data.
|
7 |
Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images / Learning and exploiting semantic representations for image classification and retrievalBucher, Maxime 27 November 2018 (has links)
Dans cette thèse nous étudions différentes questions relatives à la mise en pratique de modèles d'apprentissage profond. En effet malgré les avancées prometteuses de ces algorithmes en vision par ordinateur, leur emploi dans certains cas d'usage réels reste difficile. Une première difficulté est, pour des tâches de classification d'images, de rassembler pour des milliers de catégories suffisamment de données d'entraînement pour chacune des classes. C'est pourquoi nous proposons deux nouvelles approches adaptées à ce scénario d'apprentissage, appelé <<classification zero-shot>>.L'utilisation d'information sémantique pour modéliser les classes permet de définir les modèles par description, par opposition à une modélisation à partir d'un ensemble d'exemples, et rend possible la modélisation sans donnée de référence. L'idée fondamentale du premier chapitre est d'obtenir une distribution d'attributs optimale grâce à l'apprentissage d'une métrique, capable à la fois de sélectionner et de transformer la distribution des données originales. Dans le chapitre suivant, contrairement aux approches standards de la littérature qui reposent sur l'apprentissage d'un espace d'intégration commun, nous proposons de générer des caractéristiques visuelles à partir d'un générateur conditionnel. Une fois générés ces exemples artificiels peuvent être utilisés conjointement avec des données réelles pour l'apprentissage d'un classifieur discriminant. Dans une seconde partie de ce manuscrit, nous abordons la question de l'intelligibilité des calculs pour les tâches de vision par ordinateur. En raison des nombreuses et complexes transformations des algorithmes profonds, il est difficile pour un utilisateur d'interpréter le résultat retourné. Notre proposition est d'introduire un <<goulot d'étranglement sémantique>> dans le processus de traitement. La représentation de l'image est exprimée entièrement en langage naturel, tout en conservant l'efficacité des représentations numériques. L'intelligibilité de la représentation permet à un utilisateur d'examiner sur quelle base l'inférence a été réalisée et ainsi d'accepter ou de rejeter la décision suivant sa connaissance et son expérience humaine. / In this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <<zero-shot learning>>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <<semantic bottleneck>> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision.
|
8 |
Zero-shot Learning for Visual Recognition ProblemsNaha, Shujon January 2016 (has links)
In this thesis we discuss different aspects of zero-shot learning and propose solutions for three challenging visual recognition problems: 1) unknown object recognition from images 2) novel action recognition from videos and 3) unseen object segmentation. In all of these three problems, we have two different sets of classes, the “known classes”, which are used in the training phase and the “unknown classes” for which there is no training instance. Our proposed approach exploits the available semantic relationships between known and unknown object classes and use them to transfer the appearance models from known object classes to unknown object classes to recognize unknown objects. We also propose an approach to recognize novel actions from videos by learning a joint model that links videos and text. Finally, we present a ranking based approach for zero-shot object segmentation. We represent each unknown object class as a semantic ranking of all the known classes and use this semantic relationship to extend the segmentation model of known classes to segment unknown class objects. / October 2016
|
9 |
Zero-shot visual recognition via latent embedding learningWang, Qian January 2018 (has links)
Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.
|
10 |
Thought Recognition: Predicting and Decoding Brain Activity Using the Zero-Shot Learning ModelPalatucci, Mark M. 25 April 2011 (has links)
Machine learning algorithms have been successfully applied to learning classifiers in many domains such as computer vision, fraud detection, and brain image analysis. Typically, classifiers are trained to predict a class value given a set of labeled training data that includes all possible class values, and sometimes additional unlabeled training data.
Little research has been performed where the possible values for the class variable include values that have been omitted from the training examples. This is an important problem setting, especially in domains where the class value can take on many values, and the cost of obtaining labeled examples for all values is high.
We show that the key to addressing this problem is not predicting the held-out classes directly, but rather by recognizing the semantic properties of the classes such as their physical or functional attributes. We formalize this method as zero-shot learning and show that by utilizing semantic knowledge mined from large text corpora and crowd-sourced humans, we can discriminate classes without explicitly collecting examples of those classes for a training set.
As a case study, we consider this problem in the context of thought recognition, where the goal is to classify the pattern of brain activity observed from a non-invasive neural recording device. Specifically, we train classifiers to predict a specific concrete noun that a person is thinking about based on an observed image of that person’s neural activity.
We show that by predicting the semantic properties of the nouns such as “is it heavy?” and “is it edible?”, we can discriminate concrete nouns that people are thinking about, even without explicitly collecting examples of those nouns for a training set. Further, this allows discrimination of certain nouns that are within the same category with significantly higher accuracies than previous work.
In addition to being an important step forward for neural imaging and braincomputer- interfaces, we show that the zero-shot learning model has important implications for the broader machine learning community by providing a means for learning algorithms to extrapolate beyond their explicit training set.
|
Page generated in 0.0351 seconds