Spelling suggestions: "subject:"zeroshot learning"" "subject:"derashot learning""
1 |
Visuo-Haptic recognition of daily-life objects : a contribution to the data scarcity problem / Reconnaissance visio-haptique des objets de la vie quotidienne : à partir de peu de données d'entraînementAbderrahmane, Zineb 29 November 2018 (has links)
Il est important pour les robots de pouvoir reconnaître les objets rencontrés dans la vie quotidienne afin d’assurer leur autonomie. De nos jours, les robots sont équipés de capteurs sophistiqués permettant d’imiter le sens humain du toucher. C’est ce qui permet aux robots interagissant avec les objets de percevoir les propriétés (telles la texture, la rigidité et la matière) nécessaires pour leur reconnaissance. Dans cette thèse, notre but est d’exploiter les données haptiques issues de l’interaction robot-objet afin de reconnaître les objets de la vie quotidienne, et cela en utilisant les algorithmes d’apprentissage automatique. Le problème qui se pose est la difficulté de collecter suffisamment de données haptiques afin d’entraîner les algorithmes d’apprentissage supervisé sur tous les objets que le robot doit reconnaître. En effet, les objets de la vie quotidienne sont nombreux et l’interaction physique entre le robot et chaque objet pour la collection des données prend beaucoup de temps et d’efforts. Pour traiter ce problème, nous développons un système de reconnaissance haptique permettant de reconnaître des objets à partir d'aucune, de une seule, ou de plusieurs données d’entraînement. Enfin, nous intégrons la vision afin d’améliorer la reconnaissance d'objets lorsque le robot est équipé de caméras. / Recognizing surrounding objects is an important skill for the autonomy of robots performing in daily-life. Nowadays robots are equipped with sophisticated sensors imitating the human sense of touch. This allows the recognition of an object based on information ensuing from robot-object physical interaction. Such information can include the object texture, compliance and material. In this thesis, we exploit haptic data to perform haptic recognition of daily life objects using machine learning techniques. The main challenge faced in our work is the difficulty of collecting a fair amount of haptic training data for all daily-life objects. This is due to the continuously growing number of objects and to the effort and time needed by the robot to physically interact with each object for data collection. We solve this problem by developing a haptic recognition framework capable of performing Zero-shot, One-shot and Multi-shot Learning. We also extend our framework by integrating vision to enhance the robot’s recognition performance, whenever such sense is available.
|
2 |
Machine learning for wireless signal learningSmith, Logan 30 April 2021 (has links)
Wireless networks are vulnerable to adversarial devices by spoofing the digital identity of valid wireless devices, allowing unauthorized devices access to the network. Instead of validating devices based on their digital identity, it is possible to use their unique "physical fingerprint" caused by changes in the signal due to deviations in wireless hardware. In this thesis, the physical fingerprint was validated by performing classification with complex-valued neural networks (NN), achieving a high level of accuracy in the process. Additionally, zero-shot learning (ZSL) was implemented to learn discriminant features to separate legitimate from unauthorized devices using outlier detection and then further separate every unauthorized device into their own cluster. This approach allows 42\% of unauthorized devices to be identified as unauthorized and correctly clustered
|
3 |
Few-Shot and Zero-Shot Learning for Information ExtractionGong, Jiaying 31 May 2024 (has links)
Information extraction aims to automatically extract structured information from unstructured texts.
Supervised information extraction requires large quantities of labeled training data, which is time-consuming and labor-intensive. This dissertation focuses on information extraction, especially relation extraction and attribute-value extraction in e-commerce, with few labeled (few-shot learning) or even no labeled (zero-shot learning) training data. We explore multi-source auxiliary information and novel learning techniques to integrate semantic auxiliary information with the input text to improve few-shot learning and zero-shot learning.
For zero-shot and few-shot relation extraction, the first method explores the existing data statistics and leverages auxiliary information including labels, synonyms of labels, keywords, and hypernyms of name entities to enable zero-shot learning for the unlabeled data. We build an automatic hypernym extraction framework to help acquire hypernyms of different entities directly from the web. The second method explores the relations between seen classes and new classes. We propose a prompt-based model with semantic knowledge augmentation to recognize new relation triplets under the zero-shot setting. In this method, we transform the problem of zero-shot learning into supervised learning with the generated augmented data for new relations. We design the prompts for training using the auxiliary information based on an external knowledge graph to integrate semantic knowledge learned from seen relations. The third work utilizes auxiliary information from images to enhance few-shot learning. We propose a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to learn a multi-modal representation jointly. To supplement the missing contexts in text, this work integrates both local features (object-level) and global features (pixel-level) from different modalities through image-guided attention, object-guided attention, and hybrid feature attention to solve the problem of sparsity and noise.
We then explore the few-shot and zero-shot aspect (attribute-value) extraction in the e-commerce application field. The first work studies the multi-label few-shot learning by leveraging the auxiliary information of anchor (label) and category description based on the prototypical networks, where the hybrid attention helps alleviate ambiguity and capture more informative semantics by calculating both the label-relevant and query-related weights. A dynamic threshold is learned by integrating the semantic information from support and query sets to achieve multi-label inference. The second work explores multi-label zero-shot learning via semi-inductive link prediction of the heterogeneous hypergraph. The heterogeneous hypergraph is built with higher-order relations (generated by the auxiliary information of user behavior data and product inventory data) to capture the complex and interconnected relations between users and the products. / Doctor of Philosophy / Information extraction is the process of automatically extracting structured information from unstructured sources, such as plain text documents, web pages, images, and so on. In this dissertation, we will first focus on general relation extraction, which aims at identifying and classifying semantic relations between entities. For example, given the sentence `Peter was born in Manchester.' in the newspaper, structured information (Peter, place of birth, Manchester) can be extracted. Then, we focus on attribute-value (aspect) extraction in the application field, which aims at extracting attribute-value pairs from product descriptions or images on e-commerce websites. For example, given a product description or image of a handbag, the brand (i.e. brand: Chanel), color (i.e. color: black), and other structured information can be extracted from the product, which provides a better search and recommendation experience for customers.
With the advancement of deep learning techniques, machines (models) trained with large quantities of example input data and the corresponding desired output data, can perform automatic information extraction tasks with high accuracy. Such example input data and the corresponding desired output data are also named annotated data. However, across technological innovation and social change, new data (i.e. articles, products, etc.) is being generated continuously. It is difficult, time-consuming, and costly to annotate large quantities of new data for training. In this dissertation, we explore several different methods to help the model achieve good performance with only a few (few-shot learning) or even no labeled data (zero-shot learning) for training.
Humans are born with no prior knowledge, but they can still recognize new information based on their existing knowledge by continuously learning. Inspired by how human beings learn new knowledge, we explore different auxiliary information that can benefit few-shot and zero-shot information extraction. We studied the auxiliary information from existing data statistics, knowledge graphs, corresponding images, labels, user behavior data, product inventory data, optical characters, etc. We enable few-shot and zero-shot learning by adding auxiliary information to the training data. For example, we study the data statistics of both labeled and unlabeled data. We use data augmentation and prompts to generate training samples for no labeled data. We utilize graphs to learn general patterns and representations that can potentially transfer to unseen nodes and relations. This dissertation provides the exploration of how utilizing the above different auxiliary information to help improve the performance of information extraction with few annotated or even no annotated training data.
|
4 |
Zero-shot Learning for Visual Recognition ProblemsNaha, Shujon January 2016 (has links)
In this thesis we discuss different aspects of zero-shot learning and propose solutions for three challenging visual recognition problems: 1) unknown object recognition from images 2) novel action recognition from videos and 3) unseen object segmentation. In all of these three problems, we have two different sets of classes, the “known classes”, which are used in the training phase and the “unknown classes” for which there is no training instance. Our proposed approach exploits the available semantic relationships between known and unknown object classes and use them to transfer the appearance models from known object classes to unknown object classes to recognize unknown objects. We also propose an approach to recognize novel actions from videos by learning a joint model that links videos and text. Finally, we present a ranking based approach for zero-shot object segmentation. We represent each unknown object class as a semantic ranking of all the known classes and use this semantic relationship to extend the segmentation model of known classes to segment unknown class objects. / October 2016
|
5 |
Zero-shot visual recognition via latent embedding learningWang, Qian January 2018 (has links)
Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.
|
6 |
Thought Recognition: Predicting and Decoding Brain Activity Using the Zero-Shot Learning ModelPalatucci, Mark M. 25 April 2011 (has links)
Machine learning algorithms have been successfully applied to learning classifiers in many domains such as computer vision, fraud detection, and brain image analysis. Typically, classifiers are trained to predict a class value given a set of labeled training data that includes all possible class values, and sometimes additional unlabeled training data.
Little research has been performed where the possible values for the class variable include values that have been omitted from the training examples. This is an important problem setting, especially in domains where the class value can take on many values, and the cost of obtaining labeled examples for all values is high.
We show that the key to addressing this problem is not predicting the held-out classes directly, but rather by recognizing the semantic properties of the classes such as their physical or functional attributes. We formalize this method as zero-shot learning and show that by utilizing semantic knowledge mined from large text corpora and crowd-sourced humans, we can discriminate classes without explicitly collecting examples of those classes for a training set.
As a case study, we consider this problem in the context of thought recognition, where the goal is to classify the pattern of brain activity observed from a non-invasive neural recording device. Specifically, we train classifiers to predict a specific concrete noun that a person is thinking about based on an observed image of that person’s neural activity.
We show that by predicting the semantic properties of the nouns such as “is it heavy?” and “is it edible?”, we can discriminate concrete nouns that people are thinking about, even without explicitly collecting examples of those nouns for a training set. Further, this allows discrimination of certain nouns that are within the same category with significantly higher accuracies than previous work.
In addition to being an important step forward for neural imaging and braincomputer- interfaces, we show that the zero-shot learning model has important implications for the broader machine learning community by providing a means for learning algorithms to extrapolate beyond their explicit training set.
|
7 |
Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video RepresentationsJanuary 2016 (has links)
abstract: High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.
Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.
In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
8 |
Zero Shot Learning for Visual Object Recognition with Generative ModelsJanuary 2020 (has links)
abstract: Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability to recognize object categories based on textual descriptions of objects and previous visual knowledge, the research community has extensively pursued the area of zero-shot learning. In this area of research, machine vision models are trained to recognize object categories that are not observed during the training process. Zero-shot learning models leverage textual information to transfer visual knowledge from seen object categories in order to recognize unseen object categories.
Generative models have recently gained popularity as they synthesize unseen visual features and convert zero-shot learning into a classical supervised learning problem. These generative models are trained using seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting towards seen classes, which leads to substandard performance in generalized zero-shot learning. To address this concern, this dissertation proposes a novel generative model that leverages the semantic relationship between seen and unseen categories and explicitly performs knowledge transfer from seen categories to unseen categories. Experiments were conducted on several benchmark datasets to demonstrate the efficacy of the proposed model for both zero-shot learning and generalized zero-shot learning. The dissertation also provides a unique Student-Teacher based generative model for zero-shot learning and concludes with future research directions in this area. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
9 |
Domain-Aware Continual Zero-Shot LearningYi, Kai 29 November 2021 (has links)
We introduce Domain Aware Continual Zero-Shot Learning (DACZSL), the task of visually recognizing images of unseen categories in unseen domains sequentially. We created DACZSL on top of the DomainNet dataset by dividing it into a sequence of tasks, where classes are incrementally provided on seen domains during training and evaluation is conducted on unseen domains for both seen and unseen classes. We also proposed a novel Domain-Invariant CZSL Network (DIN), which outperforms state-of-the-art baseline models that we adapted to DACZSL setting. We adopt a structure-based approach to alleviate forgetting knowledge from previous tasks with a small per-task private network in addition to a global shared network. To encourage the private network to capture the domain and task-specific representation, we train our model with a novel adversarial knowledge disentanglement setting to make our global network task-invariant and domain-invariant over all the tasks. Our method also learns a class-wise learnable prompt to obtain better class-level text representation, which is used to represent side information to enable zero-shot prediction of future unseen classes. Our code and benchmarks are made available at https://zero-shot-learning.github.io/daczsl.
|
10 |
VISUAL AND SEMANTIC KNOWLEDGE TRANSFER FOR NOVEL TASKSYe, Meng January 2019 (has links)
Data is a critical component in a supervised machine learning system. Many successful applications of learning systems on various tasks are based on a large amount of labeled data. For example, deep convolutional neural networks have surpassed human performance on ImageNet classification, which consists of millions of labeled images. However, one challenge in conventional supervised learning systems is their generalization ability. Once a model is trained on a specific dataset, it can only perform the task on those \emph{seen} classes and cannot be used for novel \emph{unseen} classes. In order to make the model work on new classes, one has to collect and label new data and then re-train the model. However, collecting data and labeling them is labor-intensive and costly, in some cases, it is even impossible. Also, there is an enormous amount of different tasks in the real world. It is not applicable to create a dataset for each of them. These problems raise the need for Transfer Learning, which is aimed at using data from the \emph{source} domain to improve the performance of a model on the \emph{target} domain, and these two domains have different data or different tasks. One specific case of transfer learning is Zero-Shot Learning. It deals with the situation where \emph{source} domain and \emph{target} domain have the same data distribution but do not have the same set of classes. For example, a model is given animal images of `cat' and `dog' for training and will be tested on classifying 'tiger' and 'wolf' images, which it has never seen. Different from conventional supervised learning, Zero-Shot Learning does not require training data in the \emph{target} domain to perform classification. This property gives ZSL the potential to be broadly applied in various applications where a system is expected to tackle unexpected situations. In this dissertation, we develop algorithms that can help a model effectively transfer visual and semantic knowledge learned from \emph{source} task to \emph{target} task. More specifically, first we develop a model that learns a uniform visual representation of semantic attributes, which help alleviate the domain shift problem in Zero-Shot Learning. Second, we develop an ensemble network architecture with a progressive training scheme, which transfers \emph{source} domain knowledge to the \emph{target} domain in an end-to-end manner. Lastly, we move a step beyond ZSL and explore Label-less Classification, which transfers knowledge from pre-trained object detectors into scene classification tasks. Our label-less classification takes advantage of word embeddings trained from unorganized online text, thus eliminating the need for expert-defined semantic attributes for each class. Through comprehensive experiments, we show that the proposed methods can effectively transfer visual and semantic knowledge between tasks, and achieve state-of-the-art performances on standard datasets. / Computer and Information Science
|
Page generated in 0.0724 seconds