Spelling suggestions: "subject:"weakly supervised 1earning"" "subject:"weakly supervised c1earning""
1 |
Robust Approaches for Learning with Noisy LabelsLu, Yangdi January 2022 (has links)
Deep neural networks (DNNs) have achieved remarkable success in data-intense applications, while such success relies heavily on massive and carefully labeled data. In practice, obtaining large-scale datasets with correct labels is often expensive, time-consuming, and sometimes even impossible. Common approaches of constructing datasets involve some degree of error-prone processes, such as automatic labeling or crowdsourcing, which inherently introduce noisy labels. It has been observed that noisy labels severely degrade the generalization performance of classifiers, especially the overparameterized (deep) neural networks. Therefore, studying noisy labels and developing techniques for training accurate classifiers in the presence of noisy labels is of great practical significance. In this thesis, we conduct a thorough study to fully understand LNL and provide a comprehensive error decomposition to reveal the core issue of LNL. We then point out that the core issue in LNL is that the empirical risk minimizer is unreliable, i.e., the DNNs are prone to overfitting noisy labels during training. To reduce the learning errors, we propose five different methods, 1) Co-matching: a framework consists of two networks to prevent the model from memorizing noisy labels; 2) SELC: a simple method to progressively correct noisy labels and refine the model; 3) NAL: a regularization method that automatically distinguishes the mislabeled samples and prevents the model from memorizing them; 4) EM-enhanced loss: a family of robust loss functions that not only mitigates the influence of noisy labels, but also avoids underfitting problem; 5) MixNN: a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels. Our experimental results demonstrate that the proposed approaches achieve comparable or better performance than the state-of-the-art approaches on benchmark datasets with simulated label noise and large-scale datasets with real-world label noise. / Dissertation / Doctor of Philosophy (PhD) / Machine Learning has been highly successful in data-intensive applications but is often hampered when datasets contain noisy labels. Recently, Learning with Noisy Labels (LNL) is proposed to tackle this problem. By using techniques from LNL, the models can still generalize well even when trained on the data containing noisy supervised information. In this thesis, we study this crucial problem and provide a comprehensive analysis to reveal the core issue of LNL. We then propose five different methods to effectively reduce the learning errors in LNL. We show that our approaches achieve comparable or better performance compared to the state-of-the-art approaches on benchmark datasets with simulated label noise and real-world noisy datasets.
|
2 |
Textová klasifikace s limitovanými trénovacími daty / Text classification with limited training dataLaitoch, Petr January 2021 (has links)
The aim of this thesis is to minimize manual work needed to create training data for text classification tasks. Various research areas including weak supervision, interactive learning and transfer learning explore how to minimize training data creation effort. We combine ideas from available literature in order to design a comprehensive text classification framework that employs keyword-based labeling instead of traditional text annotation. Keyword-based labeling aims to label texts based on keywords contained in the texts that are highly correlated with individual classification labels. As noted repeatedly in previous work, coming up with many new keywords is challenging for humans. To accommodate for this issue, we propose an interactive keyword labeler featuring the use of word similarity for guiding a user in keyword labeling. To verify the effectiveness of our novel approach, we implement a minimum viable prototype of the designed framework and use it to perform a user study on a restaurant review multi-label classification problem.
|
3 |
Towards Learning Compact Visual Embeddings using Deep Neural NetworksJanuary 2019 (has links)
abstract: Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.
Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.
Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019
|
4 |
Supervision Beyond Manual Annotations for Learning Visual RepresentationsDoersch, Carl 01 April 2016 (has links)
For both humans and machines, understanding the visual world requires relating new percepts with past experience. We argue that a good visual representation for an image should encode what makes it similar to other images, enabling the recall of associated experiences. Current machine implementations of visual representations can capture some aspects of similarity, but fall far short of human ability overall. Even if one explicitly labels objects in millions of images to tell the computer what should be considered similar—a very expensive procedure—the labels still do not capture everything that might be relevant. This thesis shows that one can often train a representation which captures similarity beyond what is labeled in a given dataset. That means we can begin with a dataset that has uninteresting labels, or no labels at all, and still build a useful representation. To do this, we propose to using pretext tasks: tasks that are not useful in and of themselves, but serve as an excuse to learn a more general-purpose representation. The labels for a pretext task can be inexpensive or even free. Furthermore, since this approach assumes training labels differ from the desired outputs, it can handle output spaces where the correct answer is ambiguous, and therefore impossible to annotate by hand. The thesis explores two broad classes of supervision. The first isweak image-level supervision, which is exploited to train mid-level discriminative patch classifiers. For example, given a dataset of street-level imagery labeled only with GPS coordinates, patch classifiers are trained to differentiate one specific geographical region (e.g. the city of Paris) from others. The resulting classifiers each automatically collect and associate a set of patches which all depict the same distinctive architectural element. In this way, we can learn to detect elements like balconies, signs, and lamps without annotations. The second type of supervision requires no information about images other than the pixels themselves. Instead, the algorithm is trained to predict the context around image patches. The context serves as a sort of weak label: to predict well, the algorithm must associate similar-looking patches which also have similar contexts. After training, the feature representation learned using this within-image context indeed captures visual similarity across images, which ultimately makes it useful for real tasks like object detection and geometry estimation.
|
5 |
Weakly supervised methods for learning actions and objectsPrest, Alessandro 04 September 2012 (has links) (PDF)
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been manually annotated by humans. While this paradigm allowed the field to tremendously progress in the last decade, it has now become one of its major bottlenecks. Teaching a new visual concept requires an expensive human annotation effort, limiting systems to scale to thousands of visual concepts from the few dozens that work today. The exponential growth of visual data available on the net represents an invaluable resource for visual learning algorithms and calls for new methods able to exploit this information to learn visual concepts without the need of major human annotation effort. As a first contribution, we introduce an approach for learning human actions as interac- tions between persons and objects in realistic images. By exploiting the spatial structure of human-object interactions, we are able to learn action models automatically from a set of still images annotated only with the action label (weakly-supervised). Extensive experimental evaluation demonstrates that our weakly-supervised approach achieves the same performance of popular fully-supervised methods despite using substantially less supervision. In the second part of this thesis we extend this reasoning to human-object interactions in realistic video and feature length movies. Popular methods represent actions with low- level features such as image gradients or optical flow. In our approach instead, interactions are modeled as the trajectory of the object wrt to the person position, providing a rich and natural description of actions. Our interaction descriptor is an informative cue on its own and is complimentary to traditional low-level features. Finally, in the third part we propose an approach for learning object detectors from real- world web videos (i.e. YouTube). As opposed to the standard paradigm of learning from still images annotated with bounding-boxes, we propose a technique to learn from videos known only to contain objects of a target class. We demonstrate that learning detec- tors from video alone already delivers good performance requiring much less supervision compared to training from images annotated with bounding boxes. We additionally show that training from a combination of weakly annotated videos and fully annotated still images improves over training from still images alone.
|
6 |
Modeling Structured Data with Invertible Generative ModelsLu, You 01 February 2022 (has links)
Data is complex and has a variety of structures and formats. Modeling datasets is a core problem in modern artificial intelligence. Generative models are machine learning models, which model datasets with probability distributions. Deep generative models combine deep learning with probability theory, so that can model complicated datasets with flexible models. They have become one of the most popular models in machine learning, and have been applied to many problems.
Normalizing flows are a novel class of deep generative models that allow efficient exact likelihood calculation, exact latent variable inference and sampling. They are constructed using functions whose inverse and Jacobian determinant can be efficiently computed. In this paper, we develop normalizing flow based generative models to model complex datasets. In general, data can be categorized to unlabeled data, labeled data, and weakly labeled data. We develop models for these three types of data, respectively.
First, we develop Woodbury transformations, which are flow layers for general unsupervised normalizing flows, and can improve the flexibility and scalability of current flow based models. Woodbury transformations achieve efficient invertibility via Woodbury matrix identity and efficient determinant calculation via Sylvester's determinant identity. In contrast with other operations used in state-of-the-art normalizing flows, Woodbury transformations enable (1) high-dimensional interactions, (2) efficient sampling, and (3) efficient likelihood evaluation. Other similar operations, such as 1x1 convolutions, emerging convolutions, or periodic convolutions allow at most two of these three advantages. In our experiments on multiple image datasets, we find that Woodbury transformations allow learning of higher-likelihood models than other flow architectures while still enjoying their efficiency advantages.
Second, we propose conditional Glow (c-Glow), a conditional generative flow for structured output learning, which is an advanced variant of supervised learning with structured labels. Traditional structured prediction models try to learn a conditional likelihood, i.e., p(y|x), to capture the relationship between the structured output y and the input features x. For many models, computing the likelihood is intractable. These models are therefore hard to train, requiring the use of surrogate objectives or variational inference to approximate likelihood. C-Glow benefits from the ability of flow-based models to compute p(y|x) exactly and efficiently. Learning with c-Glow does not require a surrogate objective or performing inference during training. Once trained, we can directly and efficiently generate conditional samples. We develop a sample-based prediction method, which can use this advantage to do efficient and effective inference. In our experiments, we test c-Glow on five different tasks. C-Glow outperforms the state-of-the-art baselines in some tasks and predicts comparable outputs in the other tasks. The results show that c-Glow is applicable to many different structured prediction problems.
Third, we develop label learning flows (LLF), which is
a general framework for weakly supervised learning problems. Our method is a generative model based on normalizing flows. The main idea of LLF is to optimize the conditional likelihoods of all possible labelings of the data within a constrained space defined by weak signals. We develop a training method for LLF that trains the conditional flow inversely and avoids estimating the labels. Once a model is trained, we can make predictions with a sampling algorithm. We apply LLF to three weakly supervised learning problems. Experiment results show that our method outperforms many state-of-the-art alternatives.
Our research shows the advantages and versatility of normalizing flows. / Doctor of Philosophy / Data is now more affordable and accessible. At the same time, datasets are more and more complicated. Modeling data is a key problem in modern artificial intelligence and data analysis. Deep generative models combine deep learning and probability theory, and are now a major way to model complex datasets. In this dissertation, we focus on a novel class of deep generative model--normalizing flows. They are becoming popular because of their abilities to efficiently compute exact likelihood, infer exact latent variables, and draw samples. We develop flow-based generative models for different types of data, i.e., unlabeled data, labeled data, and weakly labeled data. First, we develop Woodbury transformations for unsupervised normalizing flows, which improve the flexibility and expressiveness of flow based models. Second, we develop conditional generative flows for an advanced supervised learning problem -- structured output learning, which removes the need of approximations, and surrogate objectives in traditional (deep) structured prediction models. Third, we develop label learning flows, which is a general framework for weakly supervised learning problems. Our research improves the performance of normalizing flows, and extend the applications of them to many supervised and weakly supervised problems.
|
7 |
Human Action Localization And Recognition In Unconstrained VideosBoyraz, Hakan 01 January 2013 (has links)
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed TwoPoint Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison. Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single highlevel tag. From this data, the system is able to identify discriminative regions that correspond well iii to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a twostage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets.
|
8 |
Anti-Money Laundering with Unreliable LabelsHovstadius, David January 2024 (has links)
This thesis examines the effectiveness of Graph Neural Networks (GNNs) in detecting money laundering activities using transaction data with unreliable labels. It analyses how weakly supervised learning, specifically with GNNs, manages the challenges posed by incomplete and inaccurate labels in anti-money laundering (AML) detection. The thesis utilizes simulated transaction data to compare the performance of GNNs against statistical models. This was done by generating various datasets with the AMLSim tool, and evaluating the node classification performance of different statistical machine learning models and GNNs. The findings indicate that GNNs, due to their ability to find relationships in graph structures, demonstrate superior performance in scenarios with incomplete and inaccurate labels. The findings also indicate that inaccurate positive labels has a great negative effect on the performance, showing the label importance of money launderers in graph data. This research provides possible improvements for anti-money laundering detection by employing GNNs to manage challenges in real-world data.
|
9 |
Learning sensori-motor mappings using little knowledge : application to manipulation robotics / Apprentissage de couplages sensori-moteur en utilisant très peu d'informations : application à la robotique de manipulationDe La Bourdonnaye, François 18 December 2018 (has links)
La thèse consiste en l'apprentissage d'une tâche complexe de robotique de manipulation en utilisant très peu d'aprioris. Plus précisément, la tâche apprise consiste à atteindre un objet avec un robot série. L'objectif est de réaliser cet apprentissage sans paramètres de calibrage des caméras, modèles géométriques directs, descripteurs faits à la main ou des démonstrations d'expert. L'apprentissage par renforcement profond est une classe d'algorithmes particulièrement intéressante dans cette optique. En effet, l'apprentissage par renforcement permet d’apprendre une compétence sensori-motrice en se passant de modèles dynamiques. Par ailleurs, l'apprentissage profond permet de se passer de descripteurs faits à la main pour la représentation d'état. Cependant, spécifier les objectifs sans supervision humaine est un défi important. Certaines solutions consistent à utiliser des signaux de récompense informatifs ou des démonstrations d'experts pour guider le robot vers les solutions. D'autres consistent à décomposer l'apprentissage. Par exemple, l'apprentissage "petit à petit" ou "du simple au compliqué" peut être utilisé. Cependant, cette stratégie nécessite la connaissance de l'objectif en termes d'état. Une autre solution est de décomposer une tâche complexe en plusieurs tâches plus simples. Néanmoins, cela n'implique pas l'absence de supervision pour les sous tâches mentionnées. D'autres approches utilisant plusieurs robots en parallèle peuvent également être utilisés mais nécessite du matériel coûteux. Pour notre approche, nous nous inspirons du comportement des êtres humains. Ces derniers généralement regardent l'objet avant de le manipuler. Ainsi, nous décomposons la tâche d'atteinte en 3 sous tâches. La première tâche consiste à apprendre à fixer un objet avec un système de deux caméras pour le localiser dans l'espace. Cette tâche est apprise avec de l'apprentissage par renforcement profond et un signal de récompense faiblement supervisé. Pour la tâche suivante, deux compétences sont apprises en parallèle : la fixation d'effecteur et une fonction de coordination main-oeil. Comme la précédente tâche, un algorithme d'apprentissage par renforcement profond est utilisé avec un signal de récompense faiblement supervisé. Le but de cette tâche est d'être capable de localiser l'effecteur du robot à partir des coordonnées articulaires. La dernière tâche utilise les compétences apprises lors des deux précédentes étapes pour apprendre au robot à atteindre un objet. Cet apprentissage utilise les mêmes aprioris que pour les tâches précédentes. En plus de la tâche d'atteinte, un predicteur d'atteignabilité d'objet est appris. La principale contribution de ces travaux est l'apprentissage d'une tâche de robotique complexe en n'utilisant que très peu de supervision. / The thesis is focused on learning a complex manipulation robotics task using little knowledge. More precisely, the concerned task consists in reaching an object with a serial arm and the objective is to learn it without camera calibration parameters, forward kinematics, handcrafted features, or expert demonstrations. Deep reinforcement learning algorithms suit well to this objective. Indeed, reinforcement learning allows to learn sensori-motor mappings while dispensing with dynamics. Besides, deep learning allows to dispense with handcrafted features for the state spacerepresentation. However, it is difficult to specify the objectives of the learned task without requiring human supervision. Some solutions imply expert demonstrations or shaping rewards to guiderobots towards its objective. The latter is generally computed using forward kinematics and handcrafted visual modules. Another class of solutions consists in decomposing the complex task. Learning from easy missions can be used, but this requires the knowledge of a goal state. Decomposing the whole complex into simpler sub tasks can also be utilized (hierarchical learning) but does notnecessarily imply a lack of human supervision. Alternate approaches which use several agents in parallel to increase the probability of success can be used but are costly. In our approach,we decompose the whole reaching task into three simpler sub tasks while taking inspiration from the human behavior. Indeed, humans first look at an object before reaching it. The first learned task is an object fixation task which is aimed at localizing the object in the 3D space. This is learned using deep reinforcement learning and a weakly supervised reward function. The second task consists in learning jointly end-effector binocular fixations and a hand-eye coordination function. This is also learned using a similar set-up and is aimed at localizing the end-effector in the 3D space. The third task uses the two prior learned skills to learn to reach an object and uses the same requirements as the two prior tasks: it hardly requires supervision. In addition, without using additional priors, an object reachability predictor is learned in parallel. The main contribution of this thesis is the learning of a complex robotic task with weak supervision.
|
10 |
Does it have to be trees? : Data-driven dependency parsing with incomplete and noisy training dataSpreyer, Kathrin January 2011 (has links)
We present a novel approach to training data-driven dependency parsers on incomplete annotations. Our parsers are simple modifications of two well-known dependency parsers, the transition-based Malt parser and the graph-based MST parser. While previous work on parsing with incomplete data has typically couched the task in frameworks of unsupervised or semi-supervised machine learning, we essentially treat it as a supervised problem. In particular, we propose what we call agnostic parsers which hide all fragmentation in the training data from their supervised components.
We present experimental results with training data that was obtained by means of annotation projection. Annotation projection is a resource-lean technique which allows us to transfer annotations from one language to another within a parallel corpus. However, the output tends to be noisy and incomplete due to cross-lingual non-parallelism and error-prone word alignments. This makes the projected annotations a suitable test bed for our fragment parsers. Our results show that (i) dependency parsers trained on large amounts of projected annotations achieve higher accuracy than the direct projections, and that (ii) our agnostic fragment parsers perform roughly on a par with the original parsers which are trained only on strictly filtered, complete trees. Finally, (iii) when our fragment parsers are trained on artificially fragmented but otherwise gold standard dependencies, the performance loss is moderate even with up to 50% of all edges removed. / Wir präsentieren eine neuartige Herangehensweise an das Trainieren von daten-gesteuerten Dependenzparsern auf unvollständigen Annotationen. Unsere Parser sind einfache Varianten von zwei bekannten Dependenzparsern, nämlich des transitions-basierten Malt-Parsers sowie des graph-basierten MST-Parsers.
Während frühere Arbeiten zum Parsing mit unvollständigen Daten die Aufgabe meist in Frameworks für unüberwachtes oder schwach überwachtes maschinelles Lernen gebettet haben, behandeln wir sie im Wesentlichen mit überwachten Lernverfahren. Insbesondere schlagen wir "agnostische" Parser vor, die jegliche Fragmentierung der Trainingsdaten vor ihren daten-gesteuerten Lernkomponenten verbergen.
Wir stellen Versuchsergebnisse mit Trainingsdaten vor, die mithilfe von Annotationsprojektion gewonnen wurden. Annotationsprojektion ist ein Verfahren, das es uns erlaubt, innerhalb eines Parallelkorpus Annotationen von einer Sprache auf eine andere zu übertragen. Bedingt durch begrenzten crosslingualen Parallelismus und fehleranfällige Wortalinierung ist die Ausgabe des Projektionsschrittes jedoch üblicherweise verrauscht und unvollständig. Gerade dies macht projizierte Annotationen zu einer angemessenen Testumgebung für unsere fragment-fähigen Parser. Unsere Ergebnisse belegen, dass (i) Dependenzparser, die auf großen Mengen von projizierten Annotationen trainiert wurden, größere Genauigkeit erzielen als die zugrundeliegenden direkten Projektionen, und dass (ii) die Genauigkeit unserer agnostischen, fragment-fähigen Parser der Genauigkeit der Originalparser (trainiert auf streng gefilterten, komplett projizierten Bäumen) annähernd gleichgestellt ist. Schließlich zeigen wir mit künstlich fragmentierten Gold-Standard-Daten, dass (iii) der Verlust an Genauigkeit selbst dann bescheiden bleibt, wenn bis zu 50% aller Kanten in den Trainingsdaten fehlen.
|
Page generated in 0.0966 seconds