Spelling suggestions: "subject:"label noise"" "subject:"babel noise""
1 |
PAC-learning with label noiseJabbari Arfaee, Shahin 06 1900 (has links)
One of the main criticisms of previously studied label noise models in the PAC-learning framework is the inability of such models to represent the noise in real world data. In this thesis, we study this problem by introducing a framework for modeling label noise and suggesting four new
label noise models. We prove positive learnability results for these noise models in learning simple concept classes and discuss the difficulty of the problem of learning other interesting concept classes under these new models. In addition, we study the previous general learning algorithm,
called the minimum pn-disagreement strategy, that is used to prove learnability results in the PAC-learning framework both in the absence and presence of noise. Because of limitations of the minimum pn-disagreement strategy, we propose a new general learning algorithm called the minimum
nn-disagreement strategy. Finally, for both minimum pn-disagreement strategy and minimum nn-disagreement strategy, we investigate some properties of label noise models that provide sufficient conditions for the learnability of specific concept classes.
|
2 |
PAC-learning with label noiseJabbari Arfaee, Shahin Unknown Date
No description available.
|
3 |
Active Cleaning of Label Noise Using Support Vector MachinesEkambaram, Rajmadhan 19 June 2017 (has links)
Large scale datasets collected using non-expert labelers are prone to labeling errors. Errors in the given labels or label noise affect the classifier performance, classifier complexity, class proportions, etc. It may be that a relatively small, but important class needs to have all its examples identified. Typical solutions to the label noise problem involve creating classifiers that are robust or tolerant to errors in the labels, or removing the suspected examples using machine learning algorithms. Finding the label noise examples through a manual review process is largely unexplored due to the cost and time factors involved. Nevertheless, we believe it is the only way to create a label noise free dataset. This dissertation proposes a solution exploiting the characteristics of the Support Vector Machine (SVM) classifier and the sparsity of its solution representation to identify uniform random label noise examples in a dataset. Application of this method is illustrated with problems involving two real-world large scale datasets. This dissertation also presents results for datasets that contain adversarial label noise. A simple extension of this method to a semi-supervised learning approach is also presented. The results show that most mislabels are quickly and effectively identified by the approaches developed in this dissertation.
|
4 |
Classification et apprentissage actif à partir d'un flux de données évolutif en présence d'étiquetage incertain / Classification and active learning from evolving data streams in the presence of incertain labelingBouguelia, Mohamed-Rafik 25 March 2015 (has links)
Cette thèse traite de l’apprentissage automatique pour la classification de données. Afin de réduire le coût de l’étiquetage, l’apprentissage actif permet de formuler des requêtes pour demander à un opérateur d’étiqueter seulement quelques données choisies selon un critère d’importance. Nous proposons une nouvelle mesure d’incertitude qui permet de caractériser l’importance des données et qui améliore les performances de l’apprentissage actif par rapport aux mesures existantes. Cette mesure détermine le plus petit poids nécessaire à associer à une nouvelle donnée pour que le classifieur change sa prédiction concernant cette donnée. Nous intégrons ensuite le fait que les données à traiter arrivent en continu dans un flux de longueur infinie. Nous proposons alors un seuil d’incertitude adaptatif qui convient pour un apprentissage actif à partir d’un flux de données et qui réalise un compromis entre le nombre d’erreurs de classification et le nombre d’étiquettes de classes demandées. Les méthodes existantes d’apprentissage actif à partir de flux de données, sont initialisées avec quelques données étiquetées qui couvrent toutes les classes possibles. Cependant, dans de nombreuses applications, la nature évolutive du flux fait que de nouvelles classes peuvent apparaître à tout moment. Nous proposons une méthode efficace de détection active de nouvelles classes dans un flux de données multi-classes. Cette méthode détermine de façon incrémentale une zone couverte par les classes connues, et détecte les données qui sont extérieures à cette zone et proches entre elles, comme étant de nouvelles classes. Enfin, il est souvent difficile d’obtenir un étiquetage totalement fiable car l’opérateur humain est sujet à des erreurs d’étiquetage qui réduisent les performances du classifieur appris. Cette problématique a été résolue par l’introduction d’une mesure qui reflète le degré de désaccord entre la classe donnée manuellement et la classe prédite et une nouvelle mesure d’"informativité" permettant d’exprimer la nécessité pour une donnée mal étiquetée d’être réétiquetée par un opérateur alternatif / This thesis focuses on machine learning for data classification. To reduce the labelling cost, active learning allows to query the class label of only some important instances from a human labeller.We propose a new uncertainty measure that characterizes the importance of data and improves the performance of active learning compared to the existing uncertainty measures. This measure determines the smallest instance weight to associate with new data, so that the classifier changes its prediction concerning this data. We then consider a setting where the data arrives continuously from an infinite length stream. We propose an adaptive uncertainty threshold that is suitable for active learning in the streaming setting and achieves a compromise between the number of classification errors and the number of required labels. The existing stream-based active learning methods are initialized with some labelled instances that cover all possible classes. However, in many applications, the evolving nature of the stream implies that new classes can appear at any time. We propose an effective method of active detection of novel classes in a multi-class data stream. This method incrementally maintains a feature space area which is covered by the known classes, and detects those instances that are self-similar and external to that area as novel classes. Finally, it is often difficult to get a completely reliable labelling because the human labeller is subject to labelling errors that reduce the performance of the learned classifier. This problem was solved by introducing a measure that reflects the degree of disagreement between the manually given class and the predicted class, and a new informativeness measure that expresses the necessity for a mislabelled instance to be re-labeled by an alternative labeller
|
5 |
TOWARDS EFFICIENT AND ROBUST DEEP LEARNING :HANDLING DATA NON-IDEALITY AND LEVERAGINGIN-MEMORY COMPUTINGSangamesh D Kodge (19958580) 05 November 2024 (has links)
<p dir="ltr">Deep learning has achieved remarkable success across various domains, largely relyingon assumptions of ideal data conditions—such as balanced distributions, accurate labeling,and sufficient computational resources—that rarely hold in real-world applications. Thisthesis addresses the significant challenges posed by data non-idealities, including privacyconcerns, label noise, non-IID (Independent and Identically Distributed) data, and adversarial threats, which can compromise model performance and security. Additionally, weexplore the computational limitations inherent in traditional architectures by introducingin-memory computing techniques to mitigate the memory bottleneck in deep neural networkimplementations.We propose five novel contributions to tackle these challenges and enhance the efficiencyand robustness of deep learning models. First, we introduce a gradient-free machine unlearning algorithm to ensure data privacy by effectively forgetting specific classes withoutretraining. Second, we propose a corrective machine unlearning technique, SAP, that improves robustness against label noise using Scaled Activation Projections. Third, we presentthe Neighborhood Gradient Mean (NGM) method, a decentralized learning approach thatoptimizes performance on non-IID data with minimal computational overhead. Fourth, wedevelop TREND, an ensemble design strategy that leverages transferability metrics to enhance adversarial robustness. Finally, we explore an in-memory computing solution, IMAC,that enables energy-efficient and low-latency multiplication and accumulation operationsdirectly within 6T SRAM arrays.These contributions collectively advance the state-of-the-art in handling data non-idealitiesand computational efficiency in deep learning, providing robust, scalable, and privacypreserving solutions suitable for real-world deployment across diverse environments.</p>
|
6 |
Action Recognition in Still Images and Inference of Object AffordancesGirish, Deeptha S. 15 October 2020 (has links)
No description available.
|
Page generated in 0.0525 seconds