• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 23
  • 23
  • 23
  • 9
  • 7
  • 6
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Distribution-Based Adversarial Multiple-Instance Learning

Chen, Sherry 27 January 2023 (has links)
No description available.
12

Multiple-Instance Learning from Distributions

Doran, Gary Brian, Jr. 06 February 2015 (has links)
No description available.
13

Multiple-Instance Feature Ranking

Latham, Andrew C. 26 January 2016 (has links)
No description available.
14

Mid-level representations for modeling objects / Représentations de niveau intermédiaire pour la modélisation d'objets

Tsogkas, Stavros 15 January 2016 (has links)
Dans cette thèse, nous proposons l'utilisation de représentations de niveau intermédiaire, et en particulier i) d'axes médians, ii) de parties d'objets, et iii) des caractéristiques convolutionnels, pour modéliser des objets.La première partie de la thèse traite de détecter les axes médians dans des images naturelles en couleur. Nous adoptons une approche d'apprentissage, en utilisant la couleur, la texture et les caractéristiques de regroupement spectral pour construire un classificateur qui produit une carte de probabilité dense pour la symétrie. Le Multiple Instance Learning (MIL) nous permet de traiter l'échelle et l'orientation comme des variables latentes pendant l'entraînement, tandis qu'une variante fondée sur les forêts aléatoires offre des gains significatifs en termes de temps de calcul.Dans la deuxième partie de la thèse, nous traitons de la modélisation des objets, utilisant des modèles de parties déformables (DPM). Nous développons une approche « coarse-to-fine » hiérarchique, qui utilise des bornes probabilistes pour diminuer le coût de calcul dans les modèles à grand nombre de composants basés sur HOGs. Ces bornes probabilistes, calculés de manière efficace, nous permettent d'écarter rapidement de grandes parties de l'image, et d'évaluer précisément les filtres convolutionnels seulement à des endroits prometteurs. Notre approche permet d'obtenir une accélération de 4-5 fois sur l'approche naïve, avec une perte minimale en performance.Nous employons aussi des réseaux de neurones convolutionnels (CNN) pour améliorer la détection d'objets. Nous utilisons une architecture CNN communément utilisée pour extraire les réponses de la dernière couche de convolution. Nous intégrons ces réponses dans l'architecture DPM classique, remplaçant les descripteurs HOG fabriqués à la main, et nous observons une augmentation significative de la performance de détection (~14.5% de mAP).Dans la dernière partie de la thèse nous expérimentons avec des réseaux de neurones entièrement convolutionnels pous la segmentation de parties d'objets.Nous réadaptons un CNN utilisé à l'état de l'art pour effectuer une segmentation sémantique fine de parties d'objets et nous utilisons un CRF entièrement connecté comme étape de post-traitement pour obtenir des bords fins.Nous introduirons aussi un à priori sur les formes à l'aide d'une Restricted Boltzmann Machine (RBM), à partir des segmentations de vérité terrain.Enfin, nous concevons une nouvelle architecture entièrement convolutionnel, et l'entraînons sur des données d'image à résonance magnétique du cerveau, afin de segmenter les différentes parties du cerveau humain.Notre approche permet d'atteindre des résultats à l'état de l'art sur les deux types de données. / In this thesis we propose the use of mid-level representations, and in particular i) medial axes, ii) object parts, and iii)convolutional features, for modelling objects.The first part of the thesis deals with detecting medial axes in natural RGB images. We adopt a learning approach, utilizing colour, texture and spectral clustering features, to build a classifier that produces a dense probability map for symmetry. Multiple Instance Learning (MIL) allows us to treat scale and orientation as latent variables during training, while a variation based on random forests offers significant gains in terms of running time.In the second part of the thesis we focus on object part modeling using both hand-crafted and learned feature representations. We develop a coarse-to-fine, hierarchical approach that uses probabilistic bounds for part scores to decrease the computational cost of mixture models with a large number of HOG-based templates. These efficiently computed probabilistic bounds allow us to quickly discard large parts of the image, and evaluate the exact convolution scores only at promising locations. Our approach achieves a $4times-5times$ speedup over the naive approach with minimal loss in performance.We also employ convolutional features to improve object detection. We use a popular CNN architecture to extract responses from an intermediate convolutional layer. We integrate these responses in the classic DPM pipeline, replacing hand-crafted HOG features, and observe a significant boost in detection performance (~14.5% increase in mAP).In the last part of the thesis we experiment with fully convolutional neural networks for the segmentation of object parts.We re-purpose a state-of-the-art CNN to perform fine-grained semantic segmentation of object parts and use a fully-connected CRF as a post-processing step to obtain sharp boundaries.We also inject prior shape information in our model through a Restricted Boltzmann Machine, trained on ground-truth segmentations.Finally, we train a new fully-convolutional architecture from a random initialization, to segment different parts of the human brain in magnetic resonance image data.Our methods achieve state-of-the-art results on both types of data.
15

MULTIPLE-INSTANCE AND ONE-CLASS RULE-BASED ALGORITHMS

Nguyen, Dat 17 April 2013 (has links)
In this work we developed rule-based algorithms for multiple-instance learning and one-class learning problems, namely, the mi-DS and OneClass-DS algorithms. Multiple-Instance Learning (MIL) is a variation of classical supervised learning where there is a need to classify bags (collection) of instances instead of single instances. The bag is labeled positive if at least one of its instances is positive, otherwise it is negative. One-class learning problem is also known as outlier or novelty detection problem. One-class classifiers are trained on data describing only one class and are used in situations where data from other classes are not available, and also for highly unbalanced data sets. Extensive comparisons and statistical testing of the two algorithms show that they generate models that perform on par with other state-of-the-art algorithms.
16

Uma abordagem visual para apoio ao aprendizado multi-instâncias / A visual approach for support to multi-instances learning

Quispe, Sonia Castelo 14 August 2015 (has links)
Aprendizado múltipla instância (MIL) é um paradigma de aprendizado de máquina que tem o objetivo de classificar um conjunto (bags) de objetos (instâncias), atribuindo rótulos só para os bags. Em MIL apenas os rótulos dos bags estão disponíveis para treinamento, enquanto os rótulos das instâncias são desconhecidos. Este problema é frequentemente abordado através da seleção de uma instância para representar cada bag, transformando um problema MIL em um problema de aprendizado supervisionado padrão. No entanto, não se conhecem abordagens que apoiem o usuário na realização desse processo. Neste trabalho, propomos uma visualização baseada em árvore multi-escala chamada MILTree que ajuda os usuários na realização de tarefas relacionadas com MIL, e também dois novos métodos de seleção de instâncias, chamados MILTree-SI e MILTree-Med, para melhorar os modelos MIL. MILTree é um layout de árvore de dois níveis, sendo que o primeiro projeta os bags, e o segundo nível projeta as instâncias pertencentes a cada bag, permitindo que o usuário explore e analise os dados multi-instância de uma forma intuitiva. Já os métodos de seleção de instãncias objetivam definir uma instância protótipo para cada bag, etapa crucial para a obtenção de uma alta precisão na classificação de dados multi-instância. Ambos os métodos utilizam o layout MILTree para atualizar visualmente as instâncias protótipo, e são capazes de lidar com conjuntos de dados binários e multi-classe. Para realizar a classificação dos bags, usamos um classificador SVM (Support Vector Machine). Além disso, com o apoio do layout MILTree também pode-se atualizar os modelos de classificação, alterando o conjunto de treinamento, a fim de obter uma melhor classificação. Os resultados experimentais validam a eficácia da nossa abordagem, mostrando que a mineração visual através da MILTree pode ajudar os usuários em cenários de classificação multi-instância. / Multiple-instance learning (MIL) is a paradigm of machine learning that aims at classifying a set (bags) of objects (instances), assigning labels only to the bags. In MIL, only the labels of bags are available for training while the labels of instances in bags are unknown. This problem is often addressed by selecting an instance to represent each bag, transforming a MIL problem into a standard supervised learning. However, there is no user support to assess this process. In this work, we propose a multi-scale tree-based visualization called MILTree that supports users in tasks related to MIL, and also two new instance selection methods called MILTree-SI and MILTree-Med to improve MIL models. MILTree is a two-level tree layout, where the first level projects bags, and the second level projects the instances belonging to each bag, allowing the user to understand the data multi-instance in an intuitive way. The developed selection methods define instance prototypes of each bag, which is important to achieve high accuracy in multi-instance classification. Both methods use the MILTree layout to visually update instance prototypes and can handle binary and multiple-class datasets. In order to classify the bags we use a SVM classifier. Moreover, with support of MILTree layout one can also update the classification model by changing the training set in order to obtain a better classifier. Experimental results validate the effectiveness of our approach, showing that visual mining by MILTree can help the users in MIL classification scenarios.
17

Hierarchical Bayesian Learning Approaches for Different Labeling Cases

Manandhar, Achut January 2015 (has links)
<p>The goal of a machine learning problem is to learn useful patterns from observations so that appropriate inference can be made from new observations as they become available. Based on whether labels are available for training data, a vast majority of the machine learning approaches can be broadly categorized into supervised or unsupervised learning approaches. In the context of supervised learning, when observations are available as labeled feature vectors, the learning process is a well-understood problem. However, for many applications, the standard supervised learning becomes complicated because the labels for observations are unavailable as labeled feature vectors. For example, in a ground penetrating radar (GPR) based landmine detection problem, the alarm locations are only known in 2D coordinates on the earth's surface but unknown for individual target depths. Typically, in order to apply computer vision techniques to the GPR data, it is convenient to represent the GPR data as a 2D image. Since a large portion of the image does not contain useful information pertaining to the target, the image is typically further subdivided into subimages along depth. These subimages at a particular alarm location can be considered as a set of observations, where the label is only available for the entire set but unavailable for individual observations along depth. In the absence of individual observation labels, for the purposes of training standard supervised learning approaches, observations both above and below the target are labeled as targets despite substantial differences in their characteristics. As a result, the label uncertainty with depth would complicate the parameter inference in the standard supervised learning approaches, potentially degrading their performance. In this work, we develop learning algorithms for three such specific scenarios where: (1) labels are only available for sets of independent and identically distributed (i.i.d.) observations, (2) labels are only available for sets of sequential observations, and (3) continuous correlated multiple labels are available for spatio-temporal observations. For each of these scenarios, we propose a modification in a traditional learning approach to improve its predictive accuracy. The first two algorithms are based on a set-based framework called as multiple instance learning (MIL) whereas the third algorithm is based on a structured output-associative regression (SOAR) framework. The MIL approaches are motivated by the landmine detection problem using GPR data, where the training data is typically available as labeled sets of observations or sets of sequences. The SOAR learning approach is instead motivated by the multi-dimensional human emotion label prediction problem using audio-visual data, where the training data is available in the form of multiple continuous correlated labels representing complex human emotions. In both of these applications, the unavailability of the training data as labeled featured vectors motivate developing new learning approaches that are more appropriate to model the data. </p><p>A large majority of the existing MIL approaches require computationally expensive parameter optimization, do not generalize well with time-series data, and are incapable of online learning. To overcome these limitations, for sets of observations, this work develops a nonparametric Bayesian approach to learning in MIL scenarios based on Dirichlet process mixture models. The nonparametric nature of the model and the use of non-informative priors remove the need to perform cross-validation based optimization while variational Bayesian inference allows for rapid parameter learning. The resulting approach is highly generalizable and also capable of online learning. For sets of sequences, this work integrates Hidden Markov models (HMMs) into an MIL framework and develops a new approach called the multiple instance hidden Markov model. The model parameters are inferred using variational Bayes, making the model tractable and computationally efficient. The resulting approach is highly generalizable and also capable of online learning. Similarly, most of the existing approaches developed for modeling multiple continuous correlated emotion labels do not model the spatio-temporal correlation among the emotion labels. Few approaches that do model the correlation fail to predict the multiple emotion labels simultaneously, resulting in latency during testing, and potentially compromising the effectiveness of implementing the approach in real-time scenario. This work integrates the output-associative relevance vector machine (OARVM) approach with the multivariate relevance vector machine (MVRVM) approach to simultaneously predict multiple emotion labels. The resulting approach performs competitively with the existing approaches while reducing the prediction time during testing, and the sparse Bayesian inference allows for rapid parameter learning. Experimental results on several synthetic datasets, benchmark datasets, GPR-based landmine detection datasets, and human emotion recognition datasets show that our proposed approaches perform comparably or better than the existing approaches.</p> / Dissertation
18

Uma abordagem visual para apoio ao aprendizado multi-instâncias / A visual approach for support to multi-instances learning

Sonia Castelo Quispe 14 August 2015 (has links)
Aprendizado múltipla instância (MIL) é um paradigma de aprendizado de máquina que tem o objetivo de classificar um conjunto (bags) de objetos (instâncias), atribuindo rótulos só para os bags. Em MIL apenas os rótulos dos bags estão disponíveis para treinamento, enquanto os rótulos das instâncias são desconhecidos. Este problema é frequentemente abordado através da seleção de uma instância para representar cada bag, transformando um problema MIL em um problema de aprendizado supervisionado padrão. No entanto, não se conhecem abordagens que apoiem o usuário na realização desse processo. Neste trabalho, propomos uma visualização baseada em árvore multi-escala chamada MILTree que ajuda os usuários na realização de tarefas relacionadas com MIL, e também dois novos métodos de seleção de instâncias, chamados MILTree-SI e MILTree-Med, para melhorar os modelos MIL. MILTree é um layout de árvore de dois níveis, sendo que o primeiro projeta os bags, e o segundo nível projeta as instâncias pertencentes a cada bag, permitindo que o usuário explore e analise os dados multi-instância de uma forma intuitiva. Já os métodos de seleção de instãncias objetivam definir uma instância protótipo para cada bag, etapa crucial para a obtenção de uma alta precisão na classificação de dados multi-instância. Ambos os métodos utilizam o layout MILTree para atualizar visualmente as instâncias protótipo, e são capazes de lidar com conjuntos de dados binários e multi-classe. Para realizar a classificação dos bags, usamos um classificador SVM (Support Vector Machine). Além disso, com o apoio do layout MILTree também pode-se atualizar os modelos de classificação, alterando o conjunto de treinamento, a fim de obter uma melhor classificação. Os resultados experimentais validam a eficácia da nossa abordagem, mostrando que a mineração visual através da MILTree pode ajudar os usuários em cenários de classificação multi-instância. / Multiple-instance learning (MIL) is a paradigm of machine learning that aims at classifying a set (bags) of objects (instances), assigning labels only to the bags. In MIL, only the labels of bags are available for training while the labels of instances in bags are unknown. This problem is often addressed by selecting an instance to represent each bag, transforming a MIL problem into a standard supervised learning. However, there is no user support to assess this process. In this work, we propose a multi-scale tree-based visualization called MILTree that supports users in tasks related to MIL, and also two new instance selection methods called MILTree-SI and MILTree-Med to improve MIL models. MILTree is a two-level tree layout, where the first level projects bags, and the second level projects the instances belonging to each bag, allowing the user to understand the data multi-instance in an intuitive way. The developed selection methods define instance prototypes of each bag, which is important to achieve high accuracy in multi-instance classification. Both methods use the MILTree layout to visually update instance prototypes and can handle binary and multiple-class datasets. In order to classify the bags we use a SVM classifier. Moreover, with support of MILTree layout one can also update the classification model by changing the training set in order to obtain a better classifier. Experimental results validate the effectiveness of our approach, showing that visual mining by MILTree can help the users in MIL classification scenarios.
19

Learning Techniques For Information Retrieval And Mining In High-dimensional Databases

Cheng, Hao 01 January 2009 (has links)
The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues. Specifically, my work has five main contributions as follows. The first contribution is a novel manifold learning algorithm, Local and Global Structures Preserving Projection (LGSPP), which defines salient low-dimensional representations for the high-dimensional data. A small number of projection directions are sought in order to properly preserve the local and global structures for the original data. Specifically, two groups of points are extracted for each individual point in the dataset: the first group contains the nearest neighbors of the point, and the other set are a few sampled points far away from the point. These two point sets respectively characterize the local and global structures with regard to the data point. The objective of the embedding is to minimize the distances of the points in each local neighborhood and also to disperse the points far away from their respective remote points in the original space. In this way, the relationships between the data in the original space are well preserved with little distortions. The second contribution is a new constrained clustering algorithm. Conventionally, clustering is an unsupervised learning problem, which systematically partitions a dataset into a small set of clusters such that data in each cluster appear similar to each other compared with those in other clusters. In the proposal, the partial human knowledge is exploited to find better clustering results. Two kinds of constraints are integrated into the clustering algorithm. One is the must-link constraint, indicating that the involved two points belong to the same cluster. On the other hand, the cannot-link constraint denotes that two points are not within the same cluster. Given the input constraints, data points are arranged into small groups and a graph is constructed to preserve the semantic relations between these groups. The assignment procedure makes a best effort to assign each group to a feasible cluster without violating the constraints. The theoretical analysis reveals that the probability of data points being assigned to the true clusters is much higher by the new proposal, compared to conventional methods. In general, the new scheme can produce clusters which can better match the ground truth and respect the semantic relations between points inferred from the constraints. The third contribution is a unified framework for partition-based dimension reduction techniques, which allows efficient similarity retrieval in the high-dimensional data space. Recent similarity search techniques, such as Piecewise Aggregate Approximation (PAA), Segmented Means (SMEAN) and Mean-Standard deviation (MS), prove to be very effective in reducing data dimensionality by partitioning dimensions into subsets and extracting aggregate values from each dimension subset. These partition-based techniques have many advantages including very efficient multi-phased pruning while being simple to implement. They, however, are not adaptive to different characteristics of data in diverse applications. In this study, a unified framework for these partition-based techniques is proposed and the issue of dimension partitions is examined in this framework. An investigation of the relationships of query selectivity and the dimension partition schemes discovers indicators which can predict the performance of a partitioning setting. Accordingly, a greedy algorithm is designed to effectively determine a good partitioning of data dimensions so that the performance of the reduction technique is robust with regard to different datasets. The fourth contribution is an effective similarity search technique in the database of point sets. In the conventional model, an object corresponds to a single vector. In the proposed study, an object is represented by a set of points. In general, this new representation can be used in many real-world applications and carries much more local information, but the retrieval and learning problems become very challenging. The Hausdorff distance is the common distance function to measure the similarity between two point sets, however, this metric is sensitive to outliers in the data. To address this issue, a novel similarity function is defined to better capture the proximity of two objects, in which a one-to-one mapping is established between vectors of the two objects. The optimal mapping minimizes the sum of distances between each paired points. The overall distance of the optimal matching is robust and has high retrieval accuracy. The computation of the new distance function is formulated into the classical assignment problem. The lower-bounding techniques and early-stop mechanism are also proposed to significantly accelerate the expensive similarity search process. The classification problem over the point-set data is called Multiple Instance Learning (MIL) in the machine learning community in which a vector is an instance and an object is a bag of instances. The fifth contribution is to convert the MIL problem into a standard supervised learning in the conventional vector space. Specially, feature vectors of bags are grouped into clusters. Each object is then denoted as a bag of cluster labels, and common patterns of each category are discovered, each of which is further reconstructed into a bag of features. Accordingly, a bag is effectively mapped into a feature space defined by the distances from this bag to all the derived patterns. The standard supervised learning algorithms can be applied to classify objects into pre-defined categories. The results demonstrate that the proposal has better classification accuracy compared to other state-of-the-art techniques. In the future, I will continue to explore my research in large-scale data analysis algorithms, applications and system developments. Especially, I am interested in applications to analyze the massive volume of online data.
20

Comparing Weak and Strong Annotation Strategies for Multiple Instance Learning in Digital Pathology / Jämförelse av svaga och starka annoteringsstrategier för flerinstansinlärning i digital patologi

Ciallella, Alice January 2022 (has links)
Prostate cancer is the second most diagnosed cancer worldwide and its diagnosis is done through visual inspection of biopsy tissue by a pathologist, who assigns a score used by doctors to decide on the treatment. However, the scoring system, the Gleason score, is affected by a high inter and intra-observer variability, lack of standardization, and overestimation. Therefore, there is a need for new solutions that can reduce these issues and provide a more accurate diagnosis. Nowadays, high-resolution digital images of biopsy tissues can be obtained and stored. The availability of such images, called Whole Slide Images (WSI) allows the implementation of Machine and Deep learning models to assist pathologists in diagnosing prostate cancer. Multiple-Instance Learning (MIL) has been shown to reach very promising results in digital pathology and binary classification of prostate cancer slides. However, such models require large datasets to ensure good performances. This project wants to investigate the use of small sets of strongly annotated images to create new large datasets to train a MIL model. To evaluate the performance of this approach, the standard dataset is used to obtain baselines for both binary and multiclass classification tasks. For multiclassification, the International Society of Urological Pathology (ISUP) score is used, which is derived from the Gleason score. The dataset used is the publicly available PANDA. In this project, only the slides from RadboudUniversity Medical Center are used, which consists of 5160 images. The MIL model chosen is the Clustering-constrained Attention Multiple instance learning (CLAM) model, which is publicly available. The standard approach reaches a Cohen’s kappa (κ) of 0.78 and 0.59 for binary and multiclass classification respectively. To evaluate the new approach, large datasets are created starting from different set sizes. Using 500 images, the model reaches a κ of 0.72 and 0.38 respectively. While for the binary the results of the two approaches are comparable, the new approach is not beneficial for multiclass classification tasks.

Page generated in 0.1122 seconds