Global ETD Search

11	Incremental semi-supervised learning for anomalous trajectory detection Sillito, Rowland R. January 2010 (has links) The acquisition of a scene-specific normal behaviour model underlies many existing approaches to the problem of automated video surveillance. Since it is unrealistic to acquire a comprehensive set of labelled behaviours for every surveyed scenario, modelling normal behaviour typically corresponds to modelling the distribution of a large collection of unlabelled examples. In general, however, it would be desirable to be able to filter an unlabelled dataset to remove potentially anomalous examples. This thesis proposes a simple semi-supervised learning framework that could allow a human operator to efficiently filter the examples used to construct a normal behaviour model by providing occasional feedback: Specifically, the classification output of the model under construction is used to filter the incoming sequence of unlabelled examples so that human approval is requested before incorporating any example classified as anomalous, while all other examples are automatically used for training. A key component of the proposed framework is an incremental one-class learning algorithm which can be trained on a sequence of normal examples while allowing new examples to be classified at any stage during training. The proposed algorithm represents an initial set of training examples with a kernel density estimate, before using merging operations to incrementally construct a Gaussian mixture model while minimising an information-theoretic cost function. This algorithm is shown to outperform an existing state-of-the-art approach without requiring off-line model selection. Throughout this thesis behaviours are considered in terms of whole motion trajectories: in order to apply the proposed algorithm, trajectories must be encoded with fixed length vectors. To determine an appropriate encoding strategy, an empirical comparison is conducted to determine the relative class-separability afforded by several different trajectory representations for a range of datasets. The results obtained suggest that the choice of representation makes a small but consistent difference to class separability, indicating that cubic B-Spline control points (fitted using least-squares regression) provide a good choice for use in subsequent experiments. The proposed semi-supervised learning framework is tested on three different real trajectory datasets. In all cases the rate of human intervention requests drops steadily, reaching a usefully low level of 1% in one case. A further experiment indicates that once a sufficient number of interventions has been provided, a high level of classification performance can be achieved even if subsequent requests are ignored. The automatic incorporation of unlabelled data is shown to improve classification performance in all cases, while a high level of classification performance is maintained even when unlabelled data containing a high proportion of anomalous examples is presented. 004.33
12	Stable Mixing of Complete and Incomplete Information Corduneanu, Adrian, Jaakkola, Tommi 08 November 2001 (has links) An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems. AI semi-supervised learning incomplete data EM stable estimation
13	Validating Co-Training Models for Web Image Classification Zhang, Dell, Lee, Wee Sun 01 1900 (has links) Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present. / Singapore-MIT Alliance (SMA) Co-Training Machine Learning Multimedia Data Mining Semi-Supervised Learning
14	On surrogate supervision multi-view learning Jin, Gaole 03 December 2012 (has links) Data can be represented in multiple views. Traditional multi-view learning methods (i.e., co-training, multi-task learning) focus on improving learning performance using information from the auxiliary view, although information from the target view is sufficient for learning task. However, this work addresses a semi-supervised case of multi-view learning, the surrogate supervision multi-view learning, where labels are available on limited views and a classifier is obtained on the target view where labels are missing. In surrogate multi-view learning, one cannot obtain a classifier without information from the auxiliary view. To solve this challenging problem, we propose discriminative and generative approaches. / Graduation date: 2013 multi-view learning semi-supervised learning Supervised learning (Machine learning)
15	Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm Caponnetto, Andrea, Rosasco, Lorenzo, Vito, Ernesto De, Verri, Alessandro 27 May 2005 (has links) This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting. AI optimal rates effective dimension semi-supervised learning
16	Deep Domain Fusion for Adaptive Image Classification January 2019 (has links) abstract: Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data. In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks. / Dissertation/Thesis / Masters Thesis Computer Science 2019 Computer science Machine Learning Semi-Supervised Learning Unsupervised Domain Adaptation
17	New Directions in Gaussian Mixture Learning and Semi-supervised Learning Sinha, Kaushik 01 November 2010 (has links) No description available. Computer Science Gaussian Mixture Learning Semi-supervised Learning
18	Semi-supervised Information Fusion for Clustering, Classification and Detection Applications Li, Huaying January 2017 (has links) Information fusion techniques have been widely applied in many applications including clustering, classification, detection and etc. The major objective is to improve the performance using information derived from multiple sources as compared to using information obtained from any of the sources individually. In our previous work, we demonstrated the performance improvement of Electroencephalography(EEG) based seizure detection using information fusion. In the detection problem, the optimal fusion rule is usually derived under the assumption that local decisions are conditionally independent given the hypotheses. However, due to the fact that local detectors observe the same phenomenon, it is highly possible that local decisions are correlated. To address the issue of correlation, we implement the fusion rule sub-optimally by first estimating the unknown parameters under one of the hypotheses and then using them as known parameters to estimate the rest of unknown parameters. In the aforementioned scenario, the hypotheses are uniquely defined, i.e., all local detectors follow the same labeling convention. However, in certain applications, the regions of interest (decisions, hypotheses, clusters and etc.) are not unique, i.e., may vary locally (from sources to sources). In this case, information fusion becomes more complicated. Historically, this problem was first observed in classification and clustering. In classification applications, the category information is pre-defined and training data is required. Therefore, a classification problem can be viewed as a detection problem by considering the pre-defined classes as the hypotheses in detection. However, information fusion in clustering applications is more difficult due to the lack of prior information and the correspondence problem caused by symbolic cluster labels. In the literature, information fusion in clustering problem is usually referred to as clustering ensemble problem. Most of the existing clustering ensemble methods are unsupervised. In this thesis, we proposed two semi-supervised clustering ensemble algorithms (SEA). Similar to existing ensemble methods, SEA consists of two major steps: the generation and fusion of base clusterings. Analogous to distributed detection, we propose a distributed clustering system which consists of a base clustering generator and a decision fusion center. The role of the base clustering generator is to generate multiple base clusterings for the given data set. The role of the decision fusion center is to combine all base clusterings into a single consensus clustering. Although training data is not required by conventional clustering algorithms (usually unsupervised), in many applications expert opinions are always available to label a small portion of data observations. These labels can be utilized as the guidance information in the fusion process. Therefore, we design two operational modes for the fusion center according to the absence or presence of the training data. In the unsupervised mode, any existing unsupervised clustering ensemble methods can be implemented as the fusion rule. In the semi-supervised mode, the proposed semi-supervised clustering ensemble methods can be implemented. In addition, a parallel distributed clustering system is also proposed to reduce the computational times of clustering high-volume data sets. Moreover, we also propose a new cluster detection algorithm based on SEA. It is implemented in the system to provide feedback information. When data observations from a new class (other than existing training classes) are detected, signal is sent out to request new training data or switching from the semi-supervised mode to the unsupervised mode. / Thesis / Doctor of Philosophy (PhD)
19	Semi-Supervised Gait Recognition Mitra, Sirshapan 01 January 2024 (has links) (PDF) In this work, we examine semi-supervised learning for Gait recognition with a limited number of labeled samples. Our research focus on two distinct aspects for limited labels, 1)closed-set: with limited labeled samples per individual, and 2) open-set: with limited labeled individuals. We find open-set poses greater challenge compared to closed-set thus, having more labeled ids is important for performance than having more labeled samples per id. Moreover, obtaining labeled samples for a large number of individuals is usually more challenging, therefore limited id setup (closed-setup) is more important to study where most of the training samples belong to unknown ids. We further analyze that existing semi-supervised learning approaches are not well suited for scenario where unlabeled samples belong to novel ids. We propose a simple prototypical self-training approach to solve this problem, where, we integrate semi-supervised learning for closed set setting with self-training which can effectively utilize unlabeled samples from unknown ids. To further alleviate the challenges of limited labeled samples, we explore the role of synthetic data where we utilize diffusion model to generate samples from both known and unknown ids. We perform our experiments on two different Gait recognition benchmarks, CASIA-B and OUMVLP, and provide a comprehensive evaluation of the proposed method. The proposed approach is effective and generalizable for both closed and open-set settings. With merely 20% of labeled samples, we were able to achieve performance competitive to supervised methods utilizing 100% labeled samples while outperforming existing semi-supervised methods. Deep Learning Semi-Supervised Learning Gait Recognition Computer Sciences
20	A Semi-Supervised Predictive Model to Link Regulatory Regions to Their Target Genes Hafez, Dina Mohamed January 2015 (has links) <p>Next generation sequencing technologies have provided us with a wealth of data profiling a diverse range of biological processes. In an effort to better understand the process of gene regulation, two predictive machine learning models specifically tailored for analyzing gene transcription and polyadenylation are presented.</p><p>Transcriptional enhancers are specific DNA sequences that act as ``information integration hubs" to confer regulatory requirements on a given cell. These non-coding DNA sequences can regulate genes from long distances, or across chromosomes, and their relationships with their target genes are not limited to one-to-one. With thousands of putative enhancers and less than 14,000 protein-coding genes, detecting enhancer-gene pairs becomes a very complex machine learning and data analysis challenge. </p><p>In order to predict these specific-sequences and link them to genes they regulate, we developed McEnhancer. Using DNAseI sensitivity data and annotated in-situ hybridization gene expression clusters, McEnhancer builds interpolated Markov models to learn enriched sequence content of known enhancer-gene pairs and predicts unknown interactions in a semi-supervised learning algorithm. Classification of predicted relationships were 73-98% accurate for gene sets with varying levels of initial known examples. Predicted interactions showed a great overlap when compared to Hi-C identified interactions. Enrichment of known functionally related TF binding motifs, enhancer-associated histone modification marks, along with corresponding developmental time point was highly evident.</p><p>On the other hand, pre-mRNA cleavage and polyadenylation is an essential step for 3'-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage site (polyA site), which are frequently constrained by sequence content and position. More than 50\% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3'-UTRs, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered by the lack of appropriate tests for determining APAs with significant differences across multiple libraries. </p><p>We specified a linear effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed us to identify highly specific subsets of APA events in the individual tissue types. Predictive kernel-based SVM models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. The main cis-regulatory elements described for polyadenylation were found to be a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical PAS signal being nearly absent at brain-specific sites. We applied this model on SRp20 data, an RNA binding protein that might be involved in oncogene activation and obtained interesting insights. </p><p>Together, these two models contribute to the understanding of enhancers and the key role they play in regulating tissue-specific expression patterns during development, as well as provide a better understanding of the diversity of post-transcriptional gene regulation in multiple tissue types.</p> / Dissertation Computer science Bioinformatics Gene regulation Interpolated Markov model Machine learning Semi-supervised learning SVM Transcriptional enhancers

Search results