Global ETD Search

31	Incremental semi-supervised learning for anomalous trajectory detection Sillito, Rowland R. January 2010 (has links) The acquisition of a scene-specific normal behaviour model underlies many existing approaches to the problem of automated video surveillance. Since it is unrealistic to acquire a comprehensive set of labelled behaviours for every surveyed scenario, modelling normal behaviour typically corresponds to modelling the distribution of a large collection of unlabelled examples. In general, however, it would be desirable to be able to filter an unlabelled dataset to remove potentially anomalous examples. This thesis proposes a simple semi-supervised learning framework that could allow a human operator to efficiently filter the examples used to construct a normal behaviour model by providing occasional feedback: Specifically, the classification output of the model under construction is used to filter the incoming sequence of unlabelled examples so that human approval is requested before incorporating any example classified as anomalous, while all other examples are automatically used for training. A key component of the proposed framework is an incremental one-class learning algorithm which can be trained on a sequence of normal examples while allowing new examples to be classified at any stage during training. The proposed algorithm represents an initial set of training examples with a kernel density estimate, before using merging operations to incrementally construct a Gaussian mixture model while minimising an information-theoretic cost function. This algorithm is shown to outperform an existing state-of-the-art approach without requiring off-line model selection. Throughout this thesis behaviours are considered in terms of whole motion trajectories: in order to apply the proposed algorithm, trajectories must be encoded with fixed length vectors. To determine an appropriate encoding strategy, an empirical comparison is conducted to determine the relative class-separability afforded by several different trajectory representations for a range of datasets. The results obtained suggest that the choice of representation makes a small but consistent difference to class separability, indicating that cubic B-Spline control points (fitted using least-squares regression) provide a good choice for use in subsequent experiments. The proposed semi-supervised learning framework is tested on three different real trajectory datasets. In all cases the rate of human intervention requests drops steadily, reaching a usefully low level of 1% in one case. A further experiment indicates that once a sufficient number of interventions has been provided, a high level of classification performance can be achieved even if subsequent requests are ignored. The automatic incorporation of unlabelled data is shown to improve classification performance in all cases, while a high level of classification performance is maintained even when unlabelled data containing a high proportion of anomalous examples is presented. 004.33
32	Bayesian optimization for selecting training and validation data for supervised machine learning : using Gaussian processes both to learn the relationship between sets of training data and model performance, and to estimate model performance over the entire problem domain / Bayesiansk optimering för val av träning- och valideringsdata för övervakad maskininlärning Bergström, David January 2019 (has links) Validation and verification in machine learning is an open problem which becomes increasingly important as its applications becomes more critical. Amongst the applications are autonomous vehicles and medical diagnostics. These systems all needs to be validated before being put into use or else the consequences might be fatal. This master’s thesis focuses on improving both learning and validating machine learning models in cases where data can either be generated or collected based on a chosen position. This can for example be taking and labeling photos at the position or running some simulation which generates data from the chosen positions. The approach is twofold. The first part concerns modeling the relationship between any fixed-size set of positions and some real valued performance measure. The second part involves calculating such a performance measure by estimating the performance over a region of positions. The result is two different algorithms, both variations of Bayesian optimization. The first algorithm models the relationship between a set of points and some performance measure while also optimizing the function and thus finding the set of points which yields the highest performance. The second algorithm uses Bayesian optimization to approximate the integral of performance over the region of interest. The resulting algorithms are validated in two different simulated environments. The resulting algorithms are applicable not only to machine learning but can also be used to optimize any function which takes a set of positions and returns a value, but are more suitable when the function is expensive to evaluate. Bayesian optimization AutoML supervised learning Computer Sciences Datavetenskap (datalogi)
33	Stable Mixing of Complete and Incomplete Information Corduneanu, Adrian, Jaakkola, Tommi 08 November 2001 (has links) An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems. AI semi-supervised learning incomplete data EM stable estimation
34	Validating Co-Training Models for Web Image Classification Zhang, Dell, Lee, Wee Sun 01 1900 (has links) Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present. / Singapore-MIT Alliance (SMA) Co-Training Machine Learning Multimedia Data Mining Semi-Supervised Learning
35	Graph based semi-supervised learning in computer vision Huang, Ning, January 2009 (has links) Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Biomedical Engineering." Includes bibliographical references (p. 54-55).
36	Kernel methods in supervised and unsupervised learning / Tsang, Wai-Hung. January 2003 (has links) Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 46-49). Also available in electronic version. Access restricted to campus users.
37	Parameter incremental learning algorithm for neural networks Wan, Sheng, January 1900 (has links) Thesis (Ph. D.)--West Virginia University, 2005. / Title from document title page. Document formatted into pages; contains x, 97 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 81-83).
38	Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm Caponnetto, Andrea, Rosasco, Lorenzo, Vito, Ernesto De, Verri, Alessandro 27 May 2005 (has links) This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting. AI optimal rates effective dimension semi-supervised learning
39	Automated alarm and root-cause analysis based on real time high-dimensional process data : Part of a joint research project between UmU, Volvo AB & Volvo Cars Harbs, Justin, Svensson, Jack January 2018 (has links) Today, a large amount of raw data are available within manufacturing industries. Unfortunately, most of it is not further analyzed in search of valuable information regarding the optimization of processes. In the painting process at the Volvo plant in Umeå, adjusted settings on the process equipments (e.g. robots, machines etc.) are mostly based on the experience of the personnel rather than actual facts (i.e. analyzed data). Consequently, time- and cost waste caused by defects is obtained when painting the commercial heavy-duty truck bodies (cabs). Hence, the aim of this masters thesis is to model the quality as a function of available background- and process data. This should be presented in an automated alarm and root-cause system. A variety of supervised learning algorithms were trained in order to estimate the probability of having at least one defect per cab. Even with a small amount of data, results have shown that such algorithms can provide valuable information. Later in this thesis work, one of the algorithms was chosen and used as the underlying model in the prototype of an automated alarm system. When this probability was considered as too high, an intuitive root-cause analysis was presented. Ultimately, this research has demonstrated the importance and possibility of analyzing data with statistical tools in the search of limiting costs- and time waste. Machine Learning classification analysis supervised learning Mathematics Matematik
40	Active Learning : an unbiased approach / L’apprentissage actif : une approche non biaisée Ribeiro de Mello, Carlos Eduardo 04 June 2013 (has links) L'apprentissage actif apparaît comme un problème important dans différents contextes de l'apprentissage supervisé pour lesquels obtenir des données est une tâche aisée mais les étiqueter est coûteux. En règle générale, c’est une stratégie de requête, une heuristique gloutonne basée sur un critère de sélection qui recherche les données non étiquetées potentiellement les plus intéressantes pour former ainsi un ensemble d'apprentissage. Une stratégie de requête est donc une procédure d'échantillonnage biaisée puisqu'elle favorise systématiquement certaines observations s'écartant ainsi des modèles d'échantillonnages indépendants et identiquement distribués. L'hypothèse principale de cette thèse s'inscrit dans la réduction du biais introduit par le critère de sélection. La proposition générale consiste à réduire le biais en sélectionnant le sous-ensemble minimal d'apprentissage pour lequel l'estimation de la loi de probabilité est aussi proche que possible de la loi sous-jacente prenant en compte l’intégralité des observations. Pour ce faire, une nouvelle stratégie générale de requête pour l'apprentissage actif a été mise au point utilisant la théorie de l'Information. Les performances de la stratégie de requête proposée ont été évaluées sur des données réelles et simulées. Les résultats obtenus confirment l'hypothèse sur le biais et montrent que l'approche envisagée améliore l'état de l'art sur différents jeux de données. / Active Learning arises as an important issue in several supervised learning scenarios where obtaining data is cheap, but labeling is costly. In general, this consists in a query strategy, a greedy heuristic based on some selection criterion, which searches for the potentially most informative observations to be labeled in order to form a training set. A query strategy is therefore a biased sampling procedure since it systematically favors some observations by generating biased training sets, instead of making independent and identically distributed draws. The main hypothesis of this thesis lies in the reduction of the bias inherited from the selection criterion. The general proposal consists in reducing the bias by selecting the minimal training set from which the estimated probability distribution is as close as possible to the underlying distribution of overall observations. For that, a novel general active learning query strategy has been developed using an Information-Theoretic framework. Several experiments have been performed in order to evaluate the performance of the proposed strategy. The obtained results confirm the hypothesis about the bias, showing that the proposal outperforms the baselines in different datasets. Apprentissage actif Apprentissage supervisé Échantillonnage Active learning Supervised learning Sampling

Search results