Spelling suggestions: "subject:"unsupervised learning"" "subject:"nsupervised learning""
11 |
Unsupervised Learning for Plant RecognitionJelacic, Mersad January 2006 (has links)
<p>Six methods are used for clustering data containing two different objects: sugar-beet plants </p><p>and weed. These objects are described by 19 different features, i.e. shape and color features. </p><p>There is also information about the distance between sugar-beet plants that is used for </p><p>labeling clusters. The methods that are evaluated: k-means, k-medoids, hierarchical clustering, </p><p>competitive learning, self-organizing maps and fuzzy c-means. After using the methods on </p><p>plant data, clusters are formed. The clusters are labeled with three different proposed </p><p>methods: expert, database and context method. Expert method is using a human for giving </p><p>initial cluster centers that are labeled. The database method is using a database as an expert </p><p>that provides initial cluster centers. The context method is using information about the </p><p>environment, which is the distance between sugar-beet plants, for labeling the clusters. </p><p> </p><p>The algorithms that were tested, with the lowest achieved corresponding error, are: k-means </p><p>(3.3%), k-medoids (3.8%), hierarchical clustering (5.3%), competitive learning (6.8%), self- </p><p>organizing maps (4.9%) and fuzzy c-means (7.9%). Three different datasets were used and the </p><p>lowest error on dataset0 is 3.3%, compared to supervised learning methods where it is 3%. </p><p>For dataset1 the error is 18.7% and for dataset2 it is 5.8%. Compared to supervised methods, </p><p>the error on dataset1 is 11% and for dataset2 it is 5.1%. The high error rate on dataset1 is due </p><p>to the samples are not very well separated in different clusters. The features from dataset1 are </p><p>extracted from lower resolution on images than the other datasets, and another difference </p><p>between the datasets are the sugar-beet plants that are in different growth stages. </p><p> </p><p>The performance of the three methods for labeling clusters is: expert method (6.8% as the </p><p>lowest error achieved), database method (3.7%) and context method (6.8%). These results </p><p>show the clustering results by competitive learning where the real error is 6.8%. </p><p> </p><p>Unsupervised-learning methods for clustering can very well be used for plant identification. </p><p>Because the samples are not classified, an automatic labeling technique must be used if plants </p><p>are to be identified. The three proposed techniques can be used for automatic labeling of </p><p>plants.</p>
|
12 |
Learning Linear, Sparse, Factorial CodesOlshausen, Bruno A. 01 December 1996 (has links)
In previous work (Olshausen & Field 1996), an algorithm was described for learning linear sparse codes which, when trained on natural images, produces a set of basis functions that are spatially localized, oriented, and bandpass (i.e., wavelet-like). This note shows how the algorithm may be interpreted within a maximum-likelihood framework. Several useful insights emerge from this connection: it makes explicit the relation to statistical independence (i.e., factorial coding), it shows a formal relationship to the algorithm of Bell and Sejnowski (1995), and it suggests how to adapt parameters that were previously fixed.
|
13 |
Knowledge Extraction from Logged Truck Data using Unsupervised Learning MethodsGrubinger, Thomas January 2008 (has links)
<p>The goal was to extract knowledge from data that is logged by the electronic system of</p><p>every Volvo truck. This allowed the evaluation of large populations of trucks without requiring additional measuring devices and facilities.</p><p>An evaluation cycle, similar to the knowledge discovery from databases model, was</p><p>developed and applied to extract knowledge from data. The focus was on extracting</p><p>information in the logged data that is related to the class labels of different populations,</p><p>but also supported knowledge extraction inherent from the given classes. The methods</p><p>used come from the field of unsupervised learning, a sub-field of machine learning and</p><p>include the methods self-organizing maps, multi-dimensional scaling and fuzzy c-means</p><p>clustering.</p><p>The developed evaluation cycle was exemplied by the evaluation of three data-sets.</p><p>Two data-sets were arranged from populations of trucks differing by their operating</p><p>environment regarding road condition or gross combination weight. The results showed</p><p>that there is relevant information in the logged data that describes these differences</p><p>in the operating environment. A third data-set consisted of populations with different</p><p>engine configurations, causing the two groups of trucks being unequally powerful.</p><p>Using the knowledge extracted in this task, engines that were sold in one of the two</p><p>configurations and were modified later, could be detected.</p><p>Information in the logged data that describes the vehicle's operating environment,</p><p>allows to detect trucks that are operated differently of their intended use. Initial experiments</p><p>to find such vehicles were conducted and recommendations for an automated</p><p>application were given.</p>
|
14 |
Unsupervised Learning for Plant RecognitionJelacic, Mersad January 2006 (has links)
Six methods are used for clustering data containing two different objects: sugar-beet plants and weed. These objects are described by 19 different features, i.e. shape and color features. There is also information about the distance between sugar-beet plants that is used for labeling clusters. The methods that are evaluated: k-means, k-medoids, hierarchical clustering, competitive learning, self-organizing maps and fuzzy c-means. After using the methods on plant data, clusters are formed. The clusters are labeled with three different proposed methods: expert, database and context method. Expert method is using a human for giving initial cluster centers that are labeled. The database method is using a database as an expert that provides initial cluster centers. The context method is using information about the environment, which is the distance between sugar-beet plants, for labeling the clusters. The algorithms that were tested, with the lowest achieved corresponding error, are: k-means (3.3%), k-medoids (3.8%), hierarchical clustering (5.3%), competitive learning (6.8%), self- organizing maps (4.9%) and fuzzy c-means (7.9%). Three different datasets were used and the lowest error on dataset0 is 3.3%, compared to supervised learning methods where it is 3%. For dataset1 the error is 18.7% and for dataset2 it is 5.8%. Compared to supervised methods, the error on dataset1 is 11% and for dataset2 it is 5.1%. The high error rate on dataset1 is due to the samples are not very well separated in different clusters. The features from dataset1 are extracted from lower resolution on images than the other datasets, and another difference between the datasets are the sugar-beet plants that are in different growth stages. The performance of the three methods for labeling clusters is: expert method (6.8% as the lowest error achieved), database method (3.7%) and context method (6.8%). These results show the clustering results by competitive learning where the real error is 6.8%. Unsupervised-learning methods for clustering can very well be used for plant identification. Because the samples are not classified, an automatic labeling technique must be used if plants are to be identified. The three proposed techniques can be used for automatic labeling of plants.
|
15 |
Semantic interpretation with distributional analysisGlass, Michael Robert 05 July 2012 (has links)
Unstructured text contains a wealth of knowledge, however, it is in a form unsuitable for reasoning. Semantic interpretation is the task of processing natural language text to create or extend a coherent, formal knowledgebase able to reason and support question answering. This task involves entity, event and relation extraction, co-reference resolution, and inference. Many domains, from intelligence data to bioinformatics, would benefit by semantic interpretation. But traditional approaches to the subtasks typically require a large annotated corpus specific to a single domain and ontology. This dissertation describes an approach to rapidly train a semantic interpreter using a set of seed annotations and a large, unlabeled corpus. Our approach adapts methods from paraphrase acquisition and automatic thesaurus construction to extend seed syntactic to semantic mappings using an automatically gathered, domain specific, parallel corpus. During interpretation, the system uses joint probabilistic inference to select the most probable interpretation consistent with the background knowledge. We evaluate both the quality of the extended mappings as well as the performance of the semantic interpreter. / text
|
16 |
Knowledge Extraction from Logged Truck Data using Unsupervised Learning MethodsGrubinger, Thomas January 2008 (has links)
The goal was to extract knowledge from data that is logged by the electronic system of every Volvo truck. This allowed the evaluation of large populations of trucks without requiring additional measuring devices and facilities. An evaluation cycle, similar to the knowledge discovery from databases model, was developed and applied to extract knowledge from data. The focus was on extracting information in the logged data that is related to the class labels of different populations, but also supported knowledge extraction inherent from the given classes. The methods used come from the field of unsupervised learning, a sub-field of machine learning and include the methods self-organizing maps, multi-dimensional scaling and fuzzy c-means clustering. The developed evaluation cycle was exemplied by the evaluation of three data-sets. Two data-sets were arranged from populations of trucks differing by their operating environment regarding road condition or gross combination weight. The results showed that there is relevant information in the logged data that describes these differences in the operating environment. A third data-set consisted of populations with different engine configurations, causing the two groups of trucks being unequally powerful. Using the knowledge extracted in this task, engines that were sold in one of the two configurations and were modified later, could be detected. Information in the logged data that describes the vehicle's operating environment, allows to detect trucks that are operated differently of their intended use. Initial experiments to find such vehicles were conducted and recommendations for an automated application were given.
|
17 |
Assessing and quantifying clusteredness: The OPTICS CordilleraRusch, Thomas, Hornik, Kurt, Mair, Patrick 01 1900 (has links) (PDF)
Data representations in low dimensions such as results from unsupervised dimensionality reduction methods are often visually interpreted to find clusters of observations. To identify clusters the result must be appreciably clustered. This property of a result may be called "clusteredness". When judged visually, the appreciation of clusteredness is highly subjective. In this paper we suggest an objective way to assess clusteredness in data representations. We provide a definition of clusteredness that captures important aspects of a clustered appearance. We characterize these aspects and define the extremes rigorously. For this characterization of clusteredness we suggest an index to assess the degree of clusteredness, coined the OPTICS Cordillera. It makes only weak assumptions and is a property of the result, invariant for different partitionings or cluster assignments. We
provide bounds and a normalization for the index, and prove that it represents the aspects of clusteredness. Our index is parsimonious with respect to mandatory parameters but
also exible by allowing optional parameters to be tuned. The index can be used as a descriptive goodness-of-clusteredness statistic or to compare different results. For illustration we use a data set of handwritten digits which are very differently represented in two
dimensions by various popular dimensionality reduction results. Empirically, observers had a hard time to visually judge the clusteredness in these representations but our index provides a clear and easy characterisation of the clusteredness of each result. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods
|
18 |
Integrated supervised and unsupervised learning method to predict the outcome of tuberculosis treatment courseRostamniakankalhori, Sharareh January 2011 (has links)
Tuberculosis (TB) is an infectious disease which is a global public health problem with over 9 million new cases annually. Tuberculosis treatment, with patient supervision and support is an element of the global plan to stop TB designed by the World Health Organization in 2006. The plan requires prediction of patient treatment course destination. The prediction outcome can be used to determine how intensive the level of supplying services and supports in frame of DOTS therapy should be. No predictive model for the outcome has been developed yet and only limited reports of influential factors for considered outcome are available. To fill this gap, this thesis develops a machine learning approach to predict the outcome of tuberculosis treatment course, which includes, firstly, data of 6,450 Iranian TB patients under DOTS (directly observed treatment, short course ) therapy were analysed to initially diagnose the significant predictors by correlation analysis; secondly, these significant features were applied to find the best classification approach from six examined algorithms including decision tree, Bayesian network, logistic regression, multilayer perceptron, radial basis function, and support vector machine; thirdly, the prediction accuracy of these existing techniques was improved by proposing and developing a new integrated method of k-mean clustering and classification algorithms. Finally, a cluster-based simplified decision tree (CSDT) was developed through an innovative hierarchical clustering and classification algorithm. CSDT was built by k-mean partitioning and the decision tree learning. This innovative method not only improves the prediction accuracy significantly but also leads to a much simpler and interpretative decision tree. The main results of this study included, firstly, finding seventeen significantly correlated features which were: age, sex, weight, nationality, area of residency, current stay in prison, low body weight, TB type, treatment category, length of disease, TB case type, recent TB infection, diabetic or HIV positive, and social risk factors like history of imprisonment, IV drug usage, and unprotected sex ; secondly, the results by applying and comparing six applied supervised machine learning tools on the testing set revealed that decision trees gave the best prediction accuracy (74.21%) compared with other methods; thirdly, by using testing set, the new integrated approach to combine the clustering and classification approach leads to the prediction accuracy improvement for all applied classifiers; the most and least improvement for prediction accuracy were shown by logistic regression (10%) and support vector machine (4%) respectively. Finally, by applying the proposed and developed CSDT, cluster-based simplified decision trees were optioned, which reduced the size of the resulting decision tree and further improved the prediction accuracy. Data type and having normal distribution have created an opportunity for the decision tree to outperform other algorithms. Pre-learning by k-mean clustering to relocate the objects and put similar cases in the same group can improve the classification accuracy. The compatible feature of k-mean partitioning and decision tree to generate pure local regions can simplify the decision trees and make them more precise through creating smaller sub-trees with fewer misclassified cases. The extracted rules from these trees can play the role of a knowledge base for a decision support system in further studies.
|
19 |
A climatology of tornado outbreak environments derived from unsupervised learning methodsBowles, Justin Alan 30 April 2021 (has links)
Tornado outbreaks (TO) occur every year across the continental United States and are a result of various synoptic scale, mesoscale, and climatological patterns. This study looks to find what patterns exist among the various scales and how that relates to the climatology of the TOs. In order to find these patterns, principal component analysis (PCA) and a cluster analysis were conducted to differentiate the patterns of data. Four distinct clusters of TOs were found with varying synoptic and mesoscale patterns as well as distinct climatological patterns. An interesting result from this study includes the shifting of TO characteristics over time to a more synoptically forced pattern that has becoming stronger and shifted eastward from the Great Plains.
|
20 |
Vehicle detection and tracking in highway surveillance videosTamersoy, Birgi 2009 August 1900 (has links)
We present a novel approach for vehicle detection and tracking in highway surveillance videos. This method incorporates well-studied computer vision and machine learning techniques to form an unsupervised system, where vehicles are automatically "learned" from video sequences. First an enhanced adaptive background mixture model is used to identify positive and negative examples. Then a video-specific classifier is trained with these examples. Both the background model and the trained classifier are used in conjunction to detect vehicles in a frame. Tracking is achieved by a simplified multi-hypotheses approach. An over-complete set of tracks
is created considering every observation within a time interval. As needed hypothesized detections are generated to force continuous tracks. Finally, a scoring function is used to separate the valid tracks in the over-complete set. The proposed detection and tracking algorithm is tested in a challenging application; vehicle counting. Our
method achieved very accurate results in three traffic surveillance videos that are
significantly different in terms of view-point, quality and clutter. / text
|
Page generated in 0.0714 seconds