Global ETD Search

191	Kernelized Supervised Dictionary Learning Jabbarzadeh Gangeh, Mehrdad 24 April 2013 (has links) The representation of a signal using a learned dictionary instead of predefined operators, such as wavelets, has led to state-of-the-art results in various applications such as denoising, texture analysis, and face recognition. The area of dictionary learning is closely associated with sparse representation, which means that the signal is represented using few atoms in the dictionary. Despite recent advances in the computation of a dictionary using fast algorithms such as K-SVD, online learning, and cyclic coordinate descent, which make the computation of a dictionary from millions of data samples computationally feasible, the dictionary is mainly computed using unsupervised approaches such as k-means. These approaches learn the dictionary by minimizing the reconstruction error without taking into account the category information, which is not optimal in classification tasks. In this thesis, we propose a supervised dictionary learning (SDL) approach by incorporating information on class labels into the learning of the dictionary. To this end, we propose to learn the dictionary in a space where the dependency between the signals and their corresponding labels is maximized. To maximize this dependency, the recently-introduced Hilbert Schmidt independence criterion (HSIC) is used. The learned dictionary is compact and has closed form; the proposed approach is fast. We show that it outperforms other unsupervised and supervised dictionary learning approaches in the literature on real-world data. Moreover, the proposed SDL approach has as its main advantage that it can be easily kernelized, particularly by incorporating a data-driven kernel such as a compression-based kernel, into the formulation. In this thesis, we propose a novel compression-based (dis)similarity measure. The proposed measure utilizes a 2D MPEG-1 encoder, which takes into consideration the spatial locality and connectivity of pixels in the images. The proposed formulation has been carefully designed based on MPEG encoder functionality. To this end, by design, it solely uses P-frame coding to find the (dis)similarity among patches/images. We show that the proposed measure works properly on both small and large patch sizes on textures. Experimental results show that by incorporating the proposed measure as a kernel into our SDL, it significantly improves the performance of a supervised pixel-based texture classification on Brodatz and outdoor images compared to other compression-based dissimilarity measures, as well as state-of-the-art SDL methods. It also improves the computation speed by about 40% compared to its closest rival. Eventually, we have extended the proposed SDL to multiview learning, where more than one representation is available on a dataset. We propose two different multiview approaches: one fusing the feature sets in the original space and then learning the dictionary and sparse coefficients on the fused set; and the other by learning one dictionary and the corresponding coefficients in each view separately, and then fusing the representations in the space of the dictionaries learned. We will show that the proposed multiview approaches benefit from the complementary information in multiple views, and investigate the relative performance of these approaches in the application of emotion recognition. dictionary learning sparse representation supervised learning HSIC classification Electrical and Computer Engineering
192	Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting Huang, Jiayuan January 2007 (has links) This thesis focuses on two fundamental machine learning problems:unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics,where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation. This thesis approaches these problems from two perspectives: graph based and distribution based. First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning. In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks. For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization to develop new unsupervised and semi-supervised learning algorithms. Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions. This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data. unsupervised learning semi-supervised learning graph based learning distribution shifting Computer Science
193	Detecting Land Cover Change over a 20 Year Time Period in the Niagara Escarpment Plan Using Satellite Remote Sensing Waite, Holly January 2009 (has links) The Niagara Escarpment is one of Southern Ontario’s most important landscapes. Due to the nature of the landform and its location, the Escarpment is subject to various development pressures including urban expansion, mineral resource extraction, agricultural practices and recreation. In 1985, Canada’s first large scale environmentally based land use plan was put in place to ensure that only development that is compatible with the Escarpment occurred within the Niagara Escarpment Plan (NEP). The southern extent of the NEP is of particular interest in this study, since a portion of the Plan is located within the rapidly expanding Greater Toronto Area (GTA). The Plan area located in the Regional Municipalities of Hamilton and Halton represent both urban and rural geographical areas respectively, and are both experiencing development pressures and subsequent changes in land cover. Monitoring initiatives on the NEP have been established, but have done little to identify consistent techniques for monitoring land cover on the Niagara Escarpment. Land cover information is an important part of planning and environmental monitoring initiatives. Remote sensing has the potential to provide frequent and accurate land cover information over various spatial scales. The goal of this research was to examine land cover change in the Regional Municipalities of Hamilton and Halton portions of the NEP. This was achieved through the creation of land cover maps for each region using Landsat 5 Thematic Mapper (TM) remotely sensed data. These maps aided in determining the qualitative and quantitative changes that had occurred in the Plan area over a 20 year time period from 1986 to 2006. Change was also examined based on the NEP’s land use designations, to determine if the Plan policy has been effective in protecting the Escarpment. To obtain land cover maps, five different supervised classification methods were explored: Minimum Distance, Mahalanobis Distance, Maximum Likelihood, Object-oriented and Support Vector Machine. Seven land cover classes were mapped (forest, water, recreation, bare agricultural fields, vegetated agricultural fields, urban and mineral resource extraction areas) at a regional scale. SVM proved most successful at mapping land cover on the Escarpment, providing classification maps with an average accuracy of 86.7%. Land cover change analysis showed promising results with an increase in the forested class and only slight increases to the urban and mineral resource extraction classes. Negatively, there was a decrease in agricultural land overall. An examination of land cover change based on the NEP land use designations showed little change, other than change that is regulated under Plan policies, proving the success of the NEP for protecting vital Escarpment lands insofar as this can be revealed through remote sensing. Land cover should be monitored in the NEP consistently over time to ensure changes in the Plan area are compatible with the Niagara Escarpment. Remote sensing is a tool that can provide this information to the Niagara Escarpment Commission (NEC) in a timely, comprehensive and cost-effective way. The information gained from remotely sensed data can aid in environmental monitoring and policy planning into the future. Niagara Escarpment Land Cover Change Remote Sensing Support Vector Machine Mapping Supervised Classification Landsat Geography
194	Fundamental Limitations of Semi-Supervised Learning Lu, Tyler (Tian) 30 April 2009 (has links) The emergence of a new paradigm in machine learning known as semi-supervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semi-supervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theory-practice gap. We focus on investigating the inherent limitations of the benefits SSL can provide over SL. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of SSL. Our framework is utopian in the sense that a SSL algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual SSL model. Thus, any lower bound on the sample complexity of SSL in this model implies lower bounds in the usual model. Roughly, our conclusion is that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution (``SSL type assumption''), SSL cannot provide significant advantages over SL. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a no-prior-knowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class. We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring the unlabeled data and doing supervised learning. We conclude with a look into future research directions that build on our investigation. artificial intelligence machine learning semi-supervised learning statistical learning theory Computer Science
195	Contributions to Unsupervised and Semi-Supervised Learning Pal, David 21 May 2009 (has links) This thesis studies two problems in theoretical machine learning. The first part of the thesis investigates the statistical stability of clustering algorithms. In the second part, we study the relative advantage of having unlabeled data in classification problems. Clustering stability was proposed and used as a model selection method in clustering tasks. The main idea of the method is that from a given data set two independent samples are taken. Each sample individually is clustered with the same clustering algorithm, with the same setting of its parameters. If the two resulting clusterings turn out to be close in some metric, it is concluded that the clustering algorithm and the setting of its parameters match the data set, and that clusterings obtained are meaningful. We study asymptotic properties of this method for certain types of cost minimizing clustering algorithms and relate their asymptotic stability to the number of optimal solutions of the underlying optimization problem. In classification problems, it is often expensive to obtain labeled data, but on the other hand, unlabeled data are often plentiful and cheap. We study how the access to unlabeled data can decrease the amount of labeled data needed in the worst-case sense. We propose an extension of the probably approximately correct (PAC) model in which this question can be naturally studied. We show that for certain basic tasks the access to unlabeled data might, at best, halve the amount of labeled data needed. machine learning statistics unsupervised learning semi-supervised learning learning theory Computer Science
196	Assessing Student Knowledge and Perceptions of Factors Influencing Participation in Supervised Agricultural Experience Programs Lewis, Lauren Joanna 2012 May 1900 (has links) The purpose of this study was to assess student knowledge and perceptions of factors influencing participation in Supervised Agricultural Experience (SAE) programs. This descriptive study was conducted in 120 randomly selected agricultural education programs throughout four purposively selected states representative of the National FFA regions. Within each state the programs randomly selected to participate were from FFA divisions characterized as having urban city-centers with outlying rural/suburban areas. Students in Florida, Indiana, Missouri, and Utah completed a researcher-designed questionnaire assessing knowledge and perceptions on factors influencing SAE participation. A response rate of 43.3% (N = 120, n = 52) was achieved, with questionnaires completed by 1,038 students. According to findings of this study 45.6% (n = 473) of the students participated in SAE programs, with most categorized as an entrepreneurship SAE and classified as a livestock project. Students could only identify at most three of five SAE categories, and those without a SAE program were either not or somewhat familiar with the five SAE categories. Students surveyed in Missouri and Utah appeared to have the strongest SAE knowledge. Each state appeared to have three main types of school resources available for use by student SAE programs. Student perceptions indicated that teachers did encourage all students to have a SAE program and apply for awards and recognition; however, most did not receive awards and recognition for their SAE program. Students reported receiving SAE help from their teacher on a monthly basis most frequently. Most students used a paper-based SAE record book which they updated weekly or monthly. Students on average received a total of nine to 34 days of classroom SAE instruction and a total of eight to 33 days of classroom recordkeeping instruction during enrollment in agricultural education courses. Factors such as enjoyment of agricultural education courses, parental and teacher support and encouragement, resources (money and facilities), and opportunities for awards and recognition did not seem to influence student SAE participation. Contrary to previous research, involvement in community and school activities did not seem to negatively influence student SAE participation. Students did not believe they needed more SAE and recordkeeping instruction. Supervised agricultural experience SAE agricultural education student SAE projects student SAE programs experiential learning
197	Image Annotation With Semi-supervised Clustering Sayar, Ahmet 01 December 2009 (has links) (PDF) Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this thesis, we propose a new image annotation technique, which improves the representation and quantization of the visual information by employing the available but unused information, called side information, which is hidden in the system. This side information is used to semi-supervise the clustering process which creates the visterms. The selection of side information depends on the visual image content, the annotation words and the relationship between them. Although there may be many different ways of defining and selecting side information, in this thesis, three types of side information are proposed. The first one is the hidden topic probability information obtained automatically from the text document associated with the image. The second one is the orientation and the third one is the color information around interest points that correspond to critical locations in the image. The side information provides a set of constraints in a semi-supervised K-means region clustering algorithm. Consequently, in generation of the visual terms from the regions, not only low level features are clustered, but also side information is used to complement the visual information, called visterms. This complementary information is expected to close the semantic gap between the low level features extracted from each region and the high level textual information. Therefore, a better match between visual codebook and the annotation words is obtained. Moreover, a speedup is obtained in the modified K-means algorithm because of the constraints brought by the side information. The proposed algorithm is implemented in a high performance parallel computation environment. QA General 15707
198	Personalized Document Clustering: Technique Development and Empirical Evaluation Wu, Chia-Chen 14 August 2003 (has links) With the proliferation of an electronic commerce and knowledge economy environment, both organizations and individuals generate and consume a large amount of online information, typically available as textual documents. To manage the ever-increasing volume of documents, organizations and individuals typically organize their documents into categories to facilitate document management and subsequent information access and browsing. However, document grouping behaviors are intentional acts, reflecting individuals¡¦ (or organizations¡¦) preferential perspective on semantic coherency or relevant groupings between subjects. Thus, an effective document clustering needs to address the described preferential perspective on document grouping and support personalized document clustering. In this thesis, we designed and implemented a personalized document clustering approach by incorporating individual¡¦s partial clustering into the document clustering process. Combining two document representation methods (i.e., feature refinement and feature weighting) with two clustering processes (i.e., pre-cluster-based and atomic-based), four personalized document clustering techniques are proposed. Using the clustering effectiveness achieved by a traditional content-based document clustering technique as performance benchmarks, our evaluation results suggest that use of partial clusters would improve the document clustering effectiveness. Moreover, the pre-cluster-based technique outperforms the atomic-based one, and the feature weighting method for document representation achieves a higher clustering effectiveness than the feature refinement method does. Supervised Document Clustering Personalized Document Clustering Document Clustering Hierarchical Agglomerative Clustering
199	AIRS: a resource limited artificial immune classifier Watkins, Andrew B. January 2001 (has links) Thesis (M.S.)--Mississippi State University. Department of Computer Science. / Title from title screen. Includes bibliographical references.
200	Barns delaktighet i frågor om umgängesstöd : en studie av elva tingsrättsdomar Gustafsson, Michelle, Olsson, Sebastian January 2015 (has links) The purpose of this study was to examine children's participation in court proceedings on supervised visitation and to analyse the descriptions of children in court verdicts. Eleven verdicts concerning supervised visitation resolved in 2014 were collected from two district courts in Stockholm County and studied with a qualitative textual analysis. The material was analysed with participation levels influenced by the ladder of participation for children developed by Roger Hart and with the theory of sociology of childhood. Our findings showed that children's opinions were mentioned in eight of the verdicts. In four verdicts the children's will influenced the courts decisions. The children’s will was in none of the verdicts determinant for the outcome. The children's level of participation had no correlation with their age. The children were often described as having universal needs rather than individual needs. The will of the children was in some verdicts invalidated by the court because of their age and their perceived lack of ability to understand what's best for them in the future. Our conclusions are that the court rarely described the children as independent actors or took the children's wishes into account. supervised visitation children’s participation contact dispute children’s rights umgängesstöd barns delaktighet umgängestvist barns rättigheter

Search results