Global ETD Search

201	Detecting Wetland Change through Supervised Classification of Landsat Satellite Imagery within the Tunkwa Watershed of British Columbia, Canada Lee, Steven January 2011 (has links) Wetlands are considered to be one of the most valuable natural occurring forms of land cover in the world. Hydrologic regulation, carbon sequestration, and habitat provision for a wide assortment of flora and fauna are just a few of the benefits associated with wetlands. The implementation of satellite remote sensing has been demonstrated to be a reliable approach to monitoring wetlands over time. Unfortunately, a national wetland inventory does not exist for Canada at this time. This study employs a supervised classification method of Landsat satellite imagery between 1976 and 2008 within the Tunkwa watershed, southwest of Kamloops, British Columbia, Canada. Images from 2005 and 2008 were repaired using a gap-filling technique due to do the failure of the scan-line corrector on the Landsat 7 satellite in 2003. Percentage pixel counts for wetlands were compared, and a diminishing trend was identified; approximately 4.8% of wetland coverage loss was recognized. The influence of the expansion of Highland Valley Copper and the forestry industry in the area may be the leading causes of wetland desiccation. This study expresses the feasibility of wetland monitoring using remote sensing and emphasizes the need for future work to compile a Canadian wetland inventory. wetlands remote sensing Landsat Tunkwa watershed land use supervised classification gap-filling
202	Combining classifier and cluster ensembles for semi-supervised and transfer learning Acharya, Ayan 09 July 2012 (has links) Unsupervised models can provide supplementary soft constraints to help classify new, "target" data since similar instances in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place, as in transfer learning settings. This contribution describes two general frameworks that take as input class membership estimates from existing classifiers learnt on previously encountered "source" data, as well as a set of cluster labels from a cluster ensemble operating solely on the target data to be classified, and yield a consensus labeling of the target data. One of the proposed frameworks admits a wide range of loss functions and classification/clustering methods and exploits properties of Bregman divergences in conjunction with Legendre duality to yield a principled and scalable approach. The other approach is built on probabilistic mixture models and provides additional flexibility of distributed computation that is useful when the target data cannot be gathered in a single place for privacy or security concerns. A variety of experiments show that the proposed frameworks can yield results substantially superior to those provided by popular transductive learning techniques or by naively applying classifiers learnt on the original task to the target data. / text Ensemble Classification Clustering Semi-supervised learning Transfer learning Alternating minimization Privacy
203	New insights on the power of active learning Berlind, Christopher 21 September 2015 (has links) Traditional supervised machine learning algorithms are expected to have access to a large corpus of labeled examples, but the massive amount of data available in the modern world has made unlabeled data much easier to acquire than accompanying labels. Active learning is an extension of the classical paradigm intended to lessen the expense of the labeling process by allowing the learning algorithm to intelligently choose which examples should be labeled. In this dissertation, we demonstrate that the power to make adaptive label queries has benefits beyond reducing labeling effort over passive learning. We develop and explore several novel methods for active learning that exemplify these new capabilities. Some of these methods use active learning for a non-standard purpose, such as computational speedup, structure discovery, and domain adaptation. Others successfully apply active learning in situations where prior results have given evidence of its ineffectiveness. Specifically, we first give an active algorithm for learning disjunctions that is able to overcome a computational intractability present in the semi-supervised version of the same problem. This is the first known example of the computational advantages of active learning. Next, we investigate using active learning to determine structural properties (margins) of the data-generating distribution that can further improve learning rates. This is in contrast to most active learning algorithms which either assume or ignore structure rather than seeking to identify and exploit it. We then give an active nearest neighbors algorithm for domain adaptation, the task of learning a predictor for some target domain using mostly examples from a different source domain. This is the first formal analysis of the generalization and query behavior of an active domain adaptation algorithm. Finally, we show a situation where active learning can outperform passive learning on very noisy data, circumventing prior results that active learning cannot have a significant advantage over passive learning in high-noise regimes. Machine learning Learning theory Active learning Semi-supervised learning Domain adaptation Large margin learning
204	Transfer learning for classification of spatially varying data Jun, Goo 13 December 2010 (has links) Many real-world datasets have spatial components that provide valuable information about characteristics of the data. In this dissertation, a novel framework for adaptive models that exploit spatial information in data is proposed. The proposed framework is mainly based on development and applications of Gaussian processes. First, a supervised learning method is proposed for the classification of hyperspectral data with spatially adaptive model parameters. The proposed algorithm models spatially varying means of each spectral band of a given class using a Gaussian process regression model. For a given location, the predictive distribution of a given class is modeled by a multivariate Gaussian distribution with spatially adjusted parameters obtained from the proposed algorithm. The Gaussian process model is generally regarded as a good tool for interpolation, but not for extrapolation. Moreover, the uncertainty of the predictive distribution increases as the distance from the training instances increases. To overcome this problem, a semi-supervised learning algorithm is presented for the classification of hyperspectral data with spatially adaptive model parameters. This algorithm fits the test data with a spatially adaptive mixture-of-Gaussians model, where the spatially varying parameters of each component are obtained by Gaussian process regressions with soft memberships using the mixture-of-Gaussian-processes model. The proposed semi-supervised algorithm assumes a transductive setting, where the unlabeled data is considered to be similar to the training data. This is not true in general, however, since one may not know how many classes may existin the unexplored regions. A spatially adaptive nonparametric Bayesian framework is therefore proposed by applying spatially adaptive mechanisms to the mixture model with infinitely many components. In this method, each component in the mixture has spatially adapted parameters estimated by Gaussian process regressions, and spatial correlations between indicator variables are also considered. In addition to land cover and land use classification applications based on hyperspectral imagery, the Gaussian process-based spatio-temporal model is also applied to predict ground-based aerosol optical depth measurements from satellite multispectral images, and to select the most informative ground-based sites by active learning. In this application, heterogeneous features with spatial and temporal information are incorporated together by employing a set of covariance functions, and it is shown that the spatio-temporal information exploited in this manner substantially improves the regression model. The conventional meaning of spatial information usually refers to actual spatio-temporal locations in the physical world. In the final chapter of this dissertation, the meaning of spatial information is generalized to the parametrized low-dimensional representation of data in feature space, and a corresponding spatial modeling technique is exploited to develop a nearest-manifold classification algorithm. / text Machine learning Gaussian processes Gaussian process regressions Spatial statistics
205	Active Machine Learning for Computational Design and Analysis under Uncertainties Lacaze, Sylvain January 2015 (has links) Computational design has become a predominant element of various engineering tasks. However, the ever increasing complexity of numerical models creates the need for efficient methodologies. Specifically, computational design under uncertainties remains sparsely used in engineering settings due to its computational cost. This dissertation proposes a coherent framework for various branches of computational design under uncertainties, including model update, reliability assessment and reliability-based design optimization. Through the use of machine learning techniques, computationally inexpensive approximations of the constraints, limit states, and objective functions are constructed. Specifically, a novel adaptive sampling strategy allowing for the refinement of any approximation only in relevant regions has been developed, referred to as generalized max-min. This technique presents various computational advantages such as ease of parallelization and applicability to any metamodel. Three approaches tailored for computational design under uncertainties are derived from the previous approximation technique. An algorithm for reliability assessment is proposed and its efficiency is demonstrated for different probabilistic settings including dependent variables using copulas. Additionally, the notion of fidelity map is introduced for model update settings with large number of dependent responses to be matched. Finally, a new reliability-based design optimization method with local refinement has been developed. A derivation of sampling-based probability of failure derivatives is also provided along with a discussion on numerical estimates. This derivation brings additional flexibility to the field of computational design. The knowledge acquired and techniques developed during this Ph.D. have been synthesized in an object-oriented MATLAB toolbox. The help and ergonomics of the toolbox have been designed so as to be accessible by a large audience. Adaptive Sampling Reliability Assessment Reliability-based Design Optimization Supervised Learning Surrogate Modeling Mechanical Engineering Active Learning
206	Mapping land-use in north-western Nigeria (Case study of Dutse) Anavberokhai, Isah January 2007 (has links) This project analyzes satellite images from 1976, 1985 and 2000 of Dutse, Jigawa state, in north-western Nigeria. The analyzed satellite images were used to determine land-use and vegetation changes that have occurred in the land-use from 1976 to 2000 will help recommend possible planning measures in order to protect the vegetation from further deterioration. Studying land-use change in north-western Nigeria is essential for analyzing various ecological and developmental consequences over time. The north-western region of Nigeria is of great environmental and economic importance having land cover rich in agricultural production and livestock grazing. The increase of population over time has affected the land-use and hence agricultural and livestock production. On completion of this project, the possible land use changes that have taken place in Dutse will be analyzed for future recommendation. The use of supervised classification and change detection of satellite images have produced an economic way to quantify different types of landuse and changes that has occurred over time. The percentage difference in land-use between 1976 and 2000 was 37%, which is considered to be high land-use change within the period of study. The result in this project is being used to propose planning strategies that could help in planning sustainable land-use and diversity in Dutse. change detection satellite images land-use land cover Dutse Jigawa supervised classification livestock sustainable. TECHNOLOGY TEKNIKVETENSKAP
207	Classification models for high-dimensional data with sparsity patterns Tillander, Annika January 2013 (has links) Today's high-throughput data collection devices, e.g. spectrometers and gene chips, create information in abundance. However, this poses serious statistical challenges, as the number of features is usually much larger than the number of observed units. Further, in this high-dimensional setting, only a small fraction of the features are likely to be informative for any specific project. In this thesis, three different approaches to the two-class supervised classification in this high-dimensional, low sample setting are considered. There are classifiers that are known to mitigate the issues of high-dimensionality, e.g. distance-based classifiers such as Naive Bayes. However, these classifiers are often computationally intensive and therefore less time-consuming for discrete data. Hence, continuous features are often transformed into discrete features. In the first paper, a discretization algorithm suitable for high-dimensional data is suggested and compared with other discretization approaches. Further, the effect of discretization on misclassification probability in high-dimensional setting is evaluated. Linear classifiers are more stable which motivate adjusting the linear discriminant procedure to high-dimensional setting. In the second paper, a two-stage estimation procedure of the inverse covariance matrix, applying Lasso-based regularization and Cuthill-McKee ordering is suggested. The estimation gives a block-diagonal approximation of the covariance matrix which in turn leads to an additive classifier. In the third paper, an asymptotic framework that represents sparse and weak block models is derived and a technique for block-wise feature selection is proposed. Probabilistic classifiers have the advantage of providing the probability of membership in each class for new observations rather than simply assigning to a class. In the fourth paper, a method is developed for constructing a Bayesian predictive classifier. Given the block-diagonal covariance matrix, the resulting Bayesian predictive and marginal classifier provides an efficient solution to the high-dimensional problem by splitting it into smaller tractable problems. The relevance and benefits of the proposed methods are illustrated using both simulated and real data. / Med dagens teknik, till exempel spektrometer och genchips, alstras data i stora mängder. Detta överflöd av data är inte bara till fördel utan orsakar även vissa problem, vanligtvis är antalet variabler (p) betydligt fler än antalet observation (n). Detta ger så kallat högdimensionella data vilket kräver nya statistiska metoder, då de traditionella metoderna är utvecklade för den omvända situationen (p<n). Dessutom är det vanligtvis väldigt få av alla dessa variabler som är relevanta för något givet projekt och styrkan på informationen hos de relevanta variablerna är ofta svag. Därav brukar denna typ av data benämnas som gles och svag (sparse and weak). Vanligtvis brukar identifiering av de relevanta variablerna liknas vid att hitta en nål i en höstack. Denna avhandling tar upp tre olika sätt att klassificera i denna typ av högdimensionella data. Där klassificera innebär, att genom ha tillgång till ett dataset med både förklaringsvariabler och en utfallsvariabel, lära en funktion eller algoritm hur den skall kunna förutspå utfallsvariabeln baserat på endast förklaringsvariablerna. Den typ av riktiga data som används i avhandlingen är microarrays, det är cellprov som visar aktivitet hos generna i cellen. Målet med klassificeringen är att med hjälp av variationen i aktivitet hos de tusentals gener (förklaringsvariablerna) avgöra huruvida cellprovet kommer från cancervävnad eller normalvävnad (utfallsvariabeln). Det finns klassificeringsmetoder som kan hantera högdimensionella data men dessa är ofta beräkningsintensiva, därav fungera de ofta bättre för diskreta data. Genom att transformera kontinuerliga variabler till diskreta (diskretisera) kan beräkningstiden reduceras och göra klassificeringen mer effektiv. I avhandlingen studeras huruvida av diskretisering påverkar klassificeringens prediceringsnoggrannhet och en mycket effektiv diskretiseringsmetod för högdimensionella data föreslås. Linjära klassificeringsmetoder har fördelen att vara stabila. Nackdelen är att de kräver en inverterbar kovariansmatris och vilket kovariansmatrisen inte är för högdimensionella data. I avhandlingen föreslås ett sätt att skatta inversen för glesa kovariansmatriser med blockdiagonalmatris. Denna matris har dessutom fördelen att det leder till additiv klassificering vilket möjliggör att välja hela block av relevanta variabler. I avhandlingen presenteras även en metod för att identifiera och välja ut blocken. Det finns också probabilistiska klassificeringsmetoder som har fördelen att ge sannolikheten att tillhöra vardera av de möjliga utfallen för en observation, inte som de flesta andra klassificeringsmetoder som bara predicerar utfallet. I avhandlingen förslås en sådan Bayesiansk metod, givet den blockdiagonala matrisen och normalfördelade utfallsklasser. De i avhandlingen förslagna metodernas relevans och fördelar är visade genom att tillämpa dem på simulerade och riktiga högdimensionella data. High-dimensionality supervised classification classification accuracy sparse block-diagonal covariance structure graphical Lasso separation strength discretization
208	Towards a Spectral Theory for Simplicial Complexes Steenbergen, John Joseph January 2013 (has links) <p>In this dissertation we study combinatorial Hodge Laplacians on simplicial com-</p><p>plexes using tools generalized from spectral graph theory. Specifically, we consider</p><p>generalizations of graph Cheeger numbers and graph random walks. The results in</p><p>this dissertation can be thought of as the beginnings of a new spectral theory for</p><p>simplicial complexes and a new theory of high-dimensional expansion.</p><p>We first consider new high-dimensional isoperimetric constants. A new Cheeger-</p><p>type inequality is proved, under certain conditions, between an isoperimetric constant</p><p>and the smallest eigenvalue of the Laplacian in codimension 0. The proof is similar</p><p>to the proof of the Cheeger inequality for graphs. Furthermore, a negative result is</p><p>proved, using the new Cheeger-type inequality and special examples, showing that</p><p>certain Cheeger-type inequalities cannot hold in codimension 1.</p><p>Second, we consider new random walks with killing on the set of oriented sim-</p><p>plexes of a certain dimension. We show that there is a systematic way of relating</p><p>these walks to combinatorial Laplacians such that a certain notion of mixing time</p><p>is bounded by a spectral gap and such that distributions that are stationary in a</p><p>certain sense relate to the harmonics of the Laplacian. In addition, we consider the</p><p>possibility of using these new random walks for semi-supervised learning. An algo-</p><p>rithm is devised which generalizes a classic label-propagation algorithm on graphs to</p><p>simplicial complexes. This new algorithm applies to a new semi-supervised learning</p><p>problem, one in which the underlying structure to be learned is flow-like.</p> / Dissertation Mathematics Applied mathematics Theoretical mathematics Graph Expansion Isoperimetric Theory Random Walks Semi-Supervised Learning Simplicial Complexes
209	A Dual Pathway Approach for Solving the Spatial Credit Assignment Problem in a Biological Way Connor, Patrick 01 November 2013 (has links) To survive, many biological organisms need to accurately infer which features of their environment predict future rewards and punishments. In machine learning terms, this is the problem of spatial credit assignment, for which many supervised learning algorithms have been developed. In this thesis, I mainly propose that a dual-pathway, regression-like strategy and associated biological implementations may be used to solve this problem. Using David Marr's (1982) three-level philosophy of computational neuroscience, the thesis and its contributions are organized as follows: - Computational Level: Here, the spatial credit assignment problem is formally defined and modeled using probability density functions. The specific challenges of the problem faced by organisms and machine learning algorithms alike are also identified. - Algorithmic Level: I present and evaluate the novel hypothesis that the general strategy used by animals is to perform a regression over past experiences. I also introduce an extension of a probabilistic model for regression that substantially improves generalization without resorting to regularization. This approach subdues residual associations to irrelevant features, as does regularization. - Physical Level: Here, the neuroscience of classical conditioning and of the basal ganglia is briefly reviewed. Then, two novel models of the basal ganglia are put forward: 1) an online-learning model that supports the regression hypothesis and 2) a biological implementation of the probabilistic model previously introduced. Finally, we compare these models to others in the literature. In short, this thesis establishes a theoretical framework for studying the spatial credit assignment problem, offers a simple hypothesis for how biological systems solve it, and implements basal ganglia-based algorithms in support. The thesis brings to light novel approaches for machine learning and several explanations for biological structures and classical conditioning phenomena. / Note: While the thesis contains content from two articles (one journal, one conference), their publishers do not require special permission for their use in dissertations (information confirming this is in an appendix of the thesis itself). Computational Neuroscience Spatial Credit Assignment Supervised Learning Classical Conditioning Basal Ganglia
210	Urban Vegetation Mapping Using Remote Sensing Techniques : A Comparison of Methods Palm, Fredrik January 2015 (has links) The aim of this study is to compare remote sensing methods in the context of a vegetation mapping of an urban environment. The methods used was (1) a traditional per-pixel based method; maximum likelihood supervised classification (ENVI), (2) a standard object based method; example based feature extraction (ENVI) and (3) a newly developed method; Window Independent Contextual Segmentation (WICS) (Choros Cognition). A four-band SPOT5 image with a pixel size of 10x10m was used for the classifications. A validation data-set was created using a ortho corrected aerial image with a pixel size of 1x1m. Error matrices was created by cross-tabulating the classified images with the validation data-set. From the error matrices, overall accuracy and kappa coefficient was calculated. The object-based method performed best with a overall accuracy of 80% and a kappa value of 0.6, followed by the WICS method with an overall accuracy of 77% and a kappa value of 0.53, placing the supervised classification last with an overall accuracy of 71% and a kappa value of 0.38. The results of this study suggests object-based method and WICS to perform better than the supervised classification in an urban environment. Vegetation Mapping Supervised Classification Object Based Image Analysis

Search results