Global ETD Search

31	Random Forests Applied as a Soil Spatial Predictive Model in Arid Utah Stum, Alexander Knell 01 May 2010 (has links) Initial soil surveys are incomplete for large tracts of public land in the western USA. Digital soil mapping offers a quantitative approach as an alternative to traditional soil mapping. I sought to predict soil classes across an arid to semiarid watershed of western Utah by applying random forests (RF) and using environmental covariates derived from Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and digital elevation models (DEM). Random forests are similar to classification and regression trees (CART). However, RF is doubly random. Many (e.g., 500) weak trees are grown (trained) independently because each tree is trained with a new randomly selected bootstrap sample, and a random subset of variables is used to split each node. To train and validate the RF trees, 561 soil descriptions were made in the field. An additional 111 points were added by case-based reasoning using aerial photo interpretation. As RF makes classification decisions from the mode of many independently grown trees, model uncertainty can be derived. The overall out of the bag (OOB) error was lower without weighting of classes; weighting increased the overall OOB error and the resulting output did not reflect soil-landscape relationships observed in the field. The final RF model had an OOB error of 55.2% and predicted soils on landforms consistent with soil-landscape relationships. The OOB error for individual classes typically decreased with increasing class size. In addition to the final classification, I determined the second and third most likely classification, model confidence, and the hypothetical extent of individual classes. Pixels that had high possibility of belonging to multiple soil classes were aggregated using a minimum confidence value based on limiting soil features, which is an effective and objective method of determining membership in soil map unit associations and complexes mapped at the 1:24,000 scale. Variables derived from both DEM and Landsat 7 ETM+ sources were important for predicting soil classes based on Gini and standard measures of variable importance and OOB errors from groves grown with exclusively DEM- or Landsat-derived data. Random forests was a powerful predictor of soil classes and produced outputs that facilitated further understanding of soil-landscape relationships. Beaver County Landsat 7 model confidence random forests soil survey agriculture Geographic Information Sciences Soil Science
32	Fire Environment Analysis at Army Garrison Camp Williams in Relation to Fire Behavior Potential for Gauging Fuel Modification Needs Frost, Scott M. 01 May 2015 (has links) Large fires (400 ha +) occur about every seven to ten years in the vegetation types located at US Army Garrison Camp Williams (AGCW) practice range located near South Jordan, Utah. In 2010 and 2012, wildfires burned beyond the Camp’s boundaries into the wildland-urban interface. The political and public reaction to these fire escapes was intense. Researchers at Utah State University were asked to organize a system of fuel treatments that could be developed to prevent future escapes. The first step of evaluation was to spatially predict fuel model types derived from a random forests classification approach. Fuel types were mapped according to fire behavior fuel models with an overall validation of 72.3% at 0.5 m resolution. Next, using a combination of empirical and semi-empirical based methods, potential fire behavior was analyzed for the dominant vegetation types at AGCW on a climatological basis. Results suggest the need for removal of woody vegetation within 20 m of firebreaks and a minimum firebreak width of 8 m in grassland fuels. In Utah juniper (Juniperus osteosperma (Torr.) Little), results suggest canopy coverage of 25% or less while in Gambel oak (Quercus gambelii Nutt.) stands along the northern boundary of the installation, a fuelbreak width of 60 m for secondary breaks and 90 m for primary breaks is recommended. firebreak fire environment fuelbreak fuel model random forests Sage-Steppe Forest Sciences
33	Ensembles for Distributed Data Shoemaker, Larry 21 October 2005 (has links) Many simulation data sets are so massive that they must be distributed among disk farms attached to different computing nodes. The data is partitioned into spatially disjoint sets that are not easily transferable among nodes due to bandwidth limitations. Conventional machine learning methods are not designed for this type of data distribution. Experts mark a training data set with different levels of saliency emphasizing speed rather than accuracy due to the size of the task. The challenge is to develop machine learning methods that learn how the expert has marked the training data so that similar test data sets can be marked more efficiently. Ensembles of machine learning classifiers are typically more accurate than individual classifiers. An ensemble of machine learning classifiers requires substantially less memory than the corresponding partition of the data set. This allows the transfer of ensembles among partitions. If all the ensembles are sent to each partition, they can vote for a level of saliency for each example in the partition. Different partitions of the data set may not have any salient points, especially if the data set has a time step dimension. This means the learned classifier for such partitions can not vote for saliency since they have not been trained to recognize it. In this work, we investigate the performance of different ensembles of classifiers on spatially partitioned data sets. Success is measured by the correct recognition of unknown and salient regions of data points. Random forests Nearest centroid Exodus ParaView Region labeling American Studies Arts and Humanities
34	Breeding Ecology Of The Egyptian Vulture (neophron Percnopterus) Population In Beypazari Sen, Bilgecan 01 December 2012 (has links) (PDF) The aim of this study was to determine the habitat features affecting nest site selection and breeding success of the endangered Egyptian Vultures (Neophron percnopterus) breeding around the town of Beypazari. We searched and monitored nest sites in the study area (750 km2) for the years 2010 and 2011. The differences in terms of habitat features between nest sites and random points distributed along cliffs, and between successful and failed nest sites were investigated using both parametric approaches and machine learning methods with 21 habitat variables. The size of the Beypazari population of Egyptian Vultures was estimated to be 45 pairs. Seventeen nests in 2010 and 37 nests in 2011 were found and monitored. The breeding success of the population was estimated to be 100% in 2010 and 70% in 2011. Random Forests was the modeling technique with the highest accuracy and the modeling process chose 6 and 4 variables affecting nest site selection and breeding success of the species, respectively. Results showed that human impact was a potential factor governing the distribution of nest sites in the area and increased the probability of breeding failure as vultures clearly preferred to nest away from nearby villages, towns and roads, and nests on lower cliffs and nests that are close to the dump site (therefore the town center) was prone to failure. Utilization of elevation gradient and aspect showed trends similar to other populations of the species, with probability of nesting increasing at lower altitudes and for south facing cliffs. The overall results emphasize the potential conflict between human presence and the population of Egyptian Vultures in the area. Continuous monitoring of the nest sites and conservation activities towards raising public awareness are advised. QH General and Animal Ecology 540-549.5
35	Texplore : temporal difference reinforcement learning for robots and time-constrained domains / Temporal difference reinforcement learning for robots and time-constrained domains Hester, Todd 30 January 2013 (has links) Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This dissertation identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuous state features; 3) it must handle sensor and/or actuator delays; and 4) it should continually select actions in real time. This dissertation focuses on addressing all four of these challenges. In particular, this dissertation is focused on time-constrained domains where the first challenge is critically important. In these domains, the agent's lifetime is not long enough for it to explore the domain thoroughly, and it must learn in very few samples. Although existing RL algorithms successfully address one or more of the RL for Robotics Challenges, no prior algorithm addresses all four of them. To fill this gap, this dissertation introduces TEXPLORE, the first algorithm to address all four challenges. TEXPLORE is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. Each tree in the random forest model represents a hypothesis of the domain's true dynamics, and the agent uses these hypotheses to explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real time whenever necessary. We empirically evaluate each component of TEXPLORE in comparison with other state-of-the-art approaches. In addition, we present modifications of TEXPLORE's exploration mechanism for different types of domains. The key result of this dissertation is a demonstration of TEXPLORE learning to control the velocity of an autonomous vehicle on-line, in real time, while running on-board the robot. After controlling the vehicle for only two minutes, TEXPLORE is able to learn to move the pedals of the vehicle to drive at the desired velocities. The work presented in this dissertation represents an important step towards applying RL to robotics and enabling robots to perform more tasks in society. By enabling robots to learn in few actions while acting on-line in real time on robots with continuous state and actuator delays, TEXPLORE significantly broadens the applicability of RL to robots. / text Reinforcement learning Robotics Machine learning Artificial intelligence Markov Decision Processes Random forests
36	On pruning and feature engineering in Random Forests Fawagreh, Khaled January 2016 (has links) Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, demonstrating the potential of the proposed methods. Performance-wise, CLUB-DRF is ranked first in terms of accuracy and classifcation speed making it ideal for real-time applications, and for machines/devices with limited memory and processing power. 004
37	MAPPING RIPARIAN BUFFER ZONES IN CYPRESS CREEK REFUGE, ILLINOIS: LAND USE CHANGE IMPACT ON HABITAT USAGE FROM 1984-2014: PASSERINE PRESENCE AND CLASSIFICATION COMPARISONS Burck, Michael Theodore 01 December 2017 (has links) In response to recent declines, forested riparian wetland areas have become an increased conservation and management area of concern focusing on increasing biodiversity and promoting healthy ecosystem services. Additionally, passerine birds have also experienced a sharp global decline in that associated habitat. To mitigate further declines of both habitat and species numbers government programs and agencies have intensified conservation efforts. However, the practices employed are often assumed to be beneficial without conducting dedicated surveys to measure efficacy and practicality of current approaches. As such, visual evidence and statistics are often needed to promote or validate further support and funding for continuing with current polices or creating new focal areas and practices. This study strives to provide an inexpensive, efficient way to assess conservation areas based on a target species through a generalized and adaptive methodology. The Cypress Creek National Wildlife Refuge in southern Illinois provides an opportunity to do just that with a focus on songbirds. The methodology outlined in this study implements multiple remote sensing land use and land cover classification techniques utilizing Landsat imagery from 1984 to 2014 to create a temporal analysis of the region from pre-refuge era to current refuge designated era. Field surveys from the 2015 songbird summer breeding and fall migration seasons as well as vegetation surveys for field-truthing supplement the remote sensing results. The classification methodology incudes a combination of pan-sharpening Landsat images to a 15 m x 15 m spatial resolution, texture analysis, object based image analysis, and Random Forests to produce land use and land cover maps. For the sake of comparison the same classification process is performed with the untransformed, source images at 30 m x 30 m spatial resolution. Landscape metrics such as the interspersion and juxtaposition index and the contiguity index also provide further insight to temporal landscape patterns. At the completion of the study it was found that there was a minimal difference between the overall classification accuracy of transformed and untransformed images and that lowest overall accuracy in the study was 91% while the highest was 98%. The key survey statistics concluded that during the summer and fall observation periods songbirds in forested wetland areas had a propensity to utilize areas closest to the wetland edge as opposed to inland areas. Furthermore, during fall migration it was concluded that the mixed forest habitat type had a direct effect on observation numbers. Overall, with the aid of multiple landscape metrics, it was shown that the region was increasing in forested area, patch density, and contiguity; in response the passerines were using the area at a high rate, especially near wetland edges creating a sustainable focal area for conservation and management. The methodology and results in this study contribute to an ongoing effort to provide visual and statistical evidence that is reliable and accessible for policy making. The potential to manipulate the generalized methods used in this study to enhance any land use and land class classifications and apply to any targeted species certainly exists. Future studies will want to investigate the use of higher spatial resolution images or actively take reflectance recordings in the field and supplement the temporal maps with a multi-year dedicated species dataset for maximum benefit. object image analysis passerine random forests riparian temporal scale factors texture analysis
38	Spatio-Temporal Data Mining to Detect Changes and Clusters in Trajectories January 2012 (has links) abstract: With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories. / Dissertation/Thesis / Ph.D. Industrial Engineering 2012 Industrial engineering Computer science Change Detection Clustering Hidden Markov Models Outlier Detection Random Forests Trajectories
39	New Insights into Decision Trees Ensembles / Nouveaux apports dans l'apprentissage par ensembles d'arbres Pisetta, Vincent 28 March 2012 (has links) Les ensembles d’arbres constituent à l’heure actuelle l’une des méthodes d’apprentissage statistique les plus performantes. Toutefois, leurs propriétés théoriques, ainsi que leurs performances empiriques restent sujettes à de nombreuses questions. Nous proposons dans cette thèse d’apporter un nouvel éclairage à ces méthodes. Plus particulièrement, après avoir évoqué les aspects théoriques actuels (chapitre 1) de trois schémas ensemblistes principaux (Forêts aléatoires, Boosting et Discrimination Stochastique), nous proposerons une analyse tendant vers l’existence d’un point commun au bien fondé de ces trois principes (chapitre 2). Ce principe tient compte de l’importance des deux premiers moments de la marge dans l’obtention d’un ensemble ayant de bonnes performances. De là, nous en déduisons un nouvel algorithme baptisé OSS (Oriented Sub-Sampling) dont les étapes sont en plein accord et découlent logiquement du cadre que nous introduisons. Les performances d’OSS sont empiriquement supérieures à celles d’algorithmes en vogue comme les Forêts aléatoires et AdaBoost. Dans un troisième volet (chapitre 3), nous analysons la méthode des Forêts aléatoires en adoptant un point de vue « noyau ». Ce dernier permet d’améliorer la compréhension des forêts avec, en particulier la compréhension et l’observation du mécanisme de régularisation de ces techniques. Le fait d’adopter un point de vue noyau permet d’améliorer les Forêts aléatoires via des méthodes populaires de post-traitement comme les SVM ou l’apprentissage de noyaux multiples. Ceux-ci démontrent des performances nettement supérieures à l’algorithme de base, et permettent également de réaliser un élagage de l’ensemble en ne conservant qu’une petite partie des classifieurs le composant. / Decision trees ensembles are among the most popular tools in machine learning. Nevertheless, their theoretical properties as well as their empirical performances are subject to strong investigation up to date. In this thesis, we propose to shed light on these methods. More precisely, after having described the current theoretical aspects of three main ensemble schemes (chapter 1), we give an analysis supporting the existence of common reasons to the success of these three principles (chapter 2). This last takes into account the two first moments of the margin as an essential ingredient to obtain strong learning abilities. Starting from this rejoinder, we propose a new ensemble algorithm called OSS (Oriented Sub-Sampling) whose steps are in perfect accordance with the point of view we introduce. The empirical performances of OSS are superior to the ones of currently popular algorithms such as Random Forests and AdaBoost. In a third chapter (chapter 3), we analyze Random Forests adopting a “kernel” point of view. This last allows us to understand and observe the underlying regularization mechanism of these kinds of methods. Adopting the kernel point of view also enables us to improve the predictive performance of Random Forests using popular post-processing techniques such as SVM and multiple kernel learning. In conjunction with random Forests, they show greatly improved performances and are able to realize a pruning of the ensemble by conserving only a small fraction of the initial base learners. Méthodes ensemblistes Boosting Forêts aléatoires Discrimination Stochastique Ensemble methods Boosting Random Forests Stochastic Discrimination
40	Random Forests for CUDA GPUs Lapajne, Mikael Hellborg, Slat, Daniel January 2010 (has links) Context. Machine Learning is a complex and resource consuming process that requires a lot of computing power. With the constant growth of information, the need for efficient algorithms with high performance is increasing. Today's commodity graphics cards are parallel multi processors with high computing capacity at an attractive price and are usually pre-installed in new PCs. The graphics cards provide an additional resource to be used in machine learning applications. The Random Forest learning algorithm which has been showed competitive within machine learning has a good potential for performance increase through parallelization of the algorithm. Objectives. In this study we implement and review a revised Random Forest algorithm for GPU execution using CUDA. Methods. A review of previous work in the area has been done by studying articles from several sources, including Compendex, Inspec, IEEE Xplore, ACM Digital Library and Springer Link. Additional information regarding GPU architecture and implementation specific details have been obtained mainly from documentation available from Nvidia and the Nvidia developer forums. The implemented algorithm has been benchmarked and compared with two state-of-the-art CPU implementations of the Random Forest algorithm, both regarding consumed time for training and classification and for classification accuracy. Results. Measurements from benchmarks made on the three different algorithms are gathered showing the performance results of the algorithms for two publicly available data sets. Conclusion. We conclude that our implementation under the right conditions is able to outperform its competitors. We also conclude that this is only true for certain data sets depending on the size of the data sets. Moreover we conclude that there is potential for further improvements of the algorithm both regarding performance as well as adaption towards a wider range of real world applications. / Mikael: +46768539263, Daniel: +46703040693 CUDA Random forests Parallel computing Graphics processing units Software Engineering Programvaruteknik

Search results