Global ETD Search

21	REMOTE SENSING BASED DETECTION OF FORESTED WETLANDS: AN EVALUATION OF LIDAR, AERIAL IMAGERY, AND THEIR DATA FUSION Suiter, Ashley E. 01 May 2015 (has links) Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A - All Imagery, Classification B - All LiDAR, Classification C - LiDAR without Intensity, and Classification D - Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B. Accuracy Assessment Aerial Imagery Data Fusion LiDAR Random Forest Wetlands
22	Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest January 2017 (has links) abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval. Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships. Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets. Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2017 Biostatistics feature selection prediction interval predictive modeling random forest
23	Can methods of machine learning be used to betterpredict lactation curves for bovines? Östling, Andreas January 2017 (has links) A random forest is compared to an OLS model for predicting lactation curves for cows.Both of the methods have been estimated and tested using data from the period 2015-01 to2015-09. Random forests outperform OLS in testing for modeling lactation curves with adecrease in MSE by approximately 26%. Data is provided by Sveriges Lantrbruksuniversitetand includes 75 558 milking events from 320 cows. The date of the milking, the time ofday when the milking occurred as well as which cow was milked were found to be importantvariables for accurate predictions. random forest lactation curves Probability Theory and Statistics Sannolikhetsteori och statistik
24	Detección de anomalías en componentes mecánicos en base a Deep Learning y Random Cut Forests Aichele Figueroa, Diego Andrés January 2019 (has links) Memoria para optar al título de Ingeniero Civil Mecánico / Dentro del área de mantenimiento, el monitorear un equipo puede ser de gran utilidad ya que permite advertir cualquier anomalía en el funcionamiento interno de éste, y así, se puede corregir cualquier desperfecto antes de que se produzca una falla de mayor gravedad. En data mining, detección de anomalías es el ejercicio de identificar elementos anómalos, es decir, aquellos elementos que difieren a lo común dentro de un set de datos. Detección de anomalías tiene aplicación en diferentes dominios, por ejemplo, hoy en día se utiliza en bancos para detectar compras fraudulentas y posibles estafas a través de un patrón de comportamiento del usuario, por ese motivo se necesitan abarcar grandes cantidades de datos por lo que su desarrollo en aprendizajes de máquinas probabilísticas es imprescindible. Cabe destacar que se ha desarrollado una variedad de algoritmos para encontrar anomalías, una de las más famosas es el Isolated Forest dentro de los árboles de decisión. Del algoritmo de Isolated Forest han derivado distintos trabajos que proponen mejoras para éste, como es el Robust Random Cut Forest el cual, por un lado permite mejorar la precisión para buscar anomalías y, también, entrega la ventaja de poder realizar un estudio dinámico de datos y buscar anomalías en tiempo real. Por otro lado, presenta la desventaja de que entre más atributos contengan los sets de datos más tiempo de cómputo tendrá para detectar una anomalía. Por ende, se utilizará un método de reducción de atributos, también conocido como reducción de dimensión, por último se estudiará como afectan tanto en efectividad y eficiencia al algoritmo sin reducir la dimensión de los datos. En esta memoria se analiza el algoritmo Robust Random Cut Forest para finalmente entregar una posible mejora a éste. Para poner en prueba el algoritmo se realiza un experimento de barras de acero, donde se obtienen como resultado sus vibraciones al ser excitado por un ruido blanco. Estos datos se procesan en tres escenarios distintos: Sin reducción de dimensiones, análisis de componentes principales(principal component analysis) y autoencoder. En base a esto, el primer escenario (sin reducción de dimensiones) servirá para establecer un punto de orientación, para ver como varían el escenario dos y tres en la detección de anomalía, en efectividad y eficiencia. %partida para detección de anomalía, luego se ver si esta mejora Luego, se realiza el estudio en el marco de tres escenarios para detectar puntos anómalos; En los resultados se observa una mejora al reducir las dimensiones en cuanto a tiempo de cómputo (eficiencia) y en precisión (efectividad) para encontrar una anomalía, finalmente los mejores resultados son con análisis de componentes principales (principal component analysis). Localización de fallas (Ingeniería) Máquina - Partes - Fallas Random Forest Deep learning
25	Fall Risk Classification for People with Lower Extremity Amputations Using Machine Learning and Smartphone Sensor Features from a 6-Minute Walk Test Daines, Kyle 04 September 2020 (has links) Falls are a leading cause of injury and accidental injury death worldwide. Fall-risk prevention techniques exist but fall-risk identification can be difficult. While clinical assessment tools are the standard for identifying fall risk, wearable-sensors and machine learning could improve outcomes with automated and efficient techniques. Machine learning research has focused on older adults. Since people with lower limb amputations have greater falling and injury risk than the elderly, research is needed to evaluate these approaches with the amputee population. In this thesis, random forest and fully connected feedforward artificial neural network (ANN) machine learning models were developed and optimized for fall-risk identification in amputee populations, using smartphone sensor data (phone at posterior pelvis) from 89 people with various levels of lower-limb amputation who completed a 6-minute walk test (6MWT). The best model was a random forest with 500 trees, using turn data and a feature set selected using correlation-based feature selection (81.3% accuracy, 57.2% sensitivity, 94.9% specificity, 0.59 Matthews correlation coefficient, 0.83 F1 score). After extensive ANN optimization with the best ranked 50 features from an Extra Trees Classifier, the best ANN model achieved 69.7% accuracy, 53.1% sensitivity, 78.9% specificity, 0.33 Matthews correlation coefficient, and 0.62 F1 score. Features from a single smartphone during a 6MWT can be used with random forest machine learning for fall-risk classification in lower limb amputees. Model performance was similarly effective or better than the Timed Up and Go and Four Square Step Test. This model could be used clinically to identify fall-risk individuals during a 6MWT, thereby finding people who were not intended for fall screening. Since model specificity was very high, the risk of accidentally misclassifying people who are a no fall-risk individual is quite low, and few people would incorrectly be entered into fall mitigation programs based on the test outcomes. Amputee Artificial Neural Network Random Forest Fall risk
26	Incorporating Sliding Window-Based Aggregation for Evaluating Topographic Variables in Geographic Information Systems Gomes, Rahul January 2019 (has links) The resolution of spatial data has increased over the past decade making them more accurate in depicting landform features. From using a 60m resolution Landsat imagery to resolution close to a meter provided by data from Unmanned Aerial Systems, the number of pixels per area has increased drastically. Topographic features derived from high resolution remote sensing is relevant to measuring agricultural yield. However, conventional algorithms in Geographic Information Systems (GIS) used for processing digital elevation models (DEM) have severe limitations. Typically, 3-by-3 window sizes are used for evaluating the slope, aspect and curvature. Since this window size is very small compared to the resolution of the DEM, they are mostly resampled to a lower resolution to match the size of typical topographic features and decrease processing overheads. This results in low accuracy and limits the predictive ability of any model using such DEM data. In this dissertation, the landform attributes were derived over multiple scales using the concept of sliding window-based aggregation. Using aggregates from previous iteration increases the efficiency from linear to logarithmic thereby addressing scalability issues. The usefulness of DEM-derived topographic features within Random Forest models that predict agricultural yield was examined. The model utilized these derived topographic features and achieved the highest accuracy of 95.31% in predicting Normalized Difference Vegetation Index (NDVI) compared to a 51.89% for window size 3-by-3 in the conventional method. The efficacy of partial dependence plots (PDP) in terms of interpretability was also assessed. This aggregation methodology could serve as a suitable replacement for conventional landform evaluation techniques which mostly rely on reducing the DEM data to a lower resolution prior to data processing. / National Science Foundation (Award OIA-1355466) DEM GIS NDVI partial dependence plots random forest sliding window
27	Detekce objektů na GPU / Object Detection on GPU Jurák, Martin January 2015 (has links) This thesis is focused on the acceleration of Random Forest object detection in an image. Random Forest detector is an ensemble of independently evaluated random decision trees. This feature can be used to acceleration on graphics unit. Development and increasing performance of graphics processing units allow the use of GPU for general-purpose computing (GPGPU). The goal of this thesis is describe how to implement Random Forest method on GPU with OpenCL standard.
28	Hledání anomálií v DNS provozu / Anomaly Detection in DNS Traffic Vraštiak, Pavel January 2012 (has links) This master thesis is written in collaboration with NIC.CZ company. It describes basic principles of DNS system and properties of DNS traffic. It's goal an implementation of DNS anomaly classifier and its evaluation in practice.
29	Métodos paramétricos e não paramétricos para a predição de valores genéticos genômicos de características de importância econômica em suínos / Joaquim, Letícia Borges. January 2019 (has links) Orientador: Danísio Prado Munari / Resumo: A seleção genômica tem sido usada em vários programas de melhoramento de plantas e animais proporcionando ganhos na acurácia de seleção quando comparado a seleção tradicional. Contudo, é importante ressaltar que a superioridade da seleção genômica depende de vários fatores tais como a metodologia de predição dos valores genéticos genômicos. Além do desafio de encontrar a melhor metodologia para a aplicação da seleção genômica, o alto custo de implementação também representa uma dificuldade na sua ampla utilização nos programas de melhoramento animal, principalmente em suínos e aves. Os objetivos deste trabalho foram: (i) avaliar a capacidade de predição de quatro diferentes métodos de seleção genômica para características reprodutivas e produtivas; (ii) utilizar o método “Random Forest” para selecionar conjuntos de marcadores SNPs mais relevantes na explicação dos fenótipos para as características estudadas e verificar o impacto do uso desses conjuntos de SNPs na acurácia dos valores genômicos preditos para duas características de importância econômica na suinocultura. No estudo foi utilizado um arquivo de pedigree com 879.965 animais distribuídos em 13 gerações e um arquivo de fenótipos composto por 73.439 observações de espessura de gordura (EG) e 69.505 registros de número de leitões nascidos vivos por leitegada (NV). Além disso, informações de 969 animais de uma linhagem fêmea tipo Landrace genotipados com um painel customizado com 57.692 marcadores distribuídos por todo ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Genomic selection has been used in several plant and animal breeding programs providing selection accuracy gains when compared to traditional genetic analysis. However, it is important to emphasize the superiority of genomic selection depends on several factors, such as the methodology for predicting genomic values. In addition to the challenge of finding the best model for application of genomic selection, the high cost of implementation also makes difficult the use of genomic selection in breeding programs of all animal species. Therefore, the aims of this study were: (i) evaluate the prediction ability of four different genomic selection methods using reproductive and productive traits; (ii) use Random Forest analysis to select the most important markers for studied traits and to verify the impact of using these subsets of SNPs on the accuracy of predicted genomic values for two economically important traits in pig production. The pedigree files contained 879,965 individuals, which spanned up to 13 generations and phenotype file composed of 73,439 backfat thickness observations and 69,505 litter records. A total of 969 animals of a Landrace female line were genotyped using a custom Illumina chip consisting of 57,692 SNPs distributed throughout the genome. Genomic selection was performed by applying the linear methods GBLUP (Genomic Estimated Breeding Value) and Single-step GBLUP and the nonlinear methods brnn (Bayesian Regularized Neural Network) and snnR (“Sparse Neural N... (Complete abstract click electronic access below) / Doutor Brnn Espessura de gordura Número de leitões nascidos vivos “Random Forest” SnnR
30	A Study on How Data Quality Influences Machine Learning Predictability and Interpretability for Tabular Data Ahsan, Humra 05 May 2022 (has links) No description available. Computer Science Machine Learning Categorical Data Random Forest Imputation

Search results