Global ETD Search

21	Fall Risk Classification for People with Lower Extremity Amputations Using Machine Learning and Smartphone Sensor Features from a 6-Minute Walk Test Daines, Kyle 04 September 2020 (has links) Falls are a leading cause of injury and accidental injury death worldwide. Fall-risk prevention techniques exist but fall-risk identification can be difficult. While clinical assessment tools are the standard for identifying fall risk, wearable-sensors and machine learning could improve outcomes with automated and efficient techniques. Machine learning research has focused on older adults. Since people with lower limb amputations have greater falling and injury risk than the elderly, research is needed to evaluate these approaches with the amputee population. In this thesis, random forest and fully connected feedforward artificial neural network (ANN) machine learning models were developed and optimized for fall-risk identification in amputee populations, using smartphone sensor data (phone at posterior pelvis) from 89 people with various levels of lower-limb amputation who completed a 6-minute walk test (6MWT). The best model was a random forest with 500 trees, using turn data and a feature set selected using correlation-based feature selection (81.3% accuracy, 57.2% sensitivity, 94.9% specificity, 0.59 Matthews correlation coefficient, 0.83 F1 score). After extensive ANN optimization with the best ranked 50 features from an Extra Trees Classifier, the best ANN model achieved 69.7% accuracy, 53.1% sensitivity, 78.9% specificity, 0.33 Matthews correlation coefficient, and 0.62 F1 score. Features from a single smartphone during a 6MWT can be used with random forest machine learning for fall-risk classification in lower limb amputees. Model performance was similarly effective or better than the Timed Up and Go and Four Square Step Test. This model could be used clinically to identify fall-risk individuals during a 6MWT, thereby finding people who were not intended for fall screening. Since model specificity was very high, the risk of accidentally misclassifying people who are a no fall-risk individual is quite low, and few people would incorrectly be entered into fall mitigation programs based on the test outcomes. Amputee Artificial Neural Network Random Forest Fall risk
22	Incorporating Sliding Window-Based Aggregation for Evaluating Topographic Variables in Geographic Information Systems Gomes, Rahul January 2019 (has links) The resolution of spatial data has increased over the past decade making them more accurate in depicting landform features. From using a 60m resolution Landsat imagery to resolution close to a meter provided by data from Unmanned Aerial Systems, the number of pixels per area has increased drastically. Topographic features derived from high resolution remote sensing is relevant to measuring agricultural yield. However, conventional algorithms in Geographic Information Systems (GIS) used for processing digital elevation models (DEM) have severe limitations. Typically, 3-by-3 window sizes are used for evaluating the slope, aspect and curvature. Since this window size is very small compared to the resolution of the DEM, they are mostly resampled to a lower resolution to match the size of typical topographic features and decrease processing overheads. This results in low accuracy and limits the predictive ability of any model using such DEM data. In this dissertation, the landform attributes were derived over multiple scales using the concept of sliding window-based aggregation. Using aggregates from previous iteration increases the efficiency from linear to logarithmic thereby addressing scalability issues. The usefulness of DEM-derived topographic features within Random Forest models that predict agricultural yield was examined. The model utilized these derived topographic features and achieved the highest accuracy of 95.31% in predicting Normalized Difference Vegetation Index (NDVI) compared to a 51.89% for window size 3-by-3 in the conventional method. The efficacy of partial dependence plots (PDP) in terms of interpretability was also assessed. This aggregation methodology could serve as a suitable replacement for conventional landform evaluation techniques which mostly rely on reducing the DEM data to a lower resolution prior to data processing. / National Science Foundation (Award OIA-1355466) DEM GIS NDVI partial dependence plots random forest sliding window
23	Detekce objektů na GPU / Object Detection on GPU Jurák, Martin January 2015 (has links) This thesis is focused on the acceleration of Random Forest object detection in an image. Random Forest detector is an ensemble of independently evaluated random decision trees. This feature can be used to acceleration on graphics unit. Development and increasing performance of graphics processing units allow the use of GPU for general-purpose computing (GPGPU). The goal of this thesis is describe how to implement Random Forest method on GPU with OpenCL standard.
24	Hledání anomálií v DNS provozu / Anomaly Detection in DNS Traffic Vraštiak, Pavel January 2012 (has links) This master thesis is written in collaboration with NIC.CZ company. It describes basic principles of DNS system and properties of DNS traffic. It's goal an implementation of DNS anomaly classifier and its evaluation in practice.
25	Métodos paramétricos e não paramétricos para a predição de valores genéticos genômicos de características de importância econômica em suínos / Joaquim, Letícia Borges. January 2019 (has links) Orientador: Danísio Prado Munari / Resumo: A seleção genômica tem sido usada em vários programas de melhoramento de plantas e animais proporcionando ganhos na acurácia de seleção quando comparado a seleção tradicional. Contudo, é importante ressaltar que a superioridade da seleção genômica depende de vários fatores tais como a metodologia de predição dos valores genéticos genômicos. Além do desafio de encontrar a melhor metodologia para a aplicação da seleção genômica, o alto custo de implementação também representa uma dificuldade na sua ampla utilização nos programas de melhoramento animal, principalmente em suínos e aves. Os objetivos deste trabalho foram: (i) avaliar a capacidade de predição de quatro diferentes métodos de seleção genômica para características reprodutivas e produtivas; (ii) utilizar o método “Random Forest” para selecionar conjuntos de marcadores SNPs mais relevantes na explicação dos fenótipos para as características estudadas e verificar o impacto do uso desses conjuntos de SNPs na acurácia dos valores genômicos preditos para duas características de importância econômica na suinocultura. No estudo foi utilizado um arquivo de pedigree com 879.965 animais distribuídos em 13 gerações e um arquivo de fenótipos composto por 73.439 observações de espessura de gordura (EG) e 69.505 registros de número de leitões nascidos vivos por leitegada (NV). Além disso, informações de 969 animais de uma linhagem fêmea tipo Landrace genotipados com um painel customizado com 57.692 marcadores distribuídos por todo ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Genomic selection has been used in several plant and animal breeding programs providing selection accuracy gains when compared to traditional genetic analysis. However, it is important to emphasize the superiority of genomic selection depends on several factors, such as the methodology for predicting genomic values. In addition to the challenge of finding the best model for application of genomic selection, the high cost of implementation also makes difficult the use of genomic selection in breeding programs of all animal species. Therefore, the aims of this study were: (i) evaluate the prediction ability of four different genomic selection methods using reproductive and productive traits; (ii) use Random Forest analysis to select the most important markers for studied traits and to verify the impact of using these subsets of SNPs on the accuracy of predicted genomic values for two economically important traits in pig production. The pedigree files contained 879,965 individuals, which spanned up to 13 generations and phenotype file composed of 73,439 backfat thickness observations and 69,505 litter records. A total of 969 animals of a Landrace female line were genotyped using a custom Illumina chip consisting of 57,692 SNPs distributed throughout the genome. Genomic selection was performed by applying the linear methods GBLUP (Genomic Estimated Breeding Value) and Single-step GBLUP and the nonlinear methods brnn (Bayesian Regularized Neural Network) and snnR (“Sparse Neural N... (Complete abstract click electronic access below) / Doutor Brnn Espessura de gordura Número de leitões nascidos vivos “Random Forest” SnnR
26	A Study on How Data Quality Influences Machine Learning Predictability and Interpretability for Tabular Data Ahsan, Humra 05 May 2022 (has links) No description available. Computer Science Machine Learning Categorical Data Random Forest Imputation
27	Learning Decision Trees and Random Forests from Histogram Data : An application to component failure prediction for heavy duty trucks Gurung, Ram Bahadur January 2017 (has links) A large volume of data has become commonplace in many domains these days. Machine learning algorithms can be trained to look for any useful hidden patterns in such data. Sometimes, these big data might need to be summarized to make them into a manageable size, for example by using histograms, for various reasons. Traditionally, machine learning algorithms can be trained on data expressed as real numbers and/or categories but not on a complex structure such as histogram. Since machine learning algorithms that can learn from data with histograms have not been explored to a major extent, this thesis intends to further explore this domain. This thesis has been limited to classification algorithms, tree-based classifiers such as decision trees, and random forest in particular. Decision trees are one of the simplest and most intuitive algorithms to train. A single decision tree might not be the best algorithm in term of its predictive performance, but it can be largely enhanced by considering an ensemble of many diverse trees as a random forest. This is the reason why both algorithms were considered. So, the objective of this thesis is to investigate how one can adapt these algorithms to make them learn better on histogram data. Our proposed approach considers the use of multiple bins of a histogram simultaneously to split a node during the tree induction process. Treating bins simultaneously is expected to capture dependencies among them, which could be useful. Experimental evaluation of the proposed approaches was carried out by comparing them with the standard approach of growing a tree where a single bin is used to split a node. Accuracy and the area under the receiver operating characteristic (ROC) curve (AUC) metrics along with the average time taken to train a model were used for comparison. For experimental purposes, real-world data from a large fleet of heavy duty trucks were used to build a component-failure prediction model. These data contain information about the operation of trucks over the years, where most operational features are summarized as histograms. Experiments were performed further on the synthetically generated dataset. From the results of the experiments, it was observed that the proposed approach outperforms the standard approach in performance and compactness of the model but lags behind in terms of training time. This thesis was motivated by a real-life problem encountered in the operation of heavy duty trucks in the automotive industry while building a data driven failure-prediction model. So, all the details about collecting and cleansing the data and the challenges encountered while making the data ready for training the algorithm have been presented in detail. histogram decision trees histogram random forest prognostics Computer Systems Datorsystem
28	The Factors Affecting Wind Erosion in Southern Utah Ozturk, Mehmet 01 August 2019 (has links) Wind erosion is a global issue and affecting millions of people in drylands by causing environmental issues (acceleration of snow melting), public health concerns (respiratory diseases), and socioeconomic problems (costs of damages and cleaning public properties after dust storms). Disturbances in drylands can be irreversible, thus leading to natural disasters such as the 1930s Dust Bowl. With increasing attention on aeolian studies, many studies have been conducted using ground-based measurements or wind tunnel studies. Ground-based measurements are important for validating model predictions and testing the effect and interactions of different factors known to affect wind erosion. Here, a machine-learning model (random forest) was used to describe sediment flux as a function of wind speed, soil moisture, precipitation, soil roughness, soil crusts, and soil texture. Model performance was compared to previous results before analyzing four new years of sediment flux data and including estimates of soil moisture to the model. The random forest model provided a better result than a regression tree with a higher variance explained (7.5% improvement). With additional soil moisture data, the model performance increased by 13.13%. With full dataset, the model provided an increase of 30.50% in total performance compared to the previous study. This research was one of the rare studies which represented a large-scale network of BSNEs and a long time series of data to quantify seasonal sediment flux under different soil covers in southern Utah. The results will also be helpful to the managers for controlling the effects on wind erosion, scientists to choose variables for further modeling or local people to increase the public awareness about the effects of wind erosion. wind erosion sediment flux bsne random forest model Environmental Sciences
29	Transfer learning for medication adherence prediction from social forums self-reported data Haas, Kyle D. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Medication non-adherence and non-compliance left unaddressed can compound into severe medical problems for patients. Identifying patients that are likely to become non-adherent can help reduce these problems. Despite these benefits, monitoring adherence at scale is cost-prohibitive. Social forums offer an easily accessible, affordable, and timely alternative to the traditional methods based on claims data. This study investigates the potential of medication adherence prediction based on social forum data for diabetes and fibromyalgia therapies by using transfer learning from the Medical Expenditure Panel Survey (MEPS). Predictive adherence models are developed by using both survey and social forums data and different random forest (RF) techniques. The first of these implementations uses binned inputs from k-means clustering. The second technique is based on ternary trees instead of the widely used binary decision trees. These techniques are able to handle missing data, a prevalent characteristic of social forums data. The results of this study show that transfer learning between survey models and social forum models is possible. Using MEPS survey data and the techniques listed above to derive RF models, less than 5% difference in accuracy was observed between the MEPS test dataset and the social forum test dataset. Along with these RF techniques, another RF implementation with imputed means for the missing values was developed and shown to predict adherence for social forum patients with an accuracy >70%. This thesis shows that a model trained with verified survey data can be used to complement traditional medical adherence models by predicting adherence from unverified, self-reported data in a dynamic and timely manner. Furthermore, this model provides a method for discovering objective insights from subjective social reports. Additional investigation is needed to improve the prediction accuracy of the proposed model and to assess biases that may be inherent to self-reported adherence measures in social health networks. MEPS Medication adherence Social forum Random forest Transfer learning
30	Evaluace geografickeho Random Forest algoritmu v posouzení sucha / Geographical Random Forest model evaluation in agricultural drought assessment Bicák, Daniel January 2021 (has links) Drought is a natural disaster, which negatively affects millions of people and causes huge economic losses. This thesis investigates agricultural drought in Czechia using machine learning algorithms. The statistical models utilised were Random Forest (RF), Geographical Random Forest (GRF) and Locally Tuned Geographical Random Forest (LT GRF). GRF consists of several RF models trained on a subset of original data. The final prediction is a weighted sum of the prediction of a local and global model. The size of the subset is determined by the tunable parameter. LT GRF addresses spatial variability of subset size and local weight. During the tuning process, optimal parameters are found for every location and then interpolated for unknown regions. The thesis aims to evaluate the performance of each model and compare GRF feature importance output with the global model. The best model features meteorological impor- tances are used to create a drought vulnerability map of Czechia. Produced assessment is compared to existing drought vulnerability projects. 1

Search results