Global ETD Search

31	Learning Decision Trees and Random Forests from Histogram Data : An application to component failure prediction for heavy duty trucks Gurung, Ram Bahadur January 2017 (has links) A large volume of data has become commonplace in many domains these days. Machine learning algorithms can be trained to look for any useful hidden patterns in such data. Sometimes, these big data might need to be summarized to make them into a manageable size, for example by using histograms, for various reasons. Traditionally, machine learning algorithms can be trained on data expressed as real numbers and/or categories but not on a complex structure such as histogram. Since machine learning algorithms that can learn from data with histograms have not been explored to a major extent, this thesis intends to further explore this domain. This thesis has been limited to classification algorithms, tree-based classifiers such as decision trees, and random forest in particular. Decision trees are one of the simplest and most intuitive algorithms to train. A single decision tree might not be the best algorithm in term of its predictive performance, but it can be largely enhanced by considering an ensemble of many diverse trees as a random forest. This is the reason why both algorithms were considered. So, the objective of this thesis is to investigate how one can adapt these algorithms to make them learn better on histogram data. Our proposed approach considers the use of multiple bins of a histogram simultaneously to split a node during the tree induction process. Treating bins simultaneously is expected to capture dependencies among them, which could be useful. Experimental evaluation of the proposed approaches was carried out by comparing them with the standard approach of growing a tree where a single bin is used to split a node. Accuracy and the area under the receiver operating characteristic (ROC) curve (AUC) metrics along with the average time taken to train a model were used for comparison. For experimental purposes, real-world data from a large fleet of heavy duty trucks were used to build a component-failure prediction model. These data contain information about the operation of trucks over the years, where most operational features are summarized as histograms. Experiments were performed further on the synthetically generated dataset. From the results of the experiments, it was observed that the proposed approach outperforms the standard approach in performance and compactness of the model but lags behind in terms of training time. This thesis was motivated by a real-life problem encountered in the operation of heavy duty trucks in the automotive industry while building a data driven failure-prediction model. So, all the details about collecting and cleansing the data and the challenges encountered while making the data ready for training the algorithm have been presented in detail. histogram decision trees histogram random forest prognostics Computer Systems Datorsystem
32	Incorporating Climate Sensitivity for Southern Pine Species into the Forest Vegetation Simulator Shockey, Melissa Dawn 08 May 2013 (has links) Growing concerns over the possible effects of greenhouse-gas-related global warming on North American forests have led to increasing calls to address climate change effects on forest vegetation in management and planning applications. The objectives of this project are to model contemporary conditions of soils and climate associated with the presence or absence and abundance of five southern pine species: shortleaf pine (Pinus echinata Mill.), slash pine (P. elliottii Engelm.), longleaf pine (P. palustris Mill.), pond pine (P. serótina Michx.), and loblolly pine (P. taeda L.). Classification and regression based Random Forest models were developed for presence-absence and abundance data, respectively. Model and diagnostics such as receiver operating curves (ROC) and variable importance plots were examined to assess model performance. Presence-absence classification models had out-of-bag error rates ranging from 6.32% to 16.06%, and areas under ROC curves ranging from 0.92-0.98. Regression models explained between 13.76% and 43.31% of variation in abundance values. Using the models based on contemporary data, predictions were made for the future years 2030, 2060, and 2090 using four different greenhouse gas emissions scenarios and three different general circulation models. Maps of future climate scenarios showed a range of potential changes in the geographic extent of the conditions consistent with current presence observations. Results of this work will be incorporated into eastern U.S. variants of the Forest Vegetation Simulator (FVS) model, similar to work that has been done for FVS variants in the West. / Master of Science Random Forest Classification Abundance Climate-Soils-Vegetation modeling Climate Change
33	The Factors Affecting Wind Erosion in Southern Utah Ozturk, Mehmet 01 August 2019 (has links) Wind erosion is a global issue and affecting millions of people in drylands by causing environmental issues (acceleration of snow melting), public health concerns (respiratory diseases), and socioeconomic problems (costs of damages and cleaning public properties after dust storms). Disturbances in drylands can be irreversible, thus leading to natural disasters such as the 1930s Dust Bowl. With increasing attention on aeolian studies, many studies have been conducted using ground-based measurements or wind tunnel studies. Ground-based measurements are important for validating model predictions and testing the effect and interactions of different factors known to affect wind erosion. Here, a machine-learning model (random forest) was used to describe sediment flux as a function of wind speed, soil moisture, precipitation, soil roughness, soil crusts, and soil texture. Model performance was compared to previous results before analyzing four new years of sediment flux data and including estimates of soil moisture to the model. The random forest model provided a better result than a regression tree with a higher variance explained (7.5% improvement). With additional soil moisture data, the model performance increased by 13.13%. With full dataset, the model provided an increase of 30.50% in total performance compared to the previous study. This research was one of the rare studies which represented a large-scale network of BSNEs and a long time series of data to quantify seasonal sediment flux under different soil covers in southern Utah. The results will also be helpful to the managers for controlling the effects on wind erosion, scientists to choose variables for further modeling or local people to increase the public awareness about the effects of wind erosion. wind erosion sediment flux bsne random forest model Environmental Sciences
34	Transfer learning for medication adherence prediction from social forums self-reported data Haas, Kyle D. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Medication non-adherence and non-compliance left unaddressed can compound into severe medical problems for patients. Identifying patients that are likely to become non-adherent can help reduce these problems. Despite these benefits, monitoring adherence at scale is cost-prohibitive. Social forums offer an easily accessible, affordable, and timely alternative to the traditional methods based on claims data. This study investigates the potential of medication adherence prediction based on social forum data for diabetes and fibromyalgia therapies by using transfer learning from the Medical Expenditure Panel Survey (MEPS). Predictive adherence models are developed by using both survey and social forums data and different random forest (RF) techniques. The first of these implementations uses binned inputs from k-means clustering. The second technique is based on ternary trees instead of the widely used binary decision trees. These techniques are able to handle missing data, a prevalent characteristic of social forums data. The results of this study show that transfer learning between survey models and social forum models is possible. Using MEPS survey data and the techniques listed above to derive RF models, less than 5% difference in accuracy was observed between the MEPS test dataset and the social forum test dataset. Along with these RF techniques, another RF implementation with imputed means for the missing values was developed and shown to predict adherence for social forum patients with an accuracy >70%. This thesis shows that a model trained with verified survey data can be used to complement traditional medical adherence models by predicting adherence from unverified, self-reported data in a dynamic and timely manner. Furthermore, this model provides a method for discovering objective insights from subjective social reports. Additional investigation is needed to improve the prediction accuracy of the proposed model and to assess biases that may be inherent to self-reported adherence measures in social health networks. MEPS Medication adherence Social forum Random forest Transfer learning
35	Evaluace geografickeho Random Forest algoritmu v posouzení sucha / Geographical Random Forest model evaluation in agricultural drought assessment Bicák, Daniel January 2021 (has links) Drought is a natural disaster, which negatively affects millions of people and causes huge economic losses. This thesis investigates agricultural drought in Czechia using machine learning algorithms. The statistical models utilised were Random Forest (RF), Geographical Random Forest (GRF) and Locally Tuned Geographical Random Forest (LT GRF). GRF consists of several RF models trained on a subset of original data. The final prediction is a weighted sum of the prediction of a local and global model. The size of the subset is determined by the tunable parameter. LT GRF addresses spatial variability of subset size and local weight. During the tuning process, optimal parameters are found for every location and then interpolated for unknown regions. The thesis aims to evaluate the performance of each model and compare GRF feature importance output with the global model. The best model features meteorological impor- tances are used to create a drought vulnerability map of Czechia. Produced assessment is compared to existing drought vulnerability projects. 1
36	Do Economic Factors Help Forecast Political Turnover? Comparing Parametric and Nonparametric Approaches Burghart, Ryan A. 22 April 2021 (has links) No description available. Economics political turnover turnover economics forecasting random forest nonparametric regressions
37	Predicting base conservation scores in RNA 3D structures Bulbul, Gul Bahar 11 August 2023 (has links) No description available. Statistics RNA 3D analysis Random forest Neural network
38	Stock market estimation : Using Linear Regression and Random Forest Kastberg, Daniel January 2022 (has links) Stock market speculation is captivating to many people. Millions of people worldwide sell and buy stocks in the hope of turning a profit. By using machine learning could Random Forest or Linear Regression estimate which direction the trend of the stock market is heading, and would Random Forest outperform Linear Regression since it involves more complex methods. To explore the subject, several stocks from Nasdaq and the index of Swedish OMX are studied and used to evaluate the machine learning models. The data was modified to measure the change in percentage to accommodate the Random Forests inability to extrapolate. The return on investment in percentage was chosen as a dependent variable. Without a technical analysis both models performed poorly, but when RSI 14, EMA 10 and SMA 10 was added, both models proved significant, while Random Forest proved the superior of them both. Hyperparameter optimization was applied on Random Forest to evaluate if it was possible to prove it even more superior to Linear Regression, but alas, it only gave an improvement in half of the datasets, which made it inconclusive. This thesis adds to the already existing papers of predicting stock prices, but goes into exploring the difference between Random Forest and Linear Regression to see if there are any obvious differences in their ability to estimate the direction of a stock’s price in a near future. Machine Learning Random Forest Linear Regression Computer Sciences Datavetenskap (datalogi)
39	Interpreting Random Forest Classification Models Using a Feature Contribution Method Palczewska, Anna Maria, Palczewski, J., Marchese-Robinson, R.M., Neagu, Daniel 18 February 2014 (has links) No / Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.
40	Maskininlärningsklassificering av fordonsstatus för minskade reparationskostnader och avbrott inom kollektivtrafiken : Applicering av Random Forest-klassificering på fordonssignaler / Machine Learning Classification of Vehicle Status for Reduction of Cost and Downtime in Public Transport Stopner, Julia, Willberg, Carl-Åke January 2022 (has links) I takt med att den moderna och datadrivna världen fortsätter att utvecklas, så väljer många instutitioner och företag att göra en ansats att kapitalisera på dessa entiters egna strömmar av data. Parallellt med denna utveckling söker en än mer globaliserad värld efter sätt att förena en ökande befolkning och större behov av att röra sig flexibelt genom moderna städer med ett trängande behov av att mildra den klimatskada som denna mobilitet medför. Framtiden för kollektivtrafik står som potentiell lösning i gränssnittet mellan dessa två trender och det går därmed att se många fördelar med att tillåta en maskininlärningsalgoritm att finna tidigare osedda mönster och hinder i den dagliga verksamheten. Denna studie utforskar om en på historisk data tränad klassificeringsmodell av typen Random Forest kan användas för att förutspå och förebygga driftstopp i kollektivtrafiken till följd av reparationsbehov hos fordonen. Implementationen av modellen resulterade i en accuracy på 63,1% och en recall på 59,9%. Slutsatsen från undersökningen blir därmed att det finns inneboende potential i metoden, även om det krävs en ökning i kvalitet och bredd på signaldata för att höja effektiviteten i modellen. Detta implicerar, givet ytterligare forskning och förbättring av intern datahantering, att en Random Forest-modell kan ha en kommersiellt mätbar relevans sett till driftstopp och reparationskostnader. / As the modern and data-driven world continues to evolve, many institutions and corporations are eager to capitalize on their own data streams for optimizations of their operations. In tandem with this, the globalized world is searching to find ways of dealing with an ever increasing population with an urge to travel and move throughout sprawling cityscapes - all the while finding ways to mitigate the climate impact that comes with this ease of movement. The future of public transport stands in the middle of these two trends and many advantages can be gained from seizing the opportunity to let machine learning ascertain unknown patterns and obstacles in daily operations. This study explores if the use of a Random Forest classifier, having been trained on historical data, would present an effective way of predicting vehicle downtime due to repairs. The implementation of the classifier resulted in an accuracy of 63.1% and a 59.9% recall. The conclusion of the study reveals that there is potential in the method although the quality and range of possible signals need to be improved to further raise the effectiveness of the model. This implies, given further investigation and an ample adaptation of the data stream and the company technical infrastructure, that a Random Forest model would result in commercial benefits in regards to downtime and cost of repair. Public Transport Random Forest classification Computer and Information Sciences Data- och informationsvetenskap

Search results