Spelling suggestions: "subject:"random forest regression"" "subject:"random corest regression""
1 |
Assessing Ponderosa Pine (Pinus ponderosa) Suitable Habitat throughout Arizona in Response to Future Climate ModelsJanuary 2011 (has links)
abstract: The species distribution model DISTRIB was used to model and map potential suitable habitat of ponderosa pine throughout Arizona under current and six future climate scenarios. Importance Values for each climate scenario were estimated from 24 predictor variables consisting of climate, elevation, soil, and vegetation data within a 4 km grid cell. Two emission scenarios, (A2 (high concentration) and B1 (low concentration)) and three climate models (the Parallel Climate Model, the Geophysical Fluid Dynamics Laboratory, and the HadleyCM3) were used to capture the potential variability among future climates and provide a range of responses from ponderosa pine. Summary tables for federal and state managed lands show the potential change in suitable habitat under the different climate scenarios; while an analysis of three elevational regions explores the potential shift of habitat upslope. According to the climate scenarios, mean annual temperature in Arizona could increase by 3.5% while annual precipitation could decrease by 36% over this century. Results of the DISTRIB model indicate that in response to the projected changes in climate, suitable habitat for ponderosa pine could increase by 13% throughout the state under the HadleyCM3 high scenario or lose 1.1% under the average of the three low scenarios. However, the spatial variability of climate changes will result in gains and losses among the ecoregions and federally and state managed lands. Therefore, alternative practices may need to be considered to limit the loss of suitable habitat in areas identified by the models. / Dissertation/Thesis / M.S. Applied Biological Sciences 2011
|
2 |
Comparing the Uses and Classification Accuracy of Logistic and Random Forest Models on an Adolescent Tobacco Use DatasetMaginnity, Joseph D. 02 October 2020 (has links)
No description available.
|
3 |
Comparative Analysis of Surrogate Models for the Dissolution of Spent Nuclear FuelAwe, Dayo 01 May 2024 (has links) (PDF)
This thesis presents a comparative analysis of surrogate models for the dissolution of spent nuclear fuel, with a focus on the use of deep learning techniques. The study explores the accuracy and efficiency of different machine learning methods in predicting the dissolution behavior of nuclear waste, and compares them to traditional modeling approaches. The results show that deep learning models can achieve high accuracy in predicting the dissolution rate, while also being computationally efficient. The study also discusses the potential applications of surrogate modeling in the field of nuclear waste management, including the optimization of waste disposal strategies and the design of more effective containment systems. Overall, this research highlights the importance of surrogate modeling in improving our understanding of nuclear waste behavior and developing more sustainable waste management practices.
|
4 |
An Investigation of How Well Random Forest Regression Can Predict Demand : Is Random Forest Regression better at predicting the sell-through of close to date products at different discount levels than a basic linear model?Jonsson, Estrid, Fredrikson, Sara January 2021 (has links)
Allt eftersom klimatkrisen fortskrider ökar engagemanget kring hållbarhet inom företag. Växthusgaser är ett av de största problemen och matsvinn har därför fått mycket uppmärksamhet sedan det utnämndes till den tredje största bidragaren till de globala utsläppen. För att minska sitt bidrag rabatterar många matbutiker produkter med kort bästföredatum, vilket kommit att kräva en förståelse för hur priskänslig efterfrågan på denna typ av produkt är. Prisoptimering görs vanligtvis med så kallade Generalized Linear Models men då efterfrågan är ett komplext koncept har maskininl ärningsmetoder börjat utmana de traditionella modellerna. En sådan metod är Random Forest Regression, och syftet med uppsatsen är att utreda ifall modellen är bättre på att estimera efterfrågan baserat på rabattnivå än en klassisk linjär modell. Vidare utreds det ifall ett tydligt linjärt samband existerar mellan rabattnivå och efterfrågan, samt ifall detta beror av produkttyp. Resultaten visar på att Random Forest tar bättre hänsyn till det komplexa samband som visade sig finnas, och i detta specifika fall presterar bättre. Vidare visade resultaten att det sammantaget inte finns något linjärt samband, men att vissa produktkategorier uppvisar svag linjäritet. / As the climate crisis continues to evolve many companies focus their development on becoming more sustainable. With greenhouse gases being highlighted as the main problem, food waste has obtained a great deal of attention after being named the third largest contributor to global emissions. One way retailers have attempted to improve is through offering close-to-date produce at discount, hence decreasing levels of food being thrown away. To minimize waste the level of discount must be optimized, and as the products can be seen as flawed the known price-to-demand relation of the products may be insufficient. The optimization process historically involves generalized linear regression models, however demand is a complex concept influenced by many factors. This report investigates whether a Machine Learning model, Random Forest Regression, is better at estimating the demand of close-to-date products at different discount levels than a basic linear regression model. The discussion also includes an analysis on whether discounts always increase the will to buy and whether this depends on product type. The results show that Random Forest to a greater extent considers the many factors influencing demand and is superior as a predictor in this case. Furthermore it was concluded that there is generally not a clear linear relation however this does depend on product type as certain categories showed some linearity.
|
5 |
Prisestimering på bostadsrätter : Implementering av OCR-metoder och Random Forest regression för datadriven värdering / Price estimation in the housing cooperative market : Implementation of OCR methods and Random Forest regression for data-driven valuationLövgren, Sofia, Löthman, Marcus January 2023 (has links)
This thesis explores the implementation of Optical Character Recognition (OCR) – based text extraction and random forest regression analysis for housing market valuation, specifically focusing on the impact of value factors, derived from OCR-extracted economic values from housing cooperatives’ annual reports. The objective is to perform price estimations using the Random Forest model to identify the key value factors that influence the estimation process and examine how the economic values from annual reports affect the sales price. The thesis aims to highlight the often-overlooked aspect that when purchasing an apartment, one also assumes the liabilities of the housing cooperative. The motivation for utilizing OCR techniques stems from the difficulties associated with manual data collection, as there is a lack of readily accessible structured data on the subject, emphasizing the importance of automation for effective data extraction. The findings indicate that OCR can effectively extract data from annual reports, but with limitations due to variation in report structures. The regression analysis reveals the Random Forest model’s effectiveness in estimating prices, with location and construction year emerging as the most influential factors. Furthermore, incorporating the economic values from the annual reports enhances the accuracy of price estimation compared to the model that excluded such factors. However, definitive conclusions regarding the precise impact of these economic factors could not be drawn due to limited geographical spread of data points and potential hidden value factors. The study concludes that the machine learning model can be used to make a credible price estimate on cooperative apartments and that OCR methods prove valuable in automating data extraction from annual reports, although standardising report format would enhance their efficiency. The thesis highlights the significance of considering the housing cooperatives’ economic values when making property purchases.
|
6 |
A Machine Learning Estimation of the Occupancy of Padel Facilities in Sweden : An application of Random Forest algorithm on a padel booking dataset / Uppskattning av svenska padelanläggningars beläggningsgrad genom maskininlärningJohansson, Michael, Gonzálvez Läth, Nadia January 2022 (has links)
Padel is one of the fastest growing sports in Sweden. Its popularity rose significantly during the Covid-19 pandemic in 2020, as many other types of sport facilities closed, and people had more flexible work schedules due to remote work. This paper is an analysis on the monthly occupancy of indoor padel facilities in Sweden between January 2018 and April 2022. It aims to answer to what degree a machine learning algorithm can predict the occupancy for a given padel facility and which key features have the largest impact on the occupancy. With these findings, it is possible to estimate the revenue for a given padel facility and therefore be used to identify which type of padel facilities have the biggest opportunity to succeed from an economical perspective. This article reviews the literature regarding different methods of machine learning, in this case, applied to booking systems and occupancy estimations. The reviewed literature also presents the most common evaluation metrics used for comparing different machine learning models. This study analyses the relationship between the occupancy level of a given padel facility and 12 input features, related to the padel facility in question, with a random forest regression model. This work results in a model that achieved a R2 score of 49% and a mean absolute error of 11%. The input features ranked according to the largest impact on the model’s estimation are (with the mean of all absolute SHAP values written in parentheses): Year (7.71), Month (5.23), Average Income in municipality (4.13), Driving Time from municipality Centre (2.35), Population of municipality (1.97), Padel Slots in municipality (1.27), Padel Slots in facility (1.27), Average Court Price (1.12), Tennis Slots in municipality (0.73), Badminton Slots in municipality (0.55), Squash Slots in municipality (0.44) and Golf Slots in municipality (0.26). Padel facilities had the highest average occupancy in 2020. The Covid-19 pandemic is likely a significant contributor to this, due to the shutdown of offices and many types of training venues. Therefore, Year has the largest impact on the model’s estimation. Occupancy of indoor facilities follows a seasonal trend, where it tends to be highest in December and January and lowest in June and July. This trend can partly be explained by a larger demand for indoor sport activities during winter and increased competition from outside padel facilities and other activities during summer. Because of this, Month had the second largest impact on the model’s estimation. / Padel är en av de snabbast växande sporterna i Sverige. Dess popularitet ökade avsevärt under Covid-19-pandemin i 2020, främst på grund av att många andra typer av sportanläggningar stängdes ner och människor hade mer flexibla arbetsscheman på grund av distansarbete. Den här uppsatsen är en analys av den månatliga beläggningen av inomhuspadelanläggningar i Sverige mellan januari 2018 och april 2022. Studien syftar till att svara på i vilken grad en maskininlärningsalgoritm kan förutsäga beläggningen för en given padelanläggning och vilka nyckelfunktioner som har störst inverkan på beläggningen. Med dessa insikter är det möjligt att uppskatta intäkterna för en given padelanläggning och kan därför användas vilka typer av padelanläggningar som har störst möjlighet att vara framgångsrika ur ett ekonomiskt perspektiv. Den granskade litteraturen studerar olika maskininlärningsmetoder tillämpad i områden som bokningssystemsanalys och beläggningsgradsstudier, samt presenterar de vanligaste utvärderingsmåtten som används för att jämföra metoderna. Denna studie analyserar sambandet mellan beläggningsgraden för en given padelanläggning och 12 inputparametrar, relaterade till padelanläggningen i fråga med hjälp av en random forest regressionsalgoritm. Detta arbete resulterar i en modell som uppnådde ett R2 värde på 49% och en genomsnittlig absolut avvikelse på 11 %. Inputparametrarna rangordnade enligt den största påverkan på modellens uppskattning är (med medelvärdet av alla absoluta SHAP-värden skrivna inom parentes): År (7.71), Månad (5.23), Genomsnittlig Inkomst i kommunen (4.13), Körtid mellan anläggning och kommunens centrum (2.35), Kommunens befolkningsmängd (1.97), Antal padeltider i kommunen (1.27), Padeltider i anläggningen(1.27), Genomsnittlig pris för bana(1.12), Tennistider i kommunen (0.73), Badmintontider i kommunen (0.55), Squashtider i kommunen (0.44) och Golftider i kommunen (0.26). Padelanläggningar hade högsta genomsnittliga beläggningsgraden under 2020. Covid-19-pandemin är sannolikt en betydande bidragande orsak till detta på grund av nedläggningen av kontor och andra sportanläggningar. Därför har inputparametern År den största inverkan på modellens uppskattning. Beläggningen av inomhusanläggningar följer en säsongsmässig trend, där den tenderar att vara högst i januari och lägst i juli. Denna trend kan delvis förklaras av en större efterfrågan på inomhussportaktiviteter under vintern och ökad konkurrens från utomstående padelanläggningar och andra aktiviteter under sommaren. På grund av detta hade Månad den näst största påverkan på modellens uppskattning.
|
7 |
Retrieval of Cloud Top PressureAdok, Claudia January 2016 (has links)
In this thesis the predictive models the multilayer perceptron and random forest are evaluated to predict cloud top pressure. The dataset used in this thesis contains brightness temperatures, reflectances and other useful variables to determine the cloud top pressure from the Advanced Very High Resolution Radiometer (AVHRR) instrument on the two satellites NOAA-17 and NOAA-18 during the time period 2006-2009. The dataset also contains numerical weather prediction (NWP) variables calculated using mathematical models. In the dataset there are also observed cloud top pressure and cloud top height estimates from the more accurate instrument on the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite. The predicted cloud top pressure is converted into an interpolated cloud top height. The predicted pressure and interpolated height are then evaluated against the more accurate and observed cloud top pressure and cloud top height from the instrument on the satellite CALIPSO. The predictive models have been performed on the data using different sampling strategies to take into account the performance of individual cloud classes prevalent in the data. The multilayer perceptron is performed using both the original response cloud top pressure and a log transformed repsonse to avoid negative values as output which is prevalent when using the original response. Results show that overall the random forest model performs better than the multilayer perceptron in terms of root mean squared error and mean absolute error.
|
8 |
Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approachesJi-young Shin (11560495) 14 October 2021 (has links)
<p>The present dissertation investigated
the impact of scales / scoring methods and prompt linguistic features on the
meausrement quality of L2 English elicited imitation (EI). Scales / scoring
methods are an important feature for the validity and reliabilty of L2 EI test,
but less is known (Yan et al., 2016). Prompt linguistic features are also known
to influence EI test quaity, particularly item difficulty, but item
discrimination or corpus-based, fine-grained meausres have rarely been incorporated
into examining the contribution of prompt linguistic features. The current
study addressed the research needs, using item response theory (IRT) and random
forest modeling.</p><p>Data consisted of 9,348 oral responses
to forty-eight items, including EI prompts, item scores, and rater comments, which
were collected from 779 examinees of an L2 English EI test at Purdue
Universtiy. First, the study explored the current and alternative EI scales / scoring
methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based
measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project
identified important prompt linguistic features that predict EI item difficulty
and discrimination across different scales / scoring methods and proficiency, using
multi-level modeling and random forest regression (RQ5 and RQ6 in Phase
Ⅱ).</p><p>The main findings were
(although not limited to): 1) collapsing exact repetition and paraphrase
categories led to more optimal measurement (i.e., adequacy of item parameter values, category
functioning, and model / item / person fit) (RQ1); there were fewer misfitting
persons with lower proficiency and higher frequency of unexpected responses in
the extreme categories (RQ2); the inconsistency of qualitatively distinguishing
semantic errors and the wide range of grammatical accuracy in the minor error
category contributed to misfit (RQ3); a quantity-based, 4-category ordinal
scale outperformed quality-based or binary scales (RQ4); sentence length
significantly explained item difficulty only, with small variance explained
(RQ5); Corpus-based lexical measures and
phrase-level syntactic complexity were important to predicting item difficulty,
particularly for the higher ability level. The findings made implications for
EI scale / item development in human and automatic scoring settings and L2
English proficiency development.</p>
|
9 |
What Matters the Most? Understanding Individual Tornado Preparedness Using Machine LearningChoi, Junghwa, Robinson, Scott, Maulik, Romit, Wehde, Wesley 01 August 2020 (has links)
Scholars from various disciplines have long attempted to identify the variables most closely associated with individual preparedness. Therefore, we now have much more knowledge regarding these factors and their association with individual preparedness behaviors. However, it has not been sufficiently discussed how decisive many of these factors are in encouraging preparedness. In this article, we seek to examine what factors, among the many examined in previous studies, are most central to engendering emergency preparedness in individuals particularly for tornadoes by utilizing a relatively uncommon machine learning technique in disaster management literature. Using unique survey data, we find that in the case of tornado preparedness the most decisive variables are related to personal experiences and economic circumstances rather than basic demographics. Our findings contribute to scholarly endeavors to understand and promote individual tornado preparedness behaviors by highlighting the variables most likely to shape tornado preparedness at an individual level.
|
10 |
House Price PredictionAghi, Nawar, Abdulal, Ahmad January 2020 (has links)
This study proposes a performance comparison between machine learning regression algorithms and Artificial Neural Network (ANN). The regression algorithms used in this study are Multiple linear, Least Absolute Selection Operator (Lasso), Ridge, Random Forest. Moreover, this study attempts to analyse the correlation between variables to determine the most important factors that affect house prices in Malmö, Sweden. There are two datasets used in this study which called public and local. They contain house prices from Ames, Iowa, United States and Malmö, Sweden, respectively.The accuracy of the prediction is evaluated by checking the root square and root mean square error scores of the training model. The test is performed after applying the required pre-processing methods and splitting the data into two parts. However, one part will be used in the training and the other in the test phase. We have also presented a binning strategy that improved the accuracy of the models.This thesis attempts to show that Lasso gives the best score among other algorithms when using the public dataset in training. The correlation graphs show the variables' level of dependency. In addition, the empirical results show that crime, deposit, lending, and repo rates influence the house prices negatively. Where inflation, year, and unemployment rate impact the house prices positively.
|
Page generated in 0.0759 seconds