Global ETD Search

1	Assessing Ponderosa Pine (Pinus ponderosa) Suitable Habitat throughout Arizona in Response to Future Climate Models January 2011 (has links) abstract: The species distribution model DISTRIB was used to model and map potential suitable habitat of ponderosa pine throughout Arizona under current and six future climate scenarios. Importance Values for each climate scenario were estimated from 24 predictor variables consisting of climate, elevation, soil, and vegetation data within a 4 km grid cell. Two emission scenarios, (A2 (high concentration) and B1 (low concentration)) and three climate models (the Parallel Climate Model, the Geophysical Fluid Dynamics Laboratory, and the HadleyCM3) were used to capture the potential variability among future climates and provide a range of responses from ponderosa pine. Summary tables for federal and state managed lands show the potential change in suitable habitat under the different climate scenarios; while an analysis of three elevational regions explores the potential shift of habitat upslope. According to the climate scenarios, mean annual temperature in Arizona could increase by 3.5% while annual precipitation could decrease by 36% over this century. Results of the DISTRIB model indicate that in response to the projected changes in climate, suitable habitat for ponderosa pine could increase by 13% throughout the state under the HadleyCM3 high scenario or lose 1.1% under the average of the three low scenarios. However, the spatial variability of climate changes will result in gains and losses among the ecoregions and federally and state managed lands. Therefore, alternative practices may need to be considered to limit the loss of suitable habitat in areas identified by the models. / Dissertation/Thesis / M.S. Applied Biological Sciences 2011 Ecology Forestry Natural Resource Management Climate change Random Forest regression Range shifts Species Distribution Modeling
2	Comparing the Uses and Classification Accuracy of Logistic and Random Forest Models on an Adolescent Tobacco Use Dataset Maginnity, Joseph D. 02 October 2020 (has links) No description available. Biostatistics Public Health
3	Comparative Analysis of Surrogate Models for the Dissolution of Spent Nuclear Fuel Awe, Dayo 01 May 2024 (has links) (PDF) This thesis presents a comparative analysis of surrogate models for the dissolution of spent nuclear fuel, with a focus on the use of deep learning techniques. The study explores the accuracy and efficiency of different machine learning methods in predicting the dissolution behavior of nuclear waste, and compares them to traditional modeling approaches. The results show that deep learning models can achieve high accuracy in predicting the dissolution rate, while also being computationally efficient. The study also discusses the potential applications of surrogate modeling in the field of nuclear waste management, including the optimization of waste disposal strategies and the design of more effective containment systems. Overall, this research highlights the importance of surrogate modeling in improving our understanding of nuclear waste behavior and developing more sustainable waste management practices. spent nuclear fuel random forest regression boosting methods surrogate model machine learning Physical Sciences and Mathematics
4	An Investigation of How Well Random Forest Regression Can Predict Demand : Is Random Forest Regression better at predicting the sell-through of close to date products at different discount levels than a basic linear model? Jonsson, Estrid, Fredrikson, Sara January 2021 (has links) Allt eftersom klimatkrisen fortskrider ökar engagemanget kring hållbarhet inom företag. Växthusgaser är ett av de största problemen och matsvinn har därför fått mycket uppmärksamhet sedan det utnämndes till den tredje största bidragaren till de globala utsläppen. För att minska sitt bidrag rabatterar många matbutiker produkter med kort bästföredatum, vilket kommit att kräva en förståelse för hur priskänslig efterfrågan på denna typ av produkt är. Prisoptimering görs vanligtvis med så kallade Generalized Linear Models men då efterfrågan är ett komplext koncept har maskininl ärningsmetoder börjat utmana de traditionella modellerna. En sådan metod är Random Forest Regression, och syftet med uppsatsen är att utreda ifall modellen är bättre på att estimera efterfrågan baserat på rabattnivå än en klassisk linjär modell. Vidare utreds det ifall ett tydligt linjärt samband existerar mellan rabattnivå och efterfrågan, samt ifall detta beror av produkttyp. Resultaten visar på att Random Forest tar bättre hänsyn till det komplexa samband som visade sig finnas, och i detta specifika fall presterar bättre. Vidare visade resultaten att det sammantaget inte finns något linjärt samband, men att vissa produktkategorier uppvisar svag linjäritet. / As the climate crisis continues to evolve many companies focus their development on becoming more sustainable. With greenhouse gases being highlighted as the main problem, food waste has obtained a great deal of attention after being named the third largest contributor to global emissions. One way retailers have attempted to improve is through offering close-to-date produce at discount, hence decreasing levels of food being thrown away. To minimize waste the level of discount must be optimized, and as the products can be seen as flawed the known price-to-demand relation of the products may be insufficient. The optimization process historically involves generalized linear regression models, however demand is a complex concept influenced by many factors. This report investigates whether a Machine Learning model, Random Forest Regression, is better at estimating the demand of close-to-date products at different discount levels than a basic linear regression model. The discussion also includes an analysis on whether discounts always increase the will to buy and whether this depends on product type. The results show that Random Forest to a greater extent considers the many factors influencing demand and is superior as a predictor in this case. Furthermore it was concluded that there is generally not a clear linear relation however this does depend on product type as certain categories showed some linearity. Random Forest Regression Linear Regression Food Waste Demand Prediction Computer and Information Sciences Data- och informationsvetenskap
5	Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers Anderson, Jacob Matthew 19 December 2024 (has links) (PDF) Post-processing bias correction of streamflow forecasts can be useful in the hydrologic modeling workflow to fine-tune forecasts for operations, water management, and decision-making. Hydrologic model runoff simulations include errors, uncertainties, and biases, leading to less accuracy and precision for applications in real-world scenarios. We used random forest regression to correct biases and errors in streamflow predictions from the U.S. National Water Model (NWM) long-range streamflow forecasts, considering U.S. Geological Survey (USGS) gauge station measurements as a proxy for true streamflow. We used other features in model training, including watershed characteristics, time fraction of year, and lagged streamflow values, to help the model perform better in gauged and ungauged areas. We assessed the effectiveness of the bias correction technique by comparing the difference between forecast and actual streamflow before and after the bias correction model was employed. We also explored advances in hydroinformatics and cloud computing by creating and testing this bias correction capability within the Google Cloud Console environment to avoid slow and unnecessary data downloads to local devices, thereby streamlining the data processing and storage within the cloud. This demonstrates the possibility of integrating our method into the NWM real-time forecasting workflow. Results indicate reasonable bias correction is possible using the random forest regression machine learning technique. Differences between USGS discharge and NWM forecasts are less than the original difference observed after being run through the random forest model. The main issue concerning the forecasts from the NWM is that the error increases further from the reference time or start of the forecast period. The model we created shows significant improvement in streamflow the further the times get from the reference time. The error is reduced and more uniform throughout all the time steps of the 30-day long-range forecasts. streamflow forecasts bias correction machine learning random forest regression Google BigQuery cloud Engineering
6	Prisestimering på bostadsrätter : Implementering av OCR-metoder och Random Forest regression för datadriven värdering / Price estimation in the housing cooperative market : Implementation of OCR methods and Random Forest regression for data-driven valuation Lövgren, Sofia, Löthman, Marcus January 2023 (has links) This thesis explores the implementation of Optical Character Recognition (OCR) – based text extraction and random forest regression analysis for housing market valuation, specifically focusing on the impact of value factors, derived from OCR-extracted economic values from housing cooperatives’ annual reports. The objective is to perform price estimations using the Random Forest model to identify the key value factors that influence the estimation process and examine how the economic values from annual reports affect the sales price. The thesis aims to highlight the often-overlooked aspect that when purchasing an apartment, one also assumes the liabilities of the housing cooperative. The motivation for utilizing OCR techniques stems from the difficulties associated with manual data collection, as there is a lack of readily accessible structured data on the subject, emphasizing the importance of automation for effective data extraction. The findings indicate that OCR can effectively extract data from annual reports, but with limitations due to variation in report structures. The regression analysis reveals the Random Forest model’s effectiveness in estimating prices, with location and construction year emerging as the most influential factors. Furthermore, incorporating the economic values from the annual reports enhances the accuracy of price estimation compared to the model that excluded such factors. However, definitive conclusions regarding the precise impact of these economic factors could not be drawn due to limited geographical spread of data points and potential hidden value factors. The study concludes that the machine learning model can be used to make a credible price estimate on cooperative apartments and that OCR methods prove valuable in automating data extraction from annual reports, although standardising report format would enhance their efficiency. The thesis highlights the significance of considering the housing cooperatives’ economic values when making property purchases. OCR Optical Character recognition Random Forest regression price estimation housing cooperatives machine learning OCR Optisk teckenigenkänning Random Forest regression Prisestimering Bostadsrätter Maskininlärning Other Computer and Information Science Annan data- och informationsvetenskap
7	A Machine Learning Estimation of the Occupancy of Padel Facilities in Sweden : An application of Random Forest algorithm on a padel booking dataset / Uppskattning av svenska padelanläggningars beläggningsgrad genom maskininlärning Johansson, Michael, Gonzálvez Läth, Nadia January 2022 (has links) Padel is one of the fastest growing sports in Sweden. Its popularity rose significantly during the Covid-19 pandemic in 2020, as many other types of sport facilities closed, and people had more flexible work schedules due to remote work. This paper is an analysis on the monthly occupancy of indoor padel facilities in Sweden between January 2018 and April 2022. It aims to answer to what degree a machine learning algorithm can predict the occupancy for a given padel facility and which key features have the largest impact on the occupancy. With these findings, it is possible to estimate the revenue for a given padel facility and therefore be used to identify which type of padel facilities have the biggest opportunity to succeed from an economical perspective. This article reviews the literature regarding different methods of machine learning, in this case, applied to booking systems and occupancy estimations. The reviewed literature also presents the most common evaluation metrics used for comparing different machine learning models. This study analyses the relationship between the occupancy level of a given padel facility and 12 input features, related to the padel facility in question, with a random forest regression model. This work results in a model that achieved a R2 score of 49% and a mean absolute error of 11%. The input features ranked according to the largest impact on the model’s estimation are (with the mean of all absolute SHAP values written in parentheses): Year (7.71), Month (5.23), Average Income in municipality (4.13), Driving Time from municipality Centre (2.35), Population of municipality (1.97), Padel Slots in municipality (1.27), Padel Slots in facility (1.27), Average Court Price (1.12), Tennis Slots in municipality (0.73), Badminton Slots in municipality (0.55), Squash Slots in municipality (0.44) and Golf Slots in municipality (0.26). Padel facilities had the highest average occupancy in 2020. The Covid-19 pandemic is likely a significant contributor to this, due to the shutdown of offices and many types of training venues. Therefore, Year has the largest impact on the model’s estimation. Occupancy of indoor facilities follows a seasonal trend, where it tends to be highest in December and January and lowest in June and July. This trend can partly be explained by a larger demand for indoor sport activities during winter and increased competition from outside padel facilities and other activities during summer. Because of this, Month had the second largest impact on the model’s estimation. / Padel är en av de snabbast växande sporterna i Sverige. Dess popularitet ökade avsevärt under Covid-19-pandemin i 2020, främst på grund av att många andra typer av sportanläggningar stängdes ner och människor hade mer flexibla arbetsscheman på grund av distansarbete. Den här uppsatsen är en analys av den månatliga beläggningen av inomhuspadelanläggningar i Sverige mellan januari 2018 och april 2022. Studien syftar till att svara på i vilken grad en maskininlärningsalgoritm kan förutsäga beläggningen för en given padelanläggning och vilka nyckelfunktioner som har störst inverkan på beläggningen. Med dessa insikter är det möjligt att uppskatta intäkterna för en given padelanläggning och kan därför användas vilka typer av padelanläggningar som har störst möjlighet att vara framgångsrika ur ett ekonomiskt perspektiv. Den granskade litteraturen studerar olika maskininlärningsmetoder tillämpad i områden som bokningssystemsanalys och beläggningsgradsstudier, samt presenterar de vanligaste utvärderingsmåtten som används för att jämföra metoderna. Denna studie analyserar sambandet mellan beläggningsgraden för en given padelanläggning och 12 inputparametrar, relaterade till padelanläggningen i fråga med hjälp av en random forest regressionsalgoritm. Detta arbete resulterar i en modell som uppnådde ett R2 värde på 49% och en genomsnittlig absolut avvikelse på 11 %. Inputparametrarna rangordnade enligt den största påverkan på modellens uppskattning är (med medelvärdet av alla absoluta SHAP-värden skrivna inom parentes): År (7.71), Månad (5.23), Genomsnittlig Inkomst i kommunen (4.13), Körtid mellan anläggning och kommunens centrum (2.35), Kommunens befolkningsmängd (1.97), Antal padeltider i kommunen (1.27), Padeltider i anläggningen(1.27), Genomsnittlig pris för bana(1.12), Tennistider i kommunen (0.73), Badmintontider i kommunen (0.55), Squashtider i kommunen (0.44) och Golftider i kommunen (0.26). Padelanläggningar hade högsta genomsnittliga beläggningsgraden under 2020. Covid-19-pandemin är sannolikt en betydande bidragande orsak till detta på grund av nedläggningen av kontor och andra sportanläggningar. Därför har inputparametern År den största inverkan på modellens uppskattning. Beläggningen av inomhusanläggningar följer en säsongsmässig trend, där den tenderar att vara högst i januari och lägst i juli. Denna trend kan delvis förklaras av en större efterfrågan på inomhussportaktiviteter under vintern och ökad konkurrens från utomstående padelanläggningar och andra aktiviteter under sommaren. På grund av detta hade Månad den näst största påverkan på modellens uppskattning. Machine Learning Random Forest Regression Occupancy Estimation Padel Bookings Sweden Evaluation Metrics Features SHAP Maskininlärning Random Forest Regression Beläggning Uppskattning Padel Bokningar Sverige Utvärderingsmått Inputvärden SHAP Computer and Information Sciences Data- och informationsvetenskap
8	Retrieval of Cloud Top Pressure Adok, Claudia January 2016 (has links) In this thesis the predictive models the multilayer perceptron and random forest are evaluated to predict cloud top pressure. The dataset used in this thesis contains brightness temperatures, reflectances and other useful variables to determine the cloud top pressure from the Advanced Very High Resolution Radiometer (AVHRR) instrument on the two satellites NOAA-17 and NOAA-18 during the time period 2006-2009. The dataset also contains numerical weather prediction (NWP) variables calculated using mathematical models. In the dataset there are also observed cloud top pressure and cloud top height estimates from the more accurate instrument on the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite. The predicted cloud top pressure is converted into an interpolated cloud top height. The predicted pressure and interpolated height are then evaluated against the more accurate and observed cloud top pressure and cloud top height from the instrument on the satellite CALIPSO. The predictive models have been performed on the data using different sampling strategies to take into account the performance of individual cloud classes prevalent in the data. The multilayer perceptron is performed using both the original response cloud top pressure and a log transformed repsonse to avoid negative values as output which is prevalent when using the original response. Results show that overall the random forest model performs better than the multilayer perceptron in terms of root mean squared error and mean absolute error. neural networks multilayer perceptron random forest regression cloud top pressure cloud top height Computer and Information Sciences Data- och informationsvetenskap
9	Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approaches Ji-young Shin (11560495) 14 October 2021 (has links) <p>The present dissertation investigated the impact of scales / scoring methods and prompt linguistic features on the meausrement quality of L2 English elicited imitation (EI). Scales / scoring methods are an important feature for the validity and reliabilty of L2 EI test, but less is known (Yan et al., 2016). Prompt linguistic features are also known to influence EI test quaity, particularly item difficulty, but item discrimination or corpus-based, fine-grained meausres have rarely been incorporated into examining the contribution of prompt linguistic features. The current study addressed the research needs, using item response theory (IRT) and random forest modeling.</p><p>Data consisted of 9,348 oral responses to forty-eight items, including EI prompts, item scores, and rater comments, which were collected from 779 examinees of an L2 English EI test at Purdue Universtiy. First, the study explored the current and alternative EI scales / scoring methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project identified important prompt linguistic features that predict EI item difficulty and discrimination across different scales / scoring methods and proficiency, using multi-level modeling and random forest regression (RQ5 and RQ6 in Phase Ⅱ).</p><p>The main findings were (although not limited to): 1) collapsing exact repetition and paraphrase categories led to more optimal measurement (i.e., adequacy of item parameter values, category functioning, and model / item / person fit) (RQ1); there were fewer misfitting persons with lower proficiency and higher frequency of unexpected responses in the extreme categories (RQ2); the inconsistency of qualitatively distinguishing semantic errors and the wide range of grammatical accuracy in the minor error category contributed to misfit (RQ3); a quantity-based, 4-category ordinal scale outperformed quality-based or binary scales (RQ4); sentence length significantly explained item difficulty only, with small variance explained (RQ5); Corpus-based lexical measures and phrase-level syntactic complexity were important to predicting item difficulty, particularly for the higher ability level. The findings made implications for EI scale / item development in human and automatic scoring settings and L2 English proficiency development.</p> Elicited imitation scales scoring methods prompt linguistic features item response theory random forest regression misfit analysis
10	What Matters the Most? Understanding Individual Tornado Preparedness Using Machine Learning Choi, Junghwa, Robinson, Scott, Maulik, Romit, Wehde, Wesley 01 August 2020 (has links) Scholars from various disciplines have long attempted to identify the variables most closely associated with individual preparedness. Therefore, we now have much more knowledge regarding these factors and their association with individual preparedness behaviors. However, it has not been sufficiently discussed how decisive many of these factors are in encouraging preparedness. In this article, we seek to examine what factors, among the many examined in previous studies, are most central to engendering emergency preparedness in individuals particularly for tornadoes by utilizing a relatively uncommon machine learning technique in disaster management literature. Using unique survey data, we find that in the case of tornado preparedness the most decisive variables are related to personal experiences and economic circumstances rather than basic demographics. Our findings contribute to scholarly endeavors to understand and promote individual tornado preparedness behaviors by highlighting the variables most likely to shape tornado preparedness at an individual level. disaster management emergency preparedness machine learning random forest regression tornado preparedness

Search results