Global ETD Search

281	Comparison of Logistic Regression and an Explained Random Forest in the Domain of Creditworthiness Assessment Ankaräng, Marcus, Kristiansson, Jakob January 2021 (has links) As the use of AI in society is developing, the requirement of explainable algorithms has increased. A challenge with many modern machine learning algorithms is that they, due to their often complex structures, lack the ability to produce human-interpretable explanations. Research within explainable AI has resulted in methods that can be applied on top of non- interpretable models to motivate their decision bases. The aim of this thesis is to compare an unexplained machine learning model used in combination with an explanatory method, and a model that is explainable through its inherent structure. Random forest was the unexplained model in question and the explanatory method was SHAP. The explainable model was logistic regression, which is explanatory through its feature weights. The comparison was conducted within the area of creditworthiness and was based on predictive performance and explainability. Furthermore, the thesis intends to use these models to investigate what characterizes loan applicants who are likely to default. The comparison showed that no model performed significantly better than the other in terms of predictive performance. Characteristics of bad loan applicants differed between the two algorithms. Three important aspects were the applicant’s age, where they lived and whether they had a residential phone. Regarding explainability, several advantages with SHAP were observed. With SHAP, explanations on both a local and a global level can be produced. Also, SHAP offers a way to take advantage of the high performance in many modern machine learning algorithms, and at the same time fulfil today’s increased requirement of transparency. / I takt med att AI används allt oftare för att fatta beslut i samhället, har kravet på förklarbarhet ökat. En utmaning med flera moderna maskininlärningsmodeller är att de, på grund av sina komplexa strukturer, sällan ger tillgång till mänskligt förståeliga motiveringar. Forskning inom förklarar AI har lett fram till metoder som kan appliceras ovanpå icke- förklarbara modeller för att tolka deras beslutsgrunder. Det här arbetet syftar till att jämföra en icke- förklarbar maskininlärningsmodell i kombination med en förklaringsmetod, och en modell som är förklarbar genom sin struktur. Den icke- förklarbara modellen var random forest och förklaringsmetoden som användes var SHAP. Den förklarbara modellen var logistisk regression, som är förklarande genom sina vikter. Jämförelsen utfördes inom området kreditvärdighet och grundades i prediktiv prestanda och förklarbarhet. Vidare användes dessa modeller för att undersöka vilka egenskaper som var kännetecknande för låntagare som inte förväntades kunna betala tillbaka sitt lån. Jämförelsen visade att ingen av de båda metoderna presterande signifikant mycket bättre än den andra sett till prediktiv prestanda. Kännetecknande särdrag för dåliga låntagare skiljde sig åt mellan metoderna. Tre viktiga aspekter var låntagarens °ålder, vart denna bodde och huruvida personen ägde en hemtelefon. Gällande förklarbarheten framträdde flera fördelar med SHAP, däribland möjligheten att kunna producera både lokala och globala förklaringar. Vidare konstaterades att SHAP gör det möjligt att dra fördel av den höga prestandan som många moderna maskininlärningsmetoder uppvisar och samtidigt uppfylla dagens ökade krav på transparens. Classification Creditworthiness Explainable Artificial Intelligence Logistic Regression Machine Learning Random Forest SHAP XAI Computer and Information Sciences Data- och informationsvetenskap
282	Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models Henriksson, Erik, Werlinder, Kristopher January 2021 (has links) The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data. / Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar. Random Forest XGBoost predicting housing prices feature engineering ensemble learning boosting data cleansing 5foldcrossvalidation. Computer Sciences Datavetenskap (datalogi)
283	Prediktion av optimal tidpunkt för köp av flygbiljetter med hjälp av maskininlärning / Prediction of optimal purchase time of airline tickets using machine learning Jacobsson, Marcus, Inkapööl, Viktor January 2020 (has links) The work presented in this study is based on the desire of cutting consumer costs related to purchase of airfare tickets. In detail, the study has investigated whether it is possible to classify optimal purchase decisions for specific flight routes with high accuracy using machine learning models trained with basic data containing only price and search date for a given date of departure. The models were based on Random Forest Classifier and trained on search data up to 90 days ahead of every leave date in July 2016-2018, and tested on the same kind of data for 2019. After preparation of data and tuning of hyperparameters the final models managed to correctly classify optimal purchase with an accuracy of 88% for the trip Stockholm-Mallorca and 84% for the trip Stockholm-Bangkok. Based on the assumption that the number of searches correlates with demand and in turn actual purchases, the study calculated the average expected savings per ticket using the model on the specific routes to be 21% and 17% respectively. Furthermore, the study has also examined how a business model for price comparison could be reshaped to incorporate these findings. The framework was set up using Business Model Canvas and resulted in the recommendation of implementing a premium service where users would be given the information wether to buy or wait based on a search. / Arbetet presenterat i studien är baserat på målet att sänka konsumentkostnader relaterat till köp av flygresor. Mer specifikt har studien undersökt huruvida det är möjligt att predicera optimala köpbeslut för specifika flygrutter med hjälp av maskininlärningsmodeller tränade på grundläggande data innehållande endast information om pris och sökdatum för varje givet avresedatum. Modellerna baserades på Random Forest Classifier och tränades på sökdata upp till 90 dagar före avresa för varje avresedag i juli 2016–2018, och testades på likadan data för 2019. Efter förberedelse av data och tuning av hyperparametrar lyckades modellerna med en träffsäkerhet på 88% respektive 84% predicera optimalt köp för rutterna Stockholm-Mallorca respektive Stockholm-Bangkok. Baserat på antagande om att antalet sökningar korrelerar med efterfrågan och vidare faktiska köp, beräknade studien att den genomsnittliga förväntade besparingen per biljett vid användning av modeller på de undersökta rutterna till 21% respektive 17%. Vidare undersökte studien hur en affärsmodell för prisjämförelse kan omformas för att inkorporera resultaten. Ramverkat som användes för detta var Business Model Canvas och mynnade ut i en rekommendation av implementering av en premiumtjänst genom vilken användare ges information biljett ska köpas eller ej vid en given sökning. Machine Learning Classification Random Forest Purchase Decision Airfare Tickets Business Model Canvas Computer and Information Sciences Data- och informationsvetenskap
284	Development of Data-Driven Models for Membrane Fouling Prediction at Wastewater Treatment Plants Kovacs, David January 2022 (has links) Membrane bioreactors (MBRs) have proven to be an extremely effective wastewater treatment process combining ultrafiltration with biological processes to produce high-quality effluent. However, one of the major drawbacks to this technology is membrane fouling – an inevitable process that reduces permeate production and increases operating costs. The prediction of membrane fouling in MBRs is important because it can provide decision support to wastewater treatment plant (WWTP) operators. Currently, mechanistic models are often used to estimate transmembrane pressure (TMP), which is an indicator of membrane fouling, but their performance is not always satisfactory. In this research, existing mechanistic and data-driven models used for membrane fouling are investigated. Data-driven machine learning techniques consisting of random forest (RF), artificial neural network (ANN), and long-short term memory network (LSTM) are used to build models to predict transmembrane pressure (TMP) at various stages of the MBR production cycle. The models are built with 4 years of high-resolution data from a confidential full-scale municipal WWTP. The model performances are examined using statistical measures such as coefficient of determination (R2), root mean squared error, mean absolute percentage error, and mean squared error. The results show that all models provide reliable predictions while the RF models have the best predictive accuracy when compared to the ANN and LSTM models. The corresponding R2 values for RF when predicting before, during, and after back pulse TMP are 0.996, 0.927, and 0.996, respectively. Model uncertainty (including hyperparameter and algorithm uncertainty) is quantified to determine the impact of hyperparameter tuning and the variance of extreme predictions caused by algorithm choice. The ANN models are most impacted by hyperparameter tuning and have the highest variability when predicting extreme values within each model’s respective hyperparameter range. The proposed models can be useful tools in providing decision support to WWTP operators employing fouling mitigation strategies, which can potentially lead to better operation of WWTPs and reduced costs. / Thesis / Master of Applied Science (MASc)
285	OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION Assareh, Amin 27 November 2012 (has links) No description available. Bioinformatics Computer Science GWAS Epistasis Interaction Detection Variable Selection Decision Trees Ensemble Learning AdaBoost LogitBoost Bagging Random Forest
286	Predictive Analysis for Trauma Patient Readmission Database Jiao, Weiwei 24 August 2017 (has links) No description available. Biostatistics Readmission Rate Pediatric Trauma Patients Healthcare Cost and Utilization Project National Readmission Database logistic regression random forest support vector machine
287	Regression Model to Project and Mitigate Vehicular Emissions in Cochabamba, Bolivia Wagner, Christopher 28 August 2017 (has links) No description available. Engineering Environmental Engineering Mechanical Engineering Random Forest Model Vehicular Fleet Cochabamba, Bolivia Vehicle Emissions Predictive Ensemble Model
288	VEHICLE RESPONSE PREDICTION USING PHYSICAL AND MACHINE LEARNING MODELS Lanka, Venkata Raghava Ravi Teja, Lanka January 2017 (has links) No description available. Transportation Mechanical Engineering Engineering
289	GULF OF MAINE LAND COVER AND LAND USE CHANGE ANALYSIS UTILIZING RANDOM FOREST CLASSIFICATION: TO BE USED IN HYDROLOGICAL AND ECOLOGICAL MODELING OF TERRESTRIAL CARBON EXPORT TO THE GULF OF MAINE VIA RIVERINE SYSTEMS Mordini, Michael B. 14 August 2013 (has links) No description available. Geography Cartography
290	Application of machine learning for soil survey updates: A case study in southeastern Ohio Subburayalu, Sakthi Kumaran 18 March 2008 (has links) No description available. Agriculture, Soil Science machine learning data mining soil survey SSURGO updates soil-landscape modeling predictive soil modeling Random Forest

Search results