Global ETD Search

441	Developing a selection of credit scoring models based on customer data / Utveckling av ett urval av kreditvärdighetsmodeller baserat på kunddata Eriksson, Thomas, Petkov, Tomas January 2019 (has links) Consumer credits are becoming increasingly popular and widespread in Sweden, with many actors trying to establish themselves on the market. In this thesis, we develop a selection of quantitative models for credit scoring, based on logistic regression and decision trees. These models may be used to reduce the number of credits approved to customers who are likely to default, and are mainly intended for e.g. newly started credit institutes who lack a statistically rigorous credit approval process, relying instead on qualitative, subjective judgements. / Konsumentkrediter blir allt vanligare och populärare i Sverige, med många aktörer som försöker etablera sig på marknaden. I denna avhandling utvecklar vi ett urval av kvantitativa modeller för bedömning av kreditvärdighet, baserade på logistisk regression och beslutsträd. Dessa modeller kan användas för att reducera antalet krediter som ställs ut till kunder med bristande betalningsförmåga, och riktas huvudsakligen till ex. nystartade kreditinstitut som saknar en statistiskt rigorös kreditbedömningsprocess, utan istället förlitar sig på kvalitativa och subjektiva bedömningar. Probability Theory and Statistics Sannolikhetsteori och statistik
442	Assessing mixed-modesurvey effects:A methodological comparison Kroon, William January 2023 (has links) No description available. Probability Theory and Statistics Sannolikhetsteori och statistik
443	Supervised classification for human movement data : A comparative study of functional and traditional methods / En jämförelse av övervakade klassificeringsmetoder för mänskligt rörelsedata : Funktionella mot diskreta tillvägagångssätt Lindvall, Markus January 2023 (has links) Functional data analysis (FDA) has a growing importance in statistics, especially in disciplines like biomechanics, where it is common to observe data over time. The objective of this thesis is to employ FDA techniques and compare the classification performance of supervised classification models utilizing functional data with the same models using a discrete summary measurement (max-values) as input. The classification revolves around predicting whether individuals, who underwent anterior cruciate ligament (ACL) reconstruction surgery, have a limb symmetry index higher or lower than 90% based on their observed movements during squat and step-down exercises executed, on average, nine months prior. Expanding the knowledge of the rehabilitation process after ACL injuries is not only interesting for affected individuals but can also improve the utilization of medical resources and reduce societal costs. The data comes from two occasions where 15 individuals who underwent ACL surgery performed the specified exercises. A total of 72 functional variables related to joint angles and moments, along with five additional univariate variables (e.g., the time elapsed between surgery and the first test occasion), were considered as potential predictors. Following an initial variable selection process using permutation tests, 14 variables were used separately in three different classification models: support vector machine (SVM), k-nearest neighbors (KNN), and naïve Bayes (NB). The classification performance was evaluated by the correct classification rate by leave-one-out cross-validation. The results showed that, when considering the variable that yielded the highest accuracy for each specific method, models utilizing functional data generally outperformed their counterparts using max-values. With functional data, SVM achieved an accuracy of 100%, KNN 93%, and NB 80%. The accuracy using max-values as input was 87%, 87%, and 80% for the SVM, KNN, and NB, respectively. / Funktionell dataanalys (FDA) spelar en alltmer betydande roll inom statistik, särskilt i områden som biomekanik där det är vanligt att observera data över tid. Syftet med denna studie är att tillämpa FDA-metoder för att jämföra klassificeringsresultat för olika klassificeringsmodeller när det nyttjar funktionellt data kontra när maxvärden används som indata. Klassificeringen innebär prediktion av huruvida individer som genomgått operation av främre korsbandet (ACL) har ett symmetriindex högre eller lägre än 90% baserat på deras observerade rörelser under knäböj och nedstegsövningar utförda, i genomsnitt, nio månader tidigare. Genom att utvidga kunskapen om rehabiliteringsprocessen efter ACL-skador kan effektiviteten av medicinska resurser samt rehabiliteringen för den enskilda individen förbättras. Datamaterialet kommer från två tillfällen där 15 personer som genomgått ACL-rekonstruktion utförde olika övningar. Totalt undersöks 72 funktionella variabler relaterade till vinklar och moment tillsammans med fem andra univariata variabler (t.ex. tiden mellan operation och första testtillfället) som potentiella prediktorer. Efter en inledande variabelurvalsprocess med hjälp av permutationstester användes 14 variabler separat i tre olika klassificeringsmodeller: support vector machine (SVM), k-nearest neighbors (KNN) och naive Bayes (NB). Klassificeringsprestationen utvärderades genom andelen korrekt klassificerade observationer vid korsvalidering där en observation exkluderades åt gången. Resultaten visade, när man beaktade den variabel som gav högst korrekt klassificeringsgrad för varje specifik metod, att modellerna som nyttjade funktionellt data i allmänhet överträffade sina motsvarigheter med maxvärden som indata. Med funktionellt data uppnådde SVM en korrekt klassificeringsgrad på 100%, KNN 93% och NB uppvisade 80% korrekt klassificering. Klassificeringsgraden för de klassificeringsmodellerna med maxvärden som indata var 87%, 87% och 80% för SVM, KNN respektive NB. Probability Theory and Statistics Sannolikhetsteori och statistik
444	Explainable modeling in machine learning : A comparative study / Förklarande modellering inom maskininlärning : en jämförande studie Stålberg, Simon, Isaksson, Olivia January 2023 (has links) As the use of advanced machine learning models has increased, the need for explainability that these models lack concerning their prediction has increased simultaneously. The aim of this thesis is to compare different functions available in the program R regarding their ability to provide explainability for these advanced machine learning models, also commonly referred to as black-box models. In this thesis we compare eight functions. Four well known black-box models are implemented on four different datasets in order to compare the functions’ ability to provide explanation in different settings. In our comparative analysis, we evaluate various aspects to assess and contrast each function, including explainability, flexibility, and functionality. Regardless of which model or dataset the explainability functions are exposed to, they are all capable of producing explainability plots. A result that showcases the high level of flexibility that every function holds. The result also provides an insight into how there is not one optimal function suitable for all of the models and datasets. All of the functions instead possess various advantages and disadvantages depending on the complexity of the models and which type of data being used. It is also evident that the number of included features and level of independence between the features has various effects on different functions. In conclusion, the functions in this thesis displayed a combination of significant flexibility and explainability, providing straightforward approaches to addressing the challenge of explainability in black-box models. / I samband med att användandet av avancerade maskininlärningsmodeller har ökat, har samtidigt behovet av förklarbarhet som dessa modeller saknar gällande deras prediktioner ökat. Syftet med denna uppsats är att jämföra olika funktioner som finns tillgängliga i programvaran R gällande deras förmåga att öka förklarbarheten för avancerade maskininlärningsmodeller, ofta benämnda som black-box modeller. I denna uppsats jämför vi åtta olika funktioner. Fyra välkända black-box modeller implementeras på fyra olika dataset för att kunna jämföra de olika funktionernas förmåga att förse förklarbarhet i olika sammanhang. I vår jämförande analys utvärderar vi olika aspekter för att bedöma och kontrastera varje funktion, inklusive förklarbarhet, flexibilitet och funktionalitet. Oavsett modell eller dataset som används, kan varje funktion visualisera förklarbarhet för black-box modeller. Ett resultat som påvisar den höga nivån av flexibilitet varje funktion innehar. Resultatet visar även på att det inte finns en optimal funktion som passar alla modeller och all typ av data. Alla funktioner innehar istället olika för- och nackdelar beroende på modellens komplexitet och vilken typ av data som används, samt antalet prediktorer och till vilken grad korrelation mellan dem finns. Sammanfattningsvis, har funktionerna i denna uppsats påvisat betydande flexibilitet och tolkningsbarhet, som i sin tur visar på att det finns enkla metoder som kan användas för att tackla utmaningen med förklarbarhet hos black-box modeller. Probability Theory and Statistics Sannolikhetsteori och statistik
445	Utvärdering av rule of ten för logistisk regression / Evaluating Rule of Ten for Logistic Regression Boman, Tobias, Vinberg, Jakob January 2023 (has links) Tidigare studier har visat att koefficientskattningar för logistisk regression inte är pålitliga när EPV (events per variable, händelser per variabel) är lågt. Baserat på dessa studier har en tumregel på minst 10 EPV föreslagits. Tumregeln kallas ’rule of ten’ och är vad som har undersökts i den här studien. För att utvärdera tumregeln gjordes en simuleringsstudie, i programmeringsspråket R, där nya datamaterial genererades baserat på ett verkligt datamaterial. 500 datamaterial genererades för varje stickprovsstorlek och EPV. För varje datamaterial skattades nya modeller och modellernas koefficienter användes för utvärderingen. Totalt analyserades 10 olika EPV och 18 stickprovsstorlekar. Resultaten bekräftar tidigare studier som visat att flera problem kan uppstå vid låga EPV och att stickprovsstorleken har mindre påverkan på resultaten. Problemen är också starkt relaterade till sambandet mellan förklarings- och responsvariabeln. / Previous studies have shown that coefficient estimates for logistic regression are not reliable when EPV (events per variable, events per variable) is low. Based on these studies, a rule of thumb of at least 10 EPV has been proposed. The rule of thumb is called the 'rule of ten' and is what is being investigated in this study. To evaluate the rule of thumb, a simulation study was performed in the programming language R, where new datasets were generated based on an original dataset. 500 datasets were generated for each sample size and EPV. For each dataset, new models were evaluated which are evaluated by comparisons with the original model. A total of 10 different EPVs and 18 sample sizes were analyzed. The results confirm previous studies that have shown that several problems can occur at low EPV and that the sample size has a lesser effect on the results. The problems are also strongly related to the relationship between the independent and the dependent variable. Probability Theory and Statistics Sannolikhetsteori och statistik
446	Characteristics of the Swedish people that would be in favour of Sweden leaving the European Union Andersson, Felix, Tsai, Andy January 2023 (has links) The European Union has been a debated topic, particularly after the United Kingdom voted to leave the union in 2016. This thesis investigates the underlying reasons as to why a Swedish individual would want Sweden to leave the European Union, and the characteristics of these individuals. With data from the European Social Survey and through forward selection, a combination of parameters that constitutes the final logistic regression model is found. It is found that a Swedish individual that is hesitant towards the European Union is more probable to be hesitant towards immigration into the country, and to vote for the Sweden Democrats. Furthermore, such an individual is less likely to strongly believe that the political system allows them to have an influence on politics, and less likely to feel emotionally attached to Europe. Probability Theory and Statistics Sannolikhetsteori och statistik
447	Predicting periodontitis : An in depth study aiming to make accurate predictions of periodontitis using regularized gradient boost - XGBoost Lillrank, Erik, Hoffstedt, Jacob January 2023 (has links) No description available. Probability Theory and Statistics Sannolikhetsteori och statistik
448	Exploring the Importance of Women's Educational Attainment in HIV Risk Prediction : A Comparative Study of Logistic Regression, Random Forest and XGBoost Wallkulle, Towe January 2023 (has links) Due to extensive HIV testing and treatment programs, the rate of new HIV infections has declined in recent years. However, young women South of the Sahara continue to be disproportionately burdened by the epidemic. The aim of this thesis is to explore the complex association between women's educational attainment and HIV prevalence. For this aim, data from the most recent demographic and health survey in Zambia is used. Recent literature has highlighted the potential use of statistical machine learning algorithms in HIV risk prediction. This thesis investigates how a classical statistical method, logistic regression, compares to tree-based ensemble prediction methods. The results suggest that the latter methods outperform logistic regression in terms of classification accuracy. In line with previous results, the logistic regression analysis shows that higher education is negatively associated with HIV prevalence, when including an interaction term in the model specification. In contrary, results from the machine learning models do not provide sufficient evidence that women's education is a relatively important predictor of HIV prevalence in Zambia. Results from feature selection suggest that future research could be conducted with less extensive data collection, as the tree-based methods are found to perform well on a smaller subset of variables. Probability Theory and Statistics Sannolikhetsteori och statistik
449	Assessment of the uncertainty in small and large dimensional portfolio allocation Thorsén, Erik January 2019 (has links) Portfolio theory is a large subject with many branches. In this thesis we concern ourselves with one of these, the precense of uncertainty in the portfolio allocation problem and in turn, what it leads to. There are many forms of uncertainty, we consider two of these. The first being the optimization problem itself and optimizing what might be the wrong objective. In the classical mean-variance portfolio problem we aim to provide a portfolio with the smallest risk while we constrain the mean. However, in practice we might not assign a fixed portfolio goal but assign probabilities to the amount of return a portfolio might give and its relation to benchmarks. That is, we assign quantiles of the portfolio return distribution. In this scenario, the use of the portfolio mean as a return measure could be misleading. It does not take any quantile into account! In the first paper, we exchange the portfolio moments to quantile-based measures in the portfolio selection problem. The properties of the quantilebased portfolio selection problem is thereafter investigated with two different (quantile-based) measures of risk. We also present a closed form solution under the assumption that the returns follow an elliptical distribution. In this specific case the portfolio is shown to be mean-variance efficient. The second paper takes on a different type of uncertainty which is classic to statistics, the problem of estimation uncertainty. We consider the sample estimators of the mean vector and of the covariance matrix of the asset returns and integrate the uncertainty these provide into a large class of optimal portfolios. We derive the sampling distribution, of the estimated optimal portfolio weights, which are obtained in both small and large dimensions. This consists of deriving the joint distribution of several quantities and thereafter specifying their high dimensional asymptotic distribution. Probability Theory and Statistics Sannolikhetsteori och statistik
450	FACTORS DRIVING RESIDENTIAL PRICES IN BOSTON IN THE 1980’S. / DRIVANDE FAKTORER FÖR BOSTADSPRISER I BOSTON PÅ 1980-TALET. Sebastian Malmgren, Sebastian, Hammaréus, Martin January 2022 (has links) This thesis report analyzes how different variables affected housing pricing in Boston in the 1980s. The goal was to form a deeper understanding of what could affect pricing for properties, both now and then. The method for analyzing this is a multiple linear regression analysis. The theory behind the work is based on linear regression and macroeconomics. The model is based on data collected by Harrison, D. and Rubinfeld, D.L. and obtained from The Carnegie Mellon University, Pittsburgh, Pennsylvania. The data consists of 506 observations with 20 variables each. The median value of the properties in a specific area measured in thousands of dollars is used as the response variable.The results of the report show that the number of rooms per dwelling has the largest effect on dwelling price, accounting for almost 40% of the influence in the final model, among the variables that were used. In second place comes the number of teachers per student and in third place is the tax rate in the area.The model result taken from showed good ability to approximately predict housing prices, with an adjusted R2 value of 0.8065. / Denna avhandlingsrapport analyserar hur olika variabler påverkade prissättningen av bostäder i Boston på 1980 talet. Målet var att bilda en djupare förståelse i vad som kan påverka prissättningen för fastigheter i då och nutid. Metoden för att analysera detta är en multipel linjär regressionsanalys. Teorin bakom arbetet bygger på linjär regression och makroekonomi. Modellen är byggd på data insamlad av Harrison, D. and Rubinfeld, D.L och inhämtad från The Carnegie Mellon University, Pittsburgh, Pennsylvania. Datan består av 506 observationer med 20 variabler vardera. Som målvariabel används medianvärdet av fastigheterna i ett specifikt område uppmätt i tusentals dollar.Resultatet av rapporten visar att antal rum per fastighet påverkar mest bland de variabler som finns tillgängliga. Därefter kommer måttet antalet lärare per elev och på tredje plats kommer skattesatsen i området.Modellen resultatet togs ifrån uppvisade god förmåga att approximativt förutspå bostadspriserna, med ett justerat R2 värde på 0.8065. Probability Theory and Statistics Sannolikhetsteori och statistik

Search results