• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1373
  • 380
  • 379
  • 77
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 2521
  • 1657
  • 1214
  • 1211
  • 1199
  • 455
  • 390
  • 363
  • 341
  • 341
  • 324
  • 323
  • 318
  • 308
  • 239
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
441

Developing a selection of credit scoring models based on customer data / Utveckling av ett urval av kreditvärdighetsmodeller baserat på kunddata

Eriksson, Thomas, Petkov, Tomas January 2019 (has links)
Consumer credits are becoming increasingly popular and widespread in Sweden, with many actors trying to establish themselves on the market. In this thesis, we develop a selection of quantitative models for credit scoring, based on logistic regression and decision trees. These models may be used to reduce the number of credits approved to customers who are likely to default, and are mainly intended for e.g. newly started credit institutes who lack a statistically rigorous credit approval process, relying instead on qualitative, subjective judgements. / Konsumentkrediter blir allt vanligare och populärare i Sverige, med många aktörer som försöker etablera sig på marknaden. I denna avhandling utvecklar vi ett urval av kvantitativa modeller för bedömning av kreditvärdighet, baserade på logistisk regression och beslutsträd. Dessa modeller kan användas för att reducera antalet krediter som ställs ut till kunder med bristande betalningsförmåga, och riktas huvudsakligen till ex. nystartade kreditinstitut som saknar en statistiskt rigorös kreditbedömningsprocess, utan istället förlitar sig på kvalitativa och subjektiva bedömningar.
442

Assessing mixed-modesurvey effects:A methodological comparison

Kroon, William January 2023 (has links)
No description available.
443

Supervised classification for human movement data : A comparative study of functional and traditional methods / En jämförelse av övervakade klassificeringsmetoder för mänskligt rörelsedata : Funktionella mot diskreta tillvägagångssätt

Lindvall, Markus January 2023 (has links)
Functional data analysis (FDA) has a growing importance in statistics, especially in disciplines like biomechanics, where it is common to observe data over time. The objective of this thesis is to employ FDA techniques and compare the classification performance of supervised classification models utilizing functional data with the same models using a discrete summary measurement (max-values) as input. The classification revolves around predicting whether individuals, who underwent anterior cruciate ligament (ACL) reconstruction surgery, have a limb symmetry index higher or lower than 90% based on their observed movements during squat and step-down exercises executed, on average, nine months prior. Expanding the knowledge of the rehabilitation process after ACL injuries is not only interesting for affected individuals but can also improve the utilization of medical resources and reduce societal costs.  The data comes from two occasions where 15 individuals who underwent ACL surgery performed the specified exercises. A total of 72 functional variables related to joint angles and moments, along with five additional univariate variables (e.g., the time elapsed between surgery and the first test occasion), were considered as potential predictors. Following an initial variable selection process using permutation tests, 14 variables were used separately in three different classification models: support vector machine (SVM), k-nearest neighbors (KNN), and naïve Bayes (NB). The classification performance was evaluated by the correct classification rate by leave-one-out cross-validation.  The results showed that, when considering the variable that yielded the highest accuracy for each specific method, models utilizing functional data generally outperformed their counterparts using max-values. With functional data, SVM achieved an accuracy of 100%, KNN 93%, and NB 80%. The accuracy using max-values as input was 87%, 87%, and 80% for the SVM, KNN, and NB, respectively. / Funktionell dataanalys (FDA) spelar en alltmer betydande roll inom statistik, särskilt i områden som biomekanik där det är vanligt att observera data över tid. Syftet med denna studie är att tillämpa FDA-metoder för att jämföra klassificeringsresultat för olika klassificeringsmodeller när det nyttjar funktionellt data kontra när maxvärden används som indata. Klassificeringen innebär prediktion av huruvida individer som genomgått operation av främre korsbandet (ACL) har ett symmetriindex högre eller lägre än 90% baserat på deras observerade rörelser under knäböj och nedstegsövningar utförda, i genomsnitt, nio månader tidigare. Genom att utvidga kunskapen om rehabiliteringsprocessen efter ACL-skador kan effektiviteten av medicinska resurser samt rehabiliteringen för den enskilda individen förbättras.  Datamaterialet kommer från två tillfällen där 15 personer som genomgått ACL-rekonstruktion utförde olika övningar. Totalt undersöks 72 funktionella variabler relaterade till vinklar och moment tillsammans med fem andra univariata variabler (t.ex. tiden mellan operation och första testtillfället) som potentiella prediktorer. Efter en inledande variabelurvalsprocess med hjälp av permutationstester användes 14 variabler separat i tre olika klassificeringsmodeller: support vector machine (SVM), k-nearest neighbors (KNN) och naive Bayes (NB). Klassificeringsprestationen utvärderades genom andelen korrekt klassificerade observationer vid korsvalidering där en observation exkluderades åt gången.  Resultaten visade, när man beaktade den variabel som gav högst korrekt klassificeringsgrad för varje specifik metod, att modellerna som nyttjade funktionellt data i allmänhet överträffade sina motsvarigheter med maxvärden som indata. Med funktionellt data uppnådde SVM en korrekt klassificeringsgrad på 100%, KNN 93% och NB uppvisade 80% korrekt klassificering. Klassificeringsgraden för de klassificeringsmodellerna med maxvärden som indata var 87%, 87% och 80% för SVM, KNN respektive NB.
444

Explainable modeling in machine learning : A comparative study / Förklarande modellering inom maskininlärning : en jämförande studie

Stålberg, Simon, Isaksson, Olivia January 2023 (has links)
As the use of advanced machine learning models has increased, the need for explainability that these models lack concerning their prediction has increased simultaneously. The aim of this thesis is to compare different functions available in the program R regarding their ability to provide explainability for these advanced machine learning models, also commonly referred to as black-box models. In this thesis we compare eight functions. Four well known black-box models are implemented on four different datasets in order to compare the functions’ ability to provide explanation in different settings. In our comparative analysis, we evaluate various aspects to assess and contrast each function, including explainability, flexibility, and functionality.  Regardless of which model or dataset the explainability functions are exposed to, they are all capable of producing explainability plots. A result that showcases the high level of flexibility that every function holds. The result also provides an insight into how there is not one optimal function suitable for all of the models and datasets. All of the functions instead possess various advantages and disadvantages depending on the complexity of the models and which type of data being used. It is also evident that the number of included features and level of independence between the features has various effects on different functions. In conclusion, the functions in this thesis displayed a combination of significant flexibility and explainability, providing straightforward approaches to addressing the challenge of explainability in black-box models. / I samband med att användandet av avancerade maskininlärningsmodeller har ökat, har samtidigt behovet av förklarbarhet som dessa modeller saknar gällande deras prediktioner ökat. Syftet med denna uppsats är att jämföra olika funktioner som finns tillgängliga i programvaran R gällande deras förmåga att öka förklarbarheten för avancerade maskininlärningsmodeller, ofta benämnda som black-box modeller. I denna uppsats jämför vi åtta olika funktioner. Fyra välkända black-box modeller implementeras på fyra olika dataset för att kunna jämföra de olika funktionernas förmåga att förse förklarbarhet i olika sammanhang. I vår jämförande analys utvärderar vi olika aspekter för att bedöma och kontrastera varje funktion, inklusive förklarbarhet, flexibilitet och funktionalitet.  Oavsett modell eller dataset som används, kan varje funktion visualisera förklarbarhet för black-box modeller. Ett resultat som påvisar den höga nivån av flexibilitet varje funktion innehar. Resultatet visar även på att det inte finns en optimal funktion som passar alla modeller och all typ av data. Alla funktioner innehar istället olika för- och nackdelar beroende på modellens komplexitet och vilken typ av data som används, samt antalet prediktorer och till vilken grad korrelation mellan dem finns. Sammanfattningsvis, har funktionerna i denna uppsats påvisat betydande flexibilitet och tolkningsbarhet, som i sin tur visar på att det finns enkla metoder som kan användas för att tackla utmaningen med förklarbarhet hos black-box modeller.
445

Utvärdering av rule of ten för logistisk regression / Evaluating Rule of Ten for Logistic Regression

Boman, Tobias, Vinberg, Jakob January 2023 (has links)
Tidigare studier har visat att koefficientskattningar för logistisk regression inte är pålitliga när EPV (events per variable, händelser per variabel) är lågt. Baserat på dessa studier har en tumregel på minst 10 EPV föreslagits. Tumregeln kallas ’rule of ten’ och är vad som har undersökts i den här studien. För att utvärdera tumregeln gjordes en simuleringsstudie, i programmeringsspråket R, där nya datamaterial genererades baserat på ett verkligt datamaterial. 500 datamaterial genererades för varje stickprovsstorlek och EPV. För varje datamaterial skattades nya modeller och modellernas koefficienter användes för utvärderingen. Totalt analyserades 10 olika EPV och 18 stickprovsstorlekar. Resultaten bekräftar tidigare studier som visat att flera problem kan uppstå vid låga EPV och att stickprovsstorleken har mindre påverkan på resultaten. Problemen är också starkt relaterade till sambandet mellan förklarings- och responsvariabeln. / Previous studies have shown that coefficient estimates for logistic regression are not reliable when EPV (events per variable, events per variable) is low. Based on these studies, a rule of thumb of at least 10 EPV has been proposed. The rule of thumb is called the 'rule of ten' and is what is being investigated in this study. To evaluate the rule of thumb, a simulation study was performed in the programming language R, where new datasets were generated based on an original dataset. 500 datasets were generated for each sample size and EPV. For each dataset, new models were evaluated which are evaluated by comparisons with the original model. A total of 10 different EPVs and 18 sample sizes were analyzed. The results confirm previous studies that have shown that several problems can occur at low EPV and that the sample size has a lesser effect on the results. The problems are also strongly related to the relationship between the independent and the dependent variable.
446

Characteristics of the Swedish people that would be in favour of Sweden leaving the European Union

Andersson, Felix, Tsai, Andy January 2023 (has links)
The European Union has been a debated topic, particularly after the United Kingdom voted to leave the union in 2016. This thesis investigates the underlying reasons as to why a Swedish individual would want Sweden to leave the European Union, and the characteristics of these individuals. With data from the European Social Survey and through forward selection, a combination of parameters that constitutes the final logistic regression model is found. It is found that a Swedish individual that is hesitant towards the European Union is more probable to be hesitant towards immigration into the country, and to vote for the Sweden Democrats. Furthermore, such an individual is less likely to strongly believe that the political system allows them to have an influence on politics, and less likely to feel emotionally attached to Europe.
447

Predicting periodontitis : An in depth study aiming to make accurate predictions of periodontitis using regularized gradient boost - XGBoost

Lillrank, Erik, Hoffstedt, Jacob January 2023 (has links)
No description available.
448

Exploring the Importance of Women's Educational Attainment in HIV Risk Prediction : A Comparative Study of Logistic Regression, Random Forest and XGBoost

Wallkulle, Towe January 2023 (has links)
Due to extensive HIV testing and treatment programs, the rate of new HIV infections has declined in recent years. However, young women South of the Sahara continue to be disproportionately burdened by the epidemic. The aim of this thesis is to explore the complex association between women's educational attainment and HIV prevalence. For this aim, data from the most recent demographic and health survey in Zambia is used. Recent literature has highlighted the potential use of statistical machine learning algorithms in HIV risk prediction. This thesis investigates how a classical statistical method, logistic regression, compares to tree-based ensemble prediction methods. The results suggest that the latter methods outperform logistic regression in terms of classification accuracy. In line with previous results, the logistic regression analysis shows that higher education is negatively associated with HIV prevalence, when including an interaction term in the model specification. In contrary, results from the machine learning models do not provide sufficient evidence that women's education is a relatively important predictor of HIV prevalence in Zambia. Results from feature selection suggest that future research could be conducted with less extensive data collection, as the tree-based methods are found to perform well on a smaller subset of variables.
449

Assessment of the uncertainty in small and large dimensional portfolio allocation

Thorsén, Erik January 2019 (has links)
Portfolio theory is a large subject with many branches. In this thesis we concern ourselves with one of these, the precense of uncertainty in the portfolio allocation problem and in turn, what it leads to. There are many forms of uncertainty, we consider two of these. The first being the optimization problem itself and optimizing what might be the wrong objective. In the classical mean-variance portfolio problem we aim to provide a portfolio with the smallest risk while we constrain the mean. However, in practice we might not assign a fixed portfolio goal but assign probabilities to the amount of return a portfolio might give and its relation to benchmarks. That is, we assign quantiles of the portfolio return distribution. In this scenario, the use of the portfolio mean as a return measure could be misleading. It does not take any quantile into account! In the first paper, we exchange the portfolio moments to quantile-based measures in the portfolio selection problem. The properties of the quantilebased portfolio selection problem is thereafter investigated with two different (quantile-based) measures of risk. We also present a closed form solution under the assumption that the returns follow an elliptical distribution. In this specific case the portfolio is shown to be mean-variance efficient. The second paper takes on a different type of uncertainty which is classic to statistics, the problem of estimation uncertainty. We consider the sample estimators of the mean vector and of the covariance matrix of the asset returns and integrate the uncertainty these provide into a large class of optimal portfolios. We derive the sampling distribution, of the estimated optimal portfolio weights, which are obtained in both small and large dimensions. This consists of deriving the joint distribution of several quantities and thereafter specifying their high dimensional asymptotic distribution.
450

FACTORS DRIVING RESIDENTIAL PRICES IN BOSTON IN THE 1980’S. / DRIVANDE FAKTORER FÖR BOSTADSPRISER I BOSTON PÅ 1980-TALET.

Sebastian Malmgren, Sebastian, Hammaréus, Martin January 2022 (has links)
This thesis report analyzes how different variables affected housing pricing in Boston in the 1980s. The goal was to form a deeper understanding of what could affect pricing for properties, both now and then. The method for analyzing this is a multiple linear regression analysis. The theory behind the work is based on linear regression and macroeconomics. The model is based on data collected by Harrison, D. and Rubinfeld, D.L. and obtained from The Carnegie Mellon University, Pittsburgh, Pennsylvania. The data consists of 506 observations with 20 variables each. The median value of the properties in a specific area measured in thousands of dollars is used as the response variable.The results of the report show that the number of rooms per dwelling has the largest effect on dwelling price, accounting for almost 40% of the influence in the final model, among the variables that were used. In second place comes the number of teachers per student and in third place is the tax rate in the area.The model result taken from showed good ability to approximately predict housing prices, with an adjusted R2 value of 0.8065. / Denna avhandlingsrapport analyserar hur olika variabler påverkade prissättningen av bostäder i Boston på 1980 talet. Målet var att bilda en djupare förståelse i vad som kan påverka prissättningen för fastigheter i då och nutid. Metoden för att analysera detta är en multipel linjär regressionsanalys. Teorin bakom arbetet bygger på linjär regression och makroekonomi. Modellen är byggd på data insamlad av Harrison, D. and Rubinfeld, D.L och inhämtad från The Carnegie Mellon University, Pittsburgh, Pennsylvania. Datan består av 506 observationer med 20 variabler vardera. Som målvariabel används medianvärdet av fastigheterna i ett specifikt område uppmätt i tusentals dollar.Resultatet av rapporten visar att antal rum per fastighet påverkar mest bland de variabler som finns tillgängliga. Därefter kommer måttet antalet lärare per elev och på tredje plats kommer skattesatsen i området.Modellen resultatet togs ifrån uppvisade god förmåga att approximativt förutspå bostadspriserna, med ett justerat R2 värde på 0.8065.

Page generated in 0.0763 seconds