• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Imputation techniques for non-ordered categorical missing data

Karangwa, Innocent January 2016 (has links)
Philosophiae Doctor - PhD / Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
2

Prognostic Stratification in Patients with Left Heart Disease : A Machine Learning Approach / Prognostisk stratifiering hos patienter med vänstersidig hjärtsvikt : En maskininlärningsmetod

Saleh, Mariam January 2024 (has links)
Left heart disease often results in left heart failure and right ventricular dysfunction which is challenging to diagnose with traditional diagnostic approaches. To address this a novel empirical 4-point right ventricular dysfunction score was created at Sahlgrenska University Hospital to overcome the limitations of single variables for diagnosing right ventricular dysfunction. In this study, we used machine learning, more specifically XGBoost coupled with interactive machine learning to develop four different models for predicting death or receiving a left ventricular assist device in patients with left heart disease (n=486). Features were selected from the dataset using recursive feature elimination with the default number of features. The initial model with 29 features, called the baseline model served as the foundation of the three additional models, each adjusted based on feedback from a clinician. The first step of feedback included removing features due to high correlation, creating a modified model with 12 features, the second step was to use 12 well-known characteristics of left and right ventricular dysfunction creating an empirical model, and adjusting the prediction threshold from 50% to 60%. The third step was to reduce the number of features to 5 based on empirical grounds. The models were compared to the right ventricular dysfunction score using the metrics area under the curve, f1 score, positive likelihood ratio, and negative likelihood ratio. The predictive efficacy of the machine learning models was superior compared to the right ventricular dysfunction score. The results also indicated that the models did neither improve nor deteriorate when reducing the number of features. However, insufficient accuracy indicates that none of the machine learning models are clinically viable. These results show the potential of machine learning in enhancing prognostic stratification in patients with left heart disease although further refinement is necessary for clinical use. / Vänstersidig hjärtsjukdom resulterar ofta i vänstersidig hjärtsvikt och högerkammarsvikt vilket är utmanade att diagnostisera med traditionella diagnostiska metoder. För att komma undan med begränsningen med enskilda variabler för att diagnostisera högerkammarsvikt skapades ett 4 poängs högerkammarsvikt score vid Sahlgrenska Universitetssjukhuset. I denna studie användes en XGBoost-algoritm kombinerat med interaktiv maskininlärning för att utveckla fyra olika prediktions modeller för att förutsäga dödlighet eller risken att få en mekanisk hjärtpump för vänster kammare hos patienter med vänster hjärtsvikt (n=486). Variabler valdes från datamängden med hjälp av rekursiv funktionseliminering med ett standardantal variabler. Den initiala modellen med 29 variabler kallades baslinjemodellen och fungerade som grunden för de tre ytterligare modellerna som justerades baserat på klinikerns feedback. Det först steget inkluderade att ta bort variabler med inbördes hög korrelation och vi skapade en modifierad modell med 12 variabler. I det andra steget i den empiriska modellen använde vi 12 kända egenskaperna vid vänsterkammar- och högerkammarsvikt och för båda justerades tröskelvärdet för prediktion från 50% till 60%. I ett tredje steg skapade vi en förenklad modell med 5 variabler ut ifrån klinisk grund. Modellerna jämfördes med höger hjärtsvikts 4 poängskalan med hjälp av mätvariablerna area under kurvan, f1-poäng, positivt sannolikhets ratio och negativt sannolikhets ratio. Detta avslöjade att maskininlärnings modellerna hade bättre prediktiv förmåga än 4-poängs högerkammarsvikt score. Dessutom visade resultatet att modellerna inte försämrades eller förbättrades när variabler valdes bort eller när nya modeller skapades på klinisk grund. Dock hade maskininlärnings modellerna otillräcklig noggrannhet för klinisk användning.
3

Stressful Events and Religious Identities: Investigating the Risk of Radical Accommodation

Uzdavines, Alex 30 August 2017 (has links)
No description available.

Page generated in 0.187 seconds