Spelling suggestions: "subject:"[een] IMPUTATION"" "subject:"[enn] IMPUTATION""
141 |
Essays in Political MethodologyBlackwell, Matthew 24 July 2012 (has links)
This dissertation provides three novel methodologies to the field of political science. In the first chapter, I describe how to make causal inferences in the face of dynamic strategies. Traditional causal inference methods assume that these dynamic decisions are made all at once, an assumption that forces a choice between omitted variable bias and post-treatment bias. I resolve this dilemma by adapting methods from biostatistics and use these methods to estimate the effectiveness of an inherently dynamic process: a candidate's decision to "go negative." Drawing on U.S. statewide elections (2000-2006), I find, in contrast to the previous literature, that negative advertising is an effective strategy for non-incumbents. In the second chapter, I develop a method for handling measurement error. Social scientists devote considerable effort to mitigating measurement error during data collection but then ignore the issue during analysis. Although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. This chapter develops an easy-to-use alternative without these problems as a special case of extreme measurement error and corrects for both. In the final chapter, I introduce a model for detecting changepoints in the distribution of contributions data because it allows for overdispersion, a key feature of contributions data. While many extant changepoint models force researchers to choose the number of changepoint ex ante, the game-changers model incorporates a Dirichlet process prior in order to estimate the number of changepoints along with their location. I demonstrate the usefulness of the model in data from the 2012 Republican primary and the 2008 U.S. Senate elections. / Government
|
142 |
Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes / Multiple imputation using principal component methods : A new methodology to deal with missing valuesAudigier, Vincent 25 November 2015 (has links)
Cette thèse est centrée sur le développement de nouvelles méthodes d'imputation multiples, basées sur des techniques d'analyse factorielle. L'étude des méthodes factorielles, ici en tant que méthodes d'imputation, offre de grandes perspectives en termes de diversité du type de données imputées d'une part, et en termes de dimensions de jeux de données imputés d'autre part. Leur propriété de réduction de la dimension limite en effet le nombre de paramètres estimés.Dans un premier temps, une méthode d'imputation simple par analyse factorielle de données mixtes est détaillée. Ses propriétés sont étudiées, en particulier sa capacité à gérer la diversité des liaisons mises en jeu et à prendre en compte les modalités rares. Sa qualité de prédiction est éprouvée en la comparant à l'imputation par forêts aléatoires.Ensuite, une méthode d'imputation multiple pour des données quantitatives basée sur une approche Bayésienne du modèle d'analyse en composantes principales est proposée. Elle permet d'inférer en présence de données manquantes y compris quand le nombre d'individus est petit devant le nombre de variables, ou quand les corrélations entre variables sont fortes.Enfin, une méthode d'imputation multiple pour des données qualitatives par analyse des correspondances multiples (ACM) est proposée. La variabilité de prédiction des données manquantes est reflétée via un bootstrap non-paramétrique. L'imputation multiple par ACM offre une réponse au problème de l'explosion combinatoire limitant les méthodes concurrentes dès lors que le nombre de variables ou de modalités est élev / This thesis proposes new multiple imputation methods that are based on principal component methods, which were initially used for exploratory analysis and visualisation of continuous, categorical and mixed multidimensional data. The study of principal component methods for imputation, never previously attempted, offers the possibility to deal with many types and sizes of data. This is because the number of estimated parameters is limited due to dimensionality reduction.First, we describe a single imputation method based on factor analysis of mixed data. We study its properties and focus on its ability to handle complex relationships between variables, as well as infrequent categories. Its high prediction quality is highlighted with respect to the state-of-the-art single imputation method based on random forests.Next, a multiple imputation method for continuous data using principal component analysis (PCA) is presented. This is based on a Bayesian treatment of the PCA model. Unlike standard methods based on Gaussian models, it can still be used when the number of variables is larger than the number of individuals and when correlations between variables are strong.Finally, a multiple imputation method for categorical data using multiple correspondence analysis (MCA) is proposed. The variability of prediction of missing values is introduced via a non-parametric bootstrap approach. This helps to tackle the combinatorial issues which arise from the large number of categories and variables. We show that multiple imputation using MCA outperforms the best current methods.
|
143 |
Analys av bortfallets påverkan i Riksstrokes kvalitetsregister / Analysis of the impact due to missing data in Riksstrokes quality registerAndersson, Tore, Borgström, Jonas January 2020 (has links)
Akut stroke är en allvarlig och livshotande sjukdom som ofta leder till fysiska och kognitiva funktionsnedsättningar. Riksstroke är ett kvalitetregister som samlar in och tillhandahåller information om strokevården i Sverige. Under 2019–2020 pågår ett omfattande valideringsarbete där analys av bortfallet inom registret utförs. Syftet med uppsatsen var att som i en del av detta arbete analysera omfattningen av bortfallet i flera faktorer och om det fanns en skillnad mellan grupperna kön, ålder och sjukhus. Därefter testades två metoder för bortfallshantering, complete case analysis och multipel imputations by chained equation (MICE). Dessa utvärderades genom att jämföra de skattade oddskvoterna för död inom 90 dagar efter inskrivning på sjukhus. Resultatet visade att det fanns stora skillnader i bortfall mellan män och kvinnor, åldersgrupper och sjukhusen. Där kan en stor del av skillnaden i bortfall troligtvis kan förklaras av åldern på patienterna. Det två utvärderade metoderna producerade jämförbara resultat.
|
144 |
Neural networks for imputation of missing genotype data : An alternative to the classical statistical methods in bioinformaticsAndersson, Alfred January 2020 (has links)
In this project, two different machine learning models were tested in an attempt at imputing missing genotype data from patients on two different panels. As the integrity of the patients had to be protected, initial training was done on data simulated from the 1000 Genomes Project. The first model consisted of two convolutional variational autoencoders and the latent representations of the networks were shuffled to force the networks to find the same patterns in the two datasets. This model was unfortunately unsuccessful at imputing the missing data. The second model was based on a UNet structure and was more successful at the task of imputation. This model had one encoder for each dataset, making each encoder specialized at finding patterns in its own data. Further improvements are required in order for the model to be fully capable at imputing the missing data.
|
145 |
Changes in the sexual function of male patients with rectal cancer over a 2‐year period from diagnosis to 24‐month follow‐up: A prospective, multicenter, cohort study / 男性直腸癌に対する腹腔鏡下根治術後の性機能推移:多施設共同前向き観察研究Sakamoto, Takashi 23 March 2021 (has links)
京都大学 / 新制・課程博士 / 博士(医学) / 甲第23075号 / 医博第4702号 / 新制||医||1049(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 川上 浩司, 教授 近藤 尚己, 教授 小川 修 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM
|
146 |
Human leukocyte antigen (HLA) genetic diversity in South African populationsTshabalala, Mqondisi January 2018 (has links)
There is documented evidence of high genetic diversity amongst African populations, but there is limited data on human leukocyte antigen (HLA) diversity in these populations. HLA genes are highly polymorphic, and encode for proteins that are part of the host defence mechanism mediated through antigen presentation to immune system effector cells. The highly polymorphic nature of HLA genes facilitates the presentation of a wide range of antigenic peptides to the immune system leading to an immune response. With the high disease burden in Africa, it is important to fully understand HLA diversity in these populations, to establish HLA-disease associations, and potentially use this data for the informed design of population-specific vaccines against the many diseases, and to improve on donor-recipient matching. The aim of this thesis is to understand HLA diversity in South African populations to support transplantation programs, add knowledge on human diversity and build a potential future resource for disease association and population studies.
There is generally limited HLA data from southern African populations (Chapter 2) to support disease association studies, provide guidance in vaccine design and donor recruitment for transplantation programs. Despite being the only active bone marrow donor registry in Africa supporting transplantation programs, HLA diversity in volunteer bone marrow donors registered at the South African Bone Marrow Registry (SABMR) is largely undocumented. This study documents HLA -A, -B, -C, -DRB1 and -DQB1 allele and haplotype frequencies from a subset of 237 SABMR registered donors with the objective of highlighting HLA diversity in South Africans (Chapter 3). Additionally, mixed resolution HLA data from the National Health Laboratory Services (NHLS) and the South African National Blood Transfusion Service (SANBS) are reported (Chapter 4). A comparison of South African HLA data (NHLS and SANBS) with other global populations including sub Saharan Africans confirm the genetic diversity of South Africans. To counter the paucity of HLA data, in silico HLA imputation tools may be used to determine HLA alleles from existing whole genome sequencing (WGS) data. HLA imputation is an economically feasible typing option for resource limited settings. To support the feasibility of HLA imputation, this study describes high resolution (up to 8 digit typing) HLA alleles determined by in silico HLA imputation tools from 24 WGS of South African individuals (chapter 5). Generally, HLA diversity of South African populations is described in detail through literature meta-analysis, documentation of previously typed individuals (SANBS, NHLS and SABMR) and HLA imputation from existing next generation sequencing (NGS) data. Although results reported here are from a small subset of 237 SABMR registered donors (chapter 3), 24 WGS (chapter 5) and mixed resolution typing NHLS and SANBS data (chapter 4), allele and haplotype frequencies generated could be a useful resource for future anthropological and population genetics studies. Furthermore, these findings may better inform donor recruitment strategies for the SABMR, and disease association studies. Future study recommendations include development of an HLA diversity resource for African populations, a comparison of large SABMR dataset with other global registries, and using more robust assembly based computational tools to fully understand the HLA diversity in South Africans. / Thesis (PhD)--University of Pretoria, 2018. / South African Medical Research Council (SAMRC) in terms of the MRC’s Flagships Awards Project (SAMRC-RFA-UFSP-01-2013/STEM CELLS), the SAMRC Extramural Unit for stem cell Research and Therapy, the Institute for Cellular and Molecular Medicine of the University of Pretoria, and the National Research Foundation of South Africa. / Immunology / PhD Medical Immunology / Unrestricted
|
147 |
Evaluation verschiedener Imputationsverfahren zur Aufbereitung großer Datenbestände am Beispiel der SrV-Studie von 2013Meister, Romy 09 March 2016 (has links)
Missing values are a serious problem in surveys. The literature suggests to replace these with realistic values using imputation methods. This master thesis examines four different imputation techniques concerning their ability for handling missing data. Therefore, mean imputation, conditional mean imputation, Expectation-Maximization algorithm and Markov-Chain-Monte-Carlo method are presented. In addition, the three first mentioned methods were simulated by using a large real data set. To analyse the quality of these techniques a metric variable of the original data set was chosen to generate some missing values considering different percentages of missingness and common missing data mechanism. After the replacement of the simulated missing values, several statistical parameters, like quantiles, arithmetic mean and variance of all completed data sets were calculated in order to compare them with the parameters from the original data set. The results, that have been established by empiric data analysis, show that the Expectation-Maximization algorithm estimates all considered statistical parameters of the complete data set far better than the other analysed imputation methods, although the assumption of a multivariate normal distribution could not be achieved. It is found, that the mean as well as the conditional mean imputation produce statistically significant estimator for the arithmetic mean under the supposition of missing completely at random, whereas other parameters as the variance do not show the estimated effects. Generally, the accuracy of all estimators from the three imputation methods decreases with increasing percentage of missingness. The results lead to the conclusion that the Expectation-Maximization algorithm should be preferred over the mean and the conditional mean imputation.
|
148 |
Genome-wide Genotype Imputation-Aspects of Quality, Performance and Practical ImplementationRoshyara, Nab Raj 06 August 2020 (has links)
Finding a relation between a particular phenotype and genotype is one of the central themes in medical genetics. Single-nucleotide polymorphisms are easily assessable markers allowing genome wide association (GWA) studies and meta-analysis. Hundreds of such analyses were performed in the last decades. Even though several tools for such analyses are available, an efficient SNP-data transformation tool was tool was necessary. We developed a data management tool fcGENE which allows us easy transformation of genetic data into different formats required by different GWA tools.
Genotype imputation which is a common technique in GWA, allows us to study the relationship of a phenotype at markers that are missing and even at completely un-typed markers. Moreover this technique helps us to infer both common and rare variants that are not directly typed. We studied different aspects of the imputation processes especially focussing on its accuracy. More specifically, our focus lied on the impact of pre-imputation filtering on the accuracy of imputation results. To measure the imputation accuracy, we defined two new statistical sores, which allowed us the comparison between imputed and true genotypes directly. Our direct comparison between the true and imputed genotypes showed that strict quality filtering of SNPs prior to imputation process may be detrimental.
We further studied the impact of differently selected reference panels from publicly available projects like HapMap and 1000 genome projects on the imputation quality. More specifically, we analysed the relationship between genetic distance of the reference and the resulting imputation quality. For this purpose, we considered different summary statistics of population differentiation (e.g. Reich’s , Nei’s and other modified scores) between the study data set and the reference panel used in imputation processes.
In the third analysis, we compared two basic trends of using reference panels in imputation process: (1) use of genetically best-matched reference panel, and (2) use of an admixed reference panel that allows the use of individual reference panel from all possible type of populations, and let the software itself select the optimal references in a piece-wise manner or as complete sequences of SNPs for each individual separately. We have analysed in detail the performance of different imputation software and also the accuracy of the imputation processes in both cases. We found that the current trend of using software with admixed reference panel in all cases is not always the best strategy. Prior to imputation process, phasing of study data sets by using an external reference panel is also a common trend especially when it comes to the imputation of large datasets. We studied the performance of different imputation frameworks with or without pre-phasing. It turned out that pre-phasing clearly reduces the imputation quality for medium-sized data sets.:Table of Contents
List of Tables IV
List of Figures V
1 Overview of the Thesis 1
1.1 Abstract 1
1.2 Outlines 4
2 Introduction 5
2.1 Basics of genetics 5
2.1.1 Phenotype, genotype and haplotype 5
2.1.2 Hardy-Weinberg law 6
2.1.3 Linkage disequilibrium 6
2.1.4 Genome-wide association analysis 7
2.2 Phasing of Genotypes 7
2.3 Genotype imputation 8
2.3.1 Tools for Imputing genotype data 9
2.3.2 Reference panels 9
3 Results 11
3.1 Detailed Abstracts 11
3.1.1 First Research Paper 11
3.1.2 Second Research Paper 14
3.1.3 Third Research Paper 17
3.1.4 Fourth Research Paper 19
3.2 Discussion and Conclusion 22
4 Published Articles 27
4.1 First Research Paper 27
4.1.1 Supplementary Information 34
4.2 Second Research Paper 51
4.2.1 Supplementary Information 62
4.3 Third Research Paper 69
4.3.1 Supplementary Information 85
4.4 Fourth Research Paper 97
4.4.1 Supplementary Information 109
5 Zusammenfassung der Arbeit 117
6 Bibliography 120
7 Eigene Publikationen 124
8 Darstellung des eigenen Beitrags 125
8.1 First Research Paper 125
8.2 Second Research Paper 126
8.3 Third Research Paper 127
8.4 Fourth Research Paper 128
9 Erklärung über die eigenständige Abfassung der Arbeit 129
10 Danksagung 130
11 Curriculum Vitae 131
List of Tables IV
List of Figures V
1 Overview of the Thesis 1
1.1 Abstract 1
1.2 Outlines 4
2 Introduction 5
2.1 Basics of genetics 5
2.1.1 Phenotype, genotype and haplotype 5
2.1.2 Hardy-Weinberg law 6
2.1.3 Linkage disequilibrium 6
2.1.4 Genome-wide association analysis 7
2.2 Phasing of Genotypes 7
2.3 Genotype imputation 8
2.3.1 Tools for Imputing genotype data 8
2.3.2 Reference panels 8
3 Results 8
3.1 Detailed Abstracts 8
3.1.1 First Research Paper 8
3.1.2 Second Research Paper 8
3.1.3 Third Research Paper 8
3.1.4 Fourth Research Paper 8
3.2 Discussion and Conclusion 8
4 Published Articles 8
4.1 First Research Paper 8
4.1.1 Supplementary Information 8
4.2 Second Research Paper 8
4.2.1 Supplementary Information 8
4.3 Third Research Paper 8
4.3.1 Supplementary Information 8
4.4 Fourth Research Paper 8
4.4.1 Supplementary Information 8
5 Zusammenfassung der Arbeit 8
6 Bibliography 8
7 Eigene Publikationen 8
8 Erklärung über die eigenständige Abfassung der Arbeit 8
9 Danksagung 8
10 Curriculum Vitae 8
|
149 |
(Re)-Examining the Influence of Program Placement on the Academic Achievement of Students with Learning DisabilitiesMcKibbin, Steven 17 July 2020 (has links)
This study explored the relationship between several variables known to influence achievement in Canadian Grade 6 students with Learning Disabilities (LD) who received instruction in either a regular class or specialized program placement. The main independent variable was program placement while the influence of four other independent variables was explored (i.e., level of academic need; prior achievement; socioeconomic status and sex). The dependent variable was a standardized, large-scale assessment of achievement. Hierarchical multiple regression was conducted on a secondary data file in order to address the following research questions: i) Does placement in a regular or specialized program influence the educational outcomes for Grade 6 students with LD, after controlling for the influence of prior achievement in Grade 3?; ii) Is there a relationship between the sociodemographic variables of sex and/or socioeconomic status and achievement for students with LD placed in either a regular or specialized program?; and iii) What influence does the student’s level of academic need have on achievement, beyond program placement, and after controlling for the influence of the other variables in the model? Results revealed that the variables of program placement and prior achievement were significant predictors of scholastic success only when the level of academic need variable was not taken into account. When the follow-up analysis focused on a relatively matched group of students with similar academic need, none of the predictors in the regression model significantly influenced achievement -- including program placement. These results provide important insight into the nuanced relationship of the ecological variables known to affect learning in students with LD placed in regular or specialized programs for instruction. Implications are discussed for stakeholders in Ontario’s public education system in terms of the optimum service delivery model for students with LD, and the inclusive education debate in Canada and abroad.
|
150 |
Model-based Multiple Imputation by Chained-equations for Multilevel Data below the Limit of DetectionXu, Peixin 24 May 2022 (has links)
No description available.
|
Page generated in 0.2022 seconds