61 |
Métodos de imputação de dados aplicados na área da saúdeNunes, Luciana Neves January 2007 (has links)
Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004).
|
62 |
Métodos de imputação de dados aplicados na área da saúdeNunes, Luciana Neves January 2007 (has links)
Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004).
|
63 |
Analyses bioinformatiques dans le cadre de la génomique du SIDA / Bioinformatics analyses in the context of AIDS genomicCoulonges, Cédric 16 December 2011 (has links)
Les technologies actuelles permettent d’explorer le génome entier pour y découvrir des variants génétiques associés aux maladies. Cela implique des outils bioinformatiques adaptés à l’interface de l’informatique, des statistiques et de la biologie. Ma thèse a porté sur l’exploitation bioinformatique des données génomiques issues de la cohorte GRIV du SIDA et du projet international IHAC (International HIV Acquisition Consortium). Posant les prémices de l'imputation, j’ai d’abord développé le logiciel SUBHAP. Notre équipe a montré que la région HLA était essentielle dans la non progression et le contrôle de la charge virale et cela m’a conduit à étudier le phénotype non-progresseur non « elite ». J’ai ainsi révélé un variant du gène CXCR6 qui, en dehors du HLA, est le seul résultat identifié par approche génome-entier et répliqué. L’imputation des données du projet IHAC (10000 patients infectés et 15000 contrôles) a été réalisée et des premières associations sont en cours d’exploration. / Nowadays with the newest technologies, the entire genome can be explored to uncover genetic variants which may be linked to diseases. This requires bioinformatics tools which are adequate for studies which are at the border between computing, statistics and biology. My thesis work focused on the bioinformatical analysis of genomic data from the GRIV AIDS cohort and from the IHAC (International HIV Acquisition Consortium) project. I first laid the foundation for imputation work by developing the SUBHAP software. Our team showed that the HLA region was essential in non-progression and viral charge control. This led me to study the non progressor non elite phenotype. Thus, I uncovered a variant of the CXCR6 gene which is, apart from HLA, the only result identified with a GWAS approach so far and which has been reproduced. The imputation of data from the IHAC project (10000 infected patients and 15000 control subjects) was also performed and the first associations are now being studied.
|
64 |
The impact of missing data imputation on HCC survival prediction : Exploring the combination of missing data imputation with data-level methods such as clustering and oversamplingDalla Torre, Kevin, Abdul Jalil, Walid January 2018 (has links)
The area of data imputation, which is the process of replacing missing data with substituted values, has been covered quite extensively in recent years. The literature on the practical impact of data imputation however, remains scarce. This thesis explores the impact of some of the state of the art data imputation methods on HCC survival prediction and classification in combination with data-level methods such as oversampling. More specifically, it explores imputation methods for mixed-type datasets and their impact on a particular HCC dataset. Previous research has shown that, the newer, more sophisticated imputation methods outperform simpler ones when evaluated with normalized root mean square error (NRMSE). Contrary to intuition however, the results of this study show that when combined with other data-level methods such as clustering and oversampling, the differences in imputation performance does not always impact classification in any meaningful way. This might be explained by the noise that is introduced when generating synthetic data points in the oversampling process. The results also show that one of the more sophisticated imputation methods, namely MICE, is highly dependent on prior assumptions about the underlying distributions of the dataset. When those assumptions are incorrect, the imputation method performs poorly and has a considerable negative impact on classification. / Forskningen kring data imputation, processen där man ersätter saknade data med substituerade värden, har varit omfattande de senaste åren. Litteraturen om den praktiska inverkan som data imputation metoder har på klassificering är dock otillräcklig. Det här kandidatexamensarbetet utforskar den inverkan som de nyare imputation metoderna har på HCC överlevnads klassificering i kombination med andra data-nivå metoder så som översampling. Mer specifikt, så utforskar denna studie imputations metoder för heterogena dataset och deras inverkan på ett specifikt HCC dataset. Tidigare forskning har visat att de nyare, mer sofistikerade imputations metoderna presterar bättre än de mer enkla metoderna när de utvärderas med normalized root mean square error (NRMSE). I motsats till intuition, så visar resultaten i denna studie att när imputation kombineras med andra data-nivå metoder så som översampling och klustring, så påverkas inte klassificeringen alltid på ett meningsfullt sätt. Detta kan förklaras med att brus introduceras i datasetet när syntetiska punkter genereras i översampling processen. Resultaten visar också att en av de mer sofistikerade imputation metoderna, nämligen MICE, är starkt beroende på tidigare antaganden som görs om de underliggande fördelningarna i datasetet. När dessa antaganden är inkorrekta så presterar imputations metoden dåligt och har en negativ inverkan på klassificering.
|
65 |
L’imputation, mécanisme fondamental de la responsabilité des personnes publiques / The concept of imputation, a key mecanism of public liabilityOki, Jean-Louis 27 November 2017 (has links)
Bien que constituant un élément essentiel à tout régime de responsabilité, la notion imputation n’a engendré qu’un nombre très réduit d’études en matière de responsabilité des personnes publiques. Cette recherche s’attachera à démontrer tant l’importance du rôle joué par l’imputation que la pertinence d’une approche visant à appréhender la responsabilité par le prisme de l’opération d’imputation. Loin de se résumer à un simple aspect technique tenant à la détermination du patrimoine responsable, la problématique de l’imputation nous semble à même de permettre l’émergence d’une réflexion plus générale sur la responsabilité elle-même. En effet, parce qu’elle permet de désigner la personne débitrice de la dette de responsabilité, l’opération d’imputation correspond toujours à une prise de position sur la fonction de la responsabilité. Que celle-ci désigne l’auteur du fait générateur ou toute autre personne n’est jamais anodin. Permettant de répondre à la question de savoir pourquoi une personne est responsable, l’étude de l’imputation permet également de découvrir le fondement de la responsabilité. Plus encore, il nous sera possible de constater que le choix d’une modalité d’imputation n’est jamais neutre et induit toujours des conséquences perceptibles sur la physionomie des régimes juridiques des diverses hypothèses de responsabilité. Par le prisme de l’imputation, il nous semble donc possible de proposer une réflexion permettant de saisir la fonction de la responsabilité, d’expliquer l’existence d’une grande diversité de régimes juridiques et, surtout, de proposer une classification des hypothèses de responsabilité prenant appui sur la logique interne qui les anime. / Although every attribution system constitutes a key element regarding the matter of liability in the public sector, the concept of imputation has only been the object of a few studies. Our research shows both the importance of the role played by the notion of imputation and the relevance of an approach which would examine the concept of liability through the imputation process. Far from constituting a simple technical tool employed to ascertain the source of liability, the notion of imputation seems to promote the emergence of a wider inquiry regarding liability itself. Indeed, because this notion serves to indicate the debtor of liability, the process of imputation always indicates a statement of views on the function of liability. The utilization of this device is never inconsequential whether it is used to designate the author of the causal event or any other person. Studying the concept of imputation does both answer the question as to why someone can be held accountable and uncover the founding principles of liability. Furthermore, it will come to our understanding that the selection of a particular method of indictment is never neutral and always involves consequences in regards to the physiognomy of the legal status of the various hypotheses of liability. By analyzing the function of liability through the lens of indictment we can grasp its meaning and thus explain the reason behind the wide diversity of legal regimes and above all, offer a classification of the various hypotheses of liability which would rely on their own internal logic.
|
66 |
Impact of pre-imputation SNP-filtering on genotype imputation resultsRoshyara, Nab Raj, Kirsten, Holger, Horn, Katrin, Ahnert, Peter, Scholz, Markus January 2014 (has links)
Background: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results: We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of ompletely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion: Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.
|
67 |
Estimation de la variance en présence de données imputées pour des plans de sondage à grande entropieVallée, Audrey-Anne 07 1900 (has links)
Les travaux portent sur l’estimation de la variance dans le cas d’une non- réponse partielle traitée par une procédure d’imputation. Traiter les valeurs imputées comme si elles avaient été observées peut mener à une sous-estimation substantielle de la variance des estimateurs ponctuels. Les estimateurs de variance usuels reposent sur la disponibilité des probabilités d’inclusion d’ordre deux, qui sont parfois difficiles (voire impossibles) à calculer. Nous proposons d’examiner les propriétés d’estimateurs de variance obtenus au moyen d’approximations des probabilités d’inclusion d’ordre deux. Ces approximations s’expriment comme une fonction des probabilités d’inclusion d’ordre un et sont généralement valides pour des plans à grande entropie. Les résultats d’une étude de simulation, évaluant les propriétés des estimateurs de variance proposés en termes de biais et d’erreur quadratique moyenne, seront présentés. / Variance estimation in the case of item nonresponse treated by imputation is the main topic of this work. Treating the imputed values as if they were observed may lead to substantial under-estimation of the variance of point estimators. Classical variance estimators rely on the availability of the second-order inclusion probabilities, which may be difficult (even impossible) to calculate. We propose to study the properties of variance estimators obtained by approximating the second-order inclusion probabilities. These approximations are expressed in terms of first-order inclusion probabilities and are usually valid for high entropy sampling designs. The results of a simulation study evaluating the properties of the proposed variance estimators in terms of bias and mean squared error will be presented.
|
68 |
Traitement des données manquantes en épidémiologie : application de l’imputation multiple à des données de surveillance et d’enquêtes / Missing data management in epidemiology : Application of multiple imputation to data from surveillance systems and surveysHéraud Bousquet, Vanina 06 April 2012 (has links)
Le traitement des données manquantes est un sujet en pleine expansion en épidémiologie. La méthode la plus souvent utilisée restreint les analyses aux sujets ayant des données complètes pour les variables d’intérêt, ce qui peut réduire lapuissance et la précision et induire des biais dans les estimations. L’objectif de ce travail a été d’investiguer et d’appliquer une méthode d’imputation multiple à des données transversales d’enquêtes épidémiologiques et de systèmes de surveillance de maladies infectieuses. Nous avons présenté l’application d’une méthode d’imputation multiple à des études de schémas différents : une analyse de risque de transmission du VIH par transfusion, une étude cas-témoins sur les facteurs de risque de l’infection à Campylobacter et une étude capture-recapture estimant le nombre de nouveaux diagnostics VIH chez les enfants. A partir d’une base de données de surveillance de l’hépatite C chronique (VHC), nous avons réalisé une imputation des données manquantes afind’identifier les facteurs de risque de complications hépatiques graves chez des usagers de drogue. A partir des mêmes données, nous avons proposé des critères d’application d’une analyse de sensibilité aux hypothèses sous-jacentes àl’imputation multiple. Enfin, nous avons décrit l’élaboration d’un processus d’imputation pérenne appliqué aux données du système de surveillance du VIH et son évolution au cours du temps, ainsi que les procédures d’évaluation et devalidation.Les applications pratiques présentées nous ont permis d’élaborer une stratégie de traitement des données manquantes, incluant l’examen approfondi de la base de données incomplète, la construction du modèle d’imputation multiple, ainsi queles étapes de validation des modèles et de vérification des hypothèses. / The management of missing values is a common and widespread problem in epidemiology. The most common technique used restricts the data analysis to subjects with complete information on variables of interest, which can reducesubstantially statistical power and precision and may also result in biased estimates.This thesis investigates the application of multiple imputation methods to manage missing values in epidemiological studies and surveillance systems for infectious diseases. Study designs to which multiple imputation was applied were diverse: a risk analysis of HIV transmission through blood transfusion, a case-control study on risk factors for ampylobacter infection, and a capture-recapture study to estimate the number of new HIV diagnoses among children. We then performed multiple imputation analysis on data of a surveillance system for chronic hepatitis C (HCV) to assess risk factors of severe liver disease among HCV infected patients who reported drug use. Within this study on HCV, we proposedguidelines to apply a sensitivity analysis in order to test the multiple imputation underlying hypotheses. Finally, we describe how we elaborated and applied an ongoing multiple imputation process of the French national HIV surveillance database, evaluated and attempted to validate multiple imputation procedures.Based on these practical applications, we worked out a strategy to handle missing data in surveillance data base, including the thorough examination of the incomplete database, the building of the imputation model, and the procedure to validate imputation models and examine underlying multiple imputation hypotheses.
|
69 |
Performance of Imputation Algorithms on Artificially Produced Missing at Random DataOketch, Tobias O 01 May 2017 (has links)
Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.
However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data.
|
70 |
Performance Comparison of Imputation Algorithms on Missing at Random DataAddo, Evans Dapaa 01 May 2018 (has links)
Missing data continues to be an issue not only the field of statistics but in any field, that deals with data. This is due to the fact that almost all the widely accepted and standard statistical software and methods assume complete data for all the variables included in the analysis. As a result, in most studies, statistical power is weakened and parameter estimates are biased, leading to weak conclusions and generalizations.
Many studies have established that multiple imputation methods are effective ways of handling missing data. This paper examines three different imputation methods (predictive mean matching, Bayesian linear regression and linear regression, non Bayesian) in the MICE package in the statistical software, R, to ascertain which of the three imputation methods imputes data that yields parameter estimates closest to the parameter estimates of a complete data given different percentages of missingness. In comparing the parameter estimates of the complete data and the imputed data, the parameter estimates in each model were evaluated and compared. The paper extends the analysis by generating a pseudo data of the original data to establish how the imputation methods perform under varying conditions.
|
Page generated in 0.0972 seconds