Spelling suggestions: "subject:"multiple imputation"" "subject:"multiple amputation""
101 |
Imputação múltipla: comparação e eficiência em experimentos multiambientais / Multiple Imputations: comparison and efficiency of multi-environmental trialsMaria Joseane Cruz da Silva 19 July 2012 (has links)
Em experimentos de genótipos ambiente são comuns à presença de valores ausentes, devido à quantidade insuficiente de genótipos para aplicação dificultando, por exemplo, o processo de recomendação de genótipos mais produtivos, pois para a aplicação da maioria das técnicas estatísticas multivariadas exigem uma matriz de dados completa. Desta forma, aplicam-se métodos que estimam os valores ausentes a partir dos dados disponíveis conhecidos como imputação de dados (simples e múltiplas), levando em consideração o padrão e o mecanismo de dados ausentes. O objetivo deste trabalho é avaliar a eficiência da imputação múltipla livre da distribuição (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007) comparando-a com o método de imputação múltipla com Monte Carlo via cadeia de Markov (IMMCMC), na imputação de unidades ausentes presentes em experimentos de interação genótipo (25) ambiente (7). Estes dados são provenientes de um experimento aleatorizado em blocos com a cultura de Eucaluptus grandis (LAVORANTI, 2003), os quais foram feitas retiradas de porcentagens aleatoriamente (10%, 20%, 30%) e posteriormente imputadas pelos métodos considerados. Os resultados obtidos por cada método mostraram que, a eficiência relativa em ambas as porcentagens manteve-se acima de 90%, sendo menor para o ambiente (4) quando imputado com a IMLD. Para a medida geral de exatidão, a medida que ocorreu acréscimo de dados em falta, foi maior ao imputar os valores ausentes com a IMMCMC, já para o método IMLD estes valores variaram sendo menor a 20% de retirada aleatória. Dentre os resultados encontrados, é de suma importância considerar o fato de que o método IMMCMC considera a suposição de normalidade, já o método IMLD leva vantagem sobre este ponto, pois não considera restrição alguma sobre a distribuição dos dados nem sobre os mecanismos e padrões de ausência. / In trials of genotypes by environment, the presence of absent values is common, due to the quantity of insufficiency of genotype application, making difficult for example, the process of recommendation of more productive genotypes, because for the application of the majority of the multivariate statistical techniques, a complete data matrix is required. Thus, methods that estimate the absent values from available data, known as imputation of data (simple and multiple) are applied, taking into consideration standards and mechanisms of absent data. The goal of this study is to evaluate the efficiency of multiple imputations free of distributions (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007), compared with the Monte Carlo via Markov chain method of multiple imputation (IMMCMC), in the absent units present in trials of genotype interaction (25)environment (7). This data is provisional of random tests in blocks with Eucaluptus grandis cultures (LAVORANTI, 2003), of which random percentages of withdrawals (10%, 20%, 30%) were performed, with posterior imputation of the considered methods. The results obtained for each method show that, the relative efficiency in both percentages were maintained above 90%, being less for environmental (4) when imputed with an IMLD. The general measure of exactness, the measures where higher absent data occurred, was larger when absent values with an IMMCMC was imputed, as for the IMLD method, the varied absent values were lower at 20% for random withdrawals. Among results found, it is of sum importance to take into consideration the fact that the IMMCMC method considers it to be an assumption of normality, as for the IMLD method, it does not consider any restriction on the distribution of data, not on mechanisms and absent standards, which is an advantage on imputations.
|
102 |
Identification des profils de changement sur données longitudinales, illustrée par deux exemples : étude des trajectoires hopsitalières de prise en charge d'un cancer. Construction des profils évolutifs de qualité de vie lors d'un essai thérapeutique pour un cancer avancé / Identification of patterns og change on mongitudinal data, illustrated by two exemples : study of hospital pathways in the management of cancer. Constuction of quality of life change patterns in a clinical trial for advanced cancerNuemi Tchathouang, Gilles Eric 21 October 2014 (has links)
ContexteDans le domaine de la santé, l’analyse des données pour l’extraction des connaissances est un enjeu en pleine expansion. Les questions sur l’organisation des soins ou encore l’étude de l’association entre le traitement et qualité de vie (QdV) perçue pourraient être abordées sous cet angle. L’évolution des technologies permet de disposer d’outils de fouille de données performants et d’outils statistiques enrichis de méthode avancées, utilisables par les non experts. Nous avons illustré cette méthode au travers de deux questions d’actualité :1 / Quelle organisation des soins pour la prise en charge des cancers ? 2/ étude de la relation chez les patients souffrant d’un cancer métastatique entre la QdV liée à la santé perçue et les traitements reçus dans le cadre d’un essai thérapeutique.Matériels et méthodesNous disposons aujourd’hui de volumineuses bases de données. Certaines retracent le parcours hospitalier des patients, comme c’est le cas pour les données d’activités hospitalières recueillies dans le cadre du programme de médicalisation des systèmes d’information (PMSI). D’autres conservent les informations sur la QdV perçues par les patients et qui recueillies en routine actuellement dans les essais thérapeutiques. L’analyse de ces données a été réalisée suivant trois étapes principales : Tout d’abord une étape de préparation des données dont l’objectif était la compatibilité à un concept d’analyse précisé. Il s’agissait par exemple de transformer une base de données classique (centrée sur le patient) vers une nouvelle base de données où « l’unité de recueil » est une entité autre que le patient (ex. trajectoire de soins). Ensuite une deuxième étape consacrée à l’application de méthodes de fouille de données pour l’extraction connaissances : les méthodes d’analyse formelle des concepts ou encore les méthodes de classifications non-supervisée. Et enfin l’étape de restitution des résultats obtenus et présenté sous forme graphique.RésultatsPour la question de l’organisation des soins, nous avons construit une typologie des trajectoires hospitalières des soins permettait de réaliser un état des lieux des pratiques dans la prise en charge des cancers étudié depuis la chirurgie jusqu’à un an de suivi des patients. Dans le cas du Cancer du sein, nous avons décrit une typologie de prise en charge sur la base des coûts d’hospitalisation sur un suivi d’un an. Pour la deuxième question, nous avons également construit une typologie des profils évolutifs de la QdV. Celle-ci comportait 3 classes : une classe d’amélioration, une classe de stabilité et une classe de dégradation.ConclusionL’intérêt majeur de ce travail était de mettre en évidence des pistes de réflexion permettant des avancées dans la compréhension et la construction de solutions adaptées aux problèmes. / Context In healthcare domain, data mining for knowledge discovery represent a growing issue. Questions about the organisation of healthcare system and the study of the relation between treatment and quality of life (QoL) perceived could be addressed that way. The evolution of technologies provides us with efficient data mining tools and statistical packages containing advanced methods available for non-experts. We illustrate this approach through two issues: 1 / What organisation of healthcare system for cancer diseases management? 2 / Exploring in patients suffering from metastatic cancer, the relationship between health-related QoL perceived and treatment received as part of a clinical trial. Materials and methods Today we have large databases. Some are dedicated to gather together all hospital stays, as is the case for the national medico-administrative DRG-type database. Others are used to store information about QoL perceived by patients, routinely collected in clinical trials. The analysis of these data was carried out following three main steps: In the first step, data are prepared to be useable according to a defined concept of data analysis. For example, a classical database (patient-centered) was converted to a new database organised around a new defined entity which was different from the patient (eg. Care trajectory). Then in the second step, we applied data mining methods for knowledge discovery: we used the formal analysis of concepts method and unsupervised clustering techniques. And finally the results were presented in a graphical form. Results Concerning the question of the organisation of healthcare system, we constructed a typology of hospital care trajectories. We were able then to describe current practice in the management of cancers from the first cancer related surgical operation until one year of follow-up. In the case of breast cancer, we’ve described a typology of care on the basis of hospital costs over a one year follow up. Concerning the second question, we have also constructed a typology of QoL change patterns. This comprised three groups: Improvement, stability and degradation group.Conclusion The main interest of this work was to highlight new thoughts, which advances understanding and, contributing in appropriate solutions building.
|
103 |
Méthodes d’analyse de survie, valeurs manquantes et fractions attribuables temps dépendantes : application aux décès par cancer de la prostate / Survival analysis methods, missing values and time-dependent attributable fractions : application to death from prostate cancerMorisot, Adeline 02 December 2015 (has links)
Le terme analyse de survie fait référence aux méthodes utilisées pour modéliser le temps d'apparition d'un ou plusieurs événements en tenant compte de la censure. L'événement d’intérêt peut être l'apparition, la récidive d'une maladie, ou le décès. Les causes de décès peuvent présenter des valeurs absentes, une situation qui peut être modélisée par des méthodes d’imputation. Dans la première partie de cette thèse nous avons passer en revue les méthodes de gestion des données manquantes. Puis nous avons détaillé les procédures qui permettent une imputation multiple des causes de décès. Nous avons développé ces méthodes dans une cohorte issue d’une étude européenne, l’ERSPC (European Randomized Study of Screening for Prostate Cancer), qui étudiait le dépistage et la mortalité par cancer de la prostate. Nous avons proposé une formulation théorique des règles de Rubin après transformation log-log complémentaire afin de combiner les estimations de survie. De plus, nous mettons à disposition le code R afférent. Dans la deuxième partie, nous présentons les méthodes d'analyse de survie, en proposant une écriture unifiée basée sur les définitions des survies brute et nette, que l’on s'intéresse à toutes les causes de décès ou à une seule cause. Cela implique la prise en compte de la censure qui peut alors être informative. Nous avons considéré les méthodes dites classiques (Kaplan-Meier, Nelson-Aalen, Cox et paramétriques), les méthodes des risques compétitifs (en considérant un modèle multi-états ou un modèle de temps latents), les méthodes dites spécifiques avec correction IPCW (Inverse Ponderation Censoring Weighting) et les méthodes de survie relative. Les méthodes dites classiques reposent sur l'hypothèse de censure non informative. Quand on s'intéresse aux décès de toutes causes, cette hypothèse est souvent valide. En revanche, pour un décès de cause particulière, les décès d'autres causes sont considérés comme une censure, et cette censure par décès d'autres causes est en général informative. Nous introduisons une approche basée sur la méthode IPCW afin de corriger cette censure informative, et nous fournissons une fonction R qui permet d’appliquer cette approche directement. Toutes les méthodes présentées dans ce chapitre sont appliquées aux bases de données complétées par imputation multiple.Enfin, dans une dernière partie nous avons cherché à déterminer le pourcentage de décès expliqué par une ou plusieurs variables en utilisant les fractions attribuables. Nous présentons les formulations théoriques des fractions attribuables, indépendantes du temps puis dépendantes du temps qui s’expriment sous la forme de survie. Nous illustrons ces concepts en utilisant toutes les méthodes de survie de la partie précédente et comparons les résultats. Les estimations obtenues avec les différentes méthodes sont très proches. / The term survival analysis refers to methods used for modeling the time of occurrence of one or more events taking censoring into account. The event of interest may be either the onset or the recurrence of a disease, or death. The causes of death may have missing values, a status that may be modeled by imputation methods.
In the first section of this thesis we made a review of the methods used to deal with these missing data. Then, we detailed the procedures that enable multiple imputation of causes of death. We have developed these methods in a subset of the ERSPC (European Randomized Study of Screening for Prostate Cancer), which studied screening and mortality for prostate cancer. We proposed a theoretical formulation of Rubin rules after a complementary log-log transformation to combine estimates of survival. In addition, we provided the related R code.
In a second section, we presented the survival analysis methods, by proposing a unified writing based on the definitions of crude and net survival, while considering either all-cause or specific cause of death. This involves consideration of censoring which can then be informative. We considered the so-called traditional methods (Kaplan-Meier, Nelson-Aalen, Cox and parametric) methods of competing risks (considering a multistate model or a latent failure time model), methods called specific that are corrected using IPCW (Inverse Ponderation Censoring Weighting) and relative survival methods. The classical methods are based on a non-informative censoring assumption. When we are interested in deaths from all causes, this assumption is often valid. However, for a particular cause of death, other causes of death are considered as a censoring. In this case, censoring by other causes of death is generally considered informative. We introduced an approach based on the IPCW method to correct this informative censoring, and we provided an R function to apply this approach directly. All methods presented in this chapter were applied to datasets completed by multiple imputation.
Finally, in a last part we sought to determine the percentage of deaths explained by one or more variables using attributable fractions. We presented the theoretical formulations of attributable fractions, time-independent and time-dependent that are expressed as survival. We illustrated these concepts using all the survival methods presented in section 2, and compared the results. Estimates obtained with the different methods were very similar.
|
104 |
Pharmacogénétique de l'Imatinib dans la Leucémie Myéloïde Chronique etDonnées Censurées par Intervalles en présence de Compétition / Pharmacogenetics of Imatinib in Chronic Myeloid Leukemia etInterval Censored Competing Risks DataDelord, Marc 05 November 2015 (has links)
Le traitement de la leucémie myéloïde chronique (LMC) par imatinib est un succès de thérapie ciblée en oncologie. Le principe de cette thérapie est de bloquer les processus biochimiques à l'origine du développement de la maladie, et de permettre à une majorité de patients de réduire leurs risques de progression mais aussi d'éviter des traitements lourds et risqués comme la greffe de cellules souches hématopoïétiques.Cependant, même si l'efficacité de l'imatinib à été prouvée dans un contexte clinique, il n'en demeure pas moins qu'une proportion non négligeable de patients n'obtient par de niveaux de réponse moléculaire jugés optimale. Le but de cette thèse est de tester l'hypothèse d'un lien entre des polymorphismes de gènes impliqués dans l'absorption des médicaments et de leurs métabolisme, et la réponse moléculaire dans la leucémie myéloïde chronique en phase chronique traitée par imatinib.Dans le but d'évaluer la réponse moléculaire des patients, des prélèvements sanguins sont réalisés tout les 3 mois afin de pratiquer le dosage d'un biomarqueur. Ce type particulier de suivi produit des données censurées par intervalles. Comme par ailleurs, les patients demeurent à risque de progression ou sont susceptible d'interrompre leurs traitements pour cause d'intolérance, il est possible que la réponse d'intérêt ne soit plus observable sous le traitement étudié. Les données ainsi produites sont censurées par intervalles dans un contexte de compétition (risques compétitifs).Afin de tenir compte de la nature particulière des données collectées, une méthode basée sur l'imputation multiple est proposée. L'idée est de transformer les données censurées par intervalles en de multiples jeux de données potentiellement censurées à droite et d'utiliser les méthodes disponibles pour l'analyser de ces données. Finalement les résultats sont assemblés en suivant les règles de l'imputation multiple. / Imatinib in the treatment of chronic myeloid leukemia is a success of targeted therapy in oncology. The aim of this therapy is to block the biochemical processes leading to disease development. This strategy results in a reduction of the risk of disease progression and allows patients to avoid extensive and hazardous treatments such as hematologic stem cell transplantation.However, even if imatinib efficacy has been demonstrated in a clinical setting, a significant part of patients do not achieve suitable levels of molecular response. The objective of this thesis, is to test the hypothesis of a correlation between polymorphisms of genes implied in drug absorption an metabolism and the molecular response in chronic myeloid leukemia in chronic phase treated by imatinib.In order to evaluate patients molecular response, blood biomarker assessments are performed every 3 months. This type of follow up produces interval censored data. As patients remain at risk of disease progression, or may interrupt their treatments due to poor tolerance, the response of interest may not be observable in a given setting. This situation produces interval censored competing risks data.To properly handle such data, we propose a multiple imputation based method.The main idea is to convert interval censored data into multiple sets of potentially right censored data that are then analysed using multiple imputation rules.
|
105 |
Missing Data - A Gentle IntroductionÖsterlund, Vilgot January 2020 (has links)
This thesis provides an introduction to methods for handling missing data. A thorough review of earlier methods and the development of the field of missing data is provided. The thesis present the methods suggested in today’s literature, multiple imputation and maximum likelihood estimation. A simulation study is performed to see if there are circumstances in small samples when any of the two methods are to be preferred. To show the importance of handling missing data, multiple imputation and maximum likelihood are compared to listwise deletion. The results from the simulation study does not show any crucial differences between multiple imputation and maximum likelihood when it comes to point estimates. Some differences are seen in the estimation of the confidence intervals, talking in favour of multiple imputation. The difference is decreasing with an increasing sample size and more studies are needed to draw definite conclusions. Further, the results shows that listwise deletion lead to biased estimations under a missing at random mechanism. The methods are also applied to a real dataset, the Swedish enrollment registry, to show how the methods work in a practical application.
|
106 |
Data analysis and multiple imputation for two-level nested designsBailey, Brittney E. 25 October 2018 (has links)
No description available.
|
107 |
Need for Cognition in Resident AssistantsAustin, Bryan 04 June 2021 (has links)
No description available.
|
108 |
Temporally-Embedded Deep Learning Model for Health Outcome PredictionBoursalie, Omar January 2021 (has links)
Deep learning models are increasingly used to analyze health records to model disease progression. Two characteristics of health records present challenges to developers of deep learning-based medical systems. First, the veracity of the estimation of missing health data must be evaluated to optimize the performance of deep learning models. Second, the currently most successful deep learning diagnostic models, called transformers, lack a mechanism to analyze the temporal characteristics of health records.
In this thesis, these two challenges are investigated using a real-world medical dataset of longitudinal health records from 340,143 patients over ten years called MIIDD: McMaster Imaging Information and Diagnostic Dataset. To address missing data, the performance of imputation models (mean, regression, and deep learning) were evaluated on a real-world medical dataset. Next, techniques from adversarial machine learning were used to demonstrate how imputation can have a cascading negative impact on a deep learning model. Then, the strengths and limitations of evaluation metrics from the statistical literature (qualitative, predictive accuracy, and statistical distance) to evaluate deep learning-based imputation models were investigated. This research can serve as a reference to researchers evaluating the impact of imputation on their deep learning models.
To analyze the temporal characteristics of health records, a new model was developed and evaluated called DTTHRE: Decoder Transformer for Temporally-Embedded Health Records Encoding. DTTHRE predicts patients' primary diagnoses by analyzing their medical histories, including the elapsed time between visits. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature. DTTHRE also increased the training examples available from limited medical datasets by predicting the primary diagnosis for each visit (79.53 +/- 0.25%) with no additional training time. This research contributes towards the goal of disease predictive modeling for clinical decision support. / Dissertation / Doctor of Philosophy (PhD) / In this thesis, two challenges using deep learning models to analyze health records are investigated using a real-world medical dataset. First, an important step in analyzing health records is to estimate missing data. We investigated how imputation can have a cascading negative impact on a deep learning model's performance. A comparative analysis was then conducted to investigate the strengths and limitations of evaluation metrics from the statistical literature to assess deep learning-based imputation models. Second, the most successful deep learning diagnostic models to date, called transformers, lack a mechanism to analyze the temporal characteristics of health records. To address this gap, we developed a new temporally-embedded transformer to analyze patients' medical histories, including the elapsed time between visits, to predict their primary diagnoses. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature.
|
109 |
A Longitudinal Study of School Practices and Students’ Characteristics that Influence Students' Mathematics and Reading Performance of Arizona Charter Middle SchoolsGiovannone, Carrie Lynn January 2010 (has links)
No description available.
|
110 |
Missing Data Treatments in Multilevel Latent Growth Model: A Monte Carlo Simulation StudyJiang, Hui 25 September 2014 (has links)
No description available.
|
Page generated in 0.0788 seconds