Spelling suggestions: "subject:"[een] MISSING DATA"" "subject:"[enn] MISSING DATA""
111 |
Statistical Modeling and Analysis of Bivariate Spatial-Temporal Data with the Application to Stream Temperature StudyLi, Han 04 November 2014 (has links)
Water temperature is a critical factor for the quality and biological condition of streams. Among various factors affecting stream water temperature, air temperature is one of the most important factors related to water temperature. To appropriately quantify the relationship between water and air temperatures over a large geographic region, it is important to accommodate the spatial and temporal information of the steam temperature. In this dissertation, I devote effort to several statistical modeling techniques for analyzing bivariate spatial-temporal data in a stream temperature study.
In the first part, I focus our analysis on the individual stream. A time varying coefficient model (VCM) is used to study the relationship between air temperature and water temperature for each stream. The time varying coefficient model enables dynamic modeling of the relationship, and therefore can be used to enhance the understanding of water and air temperature relationships. The proposed model is applied to 10 streams in Maryland, West Virginia, Virginia, North Carolina and Georgia using daily maximum temperatures. The VCM approach increases the prediction accuracy by more than 50% compared to the simple linear regression model and the nonlinear logistic model.
The VCM that describes the relationship between water and air temperatures for each stream is represented by slope and intercept curves from the fitted model. In the second part, I consider water and air temperatures for different streams that are spatial correlated. I focus on clustering multiple streams by using intercept and slope curves estimated from the VCM. Spatial information is incorporated to make clustering results geographically meaningful. I further propose a weighted distance as a dissimilarity measure for streams, which provides a flexible framework to interpret the clustering results under different weights. Real data analysis shows that streams in same cluster share similar geographic features such as solar radiation, percent forest and elevation.
In the third part, I develop a spatial-temporal VCM (STVCM) to deal with missing data. The STVCM takes both spatial and temporal variation of water temperature into account. I develop a novel estimation method that emphasizes the time effect and treats the space effect as a varying coefficient for the time effect. A simulation study shows that the performance of the STVCM on missing data imputation is better than several existing methods such as the neural network and the Gaussian process. The STVCM is also applied to all 156 streams in this study to obtain a complete data record. / Ph. D.
|
112 |
Kontexteffekte in Large-Scale AssessmentsWeirich, Sebastian 13 August 2015 (has links)
Im Rahmen der Item-Response-Theorie evaluiert die kumulative Dissertationsschrift verschiedene Methoden und Modelle zur Identifikation von Kontexteffekten in Large-Scale Assessments. Solche Effekte können etwa in quantitativen empirischen Schulleistungsstudien auftreten und zu verzerrten Item- und Personenparametern führen. Um in Einzelfällen abschätzen zu können, ob Kontexteffekte auftreten und dadurch die Gefahr verzerrter Parameter gegeben ist (und falls ja, in welcher Weise), müssen IRT-Modelle entwickelt werden, die zusätzlich zu Item- und Personeneffekten Kontexteffekte parametrisieren. Solch eine Parametrisierung ist im Rahmen Generalisierter Allgemeiner Linearer Modelle möglich. In der Dissertation werden Positionseffekte als ein Beispiel für Kontexteffekte untersucht, und es werden die statistischen Eigenschaften dieses Messmodells im Rahmen einer Simulationsstudie evaluiert. Hier zeigt sich vor allem die Bedeutung des Testdesigns: Um unverfälschte Parameter zu gewinnen, ist nicht nur ein adäquates Messmodell, sondern ebenso ein adäquates, also ausbalanciertes Testdesign notwendig. Der dritte Beitrag der Dissertation befasst sich mit dem Problem fehlender Werte auf Hintergrundvariablen in Large-Scale Assessments. Als Kontexteffekt wird in diesem Beispiel derjenige Effekt verstanden, der die Wahrscheinlichkeit eines fehlenden Wertes auf einer bestimmten Variablen systematisch beeinflusst. Dabei wurde das Prinzip der multiplen Imputation auf das Problem fehlender Werte auf Hintergrundvariablen übertragen. Anders als bisher praktizierte Ansätze (Dummy-Codierung fehlender Werte) konnten so in einer Simulationsstudie für fast alle Simulationsbedingungen unverfälschte Parameter auf der Personenseite gefunden werden. / The present doctoral thesis evaluates various methods and models of the item response theory to parametrize context effects in large-scale assessments. Such effects may occur in quantitative educational assessments and may cause biased item and person parameter estimates. To decide whether context effects occur in individual cases and lead to biased parameters, specific IRT models have to be developed which parametrize context effects additionally to item and person effects. The present doctoral thesis consists of three single contributions. In the first contribution, a model for the estimation of context effects in an IRT framework is introduced. Item position effects are examined as an example of context effects in the framework of generalized linear mixed models. Using simulation studies, the statistical properties of the model are investigated, which emphasizes the relevance of an appropriate test design. A balanced incomplete test design is necessary not only to obtain valid item parameters in the Rasch model, but to guarantee for unbiased estimation of position effects in more complex IRT models. The third contribution deals with the problem of missing background data in large-scale assessments. The effect which predicts the probability of a missing value on a certain variable, is considered as a context effect. Statistical methods of multiple imputation were brought up to the problem of missing background data in large-scale assessments. In contrast to other approaches used so far in practice (dummy coding of missing values) unbiased population and subpopulation estimates were received in a simulation study for most conditions.
|
113 |
Analýza chybějících hodnot: porovnání vhodnosti tradičních metod napříč mechanismy / Analysis of Missing Data: Comparing Performance of Traditional Methods across MechanismsPetrúšek, Ivan January 2014 (has links)
The objective of this thesis is to evaluate different methods of dealing with missing values in data analysis. The thesis is divided into three major chapters. The first chapter summarizes the theoretical literature on missing data and focuses on missing data mechanisms in particular. The second chapter introduces traditional methods for addressing missing data in sociological research. The third chapter assesses the performance of these methods by analyzing simulated data sets for two variables (income, IQ). For practical analysis (chapter 3), we simulated missing data according to three different mechanisms (MCAR, MAR, NMAR) and varied the proportion of missing values under these mechanisms (10%, 20%, 30%). Then, we applied each of the following four methods of addressing missing values: complete-case analysis, arithmetic mean imputation, regression imputation, and stochastic regression imputation. In order to evaluate the performance of each of these methods we performed correlation and regression analyses for each experimental condition. The results of these simulations are largely in agreement with existing theoretical literature on the subject of missing data. In the case of NMAR, all solution methods provided biased parameter estimates. In the case of MCAR, only complete-case analysis and...
|
114 |
Attrition in Studies of Cognitive Aging / Bortfall i studier av kognitivt åldrandeJosefsson, Maria January 2013 (has links)
Longitudinal studies of cognition are preferred to cross-sectional stud- ies, since they offer a direct assessment of age-related cognitive change (within-person change). Statistical methods for analyzing age-related change are widely available. There are, however, a number of challenges accompanying such analyzes, including cohort differences, ceiling- and floor effects, and attrition. These difficulties challenge the analyst and puts stringent requirements on the statistical method being used. The objective of Paper I is to develop a classifying method to study discrepancies in age-related cognitive change. The method needs to take into account the complex issues accompanying studies of cognitive aging, and specifically work out issues related to attrition. In a second step, we aim to identify predictors explaining stability or decline in cognitive performance in relation to demographic, life-style, health-related, and genetic factors. In the second paper, which is a continuation of Paper I, we investigate brain characteristics, structural and functional, that differ between suc- cessful aging elderly and elderly with an average cognitive performance over 15-20 years. In Paper III we develop a Bayesian model to estimate the causal effect of living arrangement (living alone versus living with someone) on cog- nitive decline. The model must balance confounding variables between the two living arrangement groups as well as account for non-ignorable attrition. This is achieved by combining propensity score matching with a pattern mixture model for longitudinal data. In paper IV, the objective is to adapt and implement available impu- tation methods to longitudinal fMRI data, where some subjects are lost to follow-up. We apply these missing data methods to a real dataset, and evaluate these methods in a simulation study.
|
115 |
When work is more than a job : employment among people who inject drugsRichardson, Lindsey A. January 2012 (has links)
This thesis explores employment among people who inject drugs (IDU). It seeks identify what differentiates IDU who work from those who do not, barriers to labour market participation, and how employment is perceived and experienced by IDU. Using longitudinal data from the Vancouver Injection Drug User Study (VIDUS), it conducts this research through a detailed examination of the implications of missing data, quantitative analyses of transitions into employment and qualitative, in-depth interviews. Missing data analyses identified differences between those that those that do and do not have missing data, as well as predictors of observation gaps and how individuals end their study participation (either right-hand censorship, attrition, or death). Differences were observed along individual, behavioural and contextual dimensions. Analytical approaches to the relationship between data structure and content gleaned useful information for longitudinal studies with marginalized populations. Discrete time event history analyses of work transitions revealed complex relationships between drug use, drug-related activities, situational risk factors, and transitions into employment. While most IDU did not make transitions into employment, some did, and while some statistical relationships were expected, others were surprising. Novel findings included mode-specific addiction treatment impacts on employment (methadone vs. non-methadone) and the importance of the broader risk environment over and above even high-intensity substance use. Finally, qualitative interviews identified heterogeneity in individual motivations toward and experiences of work. Those who maintained concurrent drug use and formal labour market involvement utilized strategies to spatially and temporally separate the two activities. Individual capacities to employ these strategies were facilitated by material, vocational and temporal motivations, and interfered with by health conditions, catastrophic events and institutional relationships that operated as barriers to employment. This study provides insight into what is a known social determinant of health in the general population among injection drug users.
|
116 |
THE APPLICATION OF LAST OBSERVATION CARRIED FORWARD (LOCF) IN THE PERSISTENT BINARY CASEHe, Jun 01 January 2014 (has links)
The main purpose of this research was to evaluate use of Last Observation Carried Forward (LOCF) as an imputation method when persistent binary outcomes are missing in a Randomized Controlled Trial. A simulation study was performed to see the effect of dropout rate and type of dropout (random or associated with treatment arm) on Type I error and power. Properties of estimated event rates, treatment effect, and bias were also assessed. LOCF was also compared to two versions of complete case analysis - Complete1 (excluding all observations with missing data), and Complete2 (only carrying forward observations if the event is observed to occur). LOCF was not recommended because of the bias. Type I error was increased, and power was decreased. The other two analyses also had poor properties. LOCF analysis was applied to a mammogram dataset, with results similar to the simulation study.
|
117 |
Estimation of Regression Coefficients under a Truncated Covariate with Missing ValuesReinhammar, Ragna January 2019 (has links)
By means of a Monte Carlo study, this paper investigates the relative performance of Listwise Deletion, the EM-algorithm and the default algorithm in the MICE-package for R (PMM) in estimating regression coefficients under a left truncated covariate with missing values. The intention is to investigate whether the three frequently used missing data techniques are robust against left truncation when missing values are MCAR or MAR. The results suggest that no technique is superior overall in all combinations of factors studied. The EM-algorithm is unaffected by left truncation under MCAR but negatively affected by strong left truncation under MAR. Compared to the default MICE-algorithm, the performance of EM is more stable across distributions and combinations of sample size and missing rate. The default MICE-algorithm is improved by left truncation but is sensitive to missingness pattern and missing rate. Compared to Listwise Deletion, the EM-algorithm is less robust against left truncation when missing values are MAR. However, the decline in performance of the EM-algorithm is not large enough for the algorithm to be completely outperformed by Listwise Deletion, especially not when the missing rate is moderate. Listwise Deletion might be robust against left truncation but is inefficient.
|
118 |
Estimation des données manquantes par la métrologie virtuelle pour l'amélioration du régulateur Run-To-Run dans le domaine des semi-conducteurs / Estimation of missing data by virtual metrology for the improvement of the Run-To-Run controller in the field of semiconductorsJebri, Mohamed Ali 26 January 2018 (has links)
La thématique abordée porte sur la métrologie virtuelle (VM) pour estimer les données manquantes durant les processus de fabrications des semi-conducteurs. L'utilisation de la métrologie virtuelle permet également de fournir les mesures logicielles (estimations) des sorties pour alimenter les régulateurs run-to-run (R2R) mis en place pour le contrôle de la qualité des produits fabriqués. Pour remédier aux problèmes liés au retard de mesures causé par l'échantillonnage statique imposé par la stratégie et les équipements mis en place, notre contribution dans cette thèse est d'introduire la notion de l'échantillonnage dynamique intelligent. Cette stratégie est basée sur un algorithme qui prend en compte la condition de voisinage permettant d'éviter la mesure réelle même si l'échantillonnage statique l'exige. Cela permet de réduire le nombre de mesures réelles, le temps du cycle et le coût de production. Cette approche est assurée par un module de métrologie virtuelle (VM) que nous avons développé et qui peut être intégré dans une boucle de régulation R2R. Les résultats obtenus ont été validés sur des exemples académiques et sur des données réelles fournies par notre partenaire STMicroelectronics de Rousset concernant un processus chemical mechanical planarization (CMP). Ces données réelles ont permis également de valider les résultats obtenus de la métrologie virtuelle pour les fournir ensuite aux régulateurs R2R (ayant besoin de l'estimation de ces données). / The addressed work is about the virtual metrology (VM) for estimating missing data during semiconductor manufacturing processes. The use of virtual metrology tool also makes it possible to provide the software measurements (estimations) of the outputs to feed the run-to-run (R2R) controllers set up for the quality control of the manufactured products.To address these issues related to the delay of measurements caused by the static sampling imposed by the strategy and the equipments put in place, our contribution in this thesis is to introduce the notion of the dynamic dynamic sampling. This strategy is based on an algorithm that considers the neighborhood condition to avoid the actual measurement even if the static sampling requires it. This reduces the number of actual measurements, the cycle time and the cost of production. This approach is provided by a virtual metrology module (VM) that we have developed and which can be integrated into an R2R control loop. The obtained results were validated on academic examples and on real data provided by our partner STMicroelectronics of Rousset from a chemical mechanical planarization (CMP) process. This real data also enabled the results obtained from the virtual metrology to be validated and then supplied to the R2R regulators (who need the estimation of these data).
|
119 |
Practical considerations for genotype imputation and multi-trait multi-environment genomic prediction in a tropical maize breeding program / Considerações práticas para a imputação de genótipos e predição genômica aplicada a múltiplos caracteres e ambientes em um programa de melhoramento de milho tropicalOliveira, Amanda Avelar de 17 June 2019 (has links)
The availability of molecular markers covering the entire genome, such as single nucleotide polymorphism (SNP) markers, allied to the computational resources for processing large amounts of data, enabled the development of an approach for marker assisted selection for quantitative traits, known as genomic selection. In the last decade, genomic selection has been successfully implemented in a wide variety of animal and plant species, showing its benefits over traditional marker assisted selection and selection based only on pedigree information. However, some practical challenges may still limit the wide implementation of this method in a plant breeding program. For example, we cite the cost of high-density genotyping of a large number of individuals and the application of more complex models that take into account multiple traits and environments. Thus, this study aimed to i) investigate SNP calling and imputation strategies that allow cost-effective high-density genotyping, as well as ii) evaluating the application of multivariate genomic selection models to data from multiple traits and environments. This work was divided into two chapters. In the first chapter, we compared the accuracy of four imputation methods: NPUTE, Beagle, KNNI and FILLIN, using genotyping-by-sequencing (GBS) data from 1060 maize inbred lines, which were genotyped using different depths of coverage. In addition, two SNP calling and imputation strategies were evaluated. Our results indicated that combining SNP-calling and imputation strategies can enhance cost-effective genotyping, resulting in higher imputation accuracies. In the second chapter, multivariate genomic selection models, for multiple traits and environments, were compared with their univariate versions. We used data from 415 hybrids evaluated in the second season in four years (2006-2009) for grain yield, number of ears and grain moisture. Hybrid genotypes were inferred in silico based on their parental inbred lines using SNP markers obtained via GBS. However, genotypic information was available only for 257 hybrids, motivating the use of the H matrix, which combines genetic information based on pedigree and molecular markers. Our results demonstrated that the use of multi-trait multi-environment models can improve predictive abilities, especially to predict the performance of hybrids that have not yet been evaluated in any environment. / A disponibilidade de marcadores moleculares cobrindo todo o genoma, como os polimorfismos de nucleotídeos individuais (single nucleotide polymorphism - SNP), aliada aos recursos computacionais para o processamento de grande volume de dados, tornou possível o desenvolvimento de uma abordagem de melhoramento assistido para caracteres de herança quantitativa, conhecida como seleção genômica. Na última década a seleção genômica tem sido implementada com sucesso em uma enorme variedade de espécies animais e vegetais, comprovando suas vantagens sobre a seleção assistida por marcadores tradicional e a seleção baseada apenas em informações de parentesco. No entanto, alguns desafios práticos ainda podem limitar a implementação deste método em um programa de melhoramento de plantas. Como exemplos, citam-se o custo da genotipagem de alta densidade de um grande número de indivíduos e a aplicação de modelos mais complexos, que consideram múltiplos caracteres e ambientes. Dessa forma, este estudo teve como objetivos: i) investigar estratégias de identificação de SNPs e imputação que possibilitem uma genotipagem de alta densidade economicamente viável; e ii) avaliar a aplicação de modelos multivariados de seleção genômica para múltiplos caracteres e ambientes. Este trabalho foi divido em dois capítulos. No primeiro capítulo, comparou-se a acurácia de quatro métodos de imputação: NPUTE, Beagle, KNNI e FILLIN, usando dados de genotipagem por sequenciamento (genotyping-by-sequencing - GBS) de 1.060 linhagens de milho, que foram genotipadas usando diferentes profundidades de cobertura. Além disso, duas estratégias de identificação de SNPs e imputação foram avaliadas. Os resultados indicaram que a combinação de estratégias de detecção de polimorfismos e imputação pode possibilitar uma genotipagem economicamente viável, resultando em maiores acurácias de imputação. No segundo capítulo, modelos multivariados de seleção genômica, para múltiplos caracteres e ambientes, foram comparados com suas versões univariadas. Dados de 415 híbridos avaliados na segunda safra em quatro anos (2006-2009) para os caracteres produtividade de grãos, número de espigas e umidade foram utilizados. Os genótipos dos híbridos foram inferidos in silico com base nos genótipos das linhagens parentais usando marcadores SNPs obtidos via GBS. No entanto, informações genotípicas estavam disponíveis para apenas 257 híbridos, de modo que foi necessário fazer uso da matriz H, a qual combina informações de parentesco genético baseadas em pedigree e marcadores. Os resultados obtidos demonstraram que o uso de modelos de seleção genômica para múltiplos caracteres e ambientes pode aumentar a capacidade preditiva, especialmente para predizer a performance de híbridos nunca avaliados em qualquer ambiente.
|
120 |
Physical and Mental Health Status of Adults with Serious Mental Illness Participating in a Jail Diversion InterventionTelford, Robin 01 May 2014 (has links)
Adults with mental illnesses are at an increased risk to be diagnosed with one or more comorbid physical illnesses compared to the general population. Much of the disparities faced by adults with serious mental illnesses (SMI) can be attributed to medication side effects, increased risk for metabolic diseases, inability to communicate about severity and monitor physical health symptoms, poor health behaviors, high rates of smoking, and poor quality health care. The rate of physical illnesses for adults with mental illnesses are even higher among those who have been involved with the criminal justice system. In order to understand the relationship between physical and mental illnesses, longitudinal study designs are needed. Longitudinal studies can provide greater understanding of the temporal relationship of physical and mental illnesses. Despite the benefits of longitudinal studies, there also are challenges, including missing data.
The first manuscript of this dissertation explores the physical and mental health status of adults with mental illnesses. Secondary data were used from three different studies: a sample of adults with SMI enrolled in a mental health court jail diversion program (n=91); a sample of Medicaid enrollees with SMI in Florida (n=688) who were part of a larger Substance Abuse and Mental Health Services Administration (SAMHSA) study; and a sample of inpatient and outpatient adults with SMI from five different study sites (n=969). The samples were combined into two data sets, consisting of the jail diversion sample and the SAMHSA sample, and the jail diversion sample and the 5-site sample. Participants in these samples answered questions on the Short-Form Health Survey (SF-12), recent arrests, drug and alcohol use, socio-demographic information, and mental illness symptom severity (measured only in the criminal justice and 5-site samples).
Overall, the mental and physical health status scores were significantly lower for all of the participants compared to the general population mean scores. The participants reporting a recent arrest had a higher physical health score compared to those who did not have a recent arrest, and in the jail diversion and 5-site sample, had a lower mental health status score than those without a recent arrest. After taking age, drug and alcohol use, and psychiatric symptom severity into account, arrest was no longer associated with the physical health status score in either of the data sets. In the jail diversion and 5-site data set, arrest was still significantly associated with mental health status score after controlling for age, drug and alcohol use, and psychiatric symptom severity.
The second manuscript of this dissertation explores the analysis of missing data in a longitudinal study to determine the missing data mechanisms and missing data patterns, and subsequently, how to prepare the data for analysis by using multiple imputation or maximum likelihood estimation. Secondary data were drawn from the same jail diversion sample as in the first manuscript. Data were collected at baseline, three months, six months, and nine months. Only participants with the potential to have data collected at these time points were included (n=50).
Analysis revealed missing data due to missing item-level information, missing participant data at one time point but complete data at a subsequent time point, and missing participant data for those who dropped out of the study completely. The missing data mechanism for the missing item-level data were missing completely at random, whereas the participant-level missing data were missing at random. Multiple imputation was used for the item-level data and for the participant-level missing data. Maximum likelihood estimation was also used for the participant-level missing data and compared to the multiple imputation results. Findings suggest that multiple imputation produced more accurate parameter estimates, possibly due to the small sample size.
The findings from this study indicate that more research needs to be done to fully understand the physical illnesses experienced by adults with mental illnesses who are involved with the criminal justice system. Understanding mental and physical illness comorbidity is important in public health as it dictates appropriate treatments and training for behavioral health practitioners and staff. In addition, missing data in longitudinal studies cannot be ignored, as it can bias the results, and appropriate techniques for exploring the missing data must be used. When missing data is ignored in analyses, the subsequent results can be incorrect and unable to detect treatment effects, thereby preventing effective programs from receiving necessary funding. In addition, ignoring missing data can impact funding for behavioral health services by underestimating the prevalence and severity of mental illnesses. Future research should focus on exploring how mental and physical health are related in adults with a recent arrest compared to the general population, and ways to integrate services to address both mental and physical health.
|
Page generated in 0.0394 seconds