Spelling suggestions: "subject:"longitudinal data 2analysis"" "subject:"longitudinal data 3analysis""
61 |
On Cluster Robust ModelsSantiago Calderón, José Bayoán 01 January 2019 (has links)
Cluster robust models are a kind of statistical models that attempt to estimate parameters considering potential heterogeneity in treatment effects. Absent heterogeneity in treatment effects, the partial and average treatment effect are the same. When heterogeneity in treatment effects occurs, the average treatment effect is a function of the various partial treatment effects and the composition of the population of interest. The first chapter explores the performance of common estimators as a function of the presence of heterogeneity in treatment effects and other characteristics that may influence their performance for estimating average treatment effects. The second chapter examines various approaches to evaluating and improving cluster structures as a way to obtain cluster-robust models. Both chapters are intended to be useful to practitioners as a how-to guide to examine and think about their applications and relevant factors. Empirical examples are provided to illustrate theoretical results, showcase potential tools, and communicate a suggested thought process.
The third chapter relates to an open-source statistical software package for the Julia language. The content includes a description for the software functionality and technical elements. In addition, it features a critique and suggestions for statistical software development and the Julia ecosystem. These comments come from my experience throughout the development process of the package and related activities as an open-source and professional software developer. One goal of the paper is to make econometrics more accessible not only through accessibility to functionality, but understanding of the code, mathematics, and transparency in implementations.
|
62 |
Penalized mixed-effects ordinal response models for high-dimensional genomic data in twins and familiesGentry, Amanda E. 01 January 2018 (has links)
The Brisbane Longitudinal Twin Study (BLTS) was being conducted in Australia and was funded by the US National Institute on Drug Abuse (NIDA). Adolescent twins were sampled as a part of this study and surveyed about their substance use as part of the Pathways to Cannabis Use, Abuse and Dependence project. The methods developed in this dissertation were designed for the purpose of analyzing a subset of the Pathways data that includes demographics, cannabis use metrics, personality measures, and imputed genotypes (SNPs) for 493 complete twin pairs (986 subjects.) The primary goal was to determine what combination of SNPs and additional covariates may predict cannabis use, measured on an ordinal scale as: “never tried,” “used moderately,” or “used frequently”. To conduct this analysis, we extended the ordinal Generalized Monotone Incremental Forward Stagewise (GMIFS) method for mixed models. This extension includes allowance for a unpenalized set of covariates to be coerced into the model as well as flexibility for user-specified correlation patterns between twins in a family. The proposed methods are applicable to high-dimensional (genomic or otherwise) data with ordinal response and specific, known covariance structure within clusters.
|
63 |
Latent Growth Model Approach to Characterize Maternal Prenatal DNA Methylation TrajectoriesLapato, Dana 01 January 2019 (has links)
Background. DNA methylation (DNAm) is a removable chemical modification to the DNA sequence intimately associated with genomic stability, cellular identity, and gene expression. DNAm patterning reflects joint contributions from genetic, environmental, and behavioral factors. As such, differences in DNAm patterns may explain interindividual variability in risk liability for complex traits like major depression (MD). Hundreds of significant DNAm loci have been identified using cross-sectional association studies. This dissertation builds on that foundational work to explore novel statistical approaches for longitudinal DNAm analyses. Methods. Repeated measures of genome-wide DNAm and social and environmental determinants of health were collected up to six times across pregnancy and the first year postpartum as part of the Pregnancy, Race, Environment, Genes (PREG) Study. Statistical analyses were completed using a combination of the R statistical environment, Bioconductor packages, MplusAutomate, and Mplus software. Prenatal maternal DNAm was measured using the Infinium HumanMethylation450 Beadchip. Latent growth curve models were used to analyze repeated measures of maternal DNAm and to quantify site-level DNAm latent trajectories over the course of pregnancy. The purpose was to characterize the location and nature of prenatal DNAm changes and to test the influence of clinical and demographic factors on prenatal DNAm remodeling. Results. Over 1300 sites had DNAm trajectories significantly associated with either maternal age or lifetime MD. Many of the genomic regions overlapping significant results replicated previous age and MD-related genetic and DNAm findings. Discussion. Future work should capitalize on the progress made here integrating structural equation modeling (SEM) with longitudinal omics-level measures.
|
64 |
Bayesian Latent Variable Models for Biostatistical ApplicationsRidall, Peter Gareth January 2004 (has links)
In this thesis we develop several kinds of latent variable models in order to address three types of bio-statistical problem. The three problems are the treatment effect of carcinogens on tumour development, spatial interactions between plant species and motor unit number estimation (MUNE). The three types of data looked at are: highly heterogeneous longitudinal count data, quadrat counts of species on a rectangular lattice and lastly, electrophysiological data consisting of measurements of compound muscle action potential (CMAP) area and amplitude. Chapter 1 sets out the structure and the development of ideas presented in this thesis from the point of view of: model structure, model selection, and efficiency of estimation. Chapter 2 is an introduction to the relevant literature that has in influenced the development of this thesis. In Chapter 3 we use the EM algorithm for an application of an autoregressive hidden Markov model to describe longitudinal counts. The data is collected from experiments to test the effect of carcinogens on tumour growth in mice. Here we develop forward and backward recursions for calculating the likelihood and for estimation. Chapter 4 is the analysis of a similar kind of data using a more sophisticated model, incorporating random effects, but estimation this time is conducted from the Bayesian perspective. Bayesian model selection is also explored. In Chapter 5 we move to the two dimensional lattice and construct a model for describing the spatial interaction of tree types. We also compare the merits of directed and undirected graphical models for describing the hidden lattice. Chapter 6 is the application of a Bayesian hierarchical model (MUNE), where the latent variable this time is multivariate Gaussian and dependent on a covariate, the stimulus. Model selection is carried out using the Bayes Information Criterion (BIC). In Chapter 7 we approach the same problem by using the reversible jump methodology (Green, 1995) where this time we use a dual Gaussian-Binary representation of the latent data. We conclude in Chapter 8 with suggestions for the direction of new work. In this thesis, all of the estimation carried out on real data has only been performed once we have been satisfied that estimation is able to retrieve the parameters from simulated data. Keywords: Amyotrophic lateral sclerosis (ALS), carcinogens, hidden Markov models (HMM), latent variable models, longitudinal data analysis, motor unit disease (MND), partially ordered Markov models (POMMs), the pseudo auto- logistic model, reversible jump, spatial interactions.
|
65 |
Utilisation des antifongiques chez le patient non neutropénique en réanimation / Antifungal use on non neutropenic patients in Intensive Care UnitBailly, Sébastien 15 October 2015 (has links)
Les levures du genre Candida figurent parmi les pathogènes majeurs isolés chez les patients en soins intensifs et sont responsables d'infections systémiques : les candidoses invasives. Le retard et le manque de fiabilité du diagnostic sont susceptibles d'aggraver l'état du patient et d'augmenter le risque de décès à court terme. Pour respecter les objectifs de traitement, les experts recommandent de traiter le plus précocement possible les patients à haut risque de candidose invasive. Cette attitude permet de proposer un traitement précoce aux malades atteints, mais peut entraîner un traitement inutile et coûteux et favoriser l'émergence de souches de moindre sensibilité aux antifongiques utilisés.Ce travail applique des méthodes statistiques modernes à des données observationnelles longitudinales. Il étudie l'impact des traitements antifongiques systémiques sur la répartition des quatre principales espèces de Candida dans les différents prélèvements de patients en réanimation médicale, sur leur sensibilité à ces antifongiques, sur le diagnostic des candidémies ainsi que sur le pronostic des patients. Les analyses de séries de données temporelles à l'aide de modèles ARIMA (moyenne mobile autorégressive intégrée) ont confirmé l'impact négatif de l'utilisation des antifongiques sur la sensibilité des principales espèces de Candida ainsi que la modification de leur répartition sur une période de dix ans. L'utilisation de modèles hiérarchiques sur données répétées a montré que le traitement influence négativement la détection des levures et augmente le délai de positivité des hémocultures dans le diagnostic des candidémies. Enfin, l'utilisation des méthodes d'inférence causale a montré qu'un traitement antifongique préventif n'a pas d'impact sur le pronostic des patients non neutropéniques, non transplantés et qu'il est possible de commencer une désescalade précoce du traitement antifongique entre le premier et le cinquième jour après son initiation sans aggraver le pronostic. / Candida species are among the main pathogens isolated from patients in intensive care units (ICUs) and are responsible for a serious systemic infection: invasive candidiasis. A late and unreliable diagnosis of invasive candidiasis aggravates the patient's status and increases the risk of short-term death. The current guidelines recommend an early treatment of patients with high risks of invasive candidiasis, even in absence of documented fungal infection. However, increased antifungal drug consumption is correlated with increased costs and the emergence of drug resistance whereas there is yet no consensus about the benefits of the probabilistic antifungal treatment.The present work used modern statistical methods on longitudinal observational data. It investigated the impact of systemic antifungal treatment (SAT) on the distribution of the four Candida species most frequently isolated from ICU patients', their susceptibilities to SATs, the diagnosis of candidemia, and the prognosis of ICU patients. The use of autoregressive integrated moving average (ARIMA) models for time series confirmed the negative impact of SAT use on the susceptibilities of the four Candida species and on their relative distribution over a ten-year period. Hierarchical models for repeated measures showed that SAT has a negative impact on the diagnosis of candidemia: it decreases the rate of positive blood cultures and increases the time to positivity of these cultures. Finally, the use of causal inference models showed that early SAT has no impact on non-neutropenic, non-transplanted patient prognosis and that SAT de-escalation within 5 days after its initiation in critically ill patients is safe and does not influence the prognosis.
|
66 |
Modelos lineares mistos em estudos toxicológicos longitudinais / Linear mixed models in longitudinal toxicological studiesLuzia Pedroso de Oliveira 14 January 2015 (has links)
Os modelos mistos são apropriados na análise de dados longitudinais, agrupados e hierárquicos, permitindo descrever e comparar os perfis médios de respostas, levando em conta a variabilidade e a correlação entre as unidades experimentais de um mesmo grupo e entre os valores observados na mesma unidade experimental ao longo do tempo, assim como a heterogeneidade das variâncias. Esses modelos possibilitam a análise de dados desbalanceados, incompletos ou irregulares com relação ao tempo. Neste trabalho, buscou-se mostrar a flexibilidade dos modelos lineares mistos e a sua importância na análise de dados toxicológicos longitudinais. Os modelos lineares mistos foram utilizados para analisar os efeitos de dose no ganho de peso de ratos adultos machos e fêmeas, em teste de toxicidade por doses repetidas e também os efeitos de fase de gestação e dose nos perfis de pesos de filhotes de ratas tratadas. Foram comparados os modelos lineares mistos de regressão polinomial de grau 3, spline e de regressão por partes, ambos com um único ponto de mudança na idade média de abertura dos olhos dos filhotes, buscando o mais apropriado para descrever o crescimento dos mesmos ao longo do período de amamentação. São apresentados os códigos escritos no SAS/STAT para a análise exploratória dos dados, ajuste, comparação e validação dos modelos. Espera-se que o detalhamento da teoria e das aplicações apresentado contribua para a compreensão, interesse e uso desta metodologia por estatísticos e pesquisadores da área. / Mixed models are appropriate in the analysis of longitudinal, grouped and hierarchical data, allowing describe and compare the average response profiles, taking into account the variability and correlation among the experimental units of the same group and among the values observed over the time in the same experimental unit, as well as the heterogeneity of variances. These models allow the analysis of unbalanced, incomplete or irregular data with respect to time. This work aimed to show the flexibility of linear mixed models and its importance in the analysis of longitudinal toxicological data. Linear mixed models were used to evaluate the effects of doses in the body weight gain of adult male and female Wistar rats, in repeated doses toxicity test and also the effects of pregnancy period and dose in the pups growth of treated dams. It were compared the linear mixed models of third degree polynomial regression, spline and piecewise regression, both with a single point of change in the average time of pups eyes opening, searching for the most appropriate one to describe their growth along the lactation period. The SAS/STAT codes used for exploratory data analysis, comparison and validation of fitted models are presented. It is expected that the detailing of the theory and of the applications presented contribute with the understanding, interest and use of this methodology by statisticians and researchers in the area.
|
67 |
Modelagem simultânea de média e dispersão e aplicações na pesquisa agronômica / Joint modeling of mean and dispersion and applications to agricultural researchAfrânio Márcio Corrêa Vieira 10 February 2009 (has links)
Diversos delineamentos experimentais que são aplicados correntemente tomam como base experimentos agronômicos. Esses dados experimentais são, geralmente, analisados usando-se modelos que consideram uma variância residual constante (ou homogênea), como pressuposto inicial. Entretanto, esta pressuposição mostra-se relativamente forte quando se está diante de situações para as quais fatores ambientais ou externos exercem considerável influência nas medidas experimentais. Neste trabalho, são estudados modelos para a média e a variância, simultaneamente, com a variância estruturada de duas formas: (i) por meio de um preditor linear, que permite incorporar variáveis externas e fatores de ruído e (ii) por meio de efeitos aleatórios, que permitem acomodar tanto o efeito longitudinal quanto o efeito de superdispersão, no caso de medidas binárias repetidas no tempo. A classe de modelos lineares generalizados duplos (MLGD) foi aplicada a um estudo observacional que consistiu em medir a mortalidade de frangos de corte no fim da condição de espera pré-abate. Nesse problema, é forte a evidência de que alguns fatores influenciam a variabilidade, e consequentemente, diminuem a precisão das análises inferenciais. Outro problema agronômico relevante, associado à horticultura, são os experimentos de cultura de tecidos vegetais, em que o número de explantes que regeneram são contados. Como esse tipo de experimento apresenta um grande número de parâmetros a serem estimados, comparado ao tamanho da amostra, os modelos existente podem gerar estimativas questionáveis ou até levar a conclusões erroneas, uma vez esse que são baseados em grandes amostras para se fazer inferência estatística. Foi proposto um modelo linear generalizados duplo, para os dados de proporções, de uma perspectiva Bayesiana, visando a análise estatística sob pequenas amostras e a incorporação do conhecimento especialista no processo de estimação dos parâmetros. Um problema clínico, que envolve dados binários medidos repetidamente no tempo é apresentado e são propostos dois modelos que acomodam o efeito da superdispersão e a dependência longitudinal das medidas, utilizandos-se efeitos aleatórios. Foram obtidos resultados satisfatórios nos três problemas estudados. Os MLGD permitiram identificar os fatores associados à mortalidade das aves de corte, o que permitirá minimizar perdas e habilitar os processos de manejo, transporte e abate aos critérios de bem-estar animal e exigências da comunidade européia. O MLGD Bayesiano permitiu identificar o genótipo associado ao efeito de superdispersão, aumentando a precisão da inferência de seleção de variedades. Dois modelos combinados foram propostos logit-normal-Bernoulli-beta e o probit-normal-Bernoulli-beta, que acomodaram satisfatoriamente a superdispersão e a dependência longitudinal das medidas binárias. Esses resultados reforçam a importância de se modelar a média e a variância conjuntamente, o que aumenta a precisão na pesquisa agronômica, tanto em estudos experimentais quanto em estudos observacionais. / Several experimental designs that are currently applied are based on agricultural experiments. These experimental data are, usually, analised with statistical models that assume constant residual variance (or homogeneous), as basic assumption. However, this assumption shows hard to stand for, when environmental or external factors exert strong influence over the measurements. In this work, we study the joint modelling for the mean and the variance, the latter being structured on two ways: (i) through a linear predictor, which allows the incorporation of external variables and/or noise factors and (ii) by the use of random effects, that accommodate jointly the possible overdispersion effect and the dependence of longitudinal data in the case of binary measusurements taken over time. The class of double generalized linear models (DGLM) was applied to an observational study where the poultry mortality was measured in the preslaughter operations. With this situation, it can be observed that there is a strong influence from some environmental factors over the variability observed, and consequently, this reduces the precision of the inferential analysis. Another relevant agricultural problem, related to horticulture, is the tissue culture experiments, where the number of regenerated explants is counted. Usually, this kind of experiment use a large number of parameters to be estimated, when compared with the sample size. The current frequentist models are based on large samples for statistical inference and, under this experimental condition, can generate unreliable estimates or even lead to erroneous conclusions. A double generalized linear model was proposed to analyse proportion data, under the Bayesian perspective, which can be applied to small samples and can incorporate expert knowledge into the parameter estimation process. One clinical research, that measured binary data repeatedly through the time is presented and two models are proposed to fit the overdispersion effect and the dependence of longitudinal measurements, using random effects. It was obtained satisfactory results under these three problems studied. the DGLM allowed to identify factors associated with the poultry mortality, that will allow to minimize loss and improve the process, since the catching until lairage on slaughterhouse, agreeing with animal welfare criteria and the European community rules. The Bayesian DGLM allowed to identify the genotype associated with the overdispersion effect, increasing the precision on the inference about varieties selection. Two combined models were proposed, a logit-normal- Bernoulli-beta and a probit-normal-Bernoulli-beta, which have both addressed the overdispersion effect and the longitudinal dependence of the binary measurements. These results reinforce the importance to modelling mean and dispersion jointly, as a way to increase the precision of agricultural experimentation, be it on experimental studies or observational studies.
|
68 |
The Interactions of Relationships, Interest, and Self-Efficacy in Undergraduate PhysicsDou, Remy 07 March 2017 (has links)
This collected papers dissertation explores students’ academic interactions in an active learning, introductory physics settings as they relate to the development of physics self-efficacy and interest. The motivation for this work extends from the national call to increase participation of students in the pursuit of science, technology, engineering, and mathematics (STEM) careers. Self-efficacy and interest are factors that play prominent roles in popular, evidence-based, career theories, including the Social cognitive career theory (SCCT) and the identity framework. Understanding how these constructs develop in light of the most pervasive characteristic of the active learning introductory physics classroom (i.e., peer-to-peer interactions) has implications on how students learn in a variety of introductory STEM classrooms and settings structured after constructivist and sociocultural learning theories.
I collected data related to students’ in-class interactions using the tools of social network analysis (SNA). Social network analysis has recently been shown to be an effective and useful way to examine the structure of student relationships that develop in and out of STEM classrooms. This set of studies furthers the implementation of SNA as a tool to examine self-efficacy and interest formation in the active learning physics classroom. Here I represent a variety of statistical applications of SNA, including bootstrapped linear regression (Chapter 2), structural equation modeling (Chapter 3), and hierarchical linear modeling for longitudinal analyses (Chapter 4).
Self-efficacy data were collected using the Sources of Self-Efficacy for Science Courses – Physics survey (SOSESC-P), and interest data were collected using the physics identity survey. Data for these studies came from the Modeling Instruction sections of Introductory Physics with Calculus offered at Florida International University in the fall of 2014 and 2015. Analyses support the idea that students’ perceptions of one another impact the development of their social network centrality, which in turn affects their self-efficacy building experiences and their overall self-efficacy. It was shown that unlike career theories that emphasize causal relationships between the development of self-efficacy and the subsequent growth of student interest, in this context student interest takes precedence before the development of student self-efficacy. This outcome also has various implications for career theories.
|
69 |
Single-index regression modelsWu, Jingwei 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Useful medical indices pose important roles in predicting medical outcomes. Medical indices, such as the well-known Body Mass Index (BMI), Charleson Comorbidity Index, etc., have been used extensively in research and clinical practice, for the quantification of risks in individual patients. However, the development of these indices is challenged; and primarily based on heuristic arguments. Statistically, most medical indices can be expressed as a function of a linear combination of individual variables and fitted by single-index model. Single-index model represents a way to retain latent nonlinear features of the data without the usual complications that come with increased dimensionality. In my dissertation, I propose a single-index model approach to analytically derive indices from observed data; the resulted index inherently correlates with specific health outcomes of interest. The first part of this dissertation discusses the derivation of an index function for the prediction of one outcome using longitudinal data. A cubic-spline estimation scheme for partially linear single-index mixed effect model is proposed to incorporate the within-subject correlations among outcome measures contributed by the same subject. A recursive algorithm based on the optimization of penalized least square estimation equation is derived and is shown to work well in both simulated data and derivation of a new body mass measure for the assessment of hypertension risk in children. The second part of this dissertation extends the single-index model to a multivariate setting. Specifically, a multivariate version of single-index model for longitudinal data is presented. An important feature of the proposed model is the accommodation of both correlations among multivariate outcomes and among the repeated measurements from the same subject via random effects that link the outcomes in a unified modeling structure. A new body mass index measure that simultaneously predicts systolic and diastolic blood pressure in children is illustrated. The final part of this dissertation shows existence, root-n strong consistency and asymptotic normality of the estimators in multivariate single-index model under suitable conditions. These asymptotic results are assessed in finite sample simulation and permit joint inference for all parameters.
|
70 |
Impact of climate oscillations/indices on hydrological variables in the Mississippi River Valley Alluvial Aquifer.Raju, Meena 13 May 2022 (has links) (PDF)
The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis were used to evaluate the relationship between climate indices and hydrological variables and the combined effect of climate indices with hydrological variables. The results of the trend analysis indicated spatial variability within the aquifer, increase in streamflow and rainfall in the Northern region of the aquifer, while a decrease was observed in the southern region of the aquifer. Change point analysis of annual maximum, annual mean streamflow and annual precipitation revealed that statistically decreasing shifts occurred in 2001, 1998 and 1995, respectively. Results of PCC analysis indicated that streamflow and rainfall has a strong positive relationship between them with PCC values more than 0.6 in most of the locations within the basin. Results of the streamflow elasticity for the locations ranged from 0.987 to 2.33 for the various locations in the basin. Results of the PCC analysis for monthly maximum and mean streamflow showed significant maximum positive correlation coefficient for Nino 3.4. Monthly maximum rainfall showed a maximum significant positive correlation coefficient for PNA and Nino3.4 and the monthly mean rainfall showed a maximum significant positive correlation coefficient of 0.18 for Nino3.4. Results of the MLR analysis showed a maximum significant positive correlation coefficient of 0.31 for monthly maximum and mean streamflow of 0.21 and 0.23 for monthly maximum and mean rainfall, respectively. Overall, results from this research will help in understanding the impacts of global climate indices on rainfall and subsequently on streamflow discharge, so as to mitigate and manage water resource availability in the MRVAA underlying the LMRB.
|
Page generated in 0.0586 seconds