Global ETD Search

31	Multiple Calibrations in Integrative Data Analysis: A Simulation Study and Application to Multidimensional Family Therapy Hall, Kristin Wynn 01 January 2013 (has links) A recent advancement in statistical methodology, Integrative Data Analyses (IDA Curran & Hussong, 2009) has led researchers to employ a calibration technique as to not violate an independence assumption. This technique uses a randomly selected, simplified correlational structured subset, or calibration, of a whole data set in a preliminary stage of analysis. However, a single calibration estimator suffers from instability, low precision and loss of power. To overcome this limitation, a multiple calibration (MC; Greenbaum et al., 2013; Wang et al., 2013) approach has been developed to produce better estimators, while still removing a level of dependency in the data as to not violate independence assumption. The MC method is conceptually similar to multiple imputation (MI; Rubin, 1987; Schafer, 1997), so MI estimators were borrowed for comparison. A simulation study was conducted to compare the MC and MI estimators, as well as to evaluate the performance of the operating characteristics of the methods in a cross classified data characteristic design. The estimators were tested in the context of assessing change over time in a longitudinal data set. Multiple calibrations consisting of a single measurement occasion per subject were drawn from a repeated measures data set, analyzed separately, and then combined by the rules set forth by each method to produce the final results. The data characteristics investigated were effect size, sample size, and the number of repeated measures per subject. Additionally, a real data application of an MC approach in an IDA framework was conducted on data from three completed, randomized controlled trials studying the treatment effects of Multidimensional Family Therapy (MDFT; Liddle et al., 2002) on substance use trajectories for adolescents at a one year follow-up. The simulation study provided empirical evidence of how the MC method preforms, as well as how it compares to the MI method in a total of 27 hypothetical scenarios. There were strong asymptotic tendencies observed for the bias, standard error, mean square error and relative efficiency of an MC estimator to approach the whole set estimators as the number of calibrations approached 100. The MI combination rules proved not appropriate to borrow for the MC case because the standard error formulas were too conservative and performance with respect to power was not robust. As a general suggestion, 5 calibrations are sufficient to produce an estimator with about half the bias of a single calibration estimator and at least some indication of significance, while 20 calibrations are ideal. After 20 calibrations, the contribution of an additional calibration to the combined estimator greatly diminished. The MDFT application demonstrated a successful implementation of 5 calibration approach in an IDA on real data, as well as the risk of missing treatment effects when analysis is limited to a single calibration's results. Additionally, results from the application provided evidence that MDFT interventions reduced the trajectories of substance use involvement at a 1-year follow-up to a greater extent than any of the active control treatment groups, overall and across all gender and ethnicity subgroups. This paper will aid researchers interested in employing a MC approach in an IDA framework or whenever a level of dependency in a data set needs to be removed for an independence assumption to hold. adolescent substance use commensurate measuers latent variables longitudinal data analysis moderated non linear factor analysis pooled data analysis Biostatistics
32	Bayesian model estimation and comparison for longitudinal categorical data Tran, Thu Trung January 2008 (has links) In this thesis, we address issues of model estimation for longitudinal categorical data and of model selection for these data with missing covariates. Longitudinal survey data capture the responses of each subject repeatedly through time, allowing for the separation of variation in the measured variable of interest across time for one subject from the variation in that variable among all subjects. Questions concerning persistence, patterns of structure, interaction of events and stability of multivariate relationships can be answered through longitudinal data analysis. Longitudinal data require special statistical methods because they must take into account the correlation between observations recorded on one subject. A further complication in analysing longitudinal data is accounting for the non- response or drop-out process. Potentially, the missing values are correlated with variables under study and hence cannot be totally excluded. Firstly, we investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from the Longitudinal Survey of Immigrants to Australia. Data for each subject is observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia. Secondly, we examine the Bayesian model selection techniques of the Bayes factor and Deviance Information Criterion for our regression models with miss- ing covariates. Computing Bayes factors involve computing the often complex marginal likelihood p(y\|model) and various authors have presented methods to estimate this quantity. Here, we take the approach of path sampling via power posteriors (Friel and Pettitt, 2006). The appeal of this method is that for hierarchical regression models with missing covariates, a common occurrence in longitudinal data analysis, it is straightforward to calculate and interpret since integration over all parameters, including the imputed missing covariates and the random effects, is carried out automatically with minimal added complexi- ties of modelling or computation. We apply this technique to compare models for the employment status of immigrants to Australia. Finally, we also develop a model choice criterion based on the Deviance In- formation Criterion (DIC), similar to Celeux et al. (2006), but which is suitable for use with generalized linear models (GLMs) when covariates are missing at random. We define three different DICs: the marginal, where the missing data are averaged out of the likelihood; the complete, where the joint likelihood for response and covariates is considered; and the naive, where the likelihood is found assuming the missing values are parameters. These three versions have different computational complexities. We investigate through simulation the performance of these three different DICs for GLMs consisting of normally, binomially and multinomially distributed data with missing covariates having a normal distribution. We find that the marginal DIC and the estimate of the effective number of parameters, pD, have desirable properties appropriately indicating the true model for the response under differing amounts of missingness of the covariates. We find that the complete DIC is inappropriate generally in this context as it is extremely sensitive to the degree of missingness of the covariate model. Our new methodology is illustrated by analysing the results of a community survey.
33	Time Trends and Predictors of Initiation for Cigarette and Waterpipe Smoking Among Jordanian School Children: Irbid, 2008-2011 McKelvey, Karma L, PhD 23 June 2014 (has links) Smoking prevalence among adolescents in the Middle East remains high while rates of smoking have been declining among adolescents elsewhere. The aims of this research were to (1) describe patterns of cigarette and waterpipe (WP) smoking, (2) identify determinants of WP smoking initiation, and (3) identify determinants of cigarette smoking initiation in a cohort of Jordanian school children. Among this cohort of school children in Irbid, Jordan, (age ≈ 12.6 at baseline) the first aim (N=1,781) described time trends in smoking behavior, age at initiation, and changes in frequency of smoking from 2008-2011 (grades 7 – 10). The second aim (N=1,243) identified determinants of WP initiation among WP-naïve students; and the third aim (N=1,454) identified determinants of cigarette smoking initiation among cigarette naïve participants. Determinants of initiation were assessed with generalized mixed models. All analyses were stratified by gender. Baseline prevalence of current smoking (cigarettes or WP) for boys and girls was 22.9% and 8.7% respectively. Prevalence of ever- and current- any smoking, cigarette smoking, WP smoking, and dual cigarette/WP smoking was higher in boys than girls each year (p These studies reveal intensive smoking patterns at early ages among Jordanian youth in Irbid, characterized by a predominance of WP smoking. WP may be a vehicle for tobacco dependence and subsequent cigarette uptake. The sizeable incidence of WP and cigarette initiation among students of both sexes points to a need for culturally relevant smoking prevention interventions. Gender-specific factors, refusal skills, and smoking cessation of both WP and cigarettes for youth and their parents/teachers would be important components of such initiatives. cigarette cohort initiation Jordan longitudinal school children smoking time trends waterpipe Epidemiology Public Health
34	Multiple Testing Correction with Repeated Correlated Outcomes: Applications to Epigenetics Leap, Katie 27 October 2017 (has links) Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time. We found that testing each site with a linear mixed effects model and then controlling the false discovery rate (FDR) had the highest positive predictive value (PPV), a low number of false positives, and was able to differentiate between differential methylation that was present at only one time point vs. a persistent relationship. In contrast, methods that controlled FDR at a single time point and ad hoc methods tended to have lower PPV, more false positives, and/or were unable to differentiate these conditions. Validation in data obtained from Project Viva found a difference between fitting longitudinal models only to sites significant at one time point and fitting all sites longitudinally. multiple testing epigenetics methylation mixed models longitudinal false discovery rate Bioinformatics Biostatistics Computational Biology
35	A Comparison of Techniques for Handling Missing Data in Longitudinal Studies Bogdan, Alexander R 07 November 2016 (has links) Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models. Missing Data Longitudinal Study Multiple Imputation Biostatistics Epidemiology Other Statistics and Probability Women's Health
36	Novel statistical models for ecological momentary assessment studies of sexually transmitted infections He, Fei 18 July 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The research ideas included in this dissertation are motivated by a large sexually trans mitted infections (STIs) study (IU Phone study), which is also an ecological momentary assessment (EMA) study implemented by Indiana University from 2008 to 2013. EMA, as a group of methods used to collect subjects’ up-to-date behaviors and status, can increase the accuracy of this information by allowing a participant to self-administer a survey or diary entry, in their own environment, as close to the occurrence of the behavior as possible. IU Phone study’s high reporting level shows one of the beneﬁts gain from introducing EMA in STIs study. As a prospective study lasting for 84 days, participants in IU Phone study undergo STI testing and complete EMA forms with project-furnished cellular telephones according to the predetermined schedules. At pre-selected eight-hour intervals, participants respond to a series of questions to identify sexual and non-sexual interactions with speciﬁc partners including partner name, relationship satisfaction and sexual satisfaction with this partner, time of each coital event and condom use for each event. etc. STIs lab results of all the participants are collected weekly as well. We are interested in several variables related to the risk of infection and sexual or non-sexual behaviors, especially the relationship among the longitudinal processes of those variables. New statistical models and applications are established to deal with the data with complex dependence and sampling data structures. The methodologies covers various of statistical aspect like generalized mixed models, mul tivariate models and autoregressive and cross-lagged model in longitudinal data analysis, misclassiﬁcation adjustment in imperfect diagnostic tests, and variable-domain functional regression in functional data analysis. The contribution of our work is we bridge the meth ods from diﬀerent areas with EMA data in the IU Phone study and also build up a novel understanding of the association among all the variables of interest from diﬀerent perspec tives based on the characteristic of the data. Besides all the statistical analyses included in this dissertation, variety of data visualization techniques also provide informative support in presenting the complex EMA data structure. Ecological momentary assessment Functional data analysis Generalized mixed models Longitudinal data analysis Misclassification adjustment Sexually transmitted infections
37	Comparison of Time Series and Functional Data Analysis for the Study of Seasonality. Allen, Jake 17 August 2011 (has links) (PDF) Classical time series analysis has well known methods for the study of seasonality. A more recent method of functional data analysis has proposed phase-plane plots for the representation of each year of a time series. However, the study of seasonality within functional data analysis has not been explored extensively. Time series analysis is first introduced, followed by phase-plane plot analysis, and then compared by looking at the insight that both methods offer particularly with respect to the seasonal behavior of a variable. Also, the possible combination of both approaches is explored, specifically with the analysis of the phase-plane plots. The methods are applied to data observations measuring water flow in cubic feet per second collected monthly in Newport, TN from the French Broad River. Simulated data corresponding to typical time series cases are then used for comparison and further exploration. moving averages French Broad River X-11 phase-plane plots Physical Sciences and Mathematics Statistics and Probability
38	MULTI-STATE MODELS WITH MISSING COVARIATES Lou, Wenjie 01 January 2016 (has links) Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models. Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the observed data likelihood function is based on the model of a joint distribution of the response longitudinal event history data and the discrete covariate with missing values. Secondly, we proposed a maximum simulated likelihood method to deal with the missing continuous covariate when applying multi-state models. The observed data likelihood function was approximated by using the Monte Carlo simulation method. At last, an EM algorithm was used to deal with multiple missing covariates when estimating the parameters of multi-state model. The EM algorithm would be able to handle multiple missing discrete covariates in general missing pattern efficiently. All the proposed methods are justified by simulation studies and applications to the datasets from the SMART project, a consortium of 11 different high-quality longitudinal studies of aging and cognition. Longitudinal event history data multi-state model missing covariate data EM algorithm maximum simulated likelihood SMART project Applied Statistics Statistical Models
39	Modelos lineares mistos em dados longitudionais com o uso do pacote ASReml-R / Linear Mixed Models with longitudinal data using ASReml-R package Alcarde, Renata 10 April 2012 (has links) Grande parte dos experimentos instalados atualmente é planejada para que sejam realizadas observações ao longo do tempo, ou em diferentes profundidades, enfim, tais experimentos geralmente contem um fator longitudinal. Uma maneira de se analisar esse tipo de conjunto de dados é utilizando modelos mistos, por meio da inclusão de fatores de efeito aleatório e, fazendo uso do método da máxima verossimilhança restrita (REML), podem ser estimados os componentes de variância associados a tais fatores com um menor viés. O pacote estatístico ASReml-R, muito eficiente no ajuste de modelos lineares mistos por possuir uma grande variedade de estruturas para as matrizes de variâncias e covariâncias já implementadas, apresenta o inconveniente de nao ter como objetos as matrizes de delineamento X e Z, nem as matrizes de variâncias e covariâncias D e , sendo estas de grande importância para a verificação das pressuposições do modelo. Este trabalho reuniu ferramentas que facilitam e fornecem passos para a construção de modelos baseados na aleatorização, tais como o diagrama de Hasse, o diagrama de aleatorização e a construção de modelos mistos incluindo fatores longitudinais. Sendo o vetor de resíduos condicionais e o vetor de parâmetros de efeitos aleatórios confundidos, ou seja, não independentes, foram obtidos resíduos, denominados na literatura, resíduos com confundimento mínimo e, como proposta deste trabalho foi calculado o EBLUP com confudimento mínimo. Para tanto, foram implementadas funções que, utilizando os objetos de um modelo ajustado com o uso do pacote estatístico ASReml-R, tornam disponíveis as matrizes de interesse e calculam os resíduos com confundimento mínimo e o EBLUP com confundimento m´nimo. Para elucidar as técnicas neste apresentadas e salientar a importância da verificação das pressuposições do modelo adotado, foram considerados dois exemplos contendo fatores longitudinais, sendo o primeiro um experimento simples, visando a comparação da eficiência de diferentes coberturas em instalações avícolas, e o segundo um experimento realizado em três fases, contendo fatores inteiramente confundidos, com o objetivos de avaliar características do papel produzido por diferentes espécies de eucaliptos em diferentes idades. / Currently, most part of the experiments installed is designed to be carried out observations over time or at different depths. These experiments usually have a longitudinal factor. One way of analyzing this data set is by using mixed models through means of inclusion of random effect factors, and it is possible to estimate the variance components associated to such factors with lower bias by using the Restricted maximum likelihood method (REML). The ASRemi-R statistic package, very efficient in fitting mixed linear models because it has a wide variety of structures for the variance - covariance matrices already implemented, presents the disadvantage of having neither the design matricesX and Z, nor the variance - covariance matrices D and , and they are very important to verify the assumption of the model. This paper gathered tools which facilitate and provide steps to build models based on randomization such as the Hasse diagram, randomization diagram and the mixed model formulations including longitudinal factors. Since the conditional residuals and random effect parameters are confounded, that is, not independent, it was calculated residues called in the literature as least confounded residuals and as a proposal of this work, it was calculated the least confound EBLUP. It was implemented functions which using the objects of fitted models with the use of the ASReml-R statistic package becoming available the matrices of interests and calculate the least confounded residuals and the least confounded EBLUP. To elucidate the techniques shown in this paper and highlight the importance of the verification of the adopted models assumptions, it was considered two examples with longitudinal factors. The former example was a simple experiment and the second one conducted in three phases, containing completely confounded factors, with the purpose of evaluating the characteristics of the paper produced by different species of eucalyptus from different ages. Agricultural experiments Análise de dados longitudinais Applied statistics Aviaries Aviários Estatística aplicada Eucalipto Eucalyptus Experimentos agrícolas Likelihood Linear mixed models Longitudinal data analysis Modelos lineares mistos Verossimilhança
40	A machine learning perspective on repeated measures Karch, Julian 09 November 2016 (has links) Wiederholte Messungen mehrerer Individuen sind von entscheidender Bedeutung für die Psychologie. Beispiele sind längsschnittliche Paneldaten und Elektroenzephalografie-Daten (EEG-Daten). In dieser Arbeit entwickle ich für jede dieser beiden Datenarten neue Analyseansätze, denen Methoden des maschinellen Lernens zu Grunde liegen. Für Paneldaten entwickle ich Gauß-Prozess-Panelmodellierung (GPPM), die auf der flexiblen Bayesschen Methode der Gauß-Prozess-Regression basiert. Der Vergleich von GPPM mit längsschnittlicher Strukturgleichungsmodellierung (lSEM), welche die meisten herkömmlichen Panelmodellierungsmethoden als Sonderfälle enthält, zeigt, dass lSEM wiederum als Sonderfall von GPPM aufgefasst werden kann. Im Gegensatz zu lSEM eignet sich GPPM gut zur zeitkontinuierlichen Modellierung, kann eine größere Menge von Modellen beschreiben, und beinhaltet einen einfachen Ansatz zur Generierung personenspezifischer Vorhersagen. Ich zeige, dass die implementierte GPPM-Darstellung gegenüber bestehender SEM Software eine bis zu neunfach beschleunigte Parameterschätzung erlaubt. Für EEG-Daten entwickle ich einen personenspezifischen Modellierungsansatz zur Identifizierung und Quantifizierung von Unterschieden zwischen Personen, die in konventionellen EEG-Analyseverfahren ignoriert werden. Im Rahmen dieses Ansatzes wird aus einer großen Menge hypothetischer Kandidatenmodelle das beste Modell für jede Person ausgewählt. Zur Modellauswahl wird ein Verfahren aus dem Bereich des maschinellen Lernens genutzt. Ich zeig ich, wie die Modelle sowohl auf der Personen- als auch auf der Gruppenebene interpretiert werden können. Ich validiere den vorgeschlagenen Ansatz anhand von Daten zur Arbeitsgedächtnisleistung. Die Ergebnisse verdeutlichen, dass die erhaltenen personenspezifischen Modelle eine genauere Beschreibung des Zusammenhangs von Verhalten und Hirnaktivität ermöglichen als konventionelle, nicht personenspezifische EEG-Analyseverfahren. / Repeated measures obtained from multiple individuals are of crucial importance for developmental research. Examples of repeated measures obtained from multiple individuals include longitudinal panel and electroencephalography (EEG) data. In this thesis, I develop a novel analysis approach based on machine learning methods for each of these two data modalities. For longitudinal panel data, I develop Gaussian process panel modeling (GPPM), which is based on the flexible Bayesian approach of Gaussian process regression. The comparison of GPPM with longitudinal structural equation modeling (SEM), which contains most conventional panel modeling approaches as special cases, reveals that GPPM in turn encompasses longitudinal SEM as a special case. In contrast to longitudinal SEM, GPPM is well suited for continuous-time modeling, can express a larger set of models, and includes a straightforward approach to obtain person-specific predictions. The comparison between the developed GPPM toolbox and existing SEM software reveals that the GPPM representation of popular longitudinal SEMs decreases the amount of time needed for parameter estimation up to ninefold. For EEG data, I develop an approach to derive person-specific models for the identification and quantification of between-person differences in EEG responses that are ignored by conventional EEG analysis methods. The approach relies on a framework that selects the best model for each person based on a large set of hypothesized candidate models using a model selection approach from machine learning. I show how the obtained models can be interpreted on the individual as well as on the group level. I validate the proposed approach on a working memory data set. The results demonstrate that the obtained person-specific models provide a more accurate description of the link between behavior and EEG data than the conventional nonspecific EEG analysis approach. Längsschnittliche Datenanalyse EEG-Analyse Machine learning Inter- and intraindividuelle Variation Longitudinal data analysis EEG analysis Machine learning Inter- and intraindividual variation 150 Psychologie 11 Psychologie CM 3200 ddc:150

Search results