Global ETD Search

11	Treatment heterogeneity and potential outcomes in linear mixed effects models Richardson, Troy E. January 1900 (has links) Doctor of Philosophy / Department of Statistics / Gary L. Gadbury / Studies commonly focus on estimating a mean treatment effect in a population. However, in some applications the variability of treatment effects across individual units may help to characterize the overall effect of a treatment across the population. Consider a set of treatments, {T,C}, where T denotes some treatment that might be applied to an experimental unit and C denotes a control. For each of N experimental units, the duplet {r[subscript]i, r[subscript]Ci}, i=1,2,…,N, represents the potential response of the i[superscript]th experimental unit if treatment were applied and the response of the experimental unit if control were applied, respectively. The causal effect of T compared to C is the difference between the two potential responses, r[subscript]Ti- r[subscript]Ci. Much work has been done to elucidate the statistical properties of a causal effect, given a set of particular assumptions. Gadbury and others have reported on this for some simple designs and primarily focused on finite population randomization based inference. When designs become more complicated, the randomization based approach becomes increasingly difficult. Since linear mixed effects models are particularly useful for modeling data from complex designs, their role in modeling treatment heterogeneity is investigated. It is shown that an individual treatment effect can be conceptualized as a linear combination of fixed treatment effects and random effects. The random effects are assumed to have variance components specified in a mixed effects “potential outcomes” model when both potential outcomes, r[subscript]T,r[subscript]C, are variables in the model. The variance of the individual causal effect is used to quantify treatment heterogeneity. Post treatment assignment, however, only one of the two potential outcomes is observable for a unit. It is then shown that the variance component for treatment heterogeneity becomes non-estimable in an analysis of observed data. Furthermore, estimable variance components in the observed data model are demonstrated to arise from linear combinations of the non-estimable variance components in the potential outcomes model. Mixed effects models are considered in context of a particular design in an effort to illuminate the loss of information incurred when moving from a potential outcomes framework to an observed data analysis. Causal inference Counterfactual Generalized linear mixed models Subject-treatment interaction What would Fisher do Statistics (0463)
12	Robust mixtures of regression models Bai, Xiuqin January 1900 (has links) Doctor of Philosophy / Department of Statistics / Kun Chen and Weixin Yao / This proposal contains two projects that are related to robust mixture models. In the robust project, we propose a new robust mixture of regression models (Bai et al., 2012). The existing methods for tting mixture regression models assume a normal distribution for error and then estimate the regression param- eters by the maximum likelihood estimate (MLE). In this project, we demonstrate that the MLE, like the least squares estimate, is sensitive to outliers and heavy-tailed error distributions. We propose a robust estimation procedure and an EM-type algorithm to estimate the mixture regression models. Using a Monte Carlo simulation study, we demonstrate that the proposed new estimation method is robust and works much better than the MLE when there are outliers or the error distribution has heavy tails. In addition, the proposed robust method works comparably to the MLE when there are no outliers and the error is normal. In the second project, we propose a new robust mixture of linear mixed-effects models. The traditional mixture model with multiple linear mixed effects, assuming Gaussian distribution for random and error parts, is sensitive to outliers. We will propose a mixture of multiple linear mixed t-distributions to robustify the estimation procedure. An EM algorithm is provided to and the MLE under the assumption of t- distributions for error terms and random mixed effects. Furthermore, we propose to adaptively choose the degrees of freedom for the t-distribution using profile likelihood. In the simulation study, we demonstrate that our proposed model works comparably to the traditional estimation method when there are no outliers and the errors and random mixed effects are normally distributed, but works much better if there are outliers or the distributions of the errors and random mixed effects have heavy tails. Least square estimation EM algorithm Linear mixed models Mixture models Multivariate distribution Robust estimation Statistics (0463)
13	A case study in applying generalized linear mixed models to proportion data from poultry feeding experiments Shannon, Carlie January 1900 (has links) Master of Science / Department of Statistics / Leigh Murray / This case study was motivated by the need for effective statistical analysis for a series of poultry feeding experiments conducted in 2006 by Kansas State University researchers in the department of Animal Science. Some of these experiments involved an automated auger feed line system commonly used in commercial broiler houses and continuous, proportion response data. Two of the feed line experiments are considered in this case study to determine if a statistical model using a non-normal response offers a better fit for this data than a model utilizing a normal approximation. The two experiments involve fixed as well as multiple random effects. In this case study, the data from these experiments is analyzed using a linear mixed model and Generalized Linear Mixed Models (GLMM’s) with the SAS Glimmix procedure. Comparisons are made between a linear mixed model and GLMM’s using the beta and binomial responses. Since the response data is not count data a quasi-binomial approximation to the binomial is used to convert continuous proportions to the ratio of successes over total number of trials, N, for a variety of possible N values. Results from these analyses are compared on the basis of point estimates, confidence intervals and confidence interval widths, as well as p-values for tests of fixed effects. The investigation concludes that a GLMM may offer a better fit than models using a normal approximation for this data when sample sizes are small or response values are close to zero. This investigation discovers that these same instances can cause GLMM’s utilizing the beta response to behave poorly in the Glimmix procedure because lack of convergence issues prevent the obtainment of valid results. In such a case, a GLMM using a quasi-binomial response distribution with a high value of N can offer a reasonable and well behaved alternative to the beta distribution. Generalized linear mixed models GLIMMIX Quasi-binomial distribution Beta distribution Statistics (0463)
14	Statistical models for genomic selection in Panicum maximum considering allelic dosage / Modelos genéticos-estatísticos para seleção genômica em Panicum maximum com informação de dosagem alélica Lara, Letícia Aparecida de Castro 19 September 2017 (has links) Several species of economic interest are autotetraploid, such as the forage Panicum maximum, which is responsible for high productivity and quality of tropical pastures. The main accessions in nature are autotetraploid apomictic plants, on the other hand, diploid sexual plants may also be found. Although apomixis is advantageous because it fixes hybrid vigor, sexual reproduction is fundamental to allow genetic recombination by crossing among superior genotypes. Thus, genetic breeding consists of crossing apomictic plants with tetraploidized sexual plants. In these crosses, the use of superior sexual parents allows to increase the frequency of favorable alleles in the progeny. Therefore, recurrent selection programs in tetraploid sexual populations are fundamental to P. maximum breeding programs and strategies such as genomic selection can increase the accuracy of selection, allowing shorter breeding cycles and release cultivars in the market in the short term when compared to conventional programs. As P. maximum is a perennial crop, genotypes are evaluated in sucessive harvests. Thus, the study goals are to evaluate nutritional, structural, and yield traits in a sexual tetraploid population of P. maximum, investigating different classes of linear mixed models applied to longitudinal data, as well as to develop genomic selection models which consider tetraploid allelic dosage. This work was split into two chapters. In the first chapter, three classes of models were analyzed: i) Class A consists in modeling the interaction of genotypes and harvests with homogeneous correlations, genotypes were assumed not correlated, and residual effects were assumed homocedastic and not correlated; ii) Class B consists of groups of models in which genetic and residual effects were fitted with different variance and covariance (VCOV) structures and genotypes were not correlated; and iii) Class C is similar to Class B, however genotypes were correlated by an additive relationship matrix based on pedigree values. For all traits, Class C models performed better based on goodness of fit of the models. Therefore, we recommend to incorporate additive relationship matrix besides to model harvests with different levels of correlations over time. In the second chapter, SNP markers, obtained by genotyping-by-sequencing (GBS) technique, were used to develop Bayesian and GBLUP models that consider tetraploid allelic dosage. Bayesian models accuracies did not differ from the accuracy of GBLUP model and, we recommend the latter because it requires less computational time. The accuracy of genomic selection models reinforces the advantage of implementing this strategy in P. maximum breeding programs. / Diversas espécies de interesse econômico são autotetraploides, como a forrageira Panicum maximum, a qual proporciona alta produtividade e qualidade para pastagens tropicais. Os principais acessos na natureza são plantas apomíticas tetraploides, no entanto pode-se encontrar também plantas sexuais diploides. Embora a apomixia seja vantajosa pela facilidade em fixar o vigor híbrido, a reprodução sexual é fundamental por permitir recombinação genética a partir de cruzamentos entre genótipos superiores. Desta forma, o melhoramento nesta espécie consiste em cruzar plantas apomíticas com plantas sexuais tetraploidizadas. A utilização de parentais sexuais superiores nestes cruzamentos permite aumentar a frequência de alelos favoráveis na progênie. Portanto, programas de seleção recorrente intrapopulacional em populações sexuais tetraploides são fundamentais para programas de melhoramento em P. maximum. Além disto, a utilização de estratégias como seleção genômica são promissoras para aumentar os ganhos de seleção, permitindo avançar ciclos de seleção recorrente e lançar cultivares no mercado em menor prazo, quando comparados a programas convencionais. Como P. maximum é uma cultura perene, os genótipos são avaliados em sucessivos cortes. Assim, este estudo tem como finalidade avaliar caracteres de produtividade, estruturais e nutricionais em uma população sexual tetraploide de P. maximum, investigando diferentes classes de modelos lineares mistos aplicados a dados longitudinais, além de desenvolver modelos de seleção genômica que considerem a natureza tetraploide da população. Este trabalho foi dividido em dois capítulos. No primeiro capítulo, três classes de modelos foram analisados: i) Classe A consiste em modelar a interação genótipos por cortes com correlações homogêneas, genótipos não correlacionados entre si e os efeitos residuais são ajustados com homocedasticidade e ausência de correlação; ii) Classe B consiste em grupos de modelos com diferentes estruturas de variância e covariância (VCOV) para efeitos genéticos e residuais e genótipos não correlacionados; iii) Classe C é similar à Classe B, no entanto os genótipos são correlacionados por uma matriz de parentesco aditivo calculado por pedigree. Para todos os caracteres, os modelos da Classe C tiveram melhor ajuste. Portanto, recomenda-se testar matrizes de VCOV que permitam modelar cortes com diferentes níveis de correlações ao longo do tempo bem como incluir informação de parentesco aditivo e, se disponível, matriz de parentesco genômico. No segundo capítulo, marcadores SNPs, obtidos via genotipagem por sequenciamento, foram aplicados em modelos Bayesianos e GBLUP os quais foram desenvolvidos para incorporar informação de dosagem alélica tetraploide. Uma vez que as acurácias dos modelos Bayesianos não diferiram das acurácias do modelo GBLUP com dosagem alélica, recomenda-se o uso do segundo por requerer menos tempo computacional. A acurácia dos modelos preditivos reforça a vantagem em implementar seleção genômica em programas de melhoramento de P. maximum. Autotetraploides Autotetraploids Forage Forrageira Linear mixed models Melhoramento de plantas Modelos lineares mistos Plant breeding Predição Prediction
15	Métodos de diagnóstico para modelos lineares mistos / Diagnotics methods for linear mixed models. Juvencio Santos Nobre 04 March 2004 (has links) Muitos fenômenos podem ser representados por meio de modelos estatísticos de forma satisfatória. Para validar tais modelos é necessário verificar se as suposições envolvidas estão satisfeitas e se o modelo é sensível a pequenas perturbações; este é o objetivo da análise de diagnóstico. Neste trabalho apresentamos, discutimos e propomos técnicas de diagnóstico em modelos lineares mistos e as ilustramos com um exemplo prático. / Many phenomena can be represented through statistical models in a satisfactory way. To validate such models it is necessary to verify whether the assumptions are satisfied and whether the model is sensitive to small deviations; this constitutes the objective of diagnostic analysis. In this work we present, discuss and propose diagnostic techniques for mixed linear models and illustrate them with a practical example. Alavancagem e influência local. Modelos lineares mistos resíduos Linear mixed models residual and local influence.
16	Métodos de diagnóstico para modelos lineares mistos / Diagnotics methods for linear mixed models. Nobre, Juvencio Santos 04 March 2004 (has links) Muitos fenômenos podem ser representados por meio de modelos estatísticos de forma satisfatória. Para validar tais modelos é necessário verificar se as suposições envolvidas estão satisfeitas e se o modelo é sensível a pequenas perturbações; este é o objetivo da análise de diagnóstico. Neste trabalho apresentamos, discutimos e propomos técnicas de diagnóstico em modelos lineares mistos e as ilustramos com um exemplo prático. / Many phenomena can be represented through statistical models in a satisfactory way. To validate such models it is necessary to verify whether the assumptions are satisfied and whether the model is sensitive to small deviations; this constitutes the objective of diagnostic analysis. In this work we present, discuss and propose diagnostic techniques for mixed linear models and illustrate them with a practical example. Alavancagem e influência local. Linear mixed models Modelos lineares mistos residual and local influence. resíduos
17	Predicting risk of cyberbullying victimization using lasso regression Olaya Bucaro, Orlando January 2017 (has links) The increased online presence and use of technology by today’s adolescents has created new places where bullying can occur. The aim of this thesis is to specify a prediction model that can accurately predict the risk of cyberbullying victimization. The data used is from a survey conducted at five secondary schools in Pereira, Colombia. A logistic regression model with random effects is used to predict cyberbullying exposure. Predictors are selected by lasso, tuned by cross-validation. Covariates included in the study includes demographic variables, dietary habit variables, parental mediation variables, school performance variables, physical health variables, mental health variables and health risk variables such as alcohol and drug consumption. Included variables in the final model are demographic variables, mental health variables and parental mediation variables. Variables excluded in the final model includes dietary habit variables, school performance variables, physical health variables and health risk variables. The final model has an overall prediction accuracy of 88%. Multiple imputation Generalized linear mixed models Variable selection Probability Theory and Statistics Sannolikhetsteori och statistik
18	Generalized linear mixed modeling of signal detection theory Rabe, Maximilian Michael 10 April 2018 (has links) Signal Detection Theory (SDT; Green & Swets, 1966) is a well-established technique to analyze accuracy data in a number of experimental paradigms in psychology, most notably memory and perception, by separating a response bias/criterion from the theoretically bias-free discriminability/sensitivity. As SDT has traditionally been applied, the researcher may be confronted with loss in statistical power and erroneous inferences. A generalized linear mixed-effects modeling (GLMM) approach is presented and advantages with regard to power and precision are demonstrated with an example analysis. Using this approach, a correlation of response bias and sensitivity was detected in the dataset, especially prevalent at the item level, though a correlation between these measures is usually not found to be reported in the memory literature. Directions for future extensions of the method as well as a brief discussion of the correlation between response bias and sensitivity are enclosed. / Graduate / 2019-03-22 methodology cognitive psychology recognition memory signal detection theory generalized linear mixed models statistics
19	Assessing Relationships between Psychological and Biological Markers in Coronary Heart Disease Patients using Bivariate Linear Mixed Models Lally, Kristine January 2017 (has links) The Secondary Prevention in Uppsala Primary Health Care Project (SUPRIM) is a randomized controlled trial evaluating the effects of cognitive behavioral therapy on coronary heart disease patients. Various outcomes of psychological and physical health are recorded every six months approximately, over the course of two years after entry to the trial. In this thesis, relationships between the psychological outcome variables, Stress, Anxiety, Depression and Exhaustion, and five physical health biomarkers, are assessed using bivariate linear mixed models. Significant associations are found between one of the biomarkers and both Depression and Exhaustion, and also between one of the other biomarkers and Exhaustion. bivariate linear mixed models longitudinal outcomes cognitive behavioral therapy stress management Probability Theory and Statistics Sannolikhetsteori och statistik
20	Variable selection in joint modelling of mean and variance for multilevel data Charalambous, Christiana January 2011 (has links) We propose to extend the use of penalized likelihood based variable selection methods to hierarchical generalized linear models (HGLMs) for jointly modellingboth the mean and variance structures. We are interested in applying these newmethods on multilevel structured data, hence we assume a two-level hierarchical structure, with subjects nested within groups. We consider a generalized linearmixed model (GLMM) for the mean, with a structured dispersion in the formof a generalized linear model (GLM). In the first instance, we model the varianceof the random effects which are present in the mean model, or in otherwords the variation between groups (between-level variation). In the second scenario,we model the dispersion parameter associated with the conditional varianceof the response, which could also be thought of as the variation betweensubjects (within-level variation). To do variable selection, we use the smoothlyclipped absolute deviation (SCAD) penalty, a penalized likelihood variable selectionmethod, which shrinks the coefficients of redundant variables to 0 and at thesame time estimates the coefficients of the remaining important covariates. Ourmethods are likelihood based and so in order to estimate the fixed effects in ourmodels, we apply iterative procedures such as the Newton-Raphson method, inthe form of the LQA algorithm proposed by Fan and Li (2001). We carry out simulationstudies for both the joint models for the mean and variance of the randomeffects, as well as the joint models for the mean and dispersion of the response,to assess the performance of our new procedures against a similar process whichexcludes variable selection. The results show that our method increases both theaccuracy and efficiency of the resulting penalized MLEs and has 100% successrate in identifying the zero and non-zero components over 100 simulations. Forthe main real data analysis, we use the Health Survey for England (HSE) 2004dataset. We investigate how obesity is linked to several factors such as smoking,drinking, exercise, long-standing illness, to name a few. We also discover whetherthere is variation in obesity between individuals and between households of individuals,as well as test whether that variation depends on some of the factorsaffecting obesity itself. 005.73

Search results