Global ETD Search

21	Multilevel Model Selection: A Regularization Approach Incorporating Heredity Constraints Stone, Elizabeth Anne January 2013 (has links) This dissertation focuses on estimation and selection methods for a simple linear model with two levels of variation. This model provides a foundation for extensions to more levels. We propose new regularization criteria for model selection, subset selection, and variable selection in this context. Regularization is a penalized-estimation approach that shrinks the estimate and selects variables for structured data. This dissertation introduces a procedure (HM-ALASSO) that extends regularized multilevel-model estimation and selection to enforce principles of fixed heredity (e.g., including main effects when their interactions are included) and random heredity (e.g., including fixed effects when their random terms are included). The goals in developing this method were to create a procedure that provided reasonable estimates of all parameters, adhered to fixed and random heredity principles, resulted in a parsimonious model, was theoretically justifiable, and was able to be implemented and used in available software. The HM-ALASSO incorporates heredity-constrained selection directly into the estimation process. HM-ALASSO is shown to enjoy the properties of consistency, sparsity, and asymptotic normality. The ability of HM-ALASSO to produce quality estimates of the underlying parameters while adhering to heredity principles is demonstrated using simulated data. The performance of HM-ALASSO is illustrated using a subset of the High School and Beyond (HS&B) data set that includes math-achievement outcomes modeled via student- and school-level predictors. The HM-ALASSO framework is flexible enough that it can be adapted for various rule sets and parameterizations. / Statistics Statistics Hierarchical Linear Model Mixed-effects Model Variable Selection
22	MonsterLM: A method to estimate the variance explained by genome-wide interactions with environmental factors Khan, Mohammad January 2020 (has links) Estimations of heritability and variance explained due to environmental exposures and interaction effects help in understanding complex diseases. Current methods to detect such interactions rely on variance component methods. These methods have been neces- sary due to the m » n problem, where the number of predictors (m) vastly outnumbers the number of observations (n). These methods are all computationally intensive, which is further exacerbated when considering gene-environment interactions, as the number of predictors increases from m to 2m+1 in the case of a single environmental exposure. Novel methods are thus needed to enable fast and unbiased calculations of the variance explained (R2) for gene-environment interactions in very large samples on multiple traits. Taking advantage of the large number of participants in contemporary genetic studies, we herein propose a novel method for continuous trait R2 estimates that are up to 20 times faster than current methods. We have devised a novel method, monsterlm, that enables multiple linear regression on large regions encompassing tens of thousands of variants in hundreds of thousands of participants. We tested monsterlm with simulations using real genotypes from the UK Biobank. During simulations we verified the properties of monsterlm to estimate the variance explained by interaction terms. Our preliminary results showcase potential interactions between blood biochemistry biomarkers such as HbA1c, Triglycerides and ApoB with an environmental factor relating to obesity-related lifestyle factor: Waist-hip Ratio (WHR). We further investigate these results to reveal that more than 50% of the interaction variance calculated can be attributed to ∼5% of the single-nucleotide polymorphisms (SNPs) interacting with the environmental trait. Lastly, we showcase the impact of interactions on improving polygenic risk scores. / Thesis / Master of Science (MSc)
23	MATLODE: A MATLAB ODE Solver and Sensitivity Analysis Toolbox D'Augustine, Anthony Frank 04 May 2018 (has links) Sensitivity analysis quantifies the effect that of perturbations of the model inputs have on the model's outputs. Some of the key insights gained using sensitivity analysis are to understand the robustness of the model with respect to perturbations, and to select the most important parameters for the model. MATLODE is a tool for sensitivity analysis of models described by ordinary differential equations (ODEs). MATLODE implements two distinct approaches for sensitivity analysis: direct (via the tangent linear model) and adjoint. Within each approach, four families of numerical methods are implemented, namely explicit Runge-Kutta, implicit Runge-Kutta, Rosenbrock, and single diagonally implicit Runge-Kutta. Each approach and family has its own strengths and weaknesses when applied to real world problems. MATLODE has a multitude of options that allows users to find the best approach for a wide range of initial value problems. In spite of the great importance of sensitivity analysis for models governed by differential equations, until this work there was no MATLAB ordinary differential equation sensitivity analysis toolbox publicly available. The two most popular sensitivity analysis packages, CVODES [8] and FATODE [10], are geared toward the high performance modeling space; however, no native MATLAB toolbox was available. MATLODE fills this need and offers sensitivity analysis capabilities in MATLAB, one of the most popular programming languages within scientific communities such as chemistry, biology, ecology, and oceanogra- phy. We expect that MATLODE will prove to be a useful tool for these communities to help facilitate their research and fill the gap between theory and practice. / Master of Science / Sensitivity analysis is the study of how small changes in a model?s input effect the model’s output. Sensitivity analysis provides tools to quantify the impact that small, discrete changes in input values have on the output. The objective of this research is to develop a MATLAB sensitivity analysis toolbox called MATLODE. This research is critical to a wide range of communities who need to optimize system behavior or predict outcomes based on a variety of initial conditions. For example, an analyst could build a model that reflects the performance of an automobile engine, where each part in the engine has a set of initial characteristics. The analyst can use sensitivity analysis to determine which part effects the engine’s overall performance the most (or the least), without physically building the engine and running a series of empirical tests. By employing sensitivity analysis, the analyst saves time and money, and since multiple tests can usually be run through the model in the time needed to run just one empirical test, the analyst is likely to gain deeper insight and design a better product. Prior to MATLODE, employing sensitivity analysis without significant knowledge of computational science was too cumbersome and essentially impractical for many of the communities who could benefit from its use. MATLODE bridges the gap between computational science and a variety of communities faced with understanding how small changes in a system’s input values effect the systems output; and by bridging that gap, MATLODE enables more large scale research initiatives than ever before. ODE Solver Tangent Linear Model Adjoint Model Sensitivity Analysis Software
24	A Paired Comparison Approach for the Analysis of Sets of Likert Scale Responses Dittrich, Regina, Francis, Brian, Hatzinger, Reinhold, Katzenbeisser, Walter January 2005 (has links) (PDF) This paper provides an alternative methodology for the analysis of a set of Likert responses measured on a common attitudinal scale when the primary focus of interest is on the relative importance of items in the set. The method makes fewer assumptions about the distribution of the responses than the more usual approaches such as comparisons of means, MANOVA or ordinal data methods. The approach transforms the Likert responses into paired comparison responses between the items. The complete multivariate pattern of responses thus produced can be analysed by an appropriately reformulated paired comparison model. The dependency structure between item responses can also be modelled flexibly. The advantage of this approach is that sets of Likert responses can be analysed simultaneously within the Generalized Linear Model framework, providing standard likelihood based inference for model selection. This method is applied to a recent international survey on the importance of environmental problems. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
25	Tests pour la dépendance entre les sections dans un modèle de Poisson Roussel, Arnaud 05 1900 (has links) Les simulations et figures ont été réalisées avec le logiciel R. / Pour des données de panel, les mesures répétées dans le temps peuvent remettre en cause l’hypothèse d’indépendance entre les individus. Des tests ont été développés pour pouvoir vérifier s’il reste de la dépendance entre les résidus d’un modèle. Les trois tests que nous présentons dans ce mémoire sont ceux de Pesaran (2004), Friedman (1937) et Frees (1995). Ces trois tests se basent sur les résidus (et leurs corrélations) et ont été construits pour des modèles linéaires. Nous voulons étudier dans ce mémoire les performances de ces trois tests dans le cadre d’un modèle linéaire généralisé de Poisson. Dans ce but, on compare tout d’abord leurs performances (niveaux et puissances) pour deux modèles linéaires, l’un ayant un terme autorégressif et l’autre non. Par la suite, nous nous intéressons à leurs performances pour un modèle linéaire généralisé de Poisson en s’inspirant de Hsiao, Pesaran et Pick (2007) qui adaptent le test de Pesaran (2004) pour un modèle linéaire généralisé. Toutes nos comparaisons de performances se feront à l’aide de simulations dans lesquelles nous ferons varier un certain nombre de paramètres (nombre d’observations, force de la dépendance, etc.). Nous verrons que lorsque les corrélations sont toutes du même signe, le test de Pesaran donne en général de meilleurs résultats, à la fois dans les cas linéaires et pour le modèle linéaire généralisé. Le test de Frees présentera de bonnes propriétés dans le cas où le signe des corrélations entre les résidus alterne. / For panel data, repeated measures over time can challenge the hypothesis of dependence between subjects. Tests were developped in order to assess if some dependence remains among residuals. The three tests we present in this master thesis are from Pesaran (2004), Friedman (1937) and Frees (1995). These three tests, constructed specifically for linear models, are based on the residuals generated from models (and their correlations). We wish to study in this master thesis the performances of these three tests in the case of generalized linear Poisson models. For that goal, we compare them between each other (level, power, etc.) using two linear models, one with an autoregressive term and the other without. Next, inspired by Hsiao, Pesaran and Pick (2007) who adapt the test from Pesaran (2004), we will study their performances in a generalized Poisson model. All of our comparisons are done with simulations by modifying some variables (number of observations, strength of the dependence). We will observe that when the correlation is always of the same sign, Pesaran’s test is the best in most cases, for the linear models and the generalized linear model. Frees’ test will show good performances when the sign of the correlations alternates. Poisson Résidus Régression Modèle linéaire Modèle linéaire généralisé Données de panel Dépendance Test Residuals Regression Linear model Generalized linear model Panel data Dependence
26	Comparação de métodos no estudo da estabilidade fenotípica / Comparasion of methods for the study of phenotypic stability Elton Rafael Mauricio da Silva Pereira 04 February 2010 (has links) É comum o estudo da estabilidade fenótipica em genoótipos de cana-de-açúcar. Varias são as metodologias para estudar a interação genótipos x ambientes (G x E). O desafio para os melhoristas é encontrar variedades de cana-de-açúcar com desempenho superior em diversos ambientes, ou seja, que sejam altamente produtivas e também responsivas com a melhoria do ambiente. O estudo das metodologias permite verificar se determinada técnica biométrica é eficiente para expressar o comportamento de genótipos em vários ambientes e tambem permite aprimorá-las para que as conclusões sejam mais confiáveis. Este trabalho teve como objetivo comparar dois métodos de regressão utilizados para avaliar a estabilidade fenótipica em variedades de cana-de-açúcar: o linear, de Cruz, Torres e Vencovsky (1989) e o não-linear, de Toler e Burrows (1998). Foram utilizados dados da variável tonelada de cana por hectare - TCH, fornecidos pelo Programa de Melhoramento Genetico da Cana-de- Acucar da UFSCar, compreendendo sete locais e 18 genótipos de cana-de-açúcar. Quando se realizou o enquadramento dos genotipos nos diferentes grupos, 17 genotipos dos 18 avaliados enquadraram-se nos mesmos grupos em ambos os métodos. Os coeficientes de determinação foram similares, sendo que 11 genótipos apresentaram melhor ajuste ao modelo de Cruz et al., enquanto que este numero foi de sete para o modelo de Toler e Burrows. As análises indicaram que ambas metodologias produziram resultados similares. / It is common to study the phenotypic stability of sugarcane genotypes. There are several methods to study the genotype by environment interaction . The challenge for breeders is to nd varieties of sugarcane with superior performance dierent environments, i.e, that are highly productive and responsive to environmental improvement. The study of methodologies allows to verify whether certain technique biometrics is eective to express the behavior of genotypes in several environments and it also allows improving them so that the conclusions are more reliable. This study aimed to compare two regression methods used to evaluate the phenotypic stability of varieties of sugarcane: the linear method by Cruz, Torres e Vencovsky (1989), and non-linear, by Toler and Burrows (1998). We used the variable data tons of cane per hectare - TCH, which were provided by the Genetic Improvement Program of Sugarcane in UFSCar, including seven locations and 18 genotypes. When genotypes were grouped according to stability and yield, 17 of the 18 genotypes evaluated were classied in the same groups, in both methods. The coecients of determination were similar, 11 genotypes showing better adjustment to the model of Cruz et al., while this number was seven for the Toler and Burrows\' model. The analysis indicated that both methodologies produced similiar results. Cana-de-açúcar Interação genótipo-ambiente Melhoramento genético vegetal Modelos lineares Modelos não lineares. Genotype environment interaction Linear model No-linear model Plant breeding Sugarcane
27	Comparação de métodos no estudo da estabilidade fenotípica / Comparasion of methods for the study of phenotypic stability Pereira, Elton Rafael Mauricio da Silva 04 February 2010 (has links) É comum o estudo da estabilidade fenótipica em genoótipos de cana-de-açúcar. Varias são as metodologias para estudar a interação genótipos x ambientes (G x E). O desafio para os melhoristas é encontrar variedades de cana-de-açúcar com desempenho superior em diversos ambientes, ou seja, que sejam altamente produtivas e também responsivas com a melhoria do ambiente. O estudo das metodologias permite verificar se determinada técnica biométrica é eficiente para expressar o comportamento de genótipos em vários ambientes e tambem permite aprimorá-las para que as conclusões sejam mais confiáveis. Este trabalho teve como objetivo comparar dois métodos de regressão utilizados para avaliar a estabilidade fenótipica em variedades de cana-de-açúcar: o linear, de Cruz, Torres e Vencovsky (1989) e o não-linear, de Toler e Burrows (1998). Foram utilizados dados da variável tonelada de cana por hectare - TCH, fornecidos pelo Programa de Melhoramento Genetico da Cana-de- Acucar da UFSCar, compreendendo sete locais e 18 genótipos de cana-de-açúcar. Quando se realizou o enquadramento dos genotipos nos diferentes grupos, 17 genotipos dos 18 avaliados enquadraram-se nos mesmos grupos em ambos os métodos. Os coeficientes de determinação foram similares, sendo que 11 genótipos apresentaram melhor ajuste ao modelo de Cruz et al., enquanto que este numero foi de sete para o modelo de Toler e Burrows. As análises indicaram que ambas metodologias produziram resultados similares. / It is common to study the phenotypic stability of sugarcane genotypes. There are several methods to study the genotype by environment interaction . The challenge for breeders is to nd varieties of sugarcane with superior performance dierent environments, i.e, that are highly productive and responsive to environmental improvement. The study of methodologies allows to verify whether certain technique biometrics is eective to express the behavior of genotypes in several environments and it also allows improving them so that the conclusions are more reliable. This study aimed to compare two regression methods used to evaluate the phenotypic stability of varieties of sugarcane: the linear method by Cruz, Torres e Vencovsky (1989), and non-linear, by Toler and Burrows (1998). We used the variable data tons of cane per hectare - TCH, which were provided by the Genetic Improvement Program of Sugarcane in UFSCar, including seven locations and 18 genotypes. When genotypes were grouped according to stability and yield, 17 of the 18 genotypes evaluated were classied in the same groups, in both methods. The coecients of determination were similar, 11 genotypes showing better adjustment to the model of Cruz et al., while this number was seven for the Toler and Burrows\' model. The analysis indicated that both methodologies produced similiar results. Cana-de-açúcar Genotype environment interaction Interação genótipo-ambiente Linear model Melhoramento genético vegetal Modelos lineares Modelos não lineares. No-linear model Plant breeding Sugarcane
28	An Application of an In-Depth Advanced Statistical Analysis in Exploring the Dynamics of Depression, Sleep Deprivation, and Self-Esteem Gaffari, Muslihat 01 August 2024 (has links) (PDF) Depression, intertwined with sleep deprivation and self-esteem, presents a significant challenge to mental health worldwide. The research shown in this paper employs advanced statistical methodologies to unravel the complex interactions among these factors. Through log-linear homogeneous association, multinomial logistic regression, and generalized linear models, the study scrutinizes large datasets to uncover nuanced patterns and relationships. By elucidating how depression, sleep disturbances, and self-esteem intersect, the research aims to deepen understanding of mental health phenomena. The study clarifies the relationship between these variables and explores reasons for prioritizing depression research. It evaluates how statistical models, such as log-linear, multinomial logistic regression, and generalized linear models, shed light on their intricate dynamics. Findings offer insights into risk and protective factors associated with these variables, guiding tailored interventions for individuals in psychological distress. Additionally, policymakers can utilize these insights to develop comprehensive strategies promoting mental health and well-being at a societal level. Depression Sleep Deprivation Self-Esteem Mental Health Log-Linear Model Multinomial Logistic Regression Model Generalized Linear Model. Biostatistics Categorical Data Analysis Vital and Health Statistics
29	Spatial and temporal population dynamics of yellow perch (Perca flavescens) in Lake Erie Yu, Hao 19 August 2010 (has links) Yellow perch (Perca flavescens) in Lake Erie support valuable commercial and recreational fisheries critical to the local economy and society. The study of yellow perch's temporal and spatial population dynamics is important for both stock assessment and fisheries management. I explore the spatial and temporal variation of the yellow perch population by analyzing the fishery-independent surveys in Lake Erie. Model-based approaches were developed to estimate the relative abundance index, which reflected the temporal variation of the population. I also used design-based approaches to deal with the situation in which population density varied both spatially and temporally. I first used model-based approaches to explore the spatial and temporal variation of the yellow perch population and to develop the relative abundance index needed. Generalized linear models (GLM), spatial generalized linear models (s-GLM), and generalized additive models (GAM) were compared by examining the goodness-of-fit, reduction of spatial autocorrelation, and prediction errors from cross-validation. The relationship between yellow perch density distribution and spatial and environmental factors was also studied. I found that GAM showed the best goodness-of-fit shown as AIC and lowest prediction errors but s-GLM resulted in the best reduction of spatial autocorrelation. Both performed better than GLM for yellow perch relative abundance index estimation. I then applied design-based approaches to study the spatial and temporal population dynamics of yellow perch through both practical data analysis and simulation. The currently used approach in Lake Erie is stratified random sampling (StRS). Traditional sampling designs (simple random sampling (SRS) and StRS) and adaptive sampling designs (adaptive two-phase sampling (ATS), adaptive cluster sampling (ACS), and adaptive two-stage sequential sampling (ATSS)) for fishery-independent surveys were compared. From accuracy and precision aspect, ATS performed better than the SRS, StRS, ACS and ATSS for yellow perch fishery-independent survey data in Lake Erie. Model-based approaches were further studied by including geostatistical models. The performance of the GLM and GAM models and geostatistical models (spatial interpolation) were compared when they are used to analyze the temporal and spatial variation of the yellow perch population through a simulation study. This is the first time that these two types of model- based approaches have been compared in fisheries. I found that arithmetic mean (AM) method was only preferred when neither environment factors nor spatial information of sampling locations were available. If the survey can not cover the distribution area of the population due to biased design or lack of sampling locations, GLMs and GAMs are preferable to spatial interpolation (SI). Otherwise, SI is a good alternative model to estimate relative abundance index. SI has rarely been realized in fisheries. Different models may be recommended for different species/fisheries when we estimate their spatial-temporal dynamics, and also the most appropriate survey designs may be different for different species. However, the criteria and approaches for the comparison of both model-based and design-based approaches will be applied for different species or fisheries. / Ph. D. fishery-independent survey spatial generalized linear model generalized additive model generalized linear model catch rate Lake Erie Yellow perch spatial interpolation sampling design
30	Modeling and computations of multivariate datasets in space and time Demel, Samuel Seth January 1900 (has links) Doctor of Philosophy / Department of Statistics / Juan Du / Spatio-temporal and/or multivariate dependence naturally occur in datasets obtained in various disciplines; such as atmospheric sciences, meteorology, engineering and agriculture. There is a great deal of need to effectively model the complex dependence and correlated structure exhibited in these datasets. For this purpose, this dissertation studies methods and application of the spatio-temporal modeling and multivariate computation. First, a collection of spatio-temporal functions is proposed to model spatio-temporal processes which are continuous in space and discrete over time. Theoretically, we derived the necessary and sufficient conditions to ensure the model validity. On the other hand, the possibility of taking the advantage of well-established time series and spatial statistics tools makes it relatively easy to identify and fit the proposed model in practice. The spatio-temporal models with some ARMA discrete temporal margin are fitted to Kansas precipitation and Irish wind datasets for estimation or prediction, and compared with some general existing parametric models in terms of likelihood and mean squared prediction error. Second, to deal with the immense computational burden of statistical inference for multi- ple attributes recorded at a large number of locations, we develop Wendland-type compactly supported covariance matrix function models and propose multivariate covariance tapering technique with those functions for computation reduction. Simulation studies and US tem- perature data are used to illustrate applications of the proposed multivariate tapering and computational gain in spatial cokriging. Finally, to study the impact of weather change on corn yield in Kansas, we develop a spatial functional linear regression model accounting for the fact that weather data were recorded daily or hourly as opposed to the yearly crop yield data and the underlying spatial autocorrelation. The parameter function is estimated under the functional data analysis framework and its characteristics are investigated to show the influential factor and critical period of weather change dictating crop yield during the growing season. Spatio-temporal covariance modeling Multivariate tapering Spatial functional linear model Statistics (0463)

Search results