Global ETD Search

31	Detection of Latent Heteroscedasticity and Group-Based Regression Effects in Linear Models via Bayesian Model Selection Metzger, Thomas Anthony 22 August 2019 (has links) Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor'' to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture g-priors to overcome lack of prior information. We provide an R package, slgf, that implements our methodology in practice, and demonstrate its usage in practice. / Doctor of Philosophy / Statistical models are a powerful tool for describing a broad range of phenomena in our world. However, many common statistical models may make assumptions that are overly simplistic and fail to account for key trends and patterns in data. Specifically, we search for hidden structures formed by partitioning a dataset into two groups. These two groups may have distinct variability, statistical effects, or other hidden effects that are missed by conventional approaches. We illustrate the ability of our method to detect these patterns through a variety of disciplines and data layouts, and provide software for researchers to implement this approach in practice. model selection heteroscedasticity linear models Bayesian
32	Lp norm estimation procedures and an L1 norm algorithm for unconstrained and constrained estimation for linear models Kim, Buyong January 1986 (has links) When the distribution of the errors in a linear regression model departs from normality, the method of least squares seems to yield relatively poor estimates of the coefficients. One alternative approach to least squares which has received a great deal of attention of late is minimum L<sub>p</sub> norm estimation. However, the statistical efüciency of a L<sub>p</sub> estimator depends greatly on the underlying distribution of errors and on the value of p. Thus, the choice of an appropriate value of p is crucial to the effectiveness of <sub>p</sub> estimation. Previous work has shown that L₁ estimation is a robust procedure in the sense that it leads to an estimator which has greater statistical efficiency than the least squares estimator in the presence of outliers, and that L₁ estimators have some- desirable statistical properties asymptotically. This dissertation is mainly concerned with the development of a new algorithm for L₁ estimation and constrained L₁ estimation. The mainstream of computational procedures for L₁ estimation has been the simplex-type algorithms via the linear programming formulation. Other procedures are the reweighted least squares method, and. nonlinear programming technique using the penalty function approach or descent method. A new computational algorithm is proposed which combines the reweighted least squares method and the linear programming approach. We employ a modified Karmarkar algorithm to solve the linear programming problem instead of the simplex method. We prove that the proposed algorithm converges in a finite number of iterations. From our simulation study we demonstrate that our algorithm requires fewer iterations to solve standard problems than are required by the simplex-type methods although the amount of computation per iteration is greater for the proposed algorithm. The proposed algorithm for unconstrained L₁ estimation is extended to the case where the L₁ estimates of the parameters of a linear model satisfy certain linear equality and/or inequality constraints. These two procedures are computationally simple to implement since a weighted least squares scheme is adopted at each iteration. Our results indicate that the proposed L₁ estimation procedure yields very accurate and stable estimates and is efficient even when the problem size is large. / Ph. D. LD5655.V856 1986.K57 Linear models (Statistics)
33	penalized: A MATLAB toolbox for fitting generalized linear models with penalties McIlhagga, William H. 07 August 2015 (has links) Yes / penalized is a exible, extensible, and e cient MATLAB toolbox for penalized maximum likelihood. penalized allows you to t a generalized linear model (gaussian, logistic, poisson, or multinomial) using any of ten provided penalties, or none. The toolbox can be extended by creating new maximum likelihood models or new penalties. The toolbox also includes routines for cross-validation and plotting. Generalized linear models Penalized regression LASSO MATLAB
34	On the Efficiency of Designs for Linear Models in Non-regular Regions and the Use of Standard Desings for Generalized Linear Models Zahran, Alyaa R. 16 July 2002 (has links) The Design of an experiment involves selection of levels of one or more factor in order to optimize one or more criteria such as prediction variance or parameter variance criteria. Good experimental designs will have several desirable properties. Typically, one can not achieve all the ideal properties in a single design. Therefore, there are frequently several good designs and choosing among them involves tradeoffs. This dissertation contains three different components centered around the area of optimal design: developing a new graphical evaluation technique, discussing designs for non-regular regions for first order models with interaction for the two- and three-factor case, and using the standard designs in the case of generalized linear models (GLM). The Fraction of Design Space (FDS) technique is proposed as a new graphical evaluation technique that addresses good prediction. The new technique is comprised of two tools that give the researcher more detailed information by quantifying the fraction of design space where the scaled predicted variance is less than or equal to any pre-specified value. The FDS technique complements Variance Dispersion Graphs (VDGs) to give the researcher more insight about the design prediction capability. Several standard designs are studied with both methods: VDG and FDS. Many Standard designs are constructed for a factor space that is either a p-dimensional hypercube or hypersphere and any point inside or on the boundary of the shape is a candidate design point. However, some economic, or practical constraints may occur that restrict factor settings and result in an irregular experimental region. For the two- and three-factor case with one corner of the cuboidal design space excluded, three sensible alternative designs are proposed and compared. Properties of these designs and relative tradeoffs are discussed. Optimum experimental designs for GLM depend on the values of the unknown parameters. Several solutions to the dependency of the parameters of the optimality function were suggested in the literature. However, they are often unrealistic in practice. The behavior of the factorial designs, the well-known standard designs of the linear case, is studied for the GLM case. Conditions under which these designs have high G-efficiency are formulated. / Ph. D. non-regular design spaces design optimality fraction of design space technique generalized linear models linear models
35	Data analysis for quantitative determinations of polar lipid molecular species Song, Tingting January 1900 (has links) Master of Science / Department of Statistics / Gary L. Gadbury / This report presents an analysis of data resulting from a lipidomics experiment. The experiment sought to determine the changes in the lipidome of big bluestem prairie grass when exposed to stressors. The two stressors were drought (versus a watered condition) and a rust infection (versus no infection), and were whole plot treatments arranged in a 2 by 2 factorial. A split plot treatment factor was the position on a sampled leaf (top half versus bottom half). In addition, samples were analyzed at different times, representing a blocking factor. A total of 110 samples were used and, for each sample, concentrations of 137 lipids were obtained. Many lipids were not detected for certain samples and, in some cases, a lipid was not detected in most samples. Thus, each lipid was analyzed separately using a modeling strategy that involved a combination of mixed effects linear models and a categorical analysis technique, with the latter used for certain lipids to determine if a pattern of observed zeros was associated with the treatment condition(s). In addition, p-values from tests of fixed effects in a mixed effect model were computed three different ways and compared. Results in general show that the drought condition has the greatest effect on the concentrations of certain lipids, followed by the effect of position on the leaf. Of least effect on lipid concentrations was the rust condition. Lipidomics experiment Mixed effect linear models Categorical analysis Statistics (0463)
36	An approach to estimating the variance components to unbalanced cluster sampled survey data and simulated data Ramroop, Shaun 30 November 2002 (has links) Statistics / M. Sc. (Statistics) 519.53 Multilevel models (Statistics) Linear models (Statistics) Cluster set theory
37	Novel algorithms in wireless CDMA systems for estimation and kernel based equalization Vlachos, Dimitrios January 2012 (has links) A powerful technique is presented for joint blind channel estimation and carrier offset method for code- division multiple access (CDMA) communication systems. The new technique combines singular value decomposition (SVD) analysis with carrier offset parameter. Current blind methods sustain a high computational complexity as they require the computation of a large SVD twice, and they are sensitive to accurate knowledge of the noise subspace rank. The proposed method overcomes both problems by computing the SVD only once. Extensive simulations using MatLab demonstrate the robustness of the proposed scheme and its performance is comparable to other existing SVD techniques with significant lower computational as much as 70% cost because it does not require knowledge of the rank of the noise sub-space. Also a kernel based equalization for CDMA communication systems is proposed, designed and simulated using MatLab. The proposed method in CDMA systems overcomes all other methods. 621.3845
38	LOG-LINEAR MODELS FOR EVALUATING HUNTING DEMAND. O'Neil, Patricia Marie. January 1983 (has links) No description available. Log-linear models.
39	Models for target detection times. Bae, Deok Hwan January 1989 (has links) Approved for public release; distribution in unlimited. / Some battlefield models have a component in them which models the time it takes for an observer to detect a target. Different observers may have different mean detection times due to various factors such as the type of sensor used, environmental conditions, fatigue of the observer, etc. Two parametric models for the distribution of time to target detection are considered which can incorporate these factors. Maximum likelihood estimation procedures for the parameters are described. Results of simulation experiments to study the small sample behavior of the estimators are presented. / http://archive.org/details/modelsfortargetd00baed / Major, Korean Air Force Weibull Regression hierarchical models maximum likelihood estimates generalized linear models
40	Modelos paramétricos para séries temporais de contagem / Parametric models for count time series Milhorança, Igor André 14 May 2014 (has links) Diversas situações práticas exigem a análise de series temporais de contagem, que podem apresentar tendência, sazonalidade e efeitos de variáveis explicativas. A motivação do nosso trabalho é a análise de internações diárias por doenças respiratórias para pessoas com mais que 65 anos residentes no município de São Paulo. O efeito de variáveis climáticas e concentrações de poluentes foram incluídos nos modelos e foram usadas as funções seno e cosseno com periodicidade de um ano para explicar o padrão sazonal e obter os efeitos das variáveis climáticas e poluentes controlando essa sazonalidade. Outro aspecto a ser considerado é a inclusão da população nas análises de modo que a interpretação dos efeitos seja para as taxas diárias de internações. Diferentes modelos paramétricos foram propostos para as internações. O mais simples é o modelo de regressão linear para o logaritmo das taxas. Foram ajustados os modelos lineares generalizados (MLG) para as internações com função de ligação logaritmo e com a população como offset, por este modelo permitir o uso das distribuições Poisson e Binomial Negativa, usadas para dados de contagem. Devido à heteroscedasticidade extra, foram propostos modelos GAMLSS incluindo variáveis para explicar o desvio padrão. Foram ajustados modelos ARMA e GARMA, por incluírem uma estrutura de correlação serial. O objetivo desse trabalho é comparar as estimativas, os erros padrões, a cobertura dos intervalos de confiança e o erro quadrático médio para o valor predito segundo os vários modelos e a escolha do modelo mais apropriado, que depende da completa análise de resíduos, geralmente omitida na literatura. O modelo GARMA com distribuição Binomial Negativa apresentou melhor ajuste, pois os erros parecem seguir a distribuição proposta e tem baixa autocorrelação, além de ter tido uma boa cobertura pelo intervalo de confiança e um baixo erro quadrático médio. Também foi analisado o efeito da autocorrelação dos dados nas estimativas nos vários modelos baseado em dados simulados. / Many practical situations require the analysis of time series of counts, which may present trend, seasonality and effects of covariates. The motivation of this work is the analysis of daily hospital admissions for respiratory diseases in people over 65 living in the city of São Paulo. The effect of climatic variables and concentrations of pollutants were included in the models and the sine and cosine functions with annual period were included to explain the seasonal pattern and obtain the effects of pollutants and climatic variables partially controlled by this seasonality. Another aspect to be considered is the inclusion of the population in the analys es in order to interpret the effects based on daily hospitalization rates . Different parametric models have been proposed for hospitalizations. The simplest is the linear regression model for the logarithm of the hospitalization rate. The generalized linear models (GLM) were adjusted for daily admissions with logarithmic link function and the population as offset to consider the Poisson and Negative Binomial distributions for counting data. Due to the extra heteroscedasticity, GAMLSS models were proposed including variables to explain the standard error. Moreover, the ARMA and GARMA models were fitted to include the serial correlation structure. The aim of this work is to compare estimates, standard errors, coverage of confidence intervals and mean squared error of predicted value for the various models and choose the most appropriate model, which depends on a complete analysis of residuals, usually omitted in the literature. The GARMA model with Negative Binomial distribution was the best fit since the errors seem to follow the proposed distribution and they have small values of autocorrelation. Besides, this model had low mean squared error and a good coverage of confidence interval. The effect of autocorrelation of data in the estimates was also analyzed in the setting of several models based on simulated data. GAMLSS GAMLSS GARMA GARMA generalized linear models modelos lineares generalizados

Search results