Spelling suggestions: "subject:" 1inear models"" "subject:" cinear models""
71 |
Modelos paramétricos para séries temporais de contagem / Parametric models for count time seriesMilhorança, Igor André 14 May 2014 (has links)
Diversas situações práticas exigem a análise de series temporais de contagem, que podem apresentar tendência, sazonalidade e efeitos de variáveis explicativas. A motivação do nosso trabalho é a análise de internações diárias por doenças respiratórias para pessoas com mais que 65 anos residentes no município de São Paulo. O efeito de variáveis climáticas e concentrações de poluentes foram incluídos nos modelos e foram usadas as funções seno e cosseno com periodicidade de um ano para explicar o padrão sazonal e obter os efeitos das variáveis climáticas e poluentes controlando essa sazonalidade. Outro aspecto a ser considerado é a inclusão da população nas análises de modo que a interpretação dos efeitos seja para as taxas diárias de internações. Diferentes modelos paramétricos foram propostos para as internações. O mais simples é o modelo de regressão linear para o logaritmo das taxas. Foram ajustados os modelos lineares generalizados (MLG) para as internações com função de ligação logaritmo e com a população como offset, por este modelo permitir o uso das distribuições Poisson e Binomial Negativa, usadas para dados de contagem. Devido à heteroscedasticidade extra, foram propostos modelos GAMLSS incluindo variáveis para explicar o desvio padrão. Foram ajustados modelos ARMA e GARMA, por incluírem uma estrutura de correlação serial. O objetivo desse trabalho é comparar as estimativas, os erros padrões, a cobertura dos intervalos de confiança e o erro quadrático médio para o valor predito segundo os vários modelos e a escolha do modelo mais apropriado, que depende da completa análise de resíduos, geralmente omitida na literatura. O modelo GARMA com distribuição Binomial Negativa apresentou melhor ajuste, pois os erros parecem seguir a distribuição proposta e tem baixa autocorrelação, além de ter tido uma boa cobertura pelo intervalo de confiança e um baixo erro quadrático médio. Também foi analisado o efeito da autocorrelação dos dados nas estimativas nos vários modelos baseado em dados simulados. / Many practical situations require the analysis of time series of counts, which may present trend, seasonality and effects of covariates. The motivation of this work is the analysis of daily hospital admissions for respiratory diseases in people over 65 living in the city of São Paulo. The effect of climatic variables and concentrations of pollutants were included in the models and the sine and cosine functions with annual period were included to explain the seasonal pattern and obtain the effects of pollutants and climatic variables partially controlled by this seasonality. Another aspect to be considered is the inclusion of the population in the analys es in order to interpret the effects based on daily hospitalization rates . Different parametric models have been proposed for hospitalizations. The simplest is the linear regression model for the logarithm of the hospitalization rate. The generalized linear models (GLM) were adjusted for daily admissions with logarithmic link function and the population as offset to consider the Poisson and Negative Binomial distributions for counting data. Due to the extra heteroscedasticity, GAMLSS models were proposed including variables to explain the standard error. Moreover, the ARMA and GARMA models were fitted to include the serial correlation structure. The aim of this work is to compare estimates, standard errors, coverage of confidence intervals and mean squared error of predicted value for the various models and choose the most appropriate model, which depends on a complete analysis of residuals, usually omitted in the literature. The GARMA model with Negative Binomial distribution was the best fit since the errors seem to follow the proposed distribution and they have small values of autocorrelation. Besides, this model had low mean squared error and a good coverage of confidence interval. The effect of autocorrelation of data in the estimates was also analyzed in the setting of several models based on simulated data.
|
72 |
Analysis of multivariate probit model in several populations. / CUHK electronic theses & dissertations collectionJanuary 2007 (has links)
Keywords: MCEM algorithm; Gibbs sampler; Multivariate probit model; Multi-group; BIC. / The main purpose of this paper is to develop maximum likelihood and Bayesian approach for the multivariate probit model in several populations. A Monte Carlo EM algorithm is proposed for obtaining the maximum likelihood estimates and the Gibbs sampler is used to produce the joint Bayesian estimates. To test hypotheses involving constraints among the structural parameters of MP model across groups, we use the method of Bayesian Information Criterion(BIC). The simulation study will be given to certify the accuracy of our algorithm. / Yu, Yin. / "March 2007." / Adviser: Sik Yum Lee. / Source: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6054. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 135-137). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307.
|
73 |
Analyzing Taguchi's experiments using GLIM with inverse Gaussian distribution.January 1994 (has links)
by Wong Kwok Keung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 50-52). / Chapter 1. --- Introduction --- p.1 / Chapter 2. --- Taguchi's methodology in design of experiments --- p.3 / Chapter 2.1 --- System design / Chapter 2.2 --- Parameter design / Chapter 2.3 --- Tolerance design / Chapter 3. --- Inverse Gaussian distribution --- p.8 / Chapter 3.1 --- Genesis / Chapter 3.2 --- Probability density function / Chapter 3.3 --- Estimation of parameters / Chapter 3.4 --- Applications / Chapter 4. --- Iterative procedures and Derivation of the GLIM 4 macros --- p.21 / Chapter 4.1 --- Generalized linear models with varying dispersion / Chapter 4.2 --- Mean and dispersion models for inverse Gaussian distribution / Chapter 4.3 --- Devising the GLIM 4 macro / Chapter 4.4 --- Model fitting / Chapter 5. --- Simulation Study --- p.34 / Chapter 5.1 --- Generating random variates from the inverse Gaussian distribution / Chapter 5.2 --- Simulation model / Chapter 5.3 --- Results / Chapter 5.4 --- Discussion / Appendix --- p.46 / References --- p.50
|
74 |
Effects of message polarity, communication orientation and hierarchy on organizational media choice. / Organizational media choiceJanuary 2001 (has links)
Au Kin-Chung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 49-54). / Abstracts in English and Chinese. / ACKNOWLEDGEMENTS --- p.2 / TABLE OF CONTENTS --- p.3 / ABSTRACT --- p.4 / INTRODUCTION --- p.6 / Media Choice Theories --- p.7 / Performance Feedback and Media Choice --- p.11 / Research Approach --- p.18 / METHOD --- p.20 / Participants --- p.20 / Design --- p.21 / Manipulations --- p.22 / Dependent Measures --- p.24 / Survey Measures --- p.25 / Procedure --- p.27 / Data Analysis --- p.27 / RESULTS --- p.29 / HLM Analysis --- p.32 / DISCUSSION --- p.39 / Media Preference And Bias --- p.43 / Using HLM in Survey Yielding Two-Level Data Set --- p.45 / LIMITATIONS AND FUTURE DIRECTIONS --- p.46 / CONCLUSION --- p.47 / REFERENCES --- p.49 / FOOTNOTES AND APPENDIX --- p.55 / TABLES AND FIGURES --- p.60
|
75 |
Numerical methods for the recursive estimation of large-scale linear econometric modelsHadjiantoni, Stella January 2015 (has links)
Recursive estimation is an essential procedure in econometrics which appears in many applications when the underlying dataset or model is modi ed. Data arrive consecutively and thus already estimated models will have to be updated with new available information. Moreover, in many cases, data will have to be deleted from a model in order to remove their effect, either because they are old (obsolete) or because they have been detected to be outliers or extreme values and further investigation is required. The aim of this thesis is to develop numerically stable and computationally efficient methods for the recursive estimation of large-scale linear econometric models. Estimation of multivariate linear models is a computationally costly procedure even for moderate-sized models. In particular, when the model needs to be estimated recursively, its estimation will be even more computationally demanding. Moreover, conventional methods yield often, misleading results. The aim is to derive new methods which effectively utilise previous computations, in order to reduce the high computational cost, and which provide accurate results as well. Novel numerical methods for the recursive estimation of the general linear, the seemingly unrelated regressions, the simultaneous equations, the univariate and multivariate timevarying parameters models are developed. The proposed methods are based on numerically stable strategies which provide accurate and precise results. Moreover, the new methods estimate the unknown parameters of the modi ed model even when the variance covariance matrix is singular.
|
76 |
Advances in Model Selection Techniques with Applications to Statistical Network Analysis and Recommender SystemsFranco Saldana, Diego January 2016 (has links)
This dissertation focuses on developing novel model selection techniques, the process by which a statistician selects one of a number of competing models of varying dimensions, under an array of different statistical assumptions on observed data. Traditionally, two main reasons have been advocated by researchers for performing model selection strategies over classical maximum likelihood estimates (MLEs). The first reason is prediction accuracy, where by shrinking or setting to zero some model parameters, one sacrifices the unbiasedness of MLEs for a reduced variance, which in turn leads to an overall improvement in predictive performance. The second reason relates to interpretability of the selected models in the presence of a large number of predictors, where in order to obtain a parsimonious representation exhibiting the relationship between the response and covariates, we are willing to sacrifice some of the smaller details brought in by spurious predictors.
In the first part of this work, we revisit the family of variable selection techniques known as sure independence screening procedures for generalized linear models and the Cox proportional hazards model. After clever combination of some of its most powerful variants, we propose new extensions based on the idea of sample splitting, data-driven thresholding, and combinations thereof. A publicly available package developed in the R statistical software demonstrates considerable improvements in terms of model selection and competitive computational time between our enhanced variable selection procedures and traditional penalized likelihood methods applied directly to the full set of covariates.
Next, we develop model selection techniques within the framework of statistical network analysis for two frequent problems arising in the context of stochastic blockmodels: community number selection and change-point detection. In the second part of this work, we propose a composite likelihood based approach for selecting the number of communities in stochastic blockmodels and its variants, with robustness consideration against possible misspecifications in the underlying conditional independence assumptions of the stochastic blockmodel. Several simulation studies, as well as two real data examples, demonstrate the superiority of our composite likelihood approach when compared to the traditional Bayesian Information Criterion or variational Bayes solutions. In the third part of this thesis, we extend our analysis on static network data to the case of dynamic stochastic blockmodels, where our model selection task is the segmentation of a time-varying network into temporal and spatial components by means of a change-point detection hypothesis testing problem. We propose a corresponding test statistic based on the idea of data aggregation across the different temporal layers through kernel-weighted adjacency matrices computed before and after each candidate change-point, and illustrate our approach on synthetic data and the Enron email corpus.
The matrix completion problem consists in the recovery of a low-rank data matrix based on a small sampling of its entries. In the final part of this dissertation, we extend prior work on nuclear norm regularization methods for matrix completion by incorporating a continuum of penalty functions between the convex nuclear norm and nonconvex rank functions. We propose an algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm-starts, and present a systematic study of the resulting spectral thresholding operators. We demonstrate that our proposed nonconvex regularization framework leads to improved model selection properties in terms of finding low-rank solutions with better predictive performance on a wide range of synthetic data and the famous Netflix data recommender system.
|
77 |
Weighted quantile regression and oracle model selection. / CUHK electronic theses & dissertations collectionJanuary 2009 (has links)
In this dissertation I suggest a new (regularized) weighted quantile regression estimation approach for nonlinear regression models and double threshold ARCH (DTARCH) models. I allow the number of parameters in the nonlinear regression models to be fixed or diverge. The proposed estimation method is robust and efficient and is applicable to other models. I use the adaptive-LASSO and SCAD regularization to select parameters in the nonlinear regression models. I simultaneously estimate the AR and ARCH parameters in the DTARCH model using the proposed weighted quantile regression. The values of the proposed methodology are revealed. / Keywords: Weighted quantile regression, Adaptive-LASSO, High dimensionality, Model selection, Oracle property, SCAD, DTARCH models. / Under regularity conditions, I establish asymptotic distributions of the proposed estimators, which show that the model selection methods perform as well as if the correct submodels are known in advance. I also suggest an algorithm for fast implementation of the proposed methodology. Simulations are conducted to compare different estimators, and a real example is used to illustrate their performance. / Jiang, Xuejun. / Adviser: Xinyuan Song. / Source: Dissertation Abstracts International, Volume: 73-01, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 86-92). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
|
78 |
A comparison of tests of heterogeneity in meta-analysis.January 2001 (has links)
Lee Shun-yi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 57-61). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Introduction --- p.1 / Chapter 1.2 --- Tests of Hypotheses --- p.4 / Chapter 1.2.1 --- Likelihood Ratio Statistic --- p.4 / Chapter 1.2.2 --- The Rao´ة s Score Statistic --- p.5 / Chapter 1.2.3 --- Wald's Statistic --- p.6 / Chapter 1.3 --- Notation --- p.6 / Chapter 2 --- Fixed Effects Model --- p.8 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.2 --- Pearson Chi-square Statistic --- p.9 / Chapter 2.3 --- Logistic Regression Model --- p.11 / Chapter 2.3.1 --- Testing Linear Hypotheses about the Regression Coefficients --- p.12 / Chapter 2.4 --- Combining Proportions --- p.16 / Chapter 2.4.1 --- Classical Estimators --- p.17 / Chapter 2.4.2 --- Jackknife Estimator --- p.18 / Chapter 2.4.3 --- Cross-validatory estimators --- p.19 / Chapter 3 --- Random Effects Model --- p.21 / Chapter 3.1 --- Introduction --- p.21 / Chapter 3.2 --- DerSimonian and Laird Method --- p.22 / Chapter 3.3 --- Generalized linear model with random effect --- p.24 / Chapter 3.3.1 --- Quasi-Likelihood --- p.25 / Chapter 3.3.2 --- Testing Linear Hypotheses about the Regression Coefficients --- p.26 / Chapter 3.3.3 --- MINQUE --- p.27 / Chapter 3.3.4 --- Score Test --- p.31 / Chapter 4 --- Overdispersion and Intraclass Correlation --- p.36 / Chapter 4.1 --- Introduction --- p.36 / Chapter 4.2 --- C(α) Test --- p.39 / Chapter 4.2.1 --- Correlated Binomial model and Beta-Binomial model --- p.40 / Chapter 4.2.2 --- C(α) Statistic Based On Quasi-likclihood --- p.46 / Chapter 4.3 --- Donner Statistic --- p.48 / Chapter 4.4 --- Rao and Scott Statistic --- p.51 / Chapter 5 --- Example and Discussion --- p.53 / Bibliography --- p.57
|
79 |
Modelos paramétricos para séries temporais de contagem / Parametric models for count time seriesIgor André Milhorança 14 May 2014 (has links)
Diversas situações práticas exigem a análise de series temporais de contagem, que podem apresentar tendência, sazonalidade e efeitos de variáveis explicativas. A motivação do nosso trabalho é a análise de internações diárias por doenças respiratórias para pessoas com mais que 65 anos residentes no município de São Paulo. O efeito de variáveis climáticas e concentrações de poluentes foram incluídos nos modelos e foram usadas as funções seno e cosseno com periodicidade de um ano para explicar o padrão sazonal e obter os efeitos das variáveis climáticas e poluentes controlando essa sazonalidade. Outro aspecto a ser considerado é a inclusão da população nas análises de modo que a interpretação dos efeitos seja para as taxas diárias de internações. Diferentes modelos paramétricos foram propostos para as internações. O mais simples é o modelo de regressão linear para o logaritmo das taxas. Foram ajustados os modelos lineares generalizados (MLG) para as internações com função de ligação logaritmo e com a população como offset, por este modelo permitir o uso das distribuições Poisson e Binomial Negativa, usadas para dados de contagem. Devido à heteroscedasticidade extra, foram propostos modelos GAMLSS incluindo variáveis para explicar o desvio padrão. Foram ajustados modelos ARMA e GARMA, por incluírem uma estrutura de correlação serial. O objetivo desse trabalho é comparar as estimativas, os erros padrões, a cobertura dos intervalos de confiança e o erro quadrático médio para o valor predito segundo os vários modelos e a escolha do modelo mais apropriado, que depende da completa análise de resíduos, geralmente omitida na literatura. O modelo GARMA com distribuição Binomial Negativa apresentou melhor ajuste, pois os erros parecem seguir a distribuição proposta e tem baixa autocorrelação, além de ter tido uma boa cobertura pelo intervalo de confiança e um baixo erro quadrático médio. Também foi analisado o efeito da autocorrelação dos dados nas estimativas nos vários modelos baseado em dados simulados. / Many practical situations require the analysis of time series of counts, which may present trend, seasonality and effects of covariates. The motivation of this work is the analysis of daily hospital admissions for respiratory diseases in people over 65 living in the city of São Paulo. The effect of climatic variables and concentrations of pollutants were included in the models and the sine and cosine functions with annual period were included to explain the seasonal pattern and obtain the effects of pollutants and climatic variables partially controlled by this seasonality. Another aspect to be considered is the inclusion of the population in the analys es in order to interpret the effects based on daily hospitalization rates . Different parametric models have been proposed for hospitalizations. The simplest is the linear regression model for the logarithm of the hospitalization rate. The generalized linear models (GLM) were adjusted for daily admissions with logarithmic link function and the population as offset to consider the Poisson and Negative Binomial distributions for counting data. Due to the extra heteroscedasticity, GAMLSS models were proposed including variables to explain the standard error. Moreover, the ARMA and GARMA models were fitted to include the serial correlation structure. The aim of this work is to compare estimates, standard errors, coverage of confidence intervals and mean squared error of predicted value for the various models and choose the most appropriate model, which depends on a complete analysis of residuals, usually omitted in the literature. The GARMA model with Negative Binomial distribution was the best fit since the errors seem to follow the proposed distribution and they have small values of autocorrelation. Besides, this model had low mean squared error and a good coverage of confidence interval. The effect of autocorrelation of data in the estimates was also analyzed in the setting of several models based on simulated data.
|
80 |
PadrÃes epidemiolÃgicos e distribuiÃÃo espacial da hansenÃase no municÃpio de Fortaleza, 2001 a 2012 / EPIDEMIOLOGICAL PATTERNS AND SPACE LEPROSY DISTRIBUTION IN THE MUNICIPALITY OF FORTALEZA, 2001 TO 2012Aline Lima Brito 26 February 2015 (has links)
O municÃpio de Fortaleza, capital do estado do CearÃ, apresenta-se como municÃpio prioritÃrio para o combate à hansenÃase no Brasil. Este estudo objetivou caracterizar os padrÃes epidemiolÃgicos e clÃnico-operacionais da hansenÃase, bem como a tendÃncia temporal e distribuiÃÃo espacial em cortes temporais dos seus principais indicadores, no municÃpio de Fortaleza, de 2001 a 2012. O municÃpio de Fortaleza à subdivido em 114 bairros (IBGE, 2000) e seis Secretarias Executivas Regionais (SER). A anÃlise se deu atravÃs da caracterizaÃÃo de indicadores epidemiolÃgicos e operacionais da hansenÃase, alÃm de sua tendÃncia, atravÃs do mÃtodo de pontos de inflexÃo, e estimativa de prevalÃncia oculta. Foram utilizadas trÃs tÃcnicas de anÃlises espaciais (Abordagem Descritiva, Bayesiana Local e EstatÃstica Scan Espacial) dos indicadores: detecÃÃo geral, detecÃÃo em menores de 15 anos e detecÃÃo em casos com grau 2 de incapacidades fÃsicas (incapacidades visÃveis), visando encontrar agregados de bairros de alto risco para a presenÃa, transmissÃo e diagnÃstico tardio da endemia. No perÃodo de estudo, foram registrados 9.658 casos novos da doenÃa, sendo 677 (7,0%) em menores de 15 anos. Foi estimada a ocorrÃncia de 197,7 casos ocultos de hansenÃase por 100 mil habitantes no municÃpio nos Ãltimos cinco anos (mÃdia de 39,5 casos por 100 mil ao ano). O coeficiente de detecÃÃo apresentou reduÃÃo no perÃodo, variando de 40,07 (2001) a 23,39 (2012) casos por 100 mil habitantes (Average Annual Percent Change - AAPC: -4,0; IC95%: -5,6 a -2,3). Apesar de diminuiÃÃes nos valores dos indicadores do outros dois coeficientes estudados, os mesmos permaneceram estÃveis. O coeficiente de detecÃÃo em menores de 15 anos de idade reduziu de 8,56/100 mil hab. em 2001 a 5,49/100 mil hab. em 2012, (AAPC: -1,4; IC95%: -5,4 a 2,8), e o coeficiente de grau 2, com 2,28/100 mil hab. em 2001 a 1,95/100 mil hab. em 2012, (AAPC: -0,8; IC95%: -4,5 a 3,1). Foram identificados na anÃlise espaÃo-temporal agregados espaciais com risco elevado para transmissÃo da doenÃa, principalmente, em bairros localizados nas SER 3 e 5 que estÃo a oeste da cidade, com o principal agregado envolvendo 22 bairros. AlÃm disso, verificou-se a existÃncia de transmissÃo ativa pelos altos valores para o coeficiente de detecÃÃo em menores de 15 anos, principalmente nas SER 3 e 5. A anÃlise espaÃo-temporal identificou, para este indicador, como principal cluster, trÃs bairros, todos localizados na SER 5. Foi constatado, tambÃm, diagnÃstico tardio nessas mesmas SERâs (3 e 5), assim como a existÃncia de indÃcios em SERâs que nÃo haviam apresentado risco significativo para detecÃÃo, como alguns bairros das SERâs 4 e 6, que estÃo mais a leste do municÃpio. Identificou-se que as SERâs que mais se destacaram como risco para ocorrÃncia da hansenÃase sÃo constituÃdas de grandes desigualdades sociais, alÃm de altos nÃveis de pobreza e aglomerados populacionais. Essas caracterÃsticas reafirmam a Ãntima relaÃÃo que a hansenÃase tem com a pobreza, assim como sua desigual distribuiÃÃo no municÃpio de Fortaleza.
|
Page generated in 0.0733 seconds