Global ETD Search

41	Comparing Performance of ANOVA to Poisson and Negative Binomial Regression When Applied to Count Data Soumare, Ibrahim January 2020 (has links) Analysis of Variance (ANOVA) is the easiest and most widely used model nowadays in statistics. ANOVA however requires a set of assumptions for the model to be a valid choice and for the inferences to be accurate. Among many, ANOVA assumes the data in question is normally distributed and homogenous. However, data from most disciplines does not meet the assumption of normality and/or equal variance. Regrettably, researchers do not always check whether the assumptions are met, and if these assumptions are violated, inferences might well be wrong. We conducted a simulation study to compare the performance of standard ANOVA to Poisson and Negative Binomial models when applied to counts data. We considered different combination of sample sizes and underlying distributions. In this simulation study, we first assed Type I error for each model involved. We then compared power as well as the quality of the estimated parameters across the models. anova count data negative binomial poisson
42	Bayesian Regression Trees for Count Data: Models and Methods Geels, Vincent M. 27 September 2022 (has links) No description available. Statistics Discrete state spaces Markov chain Monte Carlo regression trees count data data augmentation Bayesian statistics response variable transformation
43	Essays on Environmentally Friendly Practices Chang, Ching-Hsing 28 July 2011 (has links) No description available. Agricultural Economics Environmental Economics Organic milk Environmental policy Pollution reduction Almost Ideal Demand System Count data Pollution prevention Environmental innovation
44	Model trees with topic model preprocessing: an approach for data journalism illustrated with the WikiLeaks Afghanistan war logs Rusch, Thomas, Hofmarcher, Paul, Hatzinger, Reinhold, Hornik, Kurt 06 1900 (has links) (PDF) The WikiLeaks Afghanistan war logs contain nearly 77,000 reports of incidents in the US-led Afghanistan war, covering the period from January 2004 to December 2009. The recent growth of data on complex social systems and the potential to derive stories from them has shifted the focus of journalistic and scientific attention increasingly toward data-driven journalism and computational social science. In this paper we advocate the usage of modern statistical methods for problems of data journalism and beyond, which may help journalistic and scientific work and lead to additional insight. Using the WikiLeaks Afghanistan war logs for illustration, we present an approach that builds intelligible statistical models for interpretable segments in the data, in this case to explore the fatality rates associated with different circumstances in the Afghanistan war. Our approach combines preprocessing by Latent Dirichlet Allocation (LDA) with model trees. LDA is used to process the natural language information contained in each report summary by estimating latent topics and assigning each report to one of them. Together with other variables these topic assignments serve as splitting variables for finding segments in the data to which local statistical models for the reported number of fatalities are fitted. Segmentation and fitting is carried out with recursive partitioning of negative binomial distributions. We identify segments with different fatality rates that correspond to a small number of topics and other variables as well as their interactions. Furthermore, we carve out the similarities between segments and connect them to stories that have been covered in the media. This gives an unprecedented description of the war in Afghanistan and serves as an example of how data journalism, computational social science and other areas with interest in database data can benefit from modern statistical techniques. (authors' abstract)
45	Vliv institucionálního kapitálu vědeckých institucí na jejich patentovou aktivitu - příklad České republiky / The Impact of Institutional Capital of Research Institutions on Patent Activity- example of the Czech Republic Linka, Milan January 2010 (has links) The thesis examines impact of the institutional capital, which captures the ability to produce quality patents, on the number of applied patent applications after the adoption of new governmental system of research and development evaluation. For determination of the institutional capital I use similar approach as Turnovec (2005) in case of publications. The estimation of the institutional capital is based on the evaluation of patents received by publicly financed research institutions. The evaluation of patents is based on the analysis of patent family and patent citation data. The data are analyzed using methods of multi-criteria decision making. Further it is estimated that the institutional capital does not influence the number of applied patent applications.
46	Modelos série de potência zero-modificado para séries temporais com dados de contagem / Zero-modified power series models for time series with counting data Shirozono, Aimée 10 May 2019 (has links) O objetivo deste trabalho é propor os modelos Zero Modificados com distribuição na família Série de Potência (ZMPS) para séries temporais com dados de contagem. O modelo ZMPS possui um amplo portfólio de distribuições para dados de contagem em que, com uma função de ligação apropriada, podemos escrever os modelos de regressão usando as distribuições ZMPS de forma semelhante ao que é feito com os modelos lineares generalizados. Em seguida, utilizamos a ideia dos modelos Generalizados Autorregressivos e de Médias Móveis (GARMA) para finalmente propor os modelos Série de Potência Zero-Modificado para Séries Temporais com dados de contagem. / The goal of this work is to propose the Zero-Modified models with Power Series distribution (ZMPS) for time series with counting data. The ZMPS model have a huge portfolio of count data distributions wherein, with an appropriate link function, we can write the regression models using the ZMPS distributions similar to what is done with generalized linear models. Then, we can use the idea of the Generalized Autoregressive and Moving Average (GARMA) models to propose the Zero-Modified Power Series models for Time Series with counting data. Count data Dados de contagem Distribuição série de potência Generalized ARMA model Modelos generalizados ARMA Power series distribution Zero deflation Zero inflation Zero-deflação Zero-Inflação
47	Modelos semiparamétricos com resposta binomial negativa / Semiparametric models with negative binomial response Oki, Fabio Hideto 14 May 2015 (has links) O objetivo principal deste trabalho é discutir estimação e diagnóstico em modelos semiparamétricos com resposta binomial negativa, mais especificamente, modelos de regressão com resposta binomial negativa em que uma das variáveis explicativas contínuas é modelada de forma não paramétrica. Iniciamos o trabalho com um exemplo ilustrativo e fazemos uma breve revisão dos modelos paramétricos com resposta binomial negativa. Em seguida, introduzimos os modelos semiparamétricos com resposta binomial negativa e discutimos alguns aspectos de estimação, inferência e seleção de modelos. Dedicamos um capítulo a procedimentos de diagnóstico, tais como desenvolvimento de medidas de alavanca e de influência sob os aspectos de deleção de pontos e influência local, além de abordar a análise de resíduos. Reanalizamos o exemplo ilustrativo sob o enfoque semiparamétrico e apresentamos algumas conclusões. / The aim of this work is to discuss some aspects on estimation and diagnostics in negative binomial regression models which an explanatory continuous variable is modeled nonparametrically. First, an illustrative example is presented and analyzed under parametric negative binomial regression models. The proposed models are then introduced and some aspects on estimations, inference and model selection are presented. Particular emphasis is given on the development of diagnostic procedures, such as leverage measures, Cook distances, local influence approach and residuals. The motivated example is reanalyzed under the semiparametric viewpoint and some conclusions are given. Cook distance Count data Cubic splines Dados de contagem Distância de Cook Distribuição binomial negativa Influência local Local influence Métodos não paramétricos Negative binomial distribution Nonparametric methods Splines cúbicos
48	Contributions to the analysis of dispersed count data / Contribuições à análise de dados de contagem Ribeiro Junior, Eduardo Elias 18 February 2019 (has links) In many agricultural and biological contexts, the response variable is a nonnegative integer value which we wish to explain or analyze in terms of a set of covariates. Unlike the Gaussian linear model, the response variable is discrete with a distribution that places probability mass at natural numbers only. The Poisson regression is the standard model for count data. However, assumptions of this model forces the equality between mean and variance, which may be implausible in many applications. Motivated by experimental data sets, this work intended to develop more realistic methods for the analysis of count data. We proposed a novel parametrization of the COM-Poisson distribution and explored the regression models based on it. We extended the model to allow the dispersion, as well as the mean, depending on covariates. A set of count statistical models, namely COM-Poisson, Gamma-count, discrete Weibull, generalized Poisson, double Poisson and Poisson-Tweedie, was reviewed and compared, considering the dispersion, zero-inflation, and heavy tail indexes, together with the results of data analyzes. The computational routines developed in this dissertation were organized in two R packages available on GitHub. / Em diversos estudos agrícolas e biológicos, a variável resposta é um número inteiro não negativo que desejamos explicar ou analisar em termos de um conjunto de covariáveis. Diferentemente do modelo linear Gaussiano, a variável resposta é discreta com distribuição de probabilidade definida apenas em valores do conjunto dos naturais. O modelo Poisson é o modelo padrão para dados em forma de contagens. No entanto, as suposições desse modelo forçam que a média seja igual a variância, o que pode ser implausível em muitas aplicações. Motivado por conjuntos de dados experimentais, este trabalho teve como objetivo desenvolver métodos mais realistas para a análise de contagens. Foi proposta uma nova reparametrização da distribuição COM-Poisson e explorados modelos de regressão baseados nessa distribuição. Uma extensão desse modelo para permitir que a dispersão, assim como a média, dependa de covariáveis, foi proposta. Um conjunto de modelos para contagens, nomeadamente COM-Poisson, Gamma-count, Weibull discreto, Poisson generalizado, duplo Poisson e Poisson-Tweedie, foi revisado e comparado, considerando os índices de dispersão, inflação de zero e cauda pesada, juntamente com os resultados de análises de dados. As rotinas computacionais desenvolvidas nesta dissertação foram organizadas em dois pacotes R disponíveis no GitHub. Count data Dados de contagens Discrete probability models Dispersão variável Inferência baseada em verossimilhança Likelihood-based inference Modelos probabilísticos discretos Overdispersion Subdispersão Superdipersão Underdispersion Varying dispersion
49	Modelos não lineares para dados de contagem longitudinais / Non linear models for count longitudinal data Araujo, Ana Maria Souza de 16 February 2007 (has links) Experimentos em que medidas são realizadas repetidamente na mesma unidade experimental são comuns na área agronômica. As técnicas estatísticas utilizadas para análise de dados desses experimentos são chamadas de análises de medidas repetidas, tendo como caso particular o estudo de dados longitudinais, em que uma mesma variável resposta é observada em várias ocasiões no tempo. Além disso, o comportamento longitudinal pode seguir um padrão não linear, o que ocorre com freqüência em estudos de crescimento. Também são comuns experimentos em que a variável resposta refere-se a contagem. Este trabalho abordou a modelagem de dados de contagem, obtidos a partir de experimentos com medidas repetidas ao longo do tempo, em que o comportamento longitudinal da variável resposta é não linear. A distribuição Poisson multivariada, com covariâncias iguais entre as medidas, foi utilizada de forma a considerar a dependência entre os componentes do vetor de observações de medidas repetidas em cada unidade experimental. O modelo proposto por Karlis e Meligkotsidou (2005) foi estendido para dados longitudinais provenientes de experimentos inteiramente casualizados. Modelos para experimentos em blocos casualizados, supondo-se efeitos fixos ou aleatórios para blocos, foram também propostos. A ocorrência de superdispersão foi considerada e modelada através da distribuição Poisson multivariada mista. A estimação dos parâmetros foi realizada através do método de máxima verossimilhança, via algoritmo EM. A metodologia proposta foi aplicada a dados simulados para cada uma das situações estudadas e a um conjunto de dados de um experimento em blocos casualizados em que foram observados o número de folhas de bromélias em seis instantes no tempo. O método mostrou-se eficiente na estimação dos parâmetros para o modelo considerando o delineamento completamente casualizado, inclusive na ocorrência de superdispersão, e delineamento em blocos casualizados com efeito fixo, sem superdispersão e efeito aleatório para blocos. No entanto, a estimação para o modelo que considera efeito fixo para blocos, na presença de superdispersão e para o parâmetro de variância do efeito aleatório para blocos precisa ser aprimorada. / Experiments in which measurements are taken in the same experimental unit are common in agriculture area. The statistical techniques used to analyse data from those experiments are called repeated measurement analysis, and longitudinal study, in which the response variable is observed along the time, is a particular case. The longitudinal behaviour can be non linear, occuring freq¨uently in growth studies. It is also common to have experiments in which the response variable refers to count data. This work approaches the modelling of count data, obtained from experiments with repeated measurements through time, in which the response variable longitudinal behaviour is non linear. The multivariate Poisson distribution, with equal covariances between measurements, was used to consider the dependence between the components of the repeated measurement observation vector in each experimental unit. The Karlis and Meligkotsidou (2005) proposal was extended to longitudinal data obtained from completely randomized. Models for randomized blocks experiments, assuming fixed or random effects for blocks, were also proposed. The occurence of overdispersion was considered and modelled through mixed multivariate Poisson distribution. The parameter estimation was done using maximum likelihood method, via EM algorithm. The methodology was applied to simulated data for all the cases studied and to a data set from a randomized block experiment in which the number of Bromeliads leaves were observed through six instants in time. The method was efficient to estimate the parameters for the completely randomized experiment, including the occurence of overdispersion, and for the randomized blocks experiments assuming fixed effect, with no overdispersion, and random effect for blocks. The estimation for the model that considers fixed effect for block, with overdispersion and for the variance parameters of the random effect for blocks must be improved. Análise de dados longitudinais Count data Dados de contagem Distribuição de Poisson Longitudinal data analysis Maximum likelihood method Método da máxima verossimilhança Modelos não lineares Nonlinear models Poisson distribution
50	Modelos semiparamétricos com resposta binomial negativa / Semiparametric models with negative binomial response Fabio Hideto Oki 14 May 2015 (has links) O objetivo principal deste trabalho é discutir estimação e diagnóstico em modelos semiparamétricos com resposta binomial negativa, mais especificamente, modelos de regressão com resposta binomial negativa em que uma das variáveis explicativas contínuas é modelada de forma não paramétrica. Iniciamos o trabalho com um exemplo ilustrativo e fazemos uma breve revisão dos modelos paramétricos com resposta binomial negativa. Em seguida, introduzimos os modelos semiparamétricos com resposta binomial negativa e discutimos alguns aspectos de estimação, inferência e seleção de modelos. Dedicamos um capítulo a procedimentos de diagnóstico, tais como desenvolvimento de medidas de alavanca e de influência sob os aspectos de deleção de pontos e influência local, além de abordar a análise de resíduos. Reanalizamos o exemplo ilustrativo sob o enfoque semiparamétrico e apresentamos algumas conclusões. / The aim of this work is to discuss some aspects on estimation and diagnostics in negative binomial regression models which an explanatory continuous variable is modeled nonparametrically. First, an illustrative example is presented and analyzed under parametric negative binomial regression models. The proposed models are then introduced and some aspects on estimations, inference and model selection are presented. Particular emphasis is given on the development of diagnostic procedures, such as leverage measures, Cook distances, local influence approach and residuals. The motivated example is reanalyzed under the semiparametric viewpoint and some conclusions are given. Dados de contagem Distância de Cook Distribuição binomial negativa Influência local Métodos não paramétricos Splines cúbicos Cook distance Count data Cubic splines Local influence Negative binomial distribution Nonparametric methods

Search results