Spelling suggestions: "subject:"zero inflation"" "subject:"pero inflation""
51 |
Modelo destrutivo com variável terminal em experimentos quimiopreventivos de tumores em animaisZavaleta, Katherine Elizabeth Coaguila 12 April 2012 (has links)
Made available in DSpace on 2016-06-02T20:06:07Z (GMT). No. of bitstreams: 1
4375.pdf: 903031 bytes, checksum: 03118f406867a5d7be3cbc63571d4a2b (MD5)
Previous issue date: 2012-04-12 / Financiadora de Estudos e Projetos / The chemical induction of carcinogens in chemopreventive animal experiments is becoming increasingly frequent in biological research. The purpose of these biological experiments is to evaluate the effect of a particular treatment on the rate of tumors incidence in animals. In this work, the number of promoted tumors per animal will be parametrically modeled following the suggestions given by Kokoska (1987) and Freedman et al. (1993). The study of these chemopreventive experiments will be presented in the context of the destructive model proposed by Rodrigues et al. (2010) with terminal variable that allows or censures the experiment at time of the animal death. Since the data analyzed in this field are subject to excess of zeros (Freedman et al. (1993)), we propose for the number of promoted tumors a negative binomial distribution (NB), a zero-inflated Poisson distribution (ZIP), and a zero-inflated Negative Binomial distribution (ZINB). The selection of these models will be made through the likelihood ratio test and the AIC, BIC criteria. The estimation of its parameters will be obtained by using the method of maximum likelihood, and further simulation studies will also be realized. As a future proposition to finalize this project, it is suggested the Bayesian methodology as an alternative to the method of maximum likelihood via the EM algorithm. / A indução química de substâncias cancerígenas em experimentos quimiopreventivos em animais é cada vez mais frequente em pesquisas biológicas. O objetivo destes experimentos biológicos é avaliar o efeito de um determinado tratamento na taxa de incidência de tumores em animais. Neste trabalho o número de tumores promovidos por animal será modelado parametricamente seguindo as sugestões dadas por Kokoska (1987) e por Freedman et al. (1993). O estudo desses experimentos quimiopreventivos será apresentado no contexto do modelo destrutivo proposto por Rodrigues et al. (2010) com variável terminal que condiciona ou censura o experimento no instante de morte do animal. Os dados analisados possuem uma grande quantidade de zeros, portanto será proposto para o número de tumores promovidos as seguintes distribuições: binomial negativa, a distribuição de Poisson com zeros inflacionados e a distribuição binomial negativa com zeros inflacionados. A seleção destes modelos será feita através do teste da razão de verossimilhança e os critérios AIC, BIC. As estimativas dos respectivos parâmetros serão obtidas utilizando o método de máxima verossimilhança e serão feitos estudos de simulação. Para continuar este projeto, a proposta futura é utilizar a metodologia Bayesiana como alternativa ao método de máxima verossimilhança via algoritmo EM.
|
52 |
Bayesian modelling of ultra high-frequency financial dataShahtahmassebi, Golnaz January 2011 (has links)
The availability of ultra high-frequency (UHF) data on transactions has revolutionised data processing and statistical modelling techniques in finance. The unique characteristics of such data, e.g. discrete structure of price change, unequally spaced time intervals and multiple transactions have introduced new theoretical and computational challenges. In this study, we develop a Bayesian framework for modelling integer-valued variables to capture the fundamental properties of price change. We propose the application of the zero inflated Poisson difference (ZPD) distribution for modelling UHF data and assess the effect of covariates on the behaviour of price change. For this purpose, we present two modelling schemes; the first one is based on the analysis of the data after the market closes for the day and is referred to as off-line data processing. In this case, the Bayesian interpretation and analysis are undertaken using Markov chain Monte Carlo methods. The second modelling scheme introduces the dynamic ZPD model which is implemented through Sequential Monte Carlo methods (also known as particle filters). This procedure enables us to update our inference from data as new transactions take place and is known as online data processing. We apply our models to a set of FTSE100 index changes. Based on the probability integral transform, modified for the case of integer-valued random variables, we show that our models are capable of explaining well the observed distribution of price change. We then apply the deviance information criterion and introduce its sequential version for the purpose of model comparison for off-line and online modelling, respectively. Moreover, in order to add more flexibility to the tails of the ZPD distribution, we introduce the zero inflated generalised Poisson difference distribution and outline its possible application for modelling UHF data.
|
53 |
La régression de Poisson multiniveau généralisée au sein d’un devis longitudinal : un exemple de modélisation du nombre d’arrestations de membres de gangs de rue à Montréal entre 2005 et 2007Rivest, Amélie 12 1900 (has links)
Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée. / Count data have distributions with specific characteristics such as non-normality, heterogeneity of variances and a large number of zeros. It is necessary to use appropriate models to obtain unbiased results. This memoir compares four models of analysis that can be used for count data: the Poisson model, the negative binomial model, the Poisson model with zero inflation and the negative binomial model with zero inflation. For purposes of comparison, the prediction of the proportion of zero, the confirmation or refutation of the various assumptions and the prediction of average number of arrrests were used to determine the adequacy of the different models. To do this, the number of arrests of members of street gangs in the Montreal area was used for the period 2005 to 2007. The sample consisted of 470 men, aged 18 to 59 years. After the analysis, the most suitable model is the negative binomial model since it produced significant results, adapts well to the observed data and produces a zero proportion very similar to that observed.
|
54 |
La régression de Poisson multiniveau généralisée au sein d’un devis longitudinal : un exemple de modélisation du nombre d’arrestations de membres de gangs de rue à Montréal entre 2005 et 2007Rivest, Amélie 12 1900 (has links)
Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée. / Count data have distributions with specific characteristics such as non-normality, heterogeneity of variances and a large number of zeros. It is necessary to use appropriate models to obtain unbiased results. This memoir compares four models of analysis that can be used for count data: the Poisson model, the negative binomial model, the Poisson model with zero inflation and the negative binomial model with zero inflation. For purposes of comparison, the prediction of the proportion of zero, the confirmation or refutation of the various assumptions and the prediction of average number of arrrests were used to determine the adequacy of the different models. To do this, the number of arrests of members of street gangs in the Montreal area was used for the period 2005 to 2007. The sample consisted of 470 men, aged 18 to 59 years. After the analysis, the most suitable model is the negative binomial model since it produced significant results, adapts well to the observed data and produces a zero proportion very similar to that observed.
|
55 |
The impact of agri-environmental policy and infrastructure on wildlife and land pricesKoemle, Dieter 30 October 2018 (has links)
No description available.
|
56 |
Modelos série de potência com excesso de zeros observáveis e latentesCoaguila Zavaleta, Katherine Elizabeth 28 September 2016 (has links)
Submitted by Aelson Maciera (aelsoncm@terra.com.br) on 2017-06-23T18:57:54Z
No. of bitstreams: 1
TeseKECZ.pdf: 1800356 bytes, checksum: a555e52c04756515d694387c471b4030 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-06-28T08:32:51Z (GMT) No. of bitstreams: 1
TeseKECZ.pdf: 1800356 bytes, checksum: a555e52c04756515d694387c471b4030 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-06-28T08:33:00Z (GMT) No. of bitstreams: 1
TeseKECZ.pdf: 1800356 bytes, checksum: a555e52c04756515d694387c471b4030 (MD5) / Made available in DSpace on 2017-06-28T08:41:16Z (GMT). No. of bitstreams: 1
TeseKECZ.pdf: 1800356 bytes, checksum: a555e52c04756515d694387c471b4030 (MD5)
Previous issue date: 2016-09-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / The present work's main objective is to study the significance of zeros in an observable
and latent data. In observable data set that occur excess of zeros, its common to have
sobredispersion. In this sense, the models zero-inflated power series (ZISP) were proposed
to accommodate these excesses. Specifically for the analysis of observed data, it was made
a study of gradient statistic, proposed by Terrell (2002), to test the hypotheses in relation
to inflation parameter ZISP models. This test is based on evaluation of the performance
of gradient statistic compared with the classical likelihood ratio (Wilks, 1938), score (Rao,
1948) and Wald (Wald, 1943) statistics. In addition, recently, fragility has being modeled
by discrete distributions using non-negative integers values that allows zero fragility, which
means, individuals who do not present the event of interest (fraction of zero risk). For this
type of latent data, we have proposed a new survival model induced by discrete frailty with
ZISP distribution. This proposal brings a real description of individuals without risk, because
individuals cured due to genetic factors (immune) are modeled by fraction of deterministic
zero risk, while the cured by treatment are modeled by fraction of random zero risk. In this
context, we also developed the gradient statistic to verify parameter significance of zero risk
for data modeled by fraction of deterministic zero risk. To show our proposals, we present
the results of simulation studies and applications using real data. / O presente trabalho teve como objetivo principal, estudar a significância de zeros
numa análise de dados observáveis e latentes. Nos conjuntos de dados observáveis que ocorrem
excessos de zeros, é comum a existência de sobredispersão. Neste sentido os modelos
Zero-Inflacionados Série de Potência (ZISP) foram propostos para acomodar o excesso de
zeros. Especifcamente para a análise de dados observáveis com excesso de zeros desenvolvemos
um estudo da estatística gradiente, proposta por Terrell (2002), para testar as hipóteses
em relação ao parâmetro de inflação do modelo ZISP, baseado na avaliação da performance
da estatística gradiente em comparação com as estatísticas clássicas da razão de verossimilhan
ça (Wilks, 1938), escore (Rao, 1948) e Wald (Wald, 1943). Por outro lado, recentemente
a fragilidade é modelada por distribuições discretas sob os inteiros não negativos e permite
fragilidade zero, isto é, indivíduos que não apresentam o evento de interesse (fração de risco
zero). Para este tipo dados de latentes, propusemos um novo modelo de sobrevivência induzida
por fragilidade discreta com distribuição ZISP. Essa proposta traz uma descrição mais
real dos indivíduos sem risco, pois inclui indivíduos curados devido aos fatores genéticos
(imunes) modelados como a fração de risco zero determinístico, enquanto que, os indivíduos
curados por tratamento são modelados pela fração de risco zero aleatório. Neste contexto
desenvolvemos também a estatística gradiente para verificar a significância do parâmetro de
risco zero para dados modelados pela fração de risco zero determinístico. E para completar
o desenvolvimento das propostas, apresentamos os resultados de estudos de simulação e
exemplos de aplicação com uso de dados reais.
|
57 |
Inférence de réseaux pour modèles inflatés en zéro / Network inference for zero-inflated modelsKarmann, Clémence 25 November 2019 (has links)
L'inférence de réseaux ou inférence de graphes a de plus en plus d'applications notamment en santé humaine et en environnement pour l'étude de données micro-biologiques et génomiques. Les réseaux constituent en effet un outil approprié pour représenter, voire étudier des relations entre des entités. De nombreuses techniques mathématiques d'estimation ont été développées notamment dans le cadre des modèles graphiques gaussiens mais aussi dans le cas de données binaires ou mixtes. Le traitement des données d'abondance (de micro-organismes comme les bactéries par exemple) est particulier pour deux raisons : d'une part elles ne reflètent pas directement la réalité car un processus de séquençage a lieu pour dupliquer les espèces et ce processus apporte de la variabilité, d'autre part une espèce peut être absente dans certains échantillons. On est alors dans le cadre de données inflatées en zéro. Beaucoup de méthodes d'inférence de réseaux existent pour les données gaussiennes, les données binaires et les données mixtes mais les modèles inflatés en zéro sont très peu étudiés alors qu'ils reflètent la structure de nombreux jeux de données de façon pertinente. L'objectif de cette thèse concerne l'inférence de réseaux pour les modèles inflatés en zéro. Dans cette thèse, on se limitera à des réseaux de dépendances conditionnelles. Le travail présenté dans cette thèse se décompose principalement en deux parties. La première concerne des méthodes d'inférence de réseaux basées sur l'estimation de voisinages par une procédure couplant des méthodes de régressions ordinales et de sélection de variables. La seconde se focalise sur l'inférence de réseaux dans un modèle où les variables sont des gaussiennes inflatées en zéro par double troncature (à droite et à gauche). / Network inference has more and more applications, particularly in human health and environment, for the study of micro-biological and genomic data. Networks are indeed an appropriate tool to represent, or even study, relationships between entities. Many mathematical estimation techniques have been developed, particularly in the context of Gaussian graphical models, but also in the case of binary or mixed data. The processing of abundance data (of microorganisms such as bacteria for example) is particular for two reasons: on the one hand they do not directly reflect reality because a sequencing process takes place to duplicate species and this process brings variability, on the other hand a species may be absent in some samples. We are then in the context of zero-inflated data. Many graph inference methods exist for Gaussian, binary and mixed data, but zero-inflated models are rarely studied, although they reflect the structure of many data sets in a relevant way. The objective of this thesis is to infer networks for zero-inflated models. In this thesis, we will restrict to conditional dependency graphs. The work presented in this thesis is divided into two main parts. The first one concerns graph inference methods based on the estimation of neighbourhoods by a procedure combining ordinal regression models and variable selection methods. The second one focuses on graph inference in a model where the variables are Gaussian zero-inflated by double truncation (right and left).
|
58 |
[en] INTERMITTENT DEMAND FORECASTING IN RETAIL: APPLICATIONS OF THE GAS FRAMEWORK / [pt] PREVISÃO DE DEMANDA INTERMITENTE NO VAREJO: APLICAÇÕES DO FRAMEWORK GASRODRIGO SARLO ANTONIO FILHO 29 September 2021 (has links)
[pt] Demanda intermitente é definida por períodos de vendas nulas intercaladas com vendas positivas e de quantidade altamente variável. A maior parte das unidades de manutenção de estoque (stock keeping units, em inglês) ao nível loja pode ser caracterizada como contendo demanda desse tipo. Assim,
modelos acurados para prever séries com demanda intermitente trazem grandes impactos em relação à gestão de estoque. Nesta dissertação nós propomos o uso do framework GAS com as distribuições adequadas para dados de contagem, além de suas versões com excesso de zeros, e aplicamos os modelos
derivados a dados reais obtidos com uma grande rede varejista brasileira. Nós demonstramos que os modelos com excesso de zeros propostos são estimados de forma consistente por máxima verossimilhança e a distribuição dos estimadores é assintóticamente normal. A performance dos modelos propostos é comparada com benchmarks adequados das literaturas de séries temporais para dados de contagem e previsão de demanda intermitente. A avaliação das previsões é feita com base tanto na precisão da distribuição preditiva quanto na precisão das previsões pontuais. Nossos resultados mostram que os modelos propostos, em especial o modelo derivado sob distribuição hurdle Poisson, performam melhor
do que os benchmarks analisados. / [en] Intermittent demand is defined by periods of zero sales interleaved with positive sales with highly variable quantities. Most stock keeping units at the store level can be characterized as containing such demand. Thus, accurate models for predicting series with intermittent demand have major impacts in relation to inventory management. In this dissertation we propose the use of the GAS framework with the appropriate distributions for count data, in addition to their versions with excess of zeroes, and apply the derived models to real data obtained from a large Brazilian retail chain. We demonstrate that the proposed models with excess of zeros are consistently estimated via maximum likelihood and the distribution of the estimator is asymptotically normal. The performance of the proposed models is compared to adequate
benchmarks from the time series literature for count data and intermittent demand forecast. Forecasting is evaluated based on the accuracy of both the entire predictive distribution and point forecasts. Our results show that the proposed models, specially the one derived from hurdle Poisson distribution, perform better than the analyzed benchmarks.
|
59 |
Flying in the Academic Environment : An Exploratory Panel Data Analysis of CO2 Emission at KTHArtman, Arvid January 2024 (has links)
In this study, a panel data set of flights made by employees at the Royal Institute of Technology (KTH) in Sweden is analyzed using generalized linear modeling approaches, with the aim to create a model with high predictive capability of the quarterly CO2 emission and the number of flights, for a year not included in the model estimation. A Zero-inflated Gamma regression model is fitted to the CO2 emission variable and a Zero-inflated Negative Binomial regression model is used for the number of flights. To build the models, cross-validation is performed with the observations from 2018 as the training set and the observations from the next year, 2019, as the test set. One at a time, the variable that best improves the prediction of the test set data (either as included in the count model or the zero-inflation model) is selected until an additional variable turns out insignificant on a 5% significance level in the estimated model. In addition to the variables in the data, three lags of the dependent variables (CO2 emission and flights) were included, as well as transformed versions of the continuous variables, and a random intercept each for the categorical variables indicating quarter and department at KTH, respectively. Neither model selected through the cross-validation process turned out to be particularly good at predicting the values for the upcoming year, but a number of variables were proven to have a statistically significant association with the respective dependent variable.
|
60 |
Evaluación en el modelado de las respuestas de recuentoLlorens Aleixandre, Noelia 10 June 2005 (has links)
Este trabajo presenta dos líneas de investigación desarrolladas en los últimos años en torno a la etapa de evaluación en datos de recuento. Los campos de estudio han sido: los datos de recuento, concretamente el estudio del modelo de regresión de Poisson y sus extensiones y la etapa de evaluación como punto de inflexión en el proceso de modelado estadístico. Los resultados obtenidos ponen de manifiesto la importancia de aplicar el modelo adecuado a las características de los datos así como de evaluar el ajuste del mismo. Por otra parte la comparación de pruebas, índices, estimadores y modelos intentan señalar la adecuación o la preferencia de unos sobre otros en determinadas circunstancias y en función de los objetivos del investigador. / This paper presents two lines of research that have been developed in recent years on the evaluation stage in count data. The areas of study have been both count data, specifically the study of Poisson regression modelling and its extension, and the evaluation stage as a point of reflection in the statistical modelling process. The results obtained demonstrate the importance of applying appropriate models to the characteristics of data as well as evaluating their fit. On the other hand, comparisons of trials, indices, estimators and models attempt to indicate the suitability or preference for one over the others in certain circumstances and according to research objectives.
|
Page generated in 0.0558 seconds