Spelling suggestions: "subject:"dispersion.""
21 |
Inferência em um modelo de regressão com resposta binária na presença de sobredispersão e erros de medição / Inference in a regression model with overdispersed binary response and measurement errorsTieppo, Sandra Maria 15 February 2007 (has links)
Modelos de regressão com resposta binária são utilizados na solução de problemas nas mais diversas áreas. Neste trabalho enfocamos dois problemas comuns em certos conjuntos de dados e que requerem técnicas apropriadas que forneçam inferências satisfatórias. Primeiro, em certas aplicações uma mesma unidade amostral é utilizada mais de uma vez, acarretando respostas positivamente correlacionadas, responsáveis por uma variância na variável resposta superior ao que comporta a distribuição binomial, fenômeno conhecido como sobredispersão. Por outro lado, também encontramos situações em que a variável explicativa contém erros de medição. É sabido que utilizar técnicas que desconsideram esses erros conduz a resultados inadequados (estimadores viesados e inconsistentes, por exemplo). Considerando um modelo com resposta binária, utilizaremos a distribuição beta-binomial para representar a sobredispersão. Os métodos de máxima verossimilhança, SIMEX, calibração da regressão e máxima pseudo-verossimilhança foram usados na estimação dos parâmetros do modelo, que são comparados através de um estudo de simulação. O estudo de simulação sugere que os métodos de máxima verossimilhança e calibração da regressão são melhores no sentido de correção do viés, especialmente para amostras de tamanho 50 e 100. Também estudaremos testes de hipóteses assintóticos (como razão de verossimilhanças, Wald e escore) a fim de testar hipóteses de interesse. Apresentaremos também um exemplo com dados reais / Regression models with binary response are used for solving problems in several areas. In this work we approach two common problems in some data sets and they need appropriate techniques to achieve satisfactory inference. First, in some applications, the same sample unity is utilized more than once, bringing positively correlated responses, which are responsible for the response variable variance be greater than an assumption binomial distribution, phenomenon known as overdispersion. On the other hand, also we find situations where the explanatory variable has measurement errors. It is known that the use of techniques which ignores these measurement errors brings inadequate results (e. g., biased and inconsistent estimators). Taking a model with binary response, we will use a beta-binomial distribution for modeling the overdispersion. The methods of maximum likelihood, SIMEX, regression calibration and maximum pseudo-likelihood were used in the estimation of the parameters, which are compared through a simulation study. The simulation studies suggest that the maximum likelihood and regression calibration methods are better for bias correcting, especially for larger sample size. Likelihood ratio, Wald and score statistics are used in order to test hypothesis of interest. We will illustrate the techniques with an application to a real data set
|
22 |
Statistical validation of limiting similarity and negative co-occurrence null models : Extending the models to gain insights into sub-community patterns of community assembly2014 September 1900 (has links)
Competition between species is believed to lead to patterns of either competitive exclusion or limiting similarity within ecological communities; however, to date the amount of support for either as an outcome has been relatively weak. The two classes of null model commonly used to assess co-occurrence and limiting similarity have both been well studied for statistical performance; however, the methods used to evaluate their performance, particularly in terms of type II statistical errors, may have resulted in the underreporting of both patterns in the communities tested. The overall purpose of this study was to evaluate the efficacy of the negative co-occurrence and limiting similarity null models to detect patterns believed to result from competition between species and to develop an improved method for detecting said patterns. The null models were tested using synthetic but biologically realistic presence-absence matrices for both type I and type II error rate estimations. The effectiveness of the null models was evaluated with respect to community dimension (number of species × number of plots), and amount of pattern within the community. A novel method of subsetting species was developed to assess communities for patterns of co-occurrence and limiting similarity and four methods were assessed for their ability to isolate the species contributing signal to the pattern. Both classes of null model provided acceptable type I and type II error rates when matrices of more than 5 species and more than 5 plots were tested. When patterns of negative co-occurrence or limiting similarity were add to all species both null models were able to detect significant pattern (β > 0.95); however, when pattern was added to only a proportion of species the ability of the null models to detect pattern deteriorated rapidly with proportions of 80% or less. The use of species subsetting was able to detect significant pattern of both co-occurrence and limiting similarity when fewer than 80% of species were contributing signal but was dependent on the metric used for the limiting similarity null model. The ability of frequent pattern mining to isolate the species contributing signal shows promise; however, a more thorough evaluation is required in order to confirm or deny its utility.
|
23 |
Inferência em um modelo de regressão com resposta binária na presença de sobredispersão e erros de medição / Inference in a regression model with overdispersed binary response and measurement errorsSandra Maria Tieppo 15 February 2007 (has links)
Modelos de regressão com resposta binária são utilizados na solução de problemas nas mais diversas áreas. Neste trabalho enfocamos dois problemas comuns em certos conjuntos de dados e que requerem técnicas apropriadas que forneçam inferências satisfatórias. Primeiro, em certas aplicações uma mesma unidade amostral é utilizada mais de uma vez, acarretando respostas positivamente correlacionadas, responsáveis por uma variância na variável resposta superior ao que comporta a distribuição binomial, fenômeno conhecido como sobredispersão. Por outro lado, também encontramos situações em que a variável explicativa contém erros de medição. É sabido que utilizar técnicas que desconsideram esses erros conduz a resultados inadequados (estimadores viesados e inconsistentes, por exemplo). Considerando um modelo com resposta binária, utilizaremos a distribuição beta-binomial para representar a sobredispersão. Os métodos de máxima verossimilhança, SIMEX, calibração da regressão e máxima pseudo-verossimilhança foram usados na estimação dos parâmetros do modelo, que são comparados através de um estudo de simulação. O estudo de simulação sugere que os métodos de máxima verossimilhança e calibração da regressão são melhores no sentido de correção do viés, especialmente para amostras de tamanho 50 e 100. Também estudaremos testes de hipóteses assintóticos (como razão de verossimilhanças, Wald e escore) a fim de testar hipóteses de interesse. Apresentaremos também um exemplo com dados reais / Regression models with binary response are used for solving problems in several areas. In this work we approach two common problems in some data sets and they need appropriate techniques to achieve satisfactory inference. First, in some applications, the same sample unity is utilized more than once, bringing positively correlated responses, which are responsible for the response variable variance be greater than an assumption binomial distribution, phenomenon known as overdispersion. On the other hand, also we find situations where the explanatory variable has measurement errors. It is known that the use of techniques which ignores these measurement errors brings inadequate results (e. g., biased and inconsistent estimators). Taking a model with binary response, we will use a beta-binomial distribution for modeling the overdispersion. The methods of maximum likelihood, SIMEX, regression calibration and maximum pseudo-likelihood were used in the estimation of the parameters, which are compared through a simulation study. The simulation studies suggest that the maximum likelihood and regression calibration methods are better for bias correcting, especially for larger sample size. Likelihood ratio, Wald and score statistics are used in order to test hypothesis of interest. We will illustrate the techniques with an application to a real data set
|
24 |
Méthodes statistiques pour la modélisation des facteurs influençant la distribution et l’abondance de populations : application aux rapaces diurnes nichant en France / Statistical methods for modelling the distribution and abundance of populations : application to raptors breeding in FranceLe Rest, Kévin 19 December 2013 (has links)
Face au déclin global de la biodiversité, de nombreux suivis de populations animales et végétales sont réalisés sur de grandes zones géographiques et durant une longue période afin de comprendre les facteurs déterminant la distribution, l’abondance et les tendances des populations. Ces suivis à larges échelles permettent de statuer quantitativement sur l’état des populations et de mettre en place des plans de gestion appropriés en accord avec les échelles biologiques. L’analyse statistique de ce type de données n’est cependant pas sans poser un certain nombre de problèmes. Classiquement, on utilise des modèles linéaires généralisés (GLM), formalisant les liens entre des variables supposées influentes (par exemple caractérisant l’environnement) et la variable d’intérêt (souvent la présence / absence de l’espèce ou des comptages). Il se pose alors un problème majeur qui concerne la manière de sélectionner ces variables influentes dans un contexte de données spatialisées. Cette thèse explore différentes solutions et propose une méthode facilement applicable, basée sur une validation croisée tenant compte des dépendances spatiales. La robustesse de la méthode est évaluée par des simulations et différents cas d’études dont des données de comptages présentant une variabilité plus forte qu’attendue (surdispersion). Un intérêt particulier est aussi porté aux méthodes de modélisation pour les données ayant un nombre de zéros plus important qu’attendu (inflation en zéro). La dernière partie de la thèse utilise ces enseignements méthodologiques pour modéliser la distribution, l’abondance et les tendances des rapaces diurnes en France. / In the context of global biodiversity loss, more and more surveys are done at a broad spatial extent and during a long time period, which is done in order to understand processes driving the distribution, the abundance and the trends of populations at the relevant biological scales. These studies allow then defining more precise conservation status for species and establish pertinent conservation measures. However, the statistical analysis of such datasets leads some concerns. Usually, generalized linear models (GLM) are used, trying to link the variable of interest (e.g. presence/absence or abundance) with some external variables suspected to influence it (e.g. climatic and habitat variables). The main unresolved concern is about the selection of these external variables from a spatial dataset. This thesis details several possibilities and proposes a widely usable method based on a cross-validation procedure accounting for spatial dependencies. The method is evaluated through simulations and applied on several case studies, including datasets with higher than expected variability (overdispersion). A focus is also done for methods accounting for an excess of zeros (zero-inflation). The last part of this manuscript applies these methodological developments for modelling the distribution, abundance and trend of raptors breeding in France.
|
25 |
Statistical models for an MTPL portfolio / Statistical models for an MTPL portfolioPirozhkova, Daria January 2017 (has links)
In this thesis, we consider several statistical techniques applicable to claim frequency models of an MTPL portfolio with a focus on overdispersion. The practical part of the work is focused on the application and comparison of the models on real data represented by an MTPL portfolio. The comparison is presented by the results of goodness-of-fit measures. Furthermore, the predictive power of selected models is tested for the given dataset, using the simulation method. Hence, this thesis provides a combination of the analysis of goodness-of-fit results and the predictive power of the models.
|
26 |
Properties of Hurdle Negative Binomial Models for Zero-Inflated and Overdispersed Count dataBhaktha, Nivedita January 2018 (has links)
No description available.
|
27 |
具有額外變異之離散型資料分析探討 / A Study on Modelling Overdispersion in Categorical Data陳麗如 Unknown Date (has links)
處理類別型的資料時,常由於變異數與平均數間具有函數關係,因此資料呈現出來的變異程度會比預期的變異程度來的大,這種現象就稱為資料具有額外變異。一般的分析方法是利用廣義線性模型先作估計,再對估計之標準誤做調整。本文中將探討處理額外變異的另外兩種方法—準概似估計和隨機效果模型,並分別利用紡織原料與毒物學研究之資料作為範例來比較此兩種方法與前者的異同。 / Overdispersion is a common phenomenon in practice when modelling categorical data, and the scaled Pearson chi-square is usually used to measure it. In this study, we examine two other methods—the quasi-likelihood and the random-effect models. In addition, two examples are provided for illustration.
|
28 |
Statistical properties of parasite density estimators in malaria and field applications / Propriétés statistiques des estimateurs de la densité parasitaire dans les études portant sur le paludisme et applications opérationnellesHammami, Imen 24 June 2013 (has links)
Pas de résumé en français / Malaria is a devastating global health problem that affected 219 million people and caused 660,000 deaths in 2010. Inaccurate estimation of the level of infection may have adverse clinical and therapeutic implications for patients, and for epidemiological endpoint measurements. The level of infection, expressed as the parasite density (PD), is classically defined as the number of asexual parasites relative to a microliter of blood. Microscopy of Giemsa-stained thick blood smears (TBSs) is the gold standard for parasite enumeration. Parasites are counted in a predetermined number of high-power fields (HPFs) or against a fixed number of leukocytes. PD estimation methods usually involve threshold values; either the number of leukocytes counted or the number of HPFs read. Most of these methods assume that (1) the distribution of the thickness of the TBS, and hence the distribution of parasites and leukocytes within the TBS, is homogeneous; and that (2) parasites and leukocytes are evenly distributed in TBSs, and thus can be modeled through a Poisson-distribution. The violation of these assumptions commonly results in overdispersion. Firstly, we studied the statistical properties (mean error, coefficient of variation, false negative rates) of PD estimators of commonly used threshold-based counting techniques and assessed the influence of the thresholds on the cost-effectiveness of these methods. Secondly, we constituted and published the first dataset on parasite and leukocyte counts per HPF. Two sources of overdispersion in data were investigated: latent heterogeneity and spatial dependence. We accounted for unobserved heterogeneity in data by considering more flexible models that allow for overdispersion. Of particular interest were the negative binomial model (NB) and mixture models. The dependent structure in data was modeled with hidden Markov models (HMMs). We found evidence that assumptions (1) and (2) are inconsistent with parasite and leukocyte distributions. The NB-HMM is the closest model to the unknown distribution that generates the data. Finally, we devised a reduced reading procedure of the PD that aims to a better operational optimization and a practical assessing of the heterogeneity in the distribution of parasites and leukocytes in TBSs. A patent application process has been launched and a prototype development of the counter is in process.
|
29 |
台灣地區男女自殺死亡率之比較研究 / 無柯亭安 Unknown Date (has links)
為瞭解臺灣地區男女自殺死亡率的差異,本文採用Held and Riebler (2010)所建議的多元年齡-年代-世代模型,同時探討男女性自殺死亡率在年齡、年代及世代三種效應上的差異,我們同時使用非條件概似函數法(或稱對數線性模型法)及條件概似函數法(或稱多項式邏輯模型法)對台灣地區男女自殺死亡資料來配適模型。結果發現在假設世代效應與性別無關的前提下,年齡方面, 女性的自殺死亡率在10歲到24歲時顯著比男性高,在15到19歲這個年齡層差異達到最大,20歲之後差異開始變小,到了25至34歲,兩性則已無顯著差異,35歲之後男性的自殺死亡率開始顯著大於女性,並且隨著年齡增長兩性的差異越大,直到60歲之後差異才開始減小,到70歲時兩性無顯著差異。年代方面,男女的自殺死亡率在1959年到1973年間沒有顯著的差異,在1974到1988年女性的自殺死亡率顯著大於男性並於1979年到1983年來到最低點,也就是差異最大,之後差異開始變小,到了1989年時兩性已無顯著差異,從1994年開始男性的自殺死亡率反而開始顯著大於女性,而且隨著年代增加差異越大,並於2004到2008這個年代層差異達到最大。 / To understand the differences in suicide mortality between men and women in Taiwan, this study uses the Multivariate Age-Period-Cohort model proposed by Held and Riebler (2010), and explores the differences in suicide mortality between men and women on age, period and cohort effects adjusted for the other two. We use both unconditional likelihood function method (or log-linear model) and conditional likelihood function method (or multinomial logit model) to fit the model. Assuming that the cohort effect is independent of the gender, female suicide mortality in the age of 10 to 24 years old appears significantly higher than that of male, and the maximum age difference appears at the age of 15 to 19 years old. The difference is getting smaller after the age of 20, and gender difference is no longer significant between age of 25 to 34. After 35-year-old, male suicide death rate starts to exceed that of female, and the difference increases until the age of 60. After 60 years old, the difference starts to decrease till age of 70 at which there is no significant gender differences. There is no significant gender-specific suicide mortality difference between years 1959 and 1973. From 1974 to 1988 female suicide mortality rate is significantly greater than male. The difference reaches the peak in1979 to 1983. After that, the difference is getting smaller, and gender difference is no longer significant between 1989 and 1993. From 1994, suicide mortality for men begins to be significantly greater than women, and the difference increases with period. This difference reaches the maximum level in 2004 to 2008.
|
30 |
Evaluación en el modelado de las respuestas de recuentoLlorens Aleixandre, Noelia 10 June 2005 (has links)
Este trabajo presenta dos líneas de investigación desarrolladas en los últimos años en torno a la etapa de evaluación en datos de recuento. Los campos de estudio han sido: los datos de recuento, concretamente el estudio del modelo de regresión de Poisson y sus extensiones y la etapa de evaluación como punto de inflexión en el proceso de modelado estadístico. Los resultados obtenidos ponen de manifiesto la importancia de aplicar el modelo adecuado a las características de los datos así como de evaluar el ajuste del mismo. Por otra parte la comparación de pruebas, índices, estimadores y modelos intentan señalar la adecuación o la preferencia de unos sobre otros en determinadas circunstancias y en función de los objetivos del investigador. / This paper presents two lines of research that have been developed in recent years on the evaluation stage in count data. The areas of study have been both count data, specifically the study of Poisson regression modelling and its extension, and the evaluation stage as a point of reflection in the statistical modelling process. The results obtained demonstrate the importance of applying appropriate models to the characteristics of data as well as evaluating their fit. On the other hand, comparisons of trials, indices, estimators and models attempt to indicate the suitability or preference for one over the others in certain circumstances and according to research objectives.
|
Page generated in 0.094 seconds