Global ETD Search

1	A robust test of homogeneity in zero-inflated models for count data Mawella, Nadeesha R. January 1900 (has links) Doctor of Philosophy / Department of Statistics / Wei-Wen Hsu / Evaluating heterogeneity in the class of zero-inflated models has attracted considerable attention in the literature, where the heterogeneity refers to the instances of zero counts generated from two different sources. The mixture probability or the so-called mixing weight in the zero-inflated model is used to measure the extent of such heterogeneity in the population. Typically, the homogeneity tests are employed to examine the mixing weight at zero. Various testing procedures for homogeneity in zero-inflated models, such as score test and Wald test, have been well discussed and established in the literature. However, it is well known that these classical tests require the correct model specification in order to provide valid statistical inferences. In practice, the testing procedure could be performed under model misspecification, which could result in biased and invalid inferences. There are two common misspecifications in zero-inflated models, which are the incorrect specification of the baseline distribution and the misspecified mean function of the baseline distribution. As an empirical evidence, intensive simulation studies revealed that the empirical sizes of the homogeneity tests for zero-inflated models might behave extremely liberal and unstable under these misspecifications for both cross-sectional and correlated count data. We propose a robust score statistic to evaluate heterogeneity in cross-sectional zero-inflated data. Technically, the test is developed based on the Poisson-Gamma mixture model which provides a more general framework to incorporate various baseline distributions without specifying their associated mean function. The testing procedure is further extended to correlated count data. We develop a robust Wald test statistic for correlated count data with the use of working independence model assumption coupled with a sandwich estimator to adjust for any misspecification of the covariance structure in the data. The empirical performances of the proposed robust score test and Wald test are evaluated in simulation studies. It is worth to mention that the proposed Wald test can be implemented easily with minimal programming efforts in a routine statistical software such as SAS. Dental caries data from the Detroit Dental Health Project (DDHP) and Girl Scout data from Scouting Nutrition and Activity Program (SNAP) are used to illustrate the proposed methodologies. Zero-inflated models Homogeneity tests Model misspecification Huber sandwich estimator
2	Statistical developments for understanding anthropogenic impacts on marine ecosystems Marshall, Laura January 2012 (has links) Over the past decades technological developments have both changed and increased human in influence on the marine environment. We now have greater potential than ever before to introduce disturbance and deplete marine resources. Two of the issues currently under public scrutiny are the exploitation of fish stocks worldwide and levels of anthropogenic noise in the marine environment. The aim of this thesis is to investigate and develop novel analyses and simulations to provide additional insight into some of the challenges facing the marine ecosystem today. These methodologies will improve the management of these risks to marine ecosystems. This thesis first addresses the issue of competition between humans and grey seals (Halichoerus grypus) for marine resources, providing compelling evidence that a substantial proportion of the sandeels consumed by grey seals in the North Sea are in fact H. lanceolatus, which is not commercially exploited, rather than the commercially important A. marinus. In addition, we present quantitative results regarding sources of bias when estimating the total biomass of sandeels consumed by grey seals. Secondly, we investigate spatially adaptive 2-dimensional smoothing to improve the prediction of both the presence and density of marine species, information that is often key in the management of marine ecosystems. Particularly, we demonstrate the benefits of such methods in the prediction of sandeel occurrence. Lastly this thesis provides a quantitative assessment of the protocols for real-time monitoring of marine mammal presence, which require that acoustic operations cease when an animal is detected within a certain distance (i.e. the "monitoring zone") of the sound source. We assess monitoring zones of different sizes with regards to their effectiveness in reducing the risks of temporary and permanent damage to the animals' hearing, and demonstrate that a monitoring zone of 2 km is generally recommendable. 577.7
3	Engraulis anchoita (Clupeiformes: Engraulidae) eggs and larvae in the Southeastern Brazilian Bight: new perspectives from a historical data set (1974 - 2010) / Engraulis anchoita (Clupeiformes: Engraulidae) ovos e larvas na Plataforma Continental Sudeste do Brasil: novas perspectivas a partir de um conjunto de dados históricos (1974 - 2010) Favero, Jana Menegassi Del 23 August 2016 (has links) The main objective of this dissertation was to evaluate long-term fluctuations in the distribution and abundance of Engraulis anchoita eggs and larvae in the Southeastern Brazilian Bight (SBB). Engraulis anchoita is a fish species that is ecologically and economically important. We analyzed samples and abiotic data from eighteen oceanographic cruises conducted during austral late spring and early summer from 1974 to 2010. Two different stocks were detected in the SBB based on egg size, with the predominant stock in the area having smaller eggs than the stock in the region further south. Using indicative kriging, we identified occasional (e.g. Florianópolis - 27°S and off Santos Bay) and avoided (e.g. off São Sebastião Island and off Cananéia-Iguape Coastal System) spawning sites. Through zero-inflated models, spatial factors (different areas and the local depth) were related to the probability of sampling false zeros and temporal and oceanographic conditions (different years and temperature) with egg and larvae abundance. We also described faster and more accurate methodology to identify E. anchoita eggs, and compared the mesh-size efficiency to sample eggs and analyzed how egg size varied seasonally. Our results may support future studies and may assist a future fishery management of E. anchoita, a species not yet exploited in the SBB. / O principal objetivo dessa tese foi analisar as flutuações de longo-prazo na distribuição e abundância de ovos e larvas de Engraulias anchoita, uma espécie de peixe de importância econômica e ecológica, na Plataforma Continental Sudeste do Brasil (PCSE). Nós analisamos amostras e dados abióticos de dezoito cruzeiros oceanográficos realizados durante o fim da primavera e o começo do verão de 1974 a 2010. Dois estoques distintos foram identificados com base no tamanho dos ovos, um predominante e com menor tamanho e outro de maior tamanho ao sul da PCSE. Através de \"krigagem\" indicativa, foram identificadas áreas de desova ocasional (como ao norte de Florianópolis e a área ao largo da baía de Santos) e áreas em que a desova foi evitada (como em frente à Ilha de São Sebastião e ao Sistema Costeiro Cananéia-Iguape). Usando modelos inflacionados de zeros, os fatores espaciais (diferentes áreas e profundidades amostradas) foram relacionados com a probabilidade de se amostrar falso zero, enquanto os fatores temporais e oceanográficos (diferentes anos e temperatura) foram relacionados com a abundância de ovos e larvas. Apresentamos também uma metodologia mais rápida e mais eficiente para identificar os ovos de E. anchoita, comparamos as amostragens realizadas com duas malhagens diferentes e analisamos variações sazonais do tamanho dos ovos capturados. Assim, nossos resultados poderão auxiliar estudos futuros e também no manejo pesqueiro da espécie em questão, ainda não explorada comercialmente na área de estudo. áreas de desova estoques pesqueiros fish stocks flutuações de longo-prazo ichthyoplankton ictioplâncton long-term fluctuations modelos inflacionados de zeros. spawning sites zero- inflated models (ZI)
4	Engraulis anchoita (Clupeiformes: Engraulidae) eggs and larvae in the Southeastern Brazilian Bight: new perspectives from a historical data set (1974 - 2010) / Engraulis anchoita (Clupeiformes: Engraulidae) ovos e larvas na Plataforma Continental Sudeste do Brasil: novas perspectivas a partir de um conjunto de dados históricos (1974 - 2010) Jana Menegassi Del Favero 23 August 2016 (has links) The main objective of this dissertation was to evaluate long-term fluctuations in the distribution and abundance of Engraulis anchoita eggs and larvae in the Southeastern Brazilian Bight (SBB). Engraulis anchoita is a fish species that is ecologically and economically important. We analyzed samples and abiotic data from eighteen oceanographic cruises conducted during austral late spring and early summer from 1974 to 2010. Two different stocks were detected in the SBB based on egg size, with the predominant stock in the area having smaller eggs than the stock in the region further south. Using indicative kriging, we identified occasional (e.g. Florianópolis - 27°S and off Santos Bay) and avoided (e.g. off São Sebastião Island and off Cananéia-Iguape Coastal System) spawning sites. Through zero-inflated models, spatial factors (different areas and the local depth) were related to the probability of sampling false zeros and temporal and oceanographic conditions (different years and temperature) with egg and larvae abundance. We also described faster and more accurate methodology to identify E. anchoita eggs, and compared the mesh-size efficiency to sample eggs and analyzed how egg size varied seasonally. Our results may support future studies and may assist a future fishery management of E. anchoita, a species not yet exploited in the SBB. / O principal objetivo dessa tese foi analisar as flutuações de longo-prazo na distribuição e abundância de ovos e larvas de Engraulias anchoita, uma espécie de peixe de importância econômica e ecológica, na Plataforma Continental Sudeste do Brasil (PCSE). Nós analisamos amostras e dados abióticos de dezoito cruzeiros oceanográficos realizados durante o fim da primavera e o começo do verão de 1974 a 2010. Dois estoques distintos foram identificados com base no tamanho dos ovos, um predominante e com menor tamanho e outro de maior tamanho ao sul da PCSE. Através de \"krigagem\" indicativa, foram identificadas áreas de desova ocasional (como ao norte de Florianópolis e a área ao largo da baía de Santos) e áreas em que a desova foi evitada (como em frente à Ilha de São Sebastião e ao Sistema Costeiro Cananéia-Iguape). Usando modelos inflacionados de zeros, os fatores espaciais (diferentes áreas e profundidades amostradas) foram relacionados com a probabilidade de se amostrar falso zero, enquanto os fatores temporais e oceanográficos (diferentes anos e temperatura) foram relacionados com a abundância de ovos e larvas. Apresentamos também uma metodologia mais rápida e mais eficiente para identificar os ovos de E. anchoita, comparamos as amostragens realizadas com duas malhagens diferentes e analisamos variações sazonais do tamanho dos ovos capturados. Assim, nossos resultados poderão auxiliar estudos futuros e também no manejo pesqueiro da espécie em questão, ainda não explorada comercialmente na área de estudo. áreas de desova estoques pesqueiros flutuações de longo-prazo ictioplâncton modelos inflacionados de zeros. fish stocks ichthyoplankton long-term fluctuations spawning sites zero- inflated models (ZI)
5	La régression de Poisson multiniveau généralisée au sein d’un devis longitudinal : un exemple de modélisation du nombre d’arrestations de membres de gangs de rue à Montréal entre 2005 et 2007 Rivest, Amélie 12 1900 (has links) Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée. / Count data have distributions with specific characteristics such as non-normality, heterogeneity of variances and a large number of zeros. It is necessary to use appropriate models to obtain unbiased results. This memoir compares four models of analysis that can be used for count data: the Poisson model, the negative binomial model, the Poisson model with zero inflation and the negative binomial model with zero inflation. For purposes of comparison, the prediction of the proportion of zero, the confirmation or refutation of the various assumptions and the prediction of average number of arrrests were used to determine the adequacy of the different models. To do this, the number of arrests of members of street gangs in the Montreal area was used for the period 2005 to 2007. The sample consisted of 470 men, aged 18 to 59 years. After the analysis, the most suitable model is the negative binomial model since it produced significant results, adapts well to the observed data and produces a zero proportion very similar to that observed. Données comptées Analyse multiniveaux longitudinale Gang de rue Loi de Poisson Loi binomiale négative Modèles modifiés en zéro Count data Longitudinal multilevel analysis Street gang Poisson law Negative binomial law Zero-inflated models
6	Modelos de regressão beta inflacionados truncados / The truncated inflated beta regression Pereira, Gustavo Henrique de Araujo 24 May 2012 (has links) Os modelos de regressão beta e beta inflacionados conseguem ajustar adequadamente grande parte das variáveis do tipo proporção. No entanto, esses modelos não são úteis quando a variável resposta não pode assumir valores no intervalo (0,c) e assume o valor c com probabilidade positiva. Variáveis relacionadas a algum tipo de pagamento limitado entre dois valores, quando estudadas em relação ao seu valor máximo, possuem essas características. Para ajustar essas variáveis, introduzimos a distribuição beta inflacionada truncada (BIZUT), que é uma mistura de uma distribuição beta com suporte no intervalo (c,1) e uma distribuição trinomial que assume os valores zero, um e c. Propomos ainda um modelo de regressão para as situações em que a variável resposta tem distribuição BIZUT. Admitimos que todos os parâmetros da distribuição podem variar em função de variáveis preditoras. Além disso, o modelo permite que o parâmetro conhecido c varie entre as unidades populacionais. Para esse modelo são desenvolvidos diversos aspectos inferenciais, são obtidos resultados para as situações em que c é variável e são conduzidos estudos de simulação de Monte Carlo. Além disso, discutimos análise de resíduos, desenvolvemos análise de influência local e realizamos uma aplicação a dados reais de cartão de crédito. / The beta regression model or the inflated beta regression model may be a reasonable choice to fit a proportion in most situations. However, they do not fit well variables that do not assume values in the open interval (0,c), 0 < c < 1 and assume the c value with positive probability. Variables related to a kind of double bounded payment amount when studied as a proportion of the maximum payment amount have this feature. For these variables, we introduce the truncated inflated beta distribution (TBEINF). This proposed distribution is a mixture of the beta distribution bounded in the open interval (c,1) and a trinomial distribution that assumes the values zero, one and c. This work also proposes a regression model where the response variable is TBEINF distributed. The model allows all the unknown parameters of the conditional distribution of the response variable to be modeled as functions of explanatory variables. Moreover, the model allows nonconstant known parameter c across population units. For this model, some inferential aspects are developed, some results when c is not constant are obtained and Monte Carlo simulation studies are performed. In addition, residual and local influence analysis are discussed and an application to credit card data is presented. beta regression cartão de crédito credit card estimador de máxima verossimilhança inflated models maximum likelihood estimator modelos inflacionados proporções proportions regressão beta truncated inflated beta regression
7	Modelos de regressão beta inflacionados truncados / The truncated inflated beta regression Gustavo Henrique de Araujo Pereira 24 May 2012 (has links) Os modelos de regressão beta e beta inflacionados conseguem ajustar adequadamente grande parte das variáveis do tipo proporção. No entanto, esses modelos não são úteis quando a variável resposta não pode assumir valores no intervalo (0,c) e assume o valor c com probabilidade positiva. Variáveis relacionadas a algum tipo de pagamento limitado entre dois valores, quando estudadas em relação ao seu valor máximo, possuem essas características. Para ajustar essas variáveis, introduzimos a distribuição beta inflacionada truncada (BIZUT), que é uma mistura de uma distribuição beta com suporte no intervalo (c,1) e uma distribuição trinomial que assume os valores zero, um e c. Propomos ainda um modelo de regressão para as situações em que a variável resposta tem distribuição BIZUT. Admitimos que todos os parâmetros da distribuição podem variar em função de variáveis preditoras. Além disso, o modelo permite que o parâmetro conhecido c varie entre as unidades populacionais. Para esse modelo são desenvolvidos diversos aspectos inferenciais, são obtidos resultados para as situações em que c é variável e são conduzidos estudos de simulação de Monte Carlo. Além disso, discutimos análise de resíduos, desenvolvemos análise de influência local e realizamos uma aplicação a dados reais de cartão de crédito. / The beta regression model or the inflated beta regression model may be a reasonable choice to fit a proportion in most situations. However, they do not fit well variables that do not assume values in the open interval (0,c), 0 < c < 1 and assume the c value with positive probability. Variables related to a kind of double bounded payment amount when studied as a proportion of the maximum payment amount have this feature. For these variables, we introduce the truncated inflated beta distribution (TBEINF). This proposed distribution is a mixture of the beta distribution bounded in the open interval (c,1) and a trinomial distribution that assumes the values zero, one and c. This work also proposes a regression model where the response variable is TBEINF distributed. The model allows all the unknown parameters of the conditional distribution of the response variable to be modeled as functions of explanatory variables. Moreover, the model allows nonconstant known parameter c across population units. For this model, some inferential aspects are developed, some results when c is not constant are obtained and Monte Carlo simulation studies are performed. In addition, residual and local influence analysis are discussed and an application to credit card data is presented. cartão de crédito estimador de máxima verossimilhança modelos inflacionados proporções regressão beta beta regression credit card inflated models maximum likelihood estimator proportions truncated inflated beta regression
8	La régression de Poisson multiniveau généralisée au sein d’un devis longitudinal : un exemple de modélisation du nombre d’arrestations de membres de gangs de rue à Montréal entre 2005 et 2007 Rivest, Amélie 12 1900 (has links) Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée. / Count data have distributions with specific characteristics such as non-normality, heterogeneity of variances and a large number of zeros. It is necessary to use appropriate models to obtain unbiased results. This memoir compares four models of analysis that can be used for count data: the Poisson model, the negative binomial model, the Poisson model with zero inflation and the negative binomial model with zero inflation. For purposes of comparison, the prediction of the proportion of zero, the confirmation or refutation of the various assumptions and the prediction of average number of arrrests were used to determine the adequacy of the different models. To do this, the number of arrests of members of street gangs in the Montreal area was used for the period 2005 to 2007. The sample consisted of 470 men, aged 18 to 59 years. After the analysis, the most suitable model is the negative binomial model since it produced significant results, adapts well to the observed data and produces a zero proportion very similar to that observed. Données comptées Analyse multiniveaux longitudinale Gang de rue Loi de Poisson Loi binomiale négative Modèles modifiés en zéro Count data Longitudinal multilevel analysis Street gang Poisson law Negative binomial law Zero-inflated models
9	Modely pro data s nadbytečnými nulami / Models for zero-inflated data Matula, Dominik January 2016 (has links) The aim of this thesis is to provide a comprehensive overview of the main approaches to modeling data loaded with redundant zeros. There are three main subclasses of zero modified models (ZMM) described here - zero inflated models (the main focus lies on models of this subclass), zero truncated models and hurdle models. Models of each subclass are defined and then a construction of maximum likelihood estimates of regression coefficients is described. ZMM models are mostly based on Poisson or negative binomial type 2 distribution (NB2). In this work, author has extended the theory to ZIM models generally based on any discrete distributions of exponential type. There is described a construction of MLE of regression coefficients of theese models, too. Just few of present works are interested in ZIM models based on negative binomial type 1 distribution (NB1). This distribution is not of exponential type therefore a common method of MLE construction in ZIM models cannot be used here. In this work provides modification of this method using quasi-likelihood method. There are two simulation studies concluding the work. 1
10	Inférence de réseaux pour modèles inflatés en zéro / Network inference for zero-inflated models Karmann, Clémence 25 November 2019 (has links) L'inférence de réseaux ou inférence de graphes a de plus en plus d'applications notamment en santé humaine et en environnement pour l'étude de données micro-biologiques et génomiques. Les réseaux constituent en effet un outil approprié pour représenter, voire étudier des relations entre des entités. De nombreuses techniques mathématiques d'estimation ont été développées notamment dans le cadre des modèles graphiques gaussiens mais aussi dans le cas de données binaires ou mixtes. Le traitement des données d'abondance (de micro-organismes comme les bactéries par exemple) est particulier pour deux raisons : d'une part elles ne reflètent pas directement la réalité car un processus de séquençage a lieu pour dupliquer les espèces et ce processus apporte de la variabilité, d'autre part une espèce peut être absente dans certains échantillons. On est alors dans le cadre de données inflatées en zéro. Beaucoup de méthodes d'inférence de réseaux existent pour les données gaussiennes, les données binaires et les données mixtes mais les modèles inflatés en zéro sont très peu étudiés alors qu'ils reflètent la structure de nombreux jeux de données de façon pertinente. L'objectif de cette thèse concerne l'inférence de réseaux pour les modèles inflatés en zéro. Dans cette thèse, on se limitera à des réseaux de dépendances conditionnelles. Le travail présenté dans cette thèse se décompose principalement en deux parties. La première concerne des méthodes d'inférence de réseaux basées sur l'estimation de voisinages par une procédure couplant des méthodes de régressions ordinales et de sélection de variables. La seconde se focalise sur l'inférence de réseaux dans un modèle où les variables sont des gaussiennes inflatées en zéro par double troncature (à droite et à gauche). / Network inference has more and more applications, particularly in human health and environment, for the study of micro-biological and genomic data. Networks are indeed an appropriate tool to represent, or even study, relationships between entities. Many mathematical estimation techniques have been developed, particularly in the context of Gaussian graphical models, but also in the case of binary or mixed data. The processing of abundance data (of microorganisms such as bacteria for example) is particular for two reasons: on the one hand they do not directly reflect reality because a sequencing process takes place to duplicate species and this process brings variability, on the other hand a species may be absent in some samples. We are then in the context of zero-inflated data. Many graph inference methods exist for Gaussian, binary and mixed data, but zero-inflated models are rarely studied, although they reflect the structure of many data sets in a relevant way. The objective of this thesis is to infer networks for zero-inflated models. In this thesis, we will restrict to conditional dependency graphs. The work presented in this thesis is divided into two main parts. The first one concerns graph inference methods based on the estimation of neighbourhoods by a procedure combining ordinal regression models and variable selection methods. The second one focuses on graph inference in a model where the variables are Gaussian zero-inflated by double truncation (right and left). Inférence de graphes Réseaux Modèles inflatés en zéro Régression Pénalisation Lasso Sélection de variables Dépendance conditionnelle Graph inference Networks Zero-inflated models Regression Lasso penalisation Variable selection Doubly truncated gaussian data Conditional dependency 519.54

Search results