Global ETD Search

21	Three Essays In Applied Microeconomics Carrion-Flores, Carmen Eugenia January 2007 (has links) This dissertation applies economic theories and econometric methods to analyze the interactions between government policies and economic agents in two important and current topics: the protection of the environment and illegal migration.Following the introduction, the second chapter studies the empirical strength of bi-directional linkages between environmental standards and performance, on the one hand, and environmental innovation, on the other. Our empirical results reveal that environmental R&D both spurs the tightening of government environmental standards and is spurred by the anticipation of such tightening, suggesting that U.S. environmental policy (at least in the context of the manufacturing industries that we study) has been responsive to innovation and effective in inducing innovation.The third chapter studies whether a voluntary reduction pollution programs can prompt firms to develop new environmental technologies that yield future emission reduction benefits. Conversely, a VRP may induce a participating firm to divert resources from environmental research to environmental monitoring and compliance activities that yield short-term benefits in reduced emissions. We find evidence that higher rates of program participation are associated with significant reductions in the number of successful environmental patent applications four to six years after the program ended.The fourth chapter examines the migration duration of Mexican immigrants in the U.S. using data from the Mexican Migration Project (MMP). In the past, temporary migrations were frequent, and often the rule rather than the exception in the case of Mexican immigrants. This pattern may be changing due to the tightening of the border between Mexico and the Unites States. Moreover, this paper examines whether migration experience, demographic characteristics, economic conditions or social networks drive the time Mexican immigrants to reside illegally in the United States. The empirical analysis shows that the migration duration increases as the U.S. expected real wage increases. Tighter U.S. migration policies have an ambiguous effect on the migration duration while longer distances decrease the hazard of return to their state of origin.In the final chapter of this dissertation, the general findings are concluded and some future avenues of research are discussed. Environmental Innovation Voluntary Pollution Reduction Programs Illegal Migration Border Enforcement Dynamic and Count Data Models Duration Models
22	Bayesian analysis for time series of count data 2014 July 1900 (has links) Time series involving count data are present in a wide variety of applications. In many applications, the observed counts are usually small and dependent. Failure to take these facts into account can lead to misleading inferences and may detect false relationships. To tackle such issues, a Poisson parameter-driven model is assumed for the time series at hand. This model can account for the time dependence between observations through introducing an autoregressive latent process. In this thesis, we consider Bayesian approaches for estimating the Poisson parameter-driven model. The main challenge is that the likelihood function for the observed counts involves a high dimensional integral after integrating out the latent variables. The main contributions of this thesis are threefold. First, I develop a new single-move (SM) Markov chain Monte Carlo (MCMC) method to sample the latent variables one by one. Second, I adopt the idea of the particle Gibbs sampler (PGS) method \citep{andrieu} into our model setting and compare its performance with the SM method. Third, I consider Bayesian composite likelihood methods and compare three different adjustment methods with the unadjusted method and the SM method. The comparisons provide a practical guide to what method to use. We conduct simulation studies to compare the latter two methods with the SM method. We conclude that the SM method outperforms the PGS method for small sample size, while they perform almost the same for large sample size. However, the SM method is much faster than the PGS method. The adjusted Bayesian composite methods provide closer results to the SM than the unadjusted one. The PGS and the selected adjustment method from simulation studies are compared with the SM method via a real data example. Similar results are obtained: first, the PGS method provides results very close to those of the SM method. Second, the adjusted composite likelihood methods provide closer results to the SM than the unadjusted one. Time series count data Poisson parameter-driven model composite likelihood particle Gibbs sampler car crashes
23	Optimal (Adaptive) Design and Estimation Performance in Pharmacometric Modelling Maloney, Alan January 2012 (has links) The pharmaceutical industry now recognises the importance of the newly defined discipline of pharmacometrics. Pharmacometrics uses mathematical models to describe and then predict the performance of new drugs in clinical development. To ensure these models are useful, the clinical studies need to be designed such that the data generated allows the model predictions to be sufficiently accurate and precise. The capability of the available software to reliably estimate the model parameters must also be well understood. This thesis investigated two important areas in pharmacometrics: optimal design and software estimation performance. The three optimal design papers progressed significant areas of optimal design research, especially relevant to phase II dose response designs. The use of exposure, rather than dose, was investigated within an optimal design framework. In addition to using both optimal design and clinical trial simulation, this work employed a wide range of metrics for assessing design performance, and was illustrative of how optimal designs for exposure response models may yield dose selections quite different to those based on standard dose response models. The investigation of the optimal designs for Poisson dose response models demonstrated a novel mathematical approach to the necessary matrix calculations for non-linear mixed effects models. Finally, the enormous potential of using optimal adaptive designs over fixed optimal designs was demonstrated. The results showed how the adaptive designs were robust to initial parameter misspecification, with the capability to "learn" the true dose response using the accruing subject data. The two estimation performance papers investigated the relative performance of a number of different algorithms and software programs for two complex pharmacometric models. In conclusion these papers, in combination, cover a wide spectrum of study designs for non-linear dose/exposure response models, covering: normal/non-normal data, fixed/mixed effect models, single/multiple design criteria metrics, optimal design/clinical trial simulation, and adaptive/fixed designs. Phase II dose response optimal design adaptive design exposure response count data
24	Inferencia e diagnostico em modelos para dados de contagem com excesso de zeros / Inference and diagnostic in zero-inflated count data models Monzón Montoya, Alejandro Guillermo 13 August 2018 (has links) Orientador: Victor Hugo Lachos Davila / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-13T06:59:43Z (GMT). No. of bitstreams: 1 MonzonMontoya_AlejandroGuillermo_M.pdf: 1229957 bytes, checksum: a4ad33aa2fe94f8744977822a1fd1362 (MD5) Previous issue date: 2009 / Resumo: Em análise de dados, muitas vezes encontramos dados de contagem onde a quantidade de zeros excede aquela esperada sob uma determinada distribuição, tal que não é possível fazer uso dos modelos de regressão usuais. Além disso, o excesso de zeros pode fazer com que exista sobredispersão nos dados. Neste trabalho são apresentados quatro tipos de modelos para dados de contagem inflacionados de zeros: o modelo Binomial (ZIB), o modelo Poisson (ZIP), o modelo binomial negativa (ZINB) e o modelo beta-binomial (ZIBB). Usa-se o algoritmo EM para obter estimativas de máxima verossimilhança dos parâmetros do modelo e usando a função de log-verossimilhança dos dados completos obtemos medidas de influência local baseadas na metodologia proposta por Zhu e Lee (2001) e Lee e Xu (2004). Também propomos como construir resíduos para os modelos ZIB e ZIP. Finalmente, as metodologias descritas são ilustradas pela análise de dados reais / Abstract: When analyzing count data sometimes a high frequency of extra zeros is observed and the usual regression analysis is not applicable. This feature may be accounted for by over-dispersion in the data set. In this work, four types of models for zero inflated count data are presented: viz., the zero-inflated Binomial (ZIB), the zero-inflated Poisson (ZIP), the zero-inflated Negative Binomial (ZINB) and the zero-inflated Beta-Binomial (ZIBB) regression models. We use the EM algorithm to obtain maximum likelihood estimates of the parameter of the proposed models and by using the complete data likelihood function we develop local influence measures following the approach of Zhu and Lee (2001) and Lee and Xu (2004). We also discuss the calculation of residuals for the ZIB and ZIP regression models with the aim of identifying atypical observations and/or model misspecification. Finally, results obtained for two real data sets are reported, illustrating the usefulness of the proposed methodology / Mestrado / Mestre em Estatística Dados de contagem Análise de regressão Influência local (Estatística) Resíduos Count data Regression analysis Local influence (Statistics) Residues
25	Equações de estimação generalizadas com resposta binomial negativa: modelando dados correlacionados de contagem com sobredispersão / Generalized estimating equations with negative binomial responses: modeling correlated count data with overdispersion Clarissa Cardoso Oesselmann 12 December 2016 (has links) Uma suposição muito comum na análise de modelos de regressão é a de respostas independentes. No entanto, quando trabalhamos com dados longitudinais ou agrupados essa suposição pode não fazer sentido. Para resolver esse problema existem diversas metodologias, e talvez a mais conhecida, no contexto não Gaussiano, é a metodologia de Equações de Estimação Generalizadas (EEGs), que possui similaridades com os Modelos Lineares Generalizados (MLGs). Essas similaridades envolvem a classificação do modelo em torno de distribuições da família exponencial e da especificação de uma função de variância. A única diferença é que nessa função também é inserida uma matriz trabalho que inclui a parametrização da estrutura de correlação dentro das unidades experimentais. O principal objetivo desta dissertação é estudar como esses modelos se comportam em uma situação específica, de dados de contagem com sobredispersão. Quando trabalhamos com MLGs esse problema é resolvido através do ajuste de um modelo com resposta binomial negativa (BN), e a ideia é a mesma para os modelos envolvendo EEGs. Essa dissertação visa rever as teorias existentes em EEGs no geral e para o caso específico quando a resposta marginal é BN, e além disso mostrar como essa metodologia se aplica na prática, com três exemplos diferentes de dados correlacionados com respostas de contagem. / An assumption that is common in the analysis of regression models is that of independent responses. However, when working with longitudinal or grouped data this assumption may not have sense. To solve this problem there are several methods, but perhaps the best known, in the non Gaussian context, is the one based on Generalized Estimating Equations (GEE), which has similarities with Generalized Linear Models (GLM). Such similarities involve the classification of the model around the exponential family and the specification of a variance function. The only diference is that in this function is also inserted a working correlation matrix concerning the correlations within the experimental units. The main objective of this dissertation is to study how these models behave in a specific situation, which is the one on count data with overdispersion. When we work with GLM this kind of problem is solved by setting a model with a negative binomial response (NB), and the idea is the same for the GEE methodology. This dissertation aims to review in general the GEE methodology and for the specific case when the responses follow marginal negative binomial distributions. In addition, we show how this methodology is applied in practice, with three examples of correlated data with count responses. Binomial negativa Dados de contagem Equações de estimação generalizadas Sobredispersão Count Data Generalized Estimating Equations Negative Binomial Overdispersion
26	Analyse statistique de données biologiques à haut débit / Statistical analysis of high-throughput biological data Aubert, Julie 07 February 2017 (has links) Les progrès technologiques des vingt dernières années ont permis l’avènement d'une biologie à haut-débit reposant sur l'obtention de données à grande échelle de façon automatique. Les statisticiens ont un rôle important à jouer dans la modélisation et l'analyse de ces données nombreuses, bruitées, parfois hétérogènes et recueillies à différentes échelles. Ce rôle peut être de plusieurs natures. Le statisticien peut proposer de nouveaux concepts ou méthodes inspirées par les questions posées par cette biologie. Il peut proposer une modélisation fine des phénomènes observés à l'aide de ces technologies. Et lorsque des méthodes existent et nécessitent seulement une adaptation, le rôle du statisticien peut être celui d'un expert, qui connaît les méthodes, leurs limites et avantages. Le travail présenté dans cette thèse se situe à l'interface entre mathématiques appliquées et biologie, et relève plutôt des deuxième et troisième type de rôles mentionnés.Dans une première partie, j’introduis différentes méthodes développées pour l'analyse de données biologiques à haut débit, basées sur des modèles à variables latentes. Ces modèles permettent d'expliquer un phénomène observé à l'aide de variables cachées. Le modèle à variables latentes le plus simple est le modèle de mélange. Les deux premières méthodes présentées en sont des exemples: la première dans un contexte de tests multiples et la deuxième dans le cadre de la définition d'un seuil d'hybridation pour des données issues de puces à ADN. Je présente également un modèle de chaînes de Markov cachées couplées pour la détection de variations du nombre de copies en génomique prenant en compte de la dépendance entre les individus, due par exemple à une proximité génétique. Pour ce modèle, nous proposons une inférence approchée fondée sur une approximation variationnelle, l'inférence exacte ne pouvant pas être envisagée dès lors que le nombre d'individus augmente. Nous définissons également un modèle à blocs latents modélisant une structure sous-jacente par bloc de lignes et colonnes adaptées à des données de comptage issue de l'écologie microbienne. Les données issues de méta-codebarres ou de métagénomique correspondent à l'abondance de chaque unité d'intérêt (par exemple micro-organisme) d'une communauté microbienne au sein d'environnement (rhizosphère de plante, tube digestif humain, océan par exemple). Ces données ont la particularité de présenter une dispersion plus forte qu'attendue sous les modèles les plus classiques (on parle de sur-dispersion). La classification croisée est une façon d'étudier les interactions entre la structure des communautés microbiennes et les échantillons biologiques dont elles sont issues. Nous avons proposé de modéliser ce phénomène à l'aide d'une distribution Poisson-Gamma et développé une autre approximation variationnelle pour ce modèle particulier ainsi qu'un critère de sélection de modèle. La flexibilité et la performance du modèle sont illustrées sur trois jeux de données réelles.Une deuxième partie est consacrée à des travaux dédiés à l'analyse de données de transcriptomique issues des technologies de puce à ADN et de séquençage de l’ARN. La première section concerne la normalisation des données (détection et correction de biais techniques) et présente deux nouvelles méthodes que j’ai proposées avec mes co-auteurs et une comparaison de méthodes à laquelle j’ai contribuée. La deuxième section dédiée à la planification expérimentale présente une méthode pour analyser les dispositifs dit en dye-switch.Dans une dernière partie, je montre à travers deux exemples de collaboration, issues respectivement d'une analyse de gènes différentiellement exprimés à partir de données issues de puces à ADN, et d'une analyse du traductome chez l'oursin à partir de données de séquençage de l'ARN, la façon dont les compétences statistiques sont mobilisées et la plus-value apportée par les statistiques aux projets de génomique. / The technological progress of the last twenty years allowed the emergence of an high-throuput biology basing on large-scale data obtained in a automatic way. The statisticians have an important role to be played in the modelling and the analysis of these numerous, noisy, sometimes heterogeneous and collected at various scales. This role can be from several nature. The statistician can propose new concepts, or new methods inspired by questions asked by this biology. He can propose a fine modelling of the phenomena observed by means of these technologies. And when methods exist and require only an adaptation, the role of the statistician can be the one of an expert, who knows the methods, their limits and the advantages.In a first part, I introduce different methods developed with my co-authors for the analysis of high-throughput biological data, based on latent variables models. These models make it possible to explain a observed phenomenon using hidden or latent variables. The simplest latent variable model is the mixture model. The first two presented methods constitutes two examples: the first in a context of multiple tests and the second in the framework of the definition of a hybridization threshold for data derived from microarrays. I also present a model of coupled hidden Markov chains for the detection of variations in the number of copies in genomics taking into account the dependence between individuals, due for example to a genetic proximity. For this model we propose an approximate inference based on a variational approximation, the exact inference not being able to be considered as the number of individuals increases. We also define a latent-block model modeling an underlying structure per block of rows and columns adapted to count data from microbial ecology. Metabarcoding and metagenomic data correspond to the abundance of each microorganism in a microbial community within the environment (plant rhizosphere, human digestive tract, ocean, for example). These data have the particularity of presenting a dispersion stronger than expected under the most conventional models (we speak of over-dispersion). Biclustering is a way to study the interactions between the structure of microbial communities and the biological samples from which they are derived. We proposed to model this phenomenon using a Poisson-Gamma distribution and developed another variational approximation for this particular latent block model as well as a model selection criterion. The model's flexibility and performance are illustrated on three real datasets.A second part is devoted to work dedicated to the analysis of transcriptomic data derived from DNA microarrays and RNA sequencing. The first section is devoted to the normalization of data (detection and correction of technical biases) and presents two new methods that I proposed with my co-authors and a comparison of methods to which I contributed. The second section devoted to experimental design presents a method for analyzing so-called dye-switch design.In the last part, I present two examples of collaboration, derived respectively from an analysis of genes differentially expressed from microrrays data, and an analysis of translatome in sea urchins from RNA-sequencing data, how statistical skills are mobilized, and the added value that statistics bring to genomics projects. Modèles de mélange Données de comptage Normalisation Analyse différentielle Métagénomique Mixture models Count data Normalization Differential analysis Metagenomics
27	Modely s Touchardovm rozdÄlenm / Models with Touchard Distribution Ibukun, Michael Abimbola January 2021 (has links) In 2018, Raul Matsushita, Donald Pianto, Bernardo B. De Andrade, Andre Can§ado & Sergio Da Silva published a paper titled âTouchard distributionâ, which presented a model that is a two-parameter extension of the Poisson distribution. This model has its normalizing constant related to the Touchard polynomials, hence the name of this model. This diploma thesis is concerned with the properties of the Touchard distribution for which delta is known. Two asymptotic tests based on two different statistics were carried out for comparison in a Touchard model with two independent samples, supported by simulations in R.
28	The Multivariate Generalized Linear Mixed Model for a Joint Modeling Approach for Analysis of Tumor Multiplicity Data: Development and Comparison of Methods SALISBURY, SHEILIA 23 April 2008 (has links) No description available. Count Data Correlated Negative Binomial Monte Carlo GLIMMIX GENMOD Lognormal-Poisson Cancer Simulation
29	Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Landgraf, Andrew J. 15 October 2015 (has links) No description available. Statistics Binary data Count data Dimensionality reduction Exponential family Logistic PCA Principal component analysis
30	Stochastic models for MRI lesion count sequences from patients with relapsing remitting multiple sclerosis Li, Xiaobai 14 July 2006 (has links) No description available. Statistics lesion relapsing remitting multiple sclerosis longitudinal count data queueing theory hidden Markov models

Search results