Global ETD Search

11	Making Sense of the Noise: Statistical Analysis of Environmental DNA Sampling for Invasive Asian Carp Monitoring Near the Great Lakes Song, Jeffery W. 01 May 2017 (has links) Sensitive and accurate detection methods are critical for monitoring and managing the spread of aquatic invasive species, such as invasive Silver Carp (SC; Hypophthalmichthys molitrix) and Bighead Carp (BH; Hypophthalmichthys nobilis) near the Great Lakes. A new detection tool called environmental DNA (eDNA) sampling, the collection and screening of water samples for the presence of the target species’ DNA, promises improved detection sensitivity compared to conventional surveillance methods. However, the application of eDNA sampling for invasive species management has been challenging due to the potential of false positives, from detecting species’ eDNA in the absence of live organisms. In this dissertation, I study the sources of error and uncertainty in eDNA sampling and develop statistical tools to show how eDNA sampling should be utilized for monitoring and managing invasive SC and BH in the United States. In chapter 2, I investigate the environmental and hydrologic variables, e.g. reverse flow, that may be contributing to positive eDNA sampling results upstream of the electric fish dispersal barrier in the Chicago Area Waterway System (CAWS), where live SC are not expected to be present. I used a beta-binomial regression model, which showed that reverse flow volume across the barrier has a statistically significant positive relationship with the probability of SC eDNA detection upstream of the barrier from 2009 to 2012 while other covariates, such as water temperature, season, chlorophyll concentration, do not. This is a potential alternative explanation for why SC eDNA has been detected upstream of the barrier but intact SC have not. In chapter 3, I develop and parameterize a statistical model to evaluate how changes made to the US Fish and Wildlife Service (USFWS)’s eDNA sampling protocols for invasive BH and SC monitoring from 2013 to 2015 have influenced their sensitivity. The model shows that changes to the protocol have caused the sensitivity to fluctuate. Overall, when assuming that eDNA is randomly distributed, the sensitivity of the current protocol is higher for BH eDNA detection and similar for SC eDNA detection compared to the original protocol used from 2009-2012. When assuming that eDNA is clumped, the sensitivity of the current protocol is slightly higher for BH eDNA detection but worse for SC eDNA detection. In chapter 4, I apply the model developed in chapter 3 to estimate the BH and SC eDNA concentration distributions in two pools of the Illinois River where BH and SC are considered to be present, one pool where they are absent, and upstream of the electric barrier in the CAWS given eDNA sampling data and knowledge of the eDNA sampling protocol used in 2014. The results show that the estimated mean eDNA concentrations in the Illinois River are highest in the invaded pools (La Grange; Marseilles) and are lower in the uninvaded pool (Brandon Road). The estimated eDNA concentrations in the CAWS are much lower compared to the concentrations in the Marseilles pool, which indicates that the few eDNA detections in the CAWS (3% of samples positive for SC and 0.4% samples positive for BH) do not signal the presence of live BH or SC. The model shows that >50% samples positive for BH or SC eDNA are needed to infer AC presence in the CAWS, i.e., that the estimated concentrations are similar to what is found in the Marseilles pool. Finally, in chapter 5, I develop a decision tree model to evaluate the value of information that monitoring provides for making decisions about BH and SC prevention strategies near the Great Lakes. The optimal prevention strategy is dependent on prior beliefs about the expected damage of AC invasion, the probability of invasion, and whether or not BH and SC have already invaded the Great Lakes (which is informed by monitoring). Given no monitoring, the optimal strategy is to stay with the status quo of operating electric barriers in the CAWS for low probabilities of invasion and low expected invasion costs. However, if the probability of invasion is greater than 30% and the cost of invasion is greater than $100 million a year, the optimal strategy changes to installing an additional barrier in the Brandon Road pool. Greater risk-aversion (i.e., aversion to monetary losses) causes less prevention (e.g., status quo instead of additional barriers) to be preferred. Given monitoring, the model shows that monitoring provides value for making this decision, only if the monitoring tool has perfect specificity (false positive rate = 0%). Asian Carp Bayesian statistics Decision tree Detection sensitivity Environmental DNA
12	High-precision radiocarbon dating of political collapse and dynastic origins at the Maya site of Ceibal, Guatemala Inomata, Takeshi, Triadan, Daniela, MacLellan, Jessica, Burham, Melissa, Aoyama, Kazuo, Palomo, Juan Manuel, Yonenobu, Hitoshi, Pinzón, Flory, Nasu, Hiroo 07 February 2017 (has links) The lowland Maya site of Ceibal, Guatemala, had a long history of occupation, spanning from the Middle Preclassic Period through the Terminal Classic (1000 BC to AD 950). The Ceibal-Petexbatun Archaeological Project has been conducting archaeological investigations at this site since 2005 and has obtained 154 radiocarbon dates, which represent the largest collection of radiocarbon assays from a single Maya site. The Bayesian analysis of these dates, combined with a detailed study of ceramics, allowed us to develop a high-precision chronology for Ceibal. Through this chronology, we traced the trajectories of the Preclassic collapse around AD 150–300 and the Classic collapse around AD 800–950, revealing similar patterns in the two cases. Social instability started with the intensification of warfare around 75 BC and AD 735, respectively, followed by the fall of multiple centers across the Maya lowlands around AD 150 and 810. The population of Ceibal persisted for some time in both cases, but the center eventually experienced major decline around AD 300 and 900. Despite these similarities in their diachronic trajectories, the outcomes of these collapses were different, with the former associated with the development of dynasties centered on divine rulership and the latter leading to their downfalls. The Ceibal dynasty emerged during the period of low population after the Preclassic collapse, suggesting that this dynasty was placed under the influence from, or by the direct intervention of, an external power. Maya archaeology political collapse dynastic origins radiocarbon dating Bayesian statistics
13	Incorporating high-dimensional exposure modelling into studies of air pollution and health Liu, Yi January 2015 (has links) Air pollution is an important determinant of health. There is convincing, and growing, evidence linking the risk of disease, and premature death, with exposure to various pollutants including fine particulate matter and ozone. Knowledge about the health and environmental risks and their trends is important stimulus for developing environmental and public health policy. In order to perform studies into the risks of environmental hazards on human health study there is a requirement for accurate estimates of exposures that might be experienced by the populations at risk. In this thesis we develop spatio-temporal models within a Bayesian framework to obtain accurate estimates of such exposures. These models are set within a hierarchical framework in a Bayesian setting with different levels describing dependencies over space and time. Considering the complexity of hierarchical models and the large amounts of data that can arise from environmental networks mean that inference using Markov Chain Monte Carlo (MCMC) may be computational challenging in this setting. We use both MCMC and Integrated Nested Laplace Approximations (INLA) to implement spatio-temporal exposure models when dealing with high–dimensional data. We also propose an approach for utilising the results from exposure models in health models which allows them to enhance studies of the health effects of air pollution. Moreover, we investigate the possible effects of preferential sampling, where monitoring sites in environmental networks are preferentially located by the designers in order to assess whether guideline and policies are being adhered to. This means the data arising from such networks may not accurately characterise the spatial-temporal field they intend to monitor and as such will not provide accurate estimates of the exposures that are potentially experienced by populations. This has the potential to introduce bias into estimates of risk associated with exposure to air pollution and subsequent health impact analyses. Throughout the thesis, the methods developed are assessed using simulation studies and applied to real–life case studies assessing the effects of particulate matter on health in Greater London and throughout the UK. 510
14	Parcimonie dans les modèles Markoviens et application à l'analyse des séquences biologiques / Parsimonious Markov models and application to biological sequence analysis Bourguignon, Pierre Yves Vincent 15 December 2008 (has links) Les chaînes de Markov constituent une famille de modèle statistique incontournable dans de nombreuses applications, dont le spectre s'étend de la compression de texte à l'analyse des séquences biologiques. Un problème récurrent dans leur mise en oeuvre face à des données réelles est la nécessité de compromettre l'ordre du modèle, qui conditionne la complexité des interactions modélisées, avec la quantité d'information fournies par les données, dont la limitation impacte négativement la qualité des estimations menées. Les arbres de contexte permettent une granularité fine dans l'établissement de ce compromis, en permettant de recourir à des longueurs de mémoire variables selon le contexte rencontré dans la séquence. Ils ont donné lieu à des outils populaires tant pour l'indexation des textes que pour leur compression (Context Tree Maximisation – CTM - et Context Tree Weighting - CTW). Nous proposons une extension de cette classe de modèles, en introduisant les arbres de contexte parcimonieux, obtenus par fusion de noeuds issus du même parent dans l'arbre. Ces fusions permettent une augmentation radicale de la granularité de la sélection de modèle, permettant ainsi de meilleurs compromis entre complexité du modèle et qualité de l'estimation, au prix d'une extension importante de la quantité de modèles mise en concurrence. Cependant, grâce à une approche bayésienne très similaire à celle employée dans CTM et CTW, nous avons pu concevoir une méthode de sélection de modèles optimisant de manière exacte le critère bayésien de sélection de modèles tout en bénéficiant d'une programmation dynamique. Il en résulte un algorithme atteignant la borne inférieure de la complexité du problème d'optimisation, et pratiquement tractable pour des alphabets de taille inférieure à 10 symboles. Diverses démonstrations de la performance atteinte par cette procédure sont fournies en dernière partie. / Markov chains, as a universal model accounting for finite memory, discrete valued processes, are omnipresent in applied statistics. Their applications range from text compression to the analysis of biological sequences. Their practical use with finite samples, however, systematically require to draw a compromise between the memory length of the model used, which conditions the complexity of the interactions the model may capture, and the amount of information carried by the data, whose limitation negatively impacts the quality of estimation. Context trees, as an extension of the model class of Markov chains, provide the modeller with a finer granularity in this model selection process, by allowing the memory length to vary across contexts. Several popular modelling methods are based on this class of models, in fields such as text indexation of text compression (Context Tree Maximization and Context Tree Weighting). We propose an extension of the models class of context trees, the Parcimonious context trees, which further allow the fusion of sibling nodes in the context tree. They provide the modeller with a yet finer granularity to perform the model selection task, at the cost of an increased computational cost for performing it. Thanks to a bayesian approach of this problem borrowed from compression techniques, we succeeded at desiging an algorithm that exactly optimizes the bayesian criterion, while it benefits from a dynamic programming scheme ensuring the minimisation of the computational complexity of the model selection task. This algorithm is able to perform in reasonable space and time on alphabets up to size 10, and has been applied on diverse datasets to establish the good performances achieved by this approach. Sélection de modèle Markov chains Model selection Bayesian statistics
15	Dynamic bayesian statistical models for the estimation of the origin-destination matrix / Dynamic bayesian statistical models for the estimation of the origin-destination matrix / Dynamic bayesian statistical models for the estimation of the origin-destination matrix Anselmo Ramalho Pitombeira Neto 29 June 2015 (has links) In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices. / In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices. / In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices. Teoria Bayesiana de decisÃo OD matrix Estimation Bayesian statistics ENGENHARIA DE TRANSPORTES
16	Bayesian Mixture Modeling Approaches for Intermediate Variables and Causal Inference Schwartz, Scott Lee January 2010 (has links) <p>This thesis examines causal inference related topics involving intermediate variables, and uses Bayesian methodologies to advance analysis capabilities in these areas. First, joint modeling of outcome variables with intermediate variables is considered in the context of birthweight and censored gestational age analyses. The proposed methodology provides improved inference capabilities for birthweight and gestational age, avoids post-treatment selection bias problems associated with conditional on gestational age analyses, and appropriately assesses the uncertainty associated with censored gestational age. Second, principal stratification methodology for settings where causal inference analysis requires appropriate adjustment of intermediate variables is extended to observational settings with binary treatments and binary intermediate variables. This is done by uncovering the structural pathways of unmeasured confounding affecting principal stratification analysis and directly incorporating them into a model based sensitivity analysis methodology. Demonstration focuses on a study of the efficacy of influenza vaccination in elderly populations. Third, flexibility, interpretability, and capability of principal stratification analyses for continuous intermediate variables are improved by replacing the current fully parametric methodologies with semiparametric Bayesian alternatives. This presentation is one of the first uses of nonparametric techniques in causal inference analysis,</p><p>and opens a connection between these two fields. Demonstration focuses on two studies, one involving a cholesterol reduction drug, and one examine the effect of physical activity on cardiovascular disease as it relates to body mass index.</p> / Dissertation Statistics Bayesian statistics Causal inference Intermediate variables Principal stratification
17	Learning in integrated optimization models of climate change and economy Shayegh, Soheil 21 September 2015 (has links) Integrated assessment models are powerful tools for providing insight into the interaction between the economy and climate change over a long time horizon. However, knowledge of climate parameters and their behavior under extreme circumstances of global warming is still an active area of research. In this thesis we incorporated the uncertainty in one of the key parameters of climate change, climate sensitivity, into an integrated assessment model and showed how this affects the choice of optimal policies and actions. We constructed a new, multi-step-ahead approximate dynamic programing (ADP) algorithm to study the effects of the stochastic nature of climate parameters. We considered the effect of stochastic extreme events in climate change (tipping points) with large economic loss. The risk of an extreme event drives tougher GHG reduction actions in the near term. On the other hand, the optimal policies in post-tipping point stages are similar to or below the deterministic optimal policies. Once the tipping point occurs, the ensuing optimal actions tend toward more moderate policies. Previous studies have shown the impacts of economic and climate shocks on the optimal abatement policies but did not address the correlation among uncertain parameters. With uncertain climate sensitivity, the risk of extreme events is linked to the variations in climate sensitivity distribution. We developed a novel Bayesian framework to endogenously interrelate the two stochastic parameters. The results in this case are clustered around the pre-tipping point optimal policies of the deterministic climate sensitivity model. Tougher actions are more frequent as there is more uncertainty in likelihood of extreme events in the near future. This affects the optimal policies in post-tipping point states as well, as they tend to utilize more conservative actions. As we proceed in time toward the future, the (binary) status of the climate will be observed and the prior distribution of the climate sensitivity parameter will be updated. The cost and climate tradeoffs of new technologies are key to decisions in climate policy. Here we focus on electricity generation industry and contrast the extremes in electricity generation choices: making choices on new generation facilities based on cost only and in the absence of any climate policy, versus making choices based on climate impacts only regardless of the generation costs. Taking the expected drop in cost as experience grows into account when selecting the portfolio of generation, on a pure cost-minimization basis, renewable technologies displace coal and natural gas within two decades even when climate damage is not considered in the choice of technologies. This is the natural gas as a bridge fuel scenario, and technology advancement to bring down the cost of renewables requires some commitment to renewables generation in the near term. Adopting the objective of minimizing climate damage, essentially moving immediately to low greenhouse gas generation technologies, results in faster cost reduction of new technologies and may result in different technologies becoming dominant in global electricity generation. Thus today’s choices for new electricity generation by individual countries and utilities have implications not only for their direct costs and the global climate, but also for the future costs and availability of emerging electricity generation options. Integrated assessment model Climate change Stochastic optimization Bayesian statistics
18	Statistical Characterization of Protein Ensembles Fisher, Charles January 2012 (has links) Conformational ensembles are models of proteins that capture variations in conformation that result from thermal fluctuations. Ensemble based models are important tools for studying Intrinsically Disordered Proteins (IDPs), which adopt a heterogeneous set of conformations in solution. In order to construct an ensemble that provides an accurate model for a protein, one must identify a set of conformations, and their relative stabilities, that agree with experimental data. Inferring the characteristics of an ensemble for an IDP is a problem plagued by degeneracy; that is, one can typically construct many different ensembles that agree with any given set of experimental measurements. In light of this problem, this thesis will introduce three tools for characterizing ensembles: (1) an algorithm for modeling ensembles that provides estimates for the uncertainty in the resulting model, (2) a fast algorithm for constructing ensembles for large or complex IDPs and (3) a measure of the degree of disorder in an ensemble. Our hypothesis is that a protein can be accurately modeled as an ensemble only when the degeneracy of the model is appropriately accounted for. We demonstrate these methods by constructing ensembles for K18 tau protein, $\alpha$-synuclein and amyloid beta - IDPs that are implicated in the pathogenesis of Alzheimer's and Parkinson's diseases. biophysics Alzheimer's disease Bayesian statistics conformational ensemble intrinsically disordered proteins
19	Accurate Surveillance of Diabetes Mellitus in Nova Scotia within the General Population and the Five First Nations of Cape Breton Clark, Roderick 03 October 2011 (has links) Administrative data is one of the most commonly used data sources for diagnosed diabetes surveillance within Canada. Despite their widespread use, administrative case definitions have not been validated in many minority populations on which they are commonly used. Additionally, previous validation work has not evaluated the effect of conditional covariance between data sources, which has been widely shown to significantly bias parameter (sensitivity, specificity, and prevalence) estimation. Using administrative data and data sources which contained gold standard cases of diabetes, this thesis examined (1) the validity of commonly used administrative case definitions for identifying cases of diagnosed diabetes within an Aboriginal population at the sub-provincial level, and (2) the effect of conditional covariance on parameter estimates of an administrative case definition used to identify cases of diagnoses diabetes within the general population of Nova Scotia. We found significant differences in the sensitivity and specificity of a commonly used administrative case when applied to an Aboriginal population at the sub-provincial level. For the general population of Nova Scotia, we found that including a parameter to estimate conditional covariance between data sources resulted in significant variation in sensitivity, specificity, and prevalence estimates as compared to a study which did not consider this parameter. We conclude that work must continue to validate administrative case definitions both within minority populations and for the general population to enhance diabetes surveillance systems in Canada. / Validation study for administrative case definitions to identify cases of diagnosed diabetes in Canada Diabetes Mellitus Prevalence Sensitivity Specificity Bayesian Statistics Validation study
20	A Bayesian/MCMC Approach to Galaxy Modelling: NGC 6503 PUGLIELLI, DAVID 11 January 2010 (has links) We use Bayesian statistics and Markov chain Monte Carlo (MCMC) techniques to construct dynamical models for the spiral galaxy NGC 6503. The constraints include surface brightness profiles which display a Freeman Type II structure; HI and ionized gas rotation curves; the stellar rotation, which is nearly coincident with the ionized gas curve; and the line of sight stellar dispersion, which displays a $\sigma-$drop at the centre. The galaxy models consist of a S\'rsic bulge, an exponential disc with an optional inner truncation and a cosmologically motivated dark halo. The Bayesian/MCMC technique yields the joint posterior probability distribution function for the input parameters, allowing constraints on model parameters such as the halo cusp strength, structural parameters for the disc and bulge, and mass-to-light ratios. We examine several interpretations of the data: the Type II surface brightness profile may be due to dust extinction, to an inner truncated disc or to a ring of bright stars; and we test separate fits to the gas and stellar rotation curves to determine if the gas traces the gravitational potential. We test each of these scenarios for bar stability, ruling out dust extinction. We also find that the gas cannot trace the gravitational potential, as the asymmetric drift is then too large to reproduce the stellar rotation. The disc is well fit by an inner-truncated profile, but the possibility of ring formation by a bar to reproduce the Type II profile is also a realistic model. We further find that the halo must have a cuspy profile with $\gamma \gtrsim 1$; the bulge has a lower $M/L$ than the disc, suggesting a star forming component in the centre of the galaxy; and the bulge, as expected for this late type galaxy, has a low S\'{e}rsic index with $n_b\sim1-2$, suggesting a formation history dominated by secular evolution. / Thesis (Ph.D, Physics, Engineering Physics and Astronomy) -- Queen's University, 2010-01-10 00:11:41.946 Galaxy Dynamics Bayesian Statistics Galaxy Evolution N-body Simulations

Search results