• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 118
  • 28
  • 23
  • 19
  • 3
  • 2
  • 2
  • Tagged with
  • 257
  • 257
  • 42
  • 38
  • 32
  • 31
  • 31
  • 30
  • 30
  • 27
  • 26
  • 24
  • 24
  • 24
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Bayesian inference for models with infinite-dimensionally generated intractable components

Villalobos, Isadora Antoniano January 2012 (has links)
No description available.
12

Bayesian matrix factorisation : inference, priors, and data integration

Brouwer, Thomas Alexander January 2017 (has links)
In recent years the amount of biological data has increased exponentially. Most of these data can be represented as matrices relating two different entity types, such as drug-target interactions (relating drugs to protein targets), gene expression profiles (relating drugs or cell lines to genes), and drug sensitivity values (relating drugs to cell lines). Not only the size of these datasets is increasing, but also the number of different entity types that they relate. Furthermore, not all values in these datasets are typically observed, and some are very sparse. Matrix factorisation is a popular group of methods that can be used to analyse these matrices. The idea is that each matrix can be decomposed into two or more smaller matrices, such that their product approximates the original one. This factorisation of the data reveals patterns in the matrix, and gives us a lower-dimensional representation. Not only can we use this technique to identify clusters and other biological signals, we can also predict the unobserved entries, allowing us to prune biological experiments. In this thesis we introduce and explore several Bayesian matrix factorisation models, focusing on how to best use them for predicting these missing values in biological datasets. Our main hypothesis is that matrix factorisation methods, and in particular Bayesian variants, are an extremely powerful paradigm for predicting values in biological datasets, as well as other applications, and especially for sparse and noisy data. We demonstrate the competitiveness of these approaches compared to other state-of-the-art methods, and explore the conditions under which they perform the best. We consider several aspects of the Bayesian approach to matrix factorisation. Firstly, the effect of inference approaches that are used to find the factorisation on predictive performance. Secondly, we identify different likelihood and Bayesian prior choices that we can use for these models, and explore when they are most appropriate. Finally, we introduce a Bayesian matrix factorisation model that can be used to integrate multiple biological datasets, and hence improve predictions. This model hybridly combines different matrix factorisation models and Bayesian priors. Through these models and experiments we support our hypothesis and provide novel insights into the best ways to use Bayesian matrix factorisation methods for predictive purposes.
13

Methods for determining the genetic causes of rare diseases

Greene, Daniel John January 2018 (has links)
Thanks to the affordability of DNA sequencing, hundreds of thousands of individuals with rare disorders are undergoing whole-genome sequencing in an effort to reveal novel disease aetiologies, increase our understanding of biological processes and improve patient care. However, the power to discover the genetic causes of many unexplained rare diseases is hindered by a paucity of cases with a shared molecular aetiology. This thesis presents research into statistical and computational methods for determining the genetic causes of rare diseases. Methods described herein treat important aspects of the nature of rare diseases, including genetic and phenotypic heterogeneity, phenotypes involving multiple organ systems, Mendelian modes of inheritance and the incorporation of complex prior information such as model organism phenotypes and evolutionary conservation. The complex nature of rare disease phenotypes and the need to aggregate patient data across many centres has led to the adoption of the Human Phenotype Ontology (HPO) as a means of coding patient phenotypes. The HPO provides a standardised vocabulary and captures relationships between disease features. I developed a suite of software packages dubbed 'ontologyX' in order to simplify analysis and visualisation of such ontologically encoded data, and enable them to be incorporated into complex analysis methods. An important aspect of the analysis of ontological data is quantifying the semantic similarity between ontologically annotated entities, which is implemented in the ontologyX software. We employed this functionality in a phenotypic similarity regression framework, 'SimReg', which models the relationship between ontologically encoded patient phenotypes of individuals and rare variation in a given genomic locus. It does so by evaluating support for a model under which the probability that a person carries rare alleles in a locus depends on the similarity between the person's ontologically encoded phenotype and a latent characteristic phenotype which can be inferred from data. A probability of association is computed by comparison of the two models, allowing prioritisation of candidate loci for involvement in disease with respect to a heterogeneous collection of disease phenotypes. SimReg includes a sophisticated treatment of HPO-coded phenotypic data but dichotomises the genetic data at a locus. Therefore, we developed an additional method, 'BeviMed', standing for Bayesian Evaluation of Variant Involvement in Mendelian Disease, which evaluates the evidence of association between allele configurations across rare variants within a genomic locus and a case/control label. It is capable of inferring the probability of association, and conditional on association, the probability of each mode of inheritance and probability of involvement of each variant. Inference is performed through a Bayesian comparison of multiple models: under a baseline model disease risk is independent of allele configuration at the given rare variant sites and under an alternate model disease risk depends on the configuration of alleles, a latent partition of variants into pathogenic and non-pathogenic groups and a mode of inheritance. The method can be used to analyse a dataset comprising thousands of individuals genotyped at hundreds of rare variant sites in a fraction of a second, making it much faster than competing methods and facilitating genome-wide application.
14

HaMMLeT: An Infinite Hidden Markov Model with Local Transitions

Dawson, Colin Reimer, Dawson, Colin Reimer January 2017 (has links)
In classical mixture modeling, each data point is modeled as arising i.i.d. (typically) from a weighted sum of probability distributions. When data arises from different sources that may not give rise to the same mixture distribution, a hierarchical model can allow the source contexts (e.g., documents, sub-populations) to share components while assigning different weights across them (while perhaps coupling the weights to "borrow strength" across contexts). The Dirichlet Process (DP) Mixture Model (e.g., Rasmussen (2000)) is a Bayesian approach to mixture modeling which models the data as arising from a countably infinite number of components: the Dirichlet Process provides a prior on the mixture weights that guards against overfitting. The Hierarchical Dirichlet Process (HDP) Mixture Model (Teh et al., 2006) employs a separate DP Mixture Model for each context, but couples the weights across contexts. This coupling is critical to ensure that mixture components are reused across contexts. An important application of HDPs is to time series models, in particular Hidden Markov Models (HMMs), where the HDP can be used as a prior on a doubly infinite transition matrix for the latent Markov chain, giving rise to the HDP-HMM (first developed, as the "Infinite HMM", by Beal et al. (2001), and subsequently shown to be a case of an HDP by Teh et al. (2006)). There, the hierarchy is over rows of the transition matrix, and the distributions across rows are coupled through a top-level Dirichlet Process. In the first part of the dissertation, I present a formal overview of Mixture Models and Hidden Markov Models. I then turn to a discussion of Dirichlet Processes and their various representations, as well as associated schemes for tackling the problem of doing approximate inference over an infinitely flexible model with finite computa- tional resources. I will then turn to the Hierarchical Dirichlet Process (HDP) and its application to an infinite state Hidden Markov Model, the HDP-HMM. These models have been widely adopted in Bayesian statistics and machine learning. However, a limitation of the vanilla HDP is that it offers no mechanism to model correlations between mixture components across contexts. This is limiting in many applications, including topic modeling, where we expect certain components to occur or not occur together. In the HMM setting, we might expect certain states to exhibit similar incoming and outgoing transition probabilities; that is, for certain rows and columns of the transition matrix to be correlated. In particular, we might expect pairs of states that are "similar" in some way to transition frequently to each other. The HDP-HMM offers no mechanism to model this similarity structure. The central contribution of the dissertation is a novel generalization of the HDP- HMM which I call the Hierarchical Dirichlet Process Hidden Markov Model With Local Transitions (HDP-HMM-LT, or HaMMLeT for short), which allows for correlations between rows and columns of the transition matrix by assigning each state a location in a latent similarity space and promoting transitions between states that are near each other. I present a Gibbs sampling scheme for inference in this model, employing auxiliary variables to simplify the relevant conditional distributions, which have a natural interpretation after re-casting the discrete time Markov chain as a continuous time Markov Jump Process where holding times are integrated out, and where some jump attempts "fail". I refer to this novel representation as the Markov Process With Failed Jumps. I test this model on several synthetic and real data sets, showing that for data where transitions between similar states are more common, the HaMMLeT model more effectively finds the latent time series structure underlying the observations.
15

Uncertainty in inverse elasticity problems

Gendin, Daniel I. 27 September 2021 (has links)
The non-invasive differential diagnosis of breast masses through ultrasound imaging motivates the following class of elastic inverse problems: Given one or more measurements of the displacement field within an elastic material, determine the material property distribution within the material. This thesis is focused on uncertainty quantification in inverse problem solutions, with application to inverse problems in linear and nonlinear elasticity. We consider the inverse nonlinear elasticity problem in the context of Bayesian statistics. We show the well-known result that computing the Maximum A Posteriori (MAP) estimate is consistent with previous optimization formulations of the inverse elasticity problem. We show further that certainty in this estimate may be quantified using concepts from information theory, specifically, information gain as measured by the Kullback-Leibler (K-L) divergence and mutual information. A particular challenge in this context is the computational expense associated with computing these quantities. A key contribution of this work is a novel approach that exploits the mathematical structure of the inverse problem and properties of conjugate gradient method to make these calculations feasible. A focus of this work is estimating the spatial distribution of the elastic nonlinearity of a material. Measurement sensitivity to the nonlinearity is much higher for large (finite) strains than for smaller strains, and so large strains tend to be used for such measurements. Measurements of larger deformations, however, tend to show greater levels of noise. A key finding of this work is that, when identifying nonlinear elastic properties, information gain can be used to characterize a trade-off between larger strains with higher noise levels and smaller strains with lower noise levels. These results can be used to inform experimental design. An approach often used to estimate both linear and nonlinear elastic property distributions is to do so sequentially: Use a small strain deformation to estimate the linear properties, and a large strain deformation to estimate the nonlinearity. A key finding of this work is that accurate characterization of the joint posterior probability distribution over both linear and nonlinear elastic parameters requires that the estimates be performed jointly rather than sequentially. All the methods described above are demonstrated in applications to problems in elasticity for both simulated data as well as clinically measured data (obtained in vivo). In the context of the clinical data, we evaluate repeatability of measurements and parameter reconstructions in a clinical setting.
16

Describing Healthcare Service Delivery in a Ryan White Funded HIV Clinic: A Bayesian Mixed Method Case Study

Beane, Stephanie 13 May 2016 (has links)
This dissertation describes health care delivery in a Ryan White Program (RWP) HIV clinic, with a focus on medical home care, using the Bayesian Case Study Method (BCSM). The RWP funds medical care for uninsured HIV patients and Pappas and colleagues (2014) suggested enhanced HIV care build upon medical home models of care rooted in the RWP. However, little research describes how RWP clinics operate as medical homes. This study developed the BCSM to describe medical home care at a RWP clinic. The BCSM combines a case study framework with Bayesian statistics for a novel approach to mixed method, descriptive studies. Roberts (2002) and Voils (2009) used mixed-method Bayesian approaches and this dissertation contributes to this work. For this study, clinic staff and patients participated in interviews and surveys. I used Bayes’ Theorem to combine interview data, by use of subjective priors, with survey data to produce Bayesian posterior means that indicate the extent to which medical home care was provided. Subjective priors facilitate the inclusion of valuable stakeholder belief in posteriors. Using the BCSM, posterior means succinctly describe qualitative and quantitative data, in a way other methods of mixing data do not, which is useful for decision makers. Posterior means indicated that coordinated, comprehensive, and ongoing care was provided at the clinic; however, accessible care means were lower reflecting an area in need of improvement. Interview data collected for subjective priors captured detailed service delivery descriptions. For example, interview data described how medical and support services were coordinated and highlighted the role of social determinants of health (SDH). Namely, coordinated and comprehensive services that addressed SDH, such as access to housing, food, and transportation, were necessary for patients to focus on their HIV and utilize healthcare. This case study addressed a gap in the literature regarding descriptions of how RWP clinics provide medical home care. For domains with high posterior means, the associated interview data can be used to plan HIV care in non-RWP settings. Future research should describe other RWP HIV medical homes so this information can be used to plan enhanced HIV care across the healthcare system.
17

Making Sense of the Noise: Statistical Analysis of Environmental DNA Sampling for Invasive Asian Carp Monitoring Near the Great Lakes

Song, Jeffery W. 01 May 2017 (has links)
Sensitive and accurate detection methods are critical for monitoring and managing the spread of aquatic invasive species, such as invasive Silver Carp (SC; Hypophthalmichthys molitrix) and Bighead Carp (BH; Hypophthalmichthys nobilis) near the Great Lakes. A new detection tool called environmental DNA (eDNA) sampling, the collection and screening of water samples for the presence of the target species’ DNA, promises improved detection sensitivity compared to conventional surveillance methods. However, the application of eDNA sampling for invasive species management has been challenging due to the potential of false positives, from detecting species’ eDNA in the absence of live organisms. In this dissertation, I study the sources of error and uncertainty in eDNA sampling and develop statistical tools to show how eDNA sampling should be utilized for monitoring and managing invasive SC and BH in the United States. In chapter 2, I investigate the environmental and hydrologic variables, e.g. reverse flow, that may be contributing to positive eDNA sampling results upstream of the electric fish dispersal barrier in the Chicago Area Waterway System (CAWS), where live SC are not expected to be present. I used a beta-binomial regression model, which showed that reverse flow volume across the barrier has a statistically significant positive relationship with the probability of SC eDNA detection upstream of the barrier from 2009 to 2012 while other covariates, such as water temperature, season, chlorophyll concentration, do not. This is a potential alternative explanation for why SC eDNA has been detected upstream of the barrier but intact SC have not. In chapter 3, I develop and parameterize a statistical model to evaluate how changes made to the US Fish and Wildlife Service (USFWS)’s eDNA sampling protocols for invasive BH and SC monitoring from 2013 to 2015 have influenced their sensitivity. The model shows that changes to the protocol have caused the sensitivity to fluctuate. Overall, when assuming that eDNA is randomly distributed, the sensitivity of the current protocol is higher for BH eDNA detection and similar for SC eDNA detection compared to the original protocol used from 2009-2012. When assuming that eDNA is clumped, the sensitivity of the current protocol is slightly higher for BH eDNA detection but worse for SC eDNA detection. In chapter 4, I apply the model developed in chapter 3 to estimate the BH and SC eDNA concentration distributions in two pools of the Illinois River where BH and SC are considered to be present, one pool where they are absent, and upstream of the electric barrier in the CAWS given eDNA sampling data and knowledge of the eDNA sampling protocol used in 2014. The results show that the estimated mean eDNA concentrations in the Illinois River are highest in the invaded pools (La Grange; Marseilles) and are lower in the uninvaded pool (Brandon Road). The estimated eDNA concentrations in the CAWS are much lower compared to the concentrations in the Marseilles pool, which indicates that the few eDNA detections in the CAWS (3% of samples positive for SC and 0.4% samples positive for BH) do not signal the presence of live BH or SC. The model shows that >50% samples positive for BH or SC eDNA are needed to infer AC presence in the CAWS, i.e., that the estimated concentrations are similar to what is found in the Marseilles pool. Finally, in chapter 5, I develop a decision tree model to evaluate the value of information that monitoring provides for making decisions about BH and SC prevention strategies near the Great Lakes. The optimal prevention strategy is dependent on prior beliefs about the expected damage of AC invasion, the probability of invasion, and whether or not BH and SC have already invaded the Great Lakes (which is informed by monitoring). Given no monitoring, the optimal strategy is to stay with the status quo of operating electric barriers in the CAWS for low probabilities of invasion and low expected invasion costs. However, if the probability of invasion is greater than 30% and the cost of invasion is greater than $100 million a year, the optimal strategy changes to installing an additional barrier in the Brandon Road pool. Greater risk-aversion (i.e., aversion to monetary losses) causes less prevention (e.g., status quo instead of additional barriers) to be preferred. Given monitoring, the model shows that monitoring provides value for making this decision, only if the monitoring tool has perfect specificity (false positive rate = 0%).
18

High-precision radiocarbon dating of political collapse and dynastic origins at the Maya site of Ceibal, Guatemala

Inomata, Takeshi, Triadan, Daniela, MacLellan, Jessica, Burham, Melissa, Aoyama, Kazuo, Palomo, Juan Manuel, Yonenobu, Hitoshi, Pinzón, Flory, Nasu, Hiroo 07 February 2017 (has links)
The lowland Maya site of Ceibal, Guatemala, had a long history of occupation, spanning from the Middle Preclassic Period through the Terminal Classic (1000 BC to AD 950). The Ceibal-Petexbatun Archaeological Project has been conducting archaeological investigations at this site since 2005 and has obtained 154 radiocarbon dates, which represent the largest collection of radiocarbon assays from a single Maya site. The Bayesian analysis of these dates, combined with a detailed study of ceramics, allowed us to develop a high-precision chronology for Ceibal. Through this chronology, we traced the trajectories of the Preclassic collapse around AD 150–300 and the Classic collapse around AD 800–950, revealing similar patterns in the two cases. Social instability started with the intensification of warfare around 75 BC and AD 735, respectively, followed by the fall of multiple centers across the Maya lowlands around AD 150 and 810. The population of Ceibal persisted for some time in both cases, but the center eventually experienced major decline around AD 300 and 900. Despite these similarities in their diachronic trajectories, the outcomes of these collapses were different, with the former associated with the development of dynasties centered on divine rulership and the latter leading to their downfalls. The Ceibal dynasty emerged during the period of low population after the Preclassic collapse, suggesting that this dynasty was placed under the influence from, or by the direct intervention of, an external power.
19

Incorporating high-dimensional exposure modelling into studies of air pollution and health

Liu, Yi January 2015 (has links)
Air pollution is an important determinant of health. There is convincing, and growing, evidence linking the risk of disease, and premature death, with exposure to various pollutants including fine particulate matter and ozone. Knowledge about the health and environmental risks and their trends is important stimulus for developing environmental and public health policy. In order to perform studies into the risks of environmental hazards on human health study there is a requirement for accurate estimates of exposures that might be experienced by the populations at risk. In this thesis we develop spatio-temporal models within a Bayesian framework to obtain accurate estimates of such exposures. These models are set within a hierarchical framework in a Bayesian setting with different levels describing dependencies over space and time. Considering the complexity of hierarchical models and the large amounts of data that can arise from environmental networks mean that inference using Markov Chain Monte Carlo (MCMC) may be computational challenging in this setting. We use both MCMC and Integrated Nested Laplace Approximations (INLA) to implement spatio-temporal exposure models when dealing with high–dimensional data. We also propose an approach for utilising the results from exposure models in health models which allows them to enhance studies of the health effects of air pollution. Moreover, we investigate the possible effects of preferential sampling, where monitoring sites in environmental networks are preferentially located by the designers in order to assess whether guideline and policies are being adhered to. This means the data arising from such networks may not accurately characterise the spatial-temporal field they intend to monitor and as such will not provide accurate estimates of the exposures that are potentially experienced by populations. This has the potential to introduce bias into estimates of risk associated with exposure to air pollution and subsequent health impact analyses. Throughout the thesis, the methods developed are assessed using simulation studies and applied to real–life case studies assessing the effects of particulate matter on health in Greater London and throughout the UK.
20

Parcimonie dans les modèles Markoviens et application à l'analyse des séquences biologiques / Parsimonious Markov models and application to biological sequence analysis

Bourguignon, Pierre Yves Vincent 15 December 2008 (has links)
Les chaînes de Markov constituent une famille de modèle statistique incontournable dans de nombreuses applications, dont le spectre s'étend de la compression de texte à l'analyse des séquences biologiques. Un problème récurrent dans leur mise en oeuvre face à des données réelles est la nécessité de compromettre l'ordre du modèle, qui conditionne la complexité des interactions modélisées, avec la quantité d'information fournies par les données, dont la limitation impacte négativement la qualité des estimations menées. Les arbres de contexte permettent une granularité fine dans l'établissement de ce compromis, en permettant de recourir à des longueurs de mémoire variables selon le contexte rencontré dans la séquence. Ils ont donné lieu à des outils populaires tant pour l'indexation des textes que pour leur compression (Context Tree Maximisation – CTM - et Context Tree Weighting - CTW). Nous proposons une extension de cette classe de modèles, en introduisant les arbres de contexte parcimonieux, obtenus par fusion de noeuds issus du même parent dans l'arbre. Ces fusions permettent une augmentation radicale de la granularité de la sélection de modèle, permettant ainsi de meilleurs compromis entre complexité du modèle et qualité de l'estimation, au prix d'une extension importante de la quantité de modèles mise en concurrence. Cependant, grâce à une approche bayésienne très similaire à celle employée dans CTM et CTW, nous avons pu concevoir une méthode de sélection de modèles optimisant de manière exacte le critère bayésien de sélection de modèles tout en bénéficiant d'une programmation dynamique. Il en résulte un algorithme atteignant la borne inférieure de la complexité du problème d'optimisation, et pratiquement tractable pour des alphabets de taille inférieure à 10 symboles. Diverses démonstrations de la performance atteinte par cette procédure sont fournies en dernière partie. / Markov chains, as a universal model accounting for finite memory, discrete valued processes, are omnipresent in applied statistics. Their applications range from text compression to the analysis of biological sequences. Their practical use with finite samples, however, systematically require to draw a compromise between the memory length of the model used, which conditions the complexity of the interactions the model may capture, and the amount of information carried by the data, whose limitation negatively impacts the quality of estimation. Context trees, as an extension of the model class of Markov chains, provide the modeller with a finer granularity in this model selection process, by allowing the memory length to vary across contexts. Several popular modelling methods are based on this class of models, in fields such as text indexation of text compression (Context Tree Maximization and Context Tree Weighting). We propose an extension of the models class of context trees, the Parcimonious context trees, which further allow the fusion of sibling nodes in the context tree. They provide the modeller with a yet finer granularity to perform the model selection task, at the cost of an increased computational cost for performing it. Thanks to a bayesian approach of this problem borrowed from compression techniques, we succeeded at desiging an algorithm that exactly optimizes the bayesian criterion, while it benefits from a dynamic programming scheme ensuring the minimisation of the computational complexity of the model selection task. This algorithm is able to perform in reasonable space and time on alphabets up to size 10, and has been applied on diverse datasets to establish the good performances achieved by this approach.

Page generated in 0.1145 seconds