• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2047
  • 601
  • 262
  • 260
  • 61
  • 32
  • 26
  • 19
  • 15
  • 14
  • 10
  • 8
  • 6
  • 6
  • 5
  • Tagged with
  • 4146
  • 815
  • 761
  • 732
  • 723
  • 722
  • 714
  • 661
  • 580
  • 451
  • 433
  • 416
  • 412
  • 370
  • 315
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1051

Bayesian variable selection for linear mixed models when p is much larger than n with applications in genome wide association studies

Williams, Jacob Robert Michael 05 June 2023 (has links)
Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNP) causing phenotypic responses in individuals. Commonly, GWAS analyses are done by using single marker association testing (SMA) which investigates the effect of a single SNP at a time and selects a candidate set of SNPs using a strict multiple correction penalty. As SNPs are not independent but instead strongly correlated, SMA methods lead to such high false discovery rates (FDR) that the results are difficult to use by wet lab scientists. To address this, this dissertation proposes three different novel Bayesian methods: BICOSS, BGWAS, and IEB. From a Bayesian modeling point of view, SNP search can be seen as a variable selection problem in linear mixed models (LMMs) where $p$ is much larger than $n$. To deal with the $p>>n$ issue, our three proposed methods use novel Bayesian approaches based on two steps: a screening step and a model selection step. To control false discoveries, we link the screening and model selection steps through a common probability of a null SNP. To deal with model selection, we propose novel priors that are extensions for LMMs of nonlocal priors, Zellner-g prior, unit Information prior, and Zellner-Siow prior. For each method, extensive simulation studies and case studies show that these methods improve the recall of true causal SNPs and, more importantly, drastically decrease FDR. Because our Bayesian methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health. / Doctor of Philosophy / Genome-wide association studies (GWAS) seek to identify locations in DNA known as single nucleotide polymorphisms (SNPs) that are the underlying cause of observable traits such as height or breast cancer. Commonly, GWAS analyses are performed by investigating each SNP individually and seeing which SNPs are highly correlated with the response. However, as the SNPs themselves are highly correlated, investigating each one individually leads to a high number of false positives. To address this, this dissertation proposes three different advanced statistical methods: BICOSS, BGWAS, and IEB. Through extensive simulations, our methods are shown to not only drastically reduce the number of falsely detected SNPs but also increase the detection rate of true causal SNPs. Because our novel methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health.
1052

Phylogenetic Niche Modeling

McHugh, Sean W. 01 September 2021 (has links)
Projecting environmental niche models through time is a common goal when studying species response to climatic change. Species distribution models (SDMs) are commonly used to estimate a species' niche from observed patterns of occurrence and environmental predictors. However, a species niche is also shaped by non-environmental factors--including biotic interactions and dispersal barrier—truncating SDM estimates. Though truncated SDMs may accurately predict present-day species niche, projections through time are often biased by environmental condition change. Modeling niche in a phylogenetic framework leverages a clade's shared evolutionary history to pull species estimates closer towards phylogenetic conserved values and farther away from species specific biases. We propose a new Bayesian model of phylogenetic niche implemented in R. Under our model, species SDM parameters are transformed into biologically interpretable continuous parameters of environmental niche optimum, breadth, and tolerance evolving under multivariate Brownian motion random walk. Through simulation analyses, we demonstrated model accuracy and precision that improved as phylogeny size increased. We also demonstrated our model on a clade of eastern United States Plethodontid salamanders by accurately estimating species niche, even when no occurrence data is present. Our model demonstrates a novel framework where niche changes can be studied forwards and backwards through time to understand ancestral ranges, patterns of environmental specialization, and niche in data deficient species. / Master of Science / As many species face increasing pressure in a changing climate, it is crucial to understand the set of environmental conditions that shape species' ranges--known as the environmental niche--to guide conservation and land management practices. Species distribution models (SDMs) are common tools that are used to model species' environmental niche. These models treat a species' probability of occurrence as a function of environmental conditions. SDM niche estimates can predict a species' range given climate data, paleoclimate, or projections of future climate change to estimate species range shifts from the past to the future. However, SDM estimates are often biased by non-environmental factors shaping a species' range including competitive divergence or dispersal barriers. Biased SDM estimates can result in range predictions that get worse as we extrapolate beyond the observed climatic conditions. One way to overcome these biases is by leveraging the shared evolutionary history amongst related species to "fill in the gaps". Species that are more closely phylogenetically related often have more similar or "conserved" environmental niches. By estimating environmental niche over all species in a clade jointly, we can leverage niche conservatism to produce more biologically realistic estimates of niche. However, currently a methodological gap exists between SDMs estimates and macroevolutionary models, prohibiting them from being estimated jointly. We propose a novel model of evolutionary niche called PhyNE (Phylogenetic Niche Evolution), where biologically realistic environmental niches are fit across a set of species with occurrence data, while simultaneously fitting and leveraging a model of evolution across a portion of the tree of life. We evaluated model accuracy, bias, and precision through simulation analyses. Accuracy and precision increased with larger phylogeny size and effectively estimated model parameters. We then applied PhyNE to Plethodontid salamanders from Eastern North America. This ecologically-important and diverse group of lungless salamanders require cold and wet conditions and have distributions that are strongly affected by climatic conditions. Species within the family vary greatly in distribution, with some species being wide ranging generalists, while others are hyper-endemics that inhabit specific mountains in the Southern Appalachians with restricted thermal and hydric conditions. We fit PhyNE to occurrence data for these species and their associated average annual precipitation and temperature data. We identified no correlations between species environmental preference and specialization. Pattern of preference and specialization varied among Plethodontid species groups, with more aquatic species possessing a broader environmental niche, likely due to the aquatic microclimate facilitating occurrence in a wider range of conditions. We demonstrated the effectiveness of PhyNE's evolutionarily-informed estimates of environmental niche, even when species' occurrence data is limited or even absent. PhyNE establishes a proof-of-concept framework for a new class of approaches for studying niche evolution, including improved methods for estimating niche for data-deficient species, historical reconstructions, future predictions under climate change, and evaluation of niche evolutionary processes across the tree of life. Our approach establishes a framework for leveraging the rapidly growing availability of biodiversity data and molecular phylogenies to make robust eco-evolutionary predictions and assessments of species' niche and distributions in a rapidly changing world.
1053

SCALABLE BAYESIAN METHODS FOR PROBABILISTIC GRAPHICAL MODELS

Chuan Zuo (18429759) 25 April 2024 (has links)
<p dir="ltr">In recent years, probabilistic graphical models have emerged as a powerful framework for understanding complex dependencies in multivariate data, offering a structured approach to tackle uncertainty and model complexity. These models have revolutionized the way we interpret the interplay between variables in various domains, from genetics to social network analysis. Inspired by the potential of probabilistic graphical models to provide insightful data analysis while addressing the challenges of high-dimensionality and computational efficiency, this dissertation introduces two novel methodologies that leverage the strengths of graphical models in high-dimensional settings. By integrating advanced inference techniques and exploiting the structural advantages of graphical models, we demonstrate how these approaches can efficiently decode complex data patterns, offering significant improvements over traditional methods. This work not only contributes to the theoretical advancements in the field of statistical data analysis but also provides practical solutions to real-world problems characterized by large-scale, complex datasets.</p><p dir="ltr">Firstly, we introduce a novel Bayesian hybrid method for learning the structure of Gaus- sian Bayesian Networks (GBNs), addressing the critical challenge of order determination in constraint-based and score-based methodologies. By integrating a permutation matrix within the likelihood function, we propose a technique that remains invariant to data shuffling, thereby overcoming the limitations of traditional approaches. Utilizing Cholesky decompo- sition, we reparameterize the log-likelihood function to facilitate the identification of the parent-child relationship among nodes without relying on the faithfulness assumption. This method efficiently manages the permutation matrix to optimize for the sparsest Cholesky factor, leveraging the Bayesian Information Criterion (BIC) for model selection. Theoretical analysis and extensive simulations demonstrate the superiority of our method in terms of precision, recall, and F1-score across various network complexities and sample sizes. Specifically, our approach shows significant advantages in small-n-large-p scenarios, outperforming existing methods in detecting complex network structures with limited data. Real-world applications on datasets such as ECOLI70, ARTH150, MAGIC-IRRI, and MAGIC-NIAB further validate the effectiveness and robustness of our proposed method. Our findings contribute to the field of Bayesian network structure learning by providing a scalable, efficient, and reliable tool for modeling high-dimensional data structures.</p><p dir="ltr">Secondly, we introduce a Bayesian methodology tailored for Gaussian Graphical Models (GGMs) that bridges the gap between GBNs and GGMs. Utilizing the Cholesky decomposition, we establish a novel connection that leverages estimated GBN structures to accurately recover and estimate GGMs. This innovative approach benefits from a theoretical foundation provided by a theorem that connects sparse priors on Cholesky factors with the sparsity of the precision matrix, facilitating effective structure recovery in GGMs. To assess the efficacy of our proposed method, we conduct comprehensive simulations on AR2 and circle graph models, comparing its performance with renowned algorithms such as GLASSO, CLIME, and SPACE across various dimensions. Our evaluation, based on metrics like estimation ac- curacy and selection correctness, unequivocally demonstrates the superiority of our approach in accurately identifying the intrinsic graph structure. The empirical results underscore the robustness and scalability of our method, underscoring its potential as an indispensable tool for statistical data analysis, especially in the context of complex datasets.</p>
1054

A Bayesian statistics approach to updating finite element models with frequency response data

Lindholm, Brian Eric 06 June 2008 (has links)
This dissertation addresses the task of updating finite element models with frequency response data acquired in a structural dynamics test. Standard statistical techniques are used to generate statistically qualified data, which is then used in a Bayesian statistics regression formulation to update the finite element model. The Bayesian formulation allows the analyst to incorporate engineering judgment (in the form of prior knowledge) into the analysis and helps ensure that reasonable and realistic answers are obtained. The formulation includes true statistical weights derived from experimental data as well as a new formulation of the Bayesian regression problem that reduces the effects of numerical ill-conditioning. Model updates are performed with a simulated free-free beam, a simple steel frame, and a cantilever beam. Improved finite element models of the structures are obtained and several statistical tests are used to ensure that the models are improved. / Ph. D.
1055

Comparison of two drugs by multiple stage sampling using Bayesian decision theory

Smith, Armand V. 02 February 2010 (has links)
The general problem considered in this thesis is to determine an optimum strategy for deciding how to allocate the observations in each stage of a multi-stage experimental procedure between two binomial populations (e.g., the numbers of successes for two drugs) on the basis of the results of previous stages. After all of the stages of the experiment have been performed, one must make the terminal decision of which of the two populations has the higher probability of success. The optimum strategy is to be optimum relative to a given loss function; and a prior distribution, or weighting function, for the probabilities of success for the two populations is assumed. Two general classes of loss functions are considered, and it is assumed that the total number of observations in each stage is fixed prior to the experiment. In order to find the optimum strategy a method of analysis called extensive-form analysis is used. This is essentially a method for enumerating all the possible outcomes and corresponding strategies and choosing the optimum strategy for a given outcome. However, it is found that this method of analysis is much too long for all but small examples even when a digital computer is used. Because of this difficulty two alternative procedures, which are approximations to extensive-form analysis, are proposed. In the stage-by-stage procedure one assumes that at each stage he is at the last stage of his multi-stage procedure and allocates his observations to each of the two populations accordingly. It is shown that this is equivalent to assuming at each stage one has a one stage procedure. In the approximate procedure one (approximately) minimizes the posterior variance of the difference of the probabilities of success for the two populations at each stage. The computations for this procedure are quite simple to perform. The stage-by-stage procedure for the case that the two populations are normal with known variance rather than binomial is considered. It is then shown that the approximate procedure can be derived as an approximation to the stage-by- stage procedure when normal approximations to binomial distributions are used. The three procedures are compared with each other and with equal division of the observations in several examples by the computation of the probability of making the correct terminal decision for various values of the population parameters (the probabilities of success}. It is assumed in these computations that the prior distributions of the population parameters are rectangular distributions and that the loss functions are symmetric} i.e., the losses are as great for one wrong terminal decision as they are for the other. These computations show that, for the examples studied, there is relatively little loss in using the stage-by-stage procedure rather than extensive-form analysis and relatively little gain in using the approximate procedure instead of equal division of the observations. However, there is a relatively large loss in using the approximate procedure rather than the stage-by-stage procedure when the population parameters are close to 0 or 1. At first it is assumed there are a fixed number of stages in the experiment, but later in the thesis this restriction is weakened to the restriction that only the maximum number of stages possible in the experiment is fixed and the experiment can be stopped at any stage before the last possible stage is reached. Stopping rules for the stage-by- stage and the approximate procedures are then derived. / Ph. D.
1056

Influence of the Estimator Selection in Scalloped Hammerhead Shark Stock Assessment

Ballesta Artero, Irene Maria 13 January 2014 (has links)
In natural sciences, frequentist paradigm has led statistical practice; however, Bayesian approach has been gaining strength in the last decades. Our study assessed the scalloped hammerhead shark population on the western North Atlantic Ocean using Bayesian methods. This approach allowed incorporate diverse types of errors in the surplus production model and compare the influences of different statistical estimators on the values of the key parameters (r, growth rate; K carrying capacity; depletion, FMSY , fishing levels that would sustain maximum yield; and NMSY, abundance at maximum sustainable yield). Furthermore, we considered multi-levelpriors due to the variety of results on the population growth rate of this species. Our research showed that estimator selection influences the results of the surplus production model and therefore, the value of the target management points. Based on key parameter estimates with uncertainty and Deviance Information Criterion, we suggest that state-space Bayesian models be used for assessing scalloped hammerhead shark or other fish stocks with poor data available. This study found the population was overfished and suffering overfishing. Therefore, based on our research and that there was very low evidence of recovery according with the last data available, we suggest prohibition of fishing for this species because: (1) it is highly depleted (14% of its initial population), (2) the fishery status is very unstable over time, (3) it has a low reproductive rate contributing to a higher risk of overexploitation, and (4) the easiness of misidentification among different hammerhead sharks (smooth, great, scalloped and cryptic species). / Master of Science
1057

Extensions of Weighted Multidimensional Scaling with Statistics for Data Visualization and Process Monitoring

Kodali, Lata 04 September 2020 (has links)
This dissertation is the compilation of two major innovations that rely on a common technique known as multidimensional scaling (MDS). MDS is a dimension-reduction method that takes high-dimensional data and creates low-dimensional versions. Project 1: Visualizations are useful when learning from high-dimensional data. However, visualizations, just as any data summary, can be misleading when they do not incorporate measures of uncertainty; e.g., uncertainty from the data or the dimension reduction algorithm used to create the visual display. We incorporate uncertainty into visualizations created by a weighted version of MDS called WMDS. Uncertainty exists in these visualizations on the variable weights, the coordinates of the display, and the fit of WMDS. We quantify these uncertainties using Bayesian models in a method we call Informative Probabilistic WMDS (IP-WMDS). Visually, we display estimated uncertainty in the form of color and ellipses, and practically, these uncertainties reflect trust in WMDS. Our results show that these displays of uncertainty highlight different aspects of the visualization, which can help inform analysts. Project 2: Analysis of network data has emerged as an active research area in statistics. Much of the focus of ongoing research has been on static networks that represent a single snapshot or aggregated historical data unchanging over time. However, most networks result from temporally-evolving systems that exhibit intrinsic dynamic behavior. Monitoring such temporally-varying networks to detect anomalous changes has applications in both social and physical sciences. In this work, we simulate data from models that rely on MDS, and we perform an evaluation study of the use of summary statistics for anomaly detection by incorporating principles from statistical process monitoring. In contrast to most previous studies, we deliberately incorporate temporal auto-correlation in our study. Other considerations in our comprehensive assessment include types and duration of anomaly, model type, and sparsity in temporally-evolving networks. We conclude that the use of summary statistics can be valuable tools for network monitoring and often perform better than more involved techniques. / Doctor of Philosophy / In this work, two main ideas in data visualization and anomaly detection in dynamic networks are further explored. For both ideas, a connecting theme is extensions of a method called Multidimensional Scaling (MDS). MDS is a dimension-reduction method that takes high-dimensional data (all $p$ dimensions) and creates a low-dimensional projection of the data. That is, relationships in a dataset with presumably a large number of dimensions or variables can be summarized into a lower number of, e.g., two, dimensions. For a given data, an analyst could use a scatterplot to observe the relationship between 2 variables initially. Then, by coloring points, changing the size of the points, or using different shapes for the points, perhaps another 3 to 4 more variables (in total around 7 variables) may be shown in the scatterplot. An advantage of MDS (or any dimension-reduction technique) is that relationships among the data can be viewed easily in a scatterplot regardless of the number of variables in the data. The interpretation of any MDS plot is that observations that are close together are relatively more similar than observations that are farther apart, i.e., proximity in the scatterplot indicates relative similarity. In the first project, we use a weighted version of MDS called Weighted Multidimensional Scaling (WMDS) where weights, which indicate a sense of importance, are placed on the variables of the data. The problem with any WMDS plot is that inaccuracies of the method are not included in the plot. For example, is an observation that appears to be an outlier, really an outlier? An analyst cannot confirm this without further context. Thus, we created a model to calculate, visualize, and interpret such inaccuracy or uncertainty in WMDS plots. Such modeling efforts help analysts facilitate exploratory data analysis. In the second project, the theme of MDS is extended to an application with dynamic networks. Dynamic networks are multiple snapshots of pairwise interactions (represented as edges) among a set of nodes (observations). Over time, changes may appear in some of the snapshots. We aim to detect such changes using a process monitoring approach on dynamic networks. Statistical monitoring approaches determine thresholds for in-control or expected behavior that are calculated from data with no signal. Then, the in-control thresholds are used to monitor newly collected data. We applied this approach on dynamic network data, and we utilized a detailed simulation study to better understand the performance of such monitoring. For the simulation study, data are generated from dynamic network models that use MDS. We found that monitoring summary statistics of the network were quite effective on data generated from these models. Thus, simple tools may be used as a first step to anomaly detection in dynamic networks.
1058

NOISE AWARE BAYESIAN PARAMETER ESTIMATION IN BIOPROCESSES: USING NEURAL NETWORK SURROGATE MODELS WITH NON-UNIFORM DATA SAMPLING / NOISE AWARE BAYESIAN PARAMETER ESTIMATION IN BIOPROCESSES

Weir, Lauren January 2024 (has links)
This thesis demonstrates a parameter estimation technique for bioprocesses that utilizes measurement noise in experimental data to determine credible intervals on parameter estimates, with this information of potential use in prediction, robust control, and optimization. To determine these estimates, the work implements Bayesian inference using nested sampling, presenting an approach to develop neural network (NN) based surrogate models. To address challenges associated with non-uniform sampling of experimental measurements, an NN structure is proposed. The resultant surrogate model is utilized within a Nested Sampling Algorithm that samples possible parameter values from the parameter space and uses the NN to calculate model output for use in the likelihood function based on the joint probability distribution of the noise of output variables. This method is illustrated against simulated data, then with experimental data from a Sartorius fed-batch bioprocess. Results demonstrate the feasibility of the proposed technique to enable rapid parameter estimation for bioprocesses. / Thesis / Master of Applied Science (MASc) / Bioprocesses require models that can be developed quickly for rapid production of desired pharmaceuticals. Parameter estimation is necessary for these models, especially first principles models. Generating parameter estimates with confidence intervals is important for model based control. Challenges with parameter estimation that must be addressed are the presence of non-uniform sampling and measurement noise in experimental data. This thesis demonstrates a method of parameter estimation that generates parameter estimates with credible intervals by incorporating measurement noise in experimental data, while also employing a dynamic neural network surrogate model that can process non-uniformly sampled data. The proposed technique implements Bayesian inference using nested sampling and was tested against both simulated and real experimental fed-batch data.
1059

Incremental Learning approaches to Biomedical decision problems

Tortajada Velert, Salvador 21 September 2012 (has links)
During the last decade, a new trend in medicine is transforming the nature of healthcare from reactive to proactive. This new paradigm is changing into a personalized medicine where the prevention, diagnosis, and treatment of disease is focused on individual patients. This paradigm is known as P4 medicine. Among other key benefits, P4 medicine aspires to detect diseases at an early stage and introduce diagnosis to stratify patients and diseases to select the optimal therapy based on individual observations and taking into account the patient outcomes to empower the physician, the patient, and their communication. This paradigm transformation relies on the availability of complex multi-level biomedical data that are increasingly accurate, since it is possible to find exactly the needed information, but also exponentially noisy, since the access to that information is more and more challenging. In order to take advantage of this information, an important effort is being made in the last decades to digitalize medical records and to develop new mathematical and computational methods for extracting maximum knowledge from patient records, building dynamic and disease-predictive models from massive amounts of integrated clinical and biomedical data. This requirement enables the use of computer-assisted Clinical Decision Support Systems for the management of individual patients. The Clinical Decision Support System (CDSS) are computational systems that provide precise and specific knowledge for the medical decisions to be adopted for diagnosis, prognosis, treatment and management of patients. The CDSS are highly related to the concept of evidence-based medicine since they infer medical knowledge from the biomedical databases and the acquisition protocols that are used for the development of the systems, give computational support based on evidence for the clinical practice, and evaluate the performance and the added value of the solution for each specific medical problem. / Tortajada Velert, S. (2012). Incremental Learning approaches to Biomedical decision problems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17195
1060

A Model-Based Reliability Analysis Method Using Bayesian Network

Kabir, Sohag, Campean, Felician 10 December 2021 (has links)
Yes / Bayesian Network (BN)-based methods are increasingly used in system reliability analysis. While BNs enable to perform multiple analyses based on a single model, the construction of robust BN models relies either on the conversion from other intermediate system model structures or direct analyst-led development based on experts input, both requiring significant human effort. This article proposes an architecture model-based approach for the direct generation of a BN model. Given the architectural model of a system, a systematic bottom-up approach is suggested, underpinned by failure behaviour models of components composed based on interaction models to create a system-level failure behaviour model. Interoperability and reusability of models are supported by a library of component failure models. The approach was illustrated with application to a case study of a steam boiler system.

Page generated in 0.0967 seconds