• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 288
  • 67
  • 48
  • 32
  • 28
  • 18
  • 14
  • 13
  • 12
  • 9
  • 3
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 667
  • 667
  • 359
  • 359
  • 150
  • 147
  • 101
  • 72
  • 66
  • 66
  • 65
  • 63
  • 62
  • 60
  • 60
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Computational Modeling for Differential Analysis of RNA-seq and Methylation data

Wang, Xiao 16 August 2016 (has links)
Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment. / Ph. D.
232

Bayesian Alignment Model for Analysis of LC-MS-based Omic Data

Tsai, Tsung-Heng 22 May 2014 (has links)
Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used in various omic studies for biomarker discovery. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time alignment is one of the most important yet challenging preprocessing steps, in order to ensure that ion intensity measurements among multiple LC-MS runs are comparable. In this dissertation, we propose a Bayesian alignment model (BAM) for analysis of LC-MS data. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and provides estimates of the retention time variability along with uncertainty measures, enabling a natural framework to integrate information of various sources. From methodology development to practical application, we investigate the alignment problem through three research topics: 1) development of single-profile Bayesian alignment model, 2) development of multi-profile Bayesian alignment model, and 3) application to biomarker discovery research. Chapter 2 introduces the profile-based Bayesian alignment using a single chromatogram, e.g., base peak chromatogram from each LC-MS run. The single-profile alignment model improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler using a block Metropolis-Hastings algorithm, and 2) an adaptive mechanism for knot specification using stochastic search variable selection (SSVS). Chapter 3 extends the model to integrate complementary information that better captures the variability in chromatographic separation. We use Gaussian process regression on the internal standards to derive a prior distribution for the mapping functions. In addition, a clustering approach is proposed to identify multiple representative chromatograms for each LC-MS run. With the Gaussian process prior, these chromatograms are simultaneously considered in the profile-based alignment, which greatly improves the model estimation and facilitates the subsequent peak matching process. Chapter 4 demonstrates the applicability of the proposed Bayesian alignment model to biomarker discovery research. We integrate the proposed Bayesian alignment model into a rigorous preprocessing pipeline for LC-MS data analysis. Through the developed analysis pipeline, candidate biomarkers for hepatocellular carcinoma (HCC) are identified and confirmed on a complementary platform. / Ph. D.
233

Maintenance Data Augmentation, using Markov Chain Monte Carlo Simulation : (Hamiltonian MCMC using NUTS)

Roohani, Muhammad Ammar January 2024 (has links)
Reliable and efficient utilization and operation of any engineering asset require carefully designed maintenance planning and maintenance related data in the form of failure times, repair times, Mean Time between Failure (MTBF) and conditioning data etc. play a pivotal role in maintenance decision support. With the advancement in data analytics sciences and industrial artificial intelligence, maintenance related data is being used for maintenance prognostics modeling to predict future maintenance requirements that form the basis of maintenance design and planning in any maintenance-conscious industry like railways. The lack of such available data creates a no. of different types of problems in data driven prognostics modelling. There have been a few methods, the researchers have employed to counter the problems due to lack of available data. The proposed methodology involves data augmentation technique using Markov Chain Monte Carlo (MCMC) Simulation to enhance maintenance data to be used in maintenance prognostics modeling that can serve as basis for better maintenance decision support and planning.
234

Bayesian Hierarchical Modeling and Markov Chain Simulation for Chronic Wasting Disease

Mehl, Christopher 05 1900 (has links)
In this thesis, a dynamic spatial model for the spread of Chronic Wasting Disease in Colorado mule deer is derived from a system of differential equations that captures the qualitative spatial and temporal behaviour of the disease. These differential equations are incorporated into an empirical Bayesian hierarchical model through the unusual step of deterministic autoregressive updates. Spatial effects in the model are described directly in the differential equations rather than through the use of correlations in the data. The use of deterministic updates is a simplification that reduces the number of parameters that must be estimated, yet still provides a flexible model that gives reasonable predictions for the disease. The posterior distribution generated by the data model hierarchy possesses characteristics that are atypical for many Markov chain Monte Carlo simulation techniques. To address these difficulties, a new MCMC technique is developed that has qualities similar to recently introduced tempered Langevin type algorithms. The methodology is used to fit the CWD model, and posterior parameter estimates are then used to obtain predictions about Chronic Wasting Disease.
235

Towards Data-Driven I/O Load Balancing in Extreme-Scale Storage Systems

Banavathi Srinivasa, Sangeetha 15 June 2017 (has links)
Storage systems used for supercomputers and high performance computing (HPC) centers exhibit load imbalance and resource contention. This is mainly due to two factors: the bursty nature of the I/O of scientific applications; and the complex and distributed I/O path without centralized arbitration and control. For example, the extant Lustre parallel storage system, which forms the backend storage for many HPC centers, comprises numerous components, all connected in custom network topologies, and serve varying demands of large number of users and applications. Consequently, some storage servers can be more loaded than others, creating bottlenecks, and reducing overall application I/O performance. Existing solutions focus on per application load balancing, and thus are not effective due to the lack of a global view of the system. In this thesis, we adopt a data-driven quantitative approach to load balance the I/O servers at extreme scale. To this end, we design a global mapper on Lustre Metadata Server (MDS), which gathers runtime statistics collected from key storage components on the I/O path, and applies Markov chain modeling and a dynamic maximum flow algorithm to decide where data should be placed in a load-balanced fashion. Evaluation using a realistic system simulator shows that our approach yields better load balancing, which in turn can help yield higher end-to-end performance. / Master of Science / Critical jobs such as meteorological prediction are run at exa-scale supercomputing facilities like Oak Ridge Leadership Computing Facility (OLCF). It is necessary for these centers to provide an optimally running infrastructure to support these critical workloads. The amount of data that is being produced and processed is increasing rapidly necessitating the need for these High Performance Computing (HPC) centers to design systems to support the increasing volume of data. Lustre is a parallel filesystem that is deployed in HPC centers. Lustre being a hierarchical filesystem comprises of a distributed layer of Object Storage Servers (OSSs) that are responsible for I/O on the Object Storage Targets (OSTs). Lustre employs a traditional capacity-based Round-Robin approach for file placement on the OSTs. This results in the usage of only a small fraction of OSTs. Traditional Round-Robin approach also increases the load on the same set of OSSs which results in a decreased performance. Thus, it is imperative to have a better load balanced file placement algorithm that can evenly distribute the load across all OSSs and the OSTs in order to meet the future demands of data storage and processing. We approach the problem of load imbalance by splicing the whole system into two views: filesystem and applications. We first collect the current usage statistics of the filesystem by means of a distributed monitoring tool. We then predict the applications’ I/O request pattern by employing a Markov Chain Model. Finally, we make use of both these components to design a load balancing algorithm that eventually evens out the load on both the OSSs and OSTs. We evaluate our algorithm on a custom-built simulator that simulates the behavior of the actual filesystem.
236

Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly

Shi, Xu 24 October 2017 (has links)
The rapid development of biotechnology has enabled researchers to collect high-throughput data for studying various biological processes at the genomic level, transcriptomic level, and proteomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. The challenges call for more efforts in developing efficient and effective computational methods to analyze the data at different levels so as to understand the biological systems in different aspects. In this dissertation research, we have developed novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. Specifically, we focus on two research topics in this dissertation: isoform identification and phenotype-specific transcript assembly. For isoform identification, we develop a computational approach, SparseIso, to jointly model the existence and abundance of isoforms in a Bayesian framework. A spike-and-slab prior is incorporated into the model to enforce the sparsity of expressed isoforms. A Gibbs sampler is developed to sample the existence and abundance of isoforms iteratively. For transcript assembly, we develop a Bayesian approach, IntAPT, to assemble phenotype-specific transcripts from multiple RNA sequencing profiles. A two-layer Bayesian framework is used to model the existence of phenotype-specific transcripts and the transcript abundance in individual samples. Based on the hierarchical Bayesian model, a Gibbs sampling algorithm is developed to estimate the joint posterior distribution for phenotype-specific transcript assembly. The performances of our proposed methods are evaluated with simulation data, compared with existing methods and benchmarked with real cell line data. We then apply our methods on breast cancer data to identify biologically meaningful splicing mechanisms associated with breast cancer. For the further work, we will extend our methods for de novo transcript assembly to identify novel isoforms in biological systems; we will incorporate isoform-specific networks into our methods to better understand splicing mechanisms in biological systems. / Ph. D. / The next-generation sequencing technology has significantly improved the resolution of the biomedical research at the genomic level and transcriptomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. In this dissertation, we have developed two novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. We have demonstrated the advantages of our proposed approaches over existing methods on both simulation data and real cell line data. Furthermore, the application of our methods on real breast cancer data and glioblastoma tissue data has further shown the efficacy of our methods in real biological applications.
237

Development of novel computational techniques for phase identification and thermodynamic modeling, and a case study of contact metamorphism in the Triassic Culpeper Basin of Virginia

Prouty, Jonathan Michael 12 August 2024 (has links)
This dissertation develops computational techniques to aid in efficiently studying petrologic systems that would otherwise be challenging. It then focuses on a case study in which the transition from diagenesis to syn-magmatic heating led to a recrystallization and sulfur mobilization. A Markov-chain Montecarlo-based methodology is developed to allow for the assessment of uncertainty in calculated phase assemblage diagrams. Such phase equilibria are ubiquitous in modern petrology, but uncertainties are rarely considered. Methods are discussed for visualizing and quantifying emergent patterns as phase diagrams are re-calculated with input data modified within permitted uncertainty bounds, and these are implemented in a new code. Results show that uncertainty varies significantly across pressure-temperature space and that in some conditions, estimates of stable mineral assemblage are known with very little confidence. A Machine-Learning (ML) based methodology is developed for automatically identifying unknown phases using Electron-dispersion spectra (EDS) in concert with a Random Forest Classification algorithm. This methodology allows for phase identification that it is insensitive to overfitting and noisy spectra. However, this tool is limited by the amount of reference spectra available in the dataset on which the ML algorithm is trained. The approximately 250 EDS spectra in the current training database must be supplemented to make the tool more widely useful, though it currently has an excellent success rate for correctly identifying various sulfide and oxide minerals. An analysis of paragenesis associated with Central Atlantic Magmatic Province (CAMP) intrusions helps to better constrain the dynamics of magma emplacement, while also providing a method for estimating the amount of sedimentary sulfide-sequestered sulfur mobilized as a result of magnetite formation associated with igneous activity. This method demonstrates that dike emplacement can trigger liberation of sedimentary sulfur with no direct cooling impact on climate. / Doctor of Philosophy / Determining how rocks and minerals form is fundamental to the geosciences. Here I present two computer-based techniques that can help address this essential problem. One method involves carefully determining uncertainty in thermodynamic modeling. Knowing the amount of uncertainty ultimately allows us to know the degree of confidence we can have in our model-based conclusions. The second computer-based technique involves using Machine Learning to automate the identification of minerals using an Electron-dispersion spectra (EDS) measured using a Scanning Electron Microscope (SEM). In theory, computers are much better than humans at quickly and repeatedly processing large sets of data such as EDS. This technique works well when the computer is successfully 'trained' on a large set of data but is somewhat limited in this case because there isn't diverse enough data available to train the computer. We therefore need better training data so that we can more fully benefit from this mineral identification tool. A third project I worked on involved assessing the impact of magma intruding into sedimentary rocks of the Culpeper Basin in northern Virginia. This occurred roughly 200 million years ago during the rifting of Pangea. The sedimentary rock around the magma heated up so much that water in the rock boiled and caused the rock to become fractured. After this a hydrothermal system was established that helped convert pyrite to magnetite, removing sulfur from the rocks in the process.
238

Time-to-Event Modeling with Bayesian Perspectives and Applications in Reliability of Artificial Intelligence Systems

Min, Jie 02 July 2024 (has links)
Doctor of Philosophy / With the fast development of artificial intelligence (AI) technology, the reliability of AI needs to be investigated for confidently using AI products in our daily lives. This dissertation includes three projects introducing the statistical models and model estimation methods that can be used in the reliability analysis of AI systems. The first project analyzes the recurrent events data from autonomous vehicles (AVs). A nonparametric model is proposed to study the reliability of AI systems in AVs, and a statistical framework is introduced to evaluate the adequacy of using traditional parametric models in the analysis. The proposed model and framework are then applied to analyze AV data from four manufacturers that participated in an AV driving testing program overseen by the California Department of Motor Vehicles. The second project develops a survival model to investigate the failure times of graphics processing units (GPUs) used in supercomputers. The model considers several covariates, the spatial correlation, and the correlation among multiple types of failures. In addition, unique spatial correlation functions and a special distance function are introduced to quantify the spatial correlation inside supercomputers. The model is applied to explore the GPU failure times in the Titan supercomputer. The third project proposes a new Markov chain Monte Carlo sampler that can be used in the estimation and inference of spatial survival models. The sampler can generate a reasonable amount of samples within a shorter computing time compared with existing popular samplers. Important factors that can influence the performance of the proposed sampler are explored, and the sampler is used to analyze the Titan GPU failures to illustrate its usefulness in solving real-world problems.
239

A Bayesian Approach to Estimating Background Flows from a Passive Scalar

Krometis, Justin 26 June 2018 (has links)
We consider the statistical inverse problem of estimating a background flow field (e.g., of air or water) from the partial and noisy observation of a passive scalar (e.g., the concentration of a pollutant). Here the unknown is a vector field that is specified by large or infinite number of degrees of freedom. We show that the inverse problem is ill-posed, i.e., there may be many or no background flows that match a given set of observations. We therefore adopt a Bayesian approach, incorporating prior knowledge of background flows and models of the observation error to develop probabilistic estimates of the fluid flow. In doing so, we leverage frameworks developed in recent years for infinite-dimensional Bayesian inference. We provide conditions under which the inference is consistent, i.e., the posterior measure converges to a Dirac measure on the true background flow as the number of observations of the solute concentration grows large. We also define several computationally-efficient algorithms adapted to the problem. One is an adjoint method for computation of the gradient of the log likelihood, a key ingredient in many numerical methods. A second is a particle method that allows direct computation of point observations of the solute concentration, leveraging the structure of the inverse problem to avoid approximation of the full infinite-dimensional scalar field. Finally, we identify two interesting example problems with very different posterior structures, which we use to conduct a large-scale benchmark of the convergence of several Markov Chain Monte Carlo methods that have been developed in recent years for infinite-dimensional settings. / Ph. D. / We consider the problem of estimating a fluid flow (e.g., of air or water) from partial and noisy observations of the concentration of a solute (e.g., a pollutant) dissolved in the fluid. Because of observational noise, and because there are cases where the fluid flow will not affect the movement of the pollutant, the fluid flow cannot be uniquely determined from the observations. We therefore adopt a statistical (Bayesian) approach, developing probabilistic estimates of the fluid flow using models of observation error and our understanding of the flow before measurements are taken. We provide conditions under which, as the number of observations grows large, the approach is able to identify the fluid flow that generated the observations. We define several efficient algorithms for computing statistics of the fluid flow, one of which involves approximating the movement of individual solute particles to estimate concentrations only where required by the inverse problem. We identify two interesting example problems for which the statistics of the fluid flow are very different. The first case produces an approximately normal distribution. The second example exhibits highly nonGaussian structure, where several different classes of fluid flows match the data very well. We use these examples to test the functionality and efficiency of several numerical (Markov Chain Monte Carlo) methods developed in recent years to compute the solution to similar problems.
240

Statistical potentials for evolutionary studies

Kleinman, Claudia L. 06 1900 (has links)
Les séquences protéiques naturelles sont le résultat net de l’interaction entre les mécanismes de mutation, de sélection naturelle et de dérive stochastique au cours des temps évolutifs. Les modèles probabilistes d’évolution moléculaire qui tiennent compte de ces différents facteurs ont été substantiellement améliorés au cours des dernières années. En particulier, ont été proposés des modèles incorporant explicitement la structure des protéines et les interdépendances entre sites, ainsi que les outils statistiques pour évaluer la performance de ces modèles. Toutefois, en dépit des avancées significatives dans cette direction, seules des représentations très simplifiées de la structure protéique ont été utilisées jusqu’à présent. Dans ce contexte, le sujet général de cette thèse est la modélisation de la structure tridimensionnelle des protéines, en tenant compte des limitations pratiques imposées par l’utilisation de méthodes phylogénétiques très gourmandes en temps de calcul. Dans un premier temps, une méthode statistique générale est présentée, visant à optimiser les paramètres d’un potentiel statistique (qui est une pseudo-énergie mesurant la compatibilité séquence-structure). La forme fonctionnelle du potentiel est par la suite raffinée, en augmentant le niveau de détails dans la description structurale sans alourdir les coûts computationnels. Plusieurs éléments structuraux sont explorés : interactions entre pairs de résidus, accessibilité au solvant, conformation de la chaîne principale et flexibilité. Les potentiels sont ensuite inclus dans un modèle d’évolution et leur performance est évaluée en termes d’ajustement statistique à des données réelles, et contrastée avec des modèles d’évolution standards. Finalement, le nouveau modèle structurellement contraint ainsi obtenu est utilisé pour mieux comprendre les relations entre niveau d’expression des gènes et sélection et conservation de leur séquence protéique. / Protein sequences are the net result of the interplay of mutation, natural selection and stochastic variation. Probabilistic models of molecular evolution accounting for these processes have been substantially improved over the last years. In particular, models that explicitly incorporate protein structure and site interdependencies have recently been developed, as well as statistical tools for assessing their performance. Despite major advances in this direction, only simple representations of protein structure have been used so far. In this context, the main theme of this dissertation has been the modeling of three-dimensional protein structure for evolutionary studies, taking into account the limitations imposed by computationally demanding phylogenetic methods. First, a general statistical framework for optimizing the parameters of a statistical potential (an energy-like scoring system for sequence-structure compatibility) is presented. The functional form of the potential is then refined, increasing the detail of structural description without inflating computational costs. Always at the residue-level, several structural elements are investigated: pairwise distance interactions, solvent accessibility, backbone conformation and flexibility of the residues. The potentials are then included into an evolutionary model and their performance is assessed in terms of model fit, compared to standard evolutionary models. Finally, this new structurally constrained phylogenetic model is used to better understand the selective forces behind the differences in conservation found in genes of very different expression levels.

Page generated in 0.036 seconds