• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 209
  • 196
  • 31
  • 18
  • 12
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 558
  • 558
  • 214
  • 196
  • 107
  • 102
  • 73
  • 67
  • 67
  • 67
  • 66
  • 57
  • 54
  • 50
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Bayesian Parameter Estimation on Three Models of Influenza

Torrence, Robert Billington 11 May 2017 (has links)
Mathematical models of viral infections have been informing virology research for years. Estimating parameter values for these models can lead to understanding of biological values. This has been successful in HIV modeling for the estimation of values such as the lifetime of infected CD8 T-Cells. However, estimating these values is notoriously difficult, especially for highly complex models. We use Bayesian inference and Monte Carlo Markov Chain methods to estimate the underlying densities of the parameters (assumed to be continuous random variables) for three models of influenza. We discuss the advantages and limitations of parameter estimation using these methods. The data and influenza models used for this project are from the lab of Dr. Amber Smith in Memphis, Tennessee. / Master of Science
232

Bayesian Alignment Model for Analysis of LC-MS-based Omic Data

Tsai, Tsung-Heng 22 May 2014 (has links)
Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used in various omic studies for biomarker discovery. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time alignment is one of the most important yet challenging preprocessing steps, in order to ensure that ion intensity measurements among multiple LC-MS runs are comparable. In this dissertation, we propose a Bayesian alignment model (BAM) for analysis of LC-MS data. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and provides estimates of the retention time variability along with uncertainty measures, enabling a natural framework to integrate information of various sources. From methodology development to practical application, we investigate the alignment problem through three research topics: 1) development of single-profile Bayesian alignment model, 2) development of multi-profile Bayesian alignment model, and 3) application to biomarker discovery research. Chapter 2 introduces the profile-based Bayesian alignment using a single chromatogram, e.g., base peak chromatogram from each LC-MS run. The single-profile alignment model improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler using a block Metropolis-Hastings algorithm, and 2) an adaptive mechanism for knot specification using stochastic search variable selection (SSVS). Chapter 3 extends the model to integrate complementary information that better captures the variability in chromatographic separation. We use Gaussian process regression on the internal standards to derive a prior distribution for the mapping functions. In addition, a clustering approach is proposed to identify multiple representative chromatograms for each LC-MS run. With the Gaussian process prior, these chromatograms are simultaneously considered in the profile-based alignment, which greatly improves the model estimation and facilitates the subsequent peak matching process. Chapter 4 demonstrates the applicability of the proposed Bayesian alignment model to biomarker discovery research. We integrate the proposed Bayesian alignment model into a rigorous preprocessing pipeline for LC-MS data analysis. Through the developed analysis pipeline, candidate biomarkers for hepatocellular carcinoma (HCC) are identified and confirmed on a complementary platform. / Ph. D.
233

Predictive Turbulence Modeling with Bayesian Inference and Physics-Informed Machine Learning

Wu, Jinlong 25 September 2018 (has links)
Reynolds-Averaged Navier-Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high-fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled. / Ph. D. / Reynolds-Averaged Navier–Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.
234

Continuous Continuous Probabilistic Genotyping: A differentiable model and modern Bayesian inference techniques for forensic DNA mixtures

Susik, Mateusz 19 June 2024 (has links)
DNA samples are a part of the collected physical evidence during the comtemporary crime scene investigation procedure. After processing the samples, a laboratory obtains short tandem repeat electropherograms. In case of mixed DNA profiles, i.e., profiles that contain DNA material from more than one contributor, the laboratory needs to estimate the test statistic (likelihood ratio) that could provide evidence, either inculpatory or exculpatory, against the person of interest. This is automated with probabilistic genotyping (PG) software with (fully-)continuous models: the ones that consider the heights of the observed peaks. In this thesis, we provide understanding of the modern PG methods. We then show how to improve measurable indicators of the algorithm performance, such as precision and inference runtime, that directly correspond to the efficiency and efficacy of work performed in a lab. With quicker algorithms the forensics laboratories can process more samples and provide more comprehensive results by reanalysing the mixtures with different hypotheses and hyperparameterisations. With more precise algorithms, there will be a grater confidence in their results. The precision of the solution would ameliorate the admissibility of the provided evidence and reliability of the results. We achieve improvements over the state-of-the-art by utilising probabilistic programming and modern Bayesian inference methods. We describe a differentiable (and hence continuous) continuous model that can be used with different estimators from both the sampling and variational families of techniques. Finally, as the different PG products output different likelihood ratios, we provide explanation of some of the factors causing this behaviour. This is of high importance because if two solutions are used for the same crime case, the difference must be understood. Otherwise, because of lack of consensus, the results would cause confusion or, in the worst case, would not be admitted by the court.
235

Hierarchical Initial Condition Generator for Cosmic Structure Using Normalizing Flows / Hierarkisk generator av begynnelsetillstånd till kosmisk struktur med användning av normaliserade flöden

Holma, Pontus January 2024 (has links)
In this report, we present a novel Bayesian inference framework to reconstruct the three-dimensional initial conditions of cosmic structure formation from data. To achieve this goal, we leverage deep learning technologies to create a generative model of cosmic initial conditions paired with a fast machine learning surrogate model emulating the complex gravitational structure formation. According to the cosmological paradigm, all observable structures were formed from tiny primordial quantum fluctuations generated during the early stages of the Universe. As time passed, these seed fluctuations grew via gravitational aggregation to form the presently observed cosmic web traced by galaxies. For this reason, the specific shape of a configuration of the observed galaxy distribution retains a memory of its initial conditions and the physical processes that shaped it. To recover this information, we develop a novel machine learning approach that leverages the hierarchical nature of structure formation. We demonstrate our method in a mock analysis and find that we can recover the initial conditions with high accuracy, showing the potential of our model. / I detta examensarbete presenteras ett ramverk baserat på Bayesiansk inferens för att rekonstruera de tredimensionella begynnelsevärdena av kosmisk struktur från data. För att uppnå detta har tekniker från djupinlärning använts för att skapa en generativ modell av kosmiska begynnelsevärden, vilket parats ihop med en snabb maskininlärningsmodell som efterliknar den komplexa gravitationella strukturformationen. Utifrån moderna teorier inom kosmologi skapades alla observerbara strukturer i universum från små kvantfluktuationer i det tidiga universumet. Allt eftersom tiden gick har dessa fluktuationer vuxit via gravitationella krafter för att forma det av galaxer uppspända kosmiska nät som idag kan observeras. På grund av detta bibehåller en specifik form av en konfiguration av den observerade galaxdistributionen ett minne av sina begynnelsevärden och de fysikaliska processer som formade den. För att återfå denna information presenteras en maskininlärningsmetod som utnyttjar den hierarkiska naturen av strukturformation. Metoden demonstreras genom ett modelltest som påvisar att begynnelsevärdena kan återfås med hög noggrannhet, vilket indikerar modellens potential.
236

ANÁLISIS GENÉTICO DE LA GRASA INTRAMUSCULAR EN CONEJO-GENETIC ANALISYS OF INTRAMUSCULAR FAT IN RABBITS

Zomeño, Cristina 15 July 2013 (has links)
En esta tesis se aborda el estudio de la grasa intramuscular como característica determinante de la calidad de la carne para ser utilizado en programas genéticos. El conejo se plantea no sólo por su interés como especie ganadera sino como modelo en otras especies. Este estudio se divide en tres experimentos, cada uno de ellos representa cada uno de los tres objetivos específicos de esta tesis: 1. Estudiar la variabilidad genética entre líneas de conejo de factores ligados directamente a la deposición de grasa tanto en músculo como en tejido adiposo, como son las enzimas que participan en la síntesis y degradación de la grasa y en la composición en ácidos grasos. 2. Puesta a punto de una calibración NIRS para estimar grasa intramuscular y la evaluación de su uso en programas de selección en conejo. 3. Selección divergente por grasa intramuscular. Estudio de las posibilidades de éxito de la selección por grasa intramuscular y conocer las relaciones genéticas entre caracteres examinando la respuesta correlacionada. Supone el inicio de una nueva línea de investigación en metabolismo lipídico, que puede servir tanto a la producción de conejo como animal de carne como al uso del conejo como modelo experimental. Es la primera vez que se propone un experimento de selección divergente por grasa intramuscular. Este experimento permitirá conocer mejor las relaciones genéticas entre la grasa intramuscular, la grasa de la canal, así como las relaciones con otros caracteres productivos. El conocimiento de estas relaciones genéticas va a ser fundamental para futuros programas de mejora genética en todas las especies ganaderas. / Zomeño Segado, C. (2013). ANÁLISIS GENÉTICO DE LA GRASA INTRAMUSCULAR EN CONEJO-GENETIC ANALISYS OF INTRAMUSCULAR FAT IN RABBITS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31121
237

Topic Model-based Mass Spectrometric Data Analysis in Cancer Biomarker Discovery Studies

Wang, Minkun 14 June 2017 (has links)
Identification of disease-related alterations in molecular and cellular mechanisms may reveal useful biomarkers for human diseases including cancers. High-throughput omic technologies for identifying and quantifying multi-level biological molecules (e.g., proteins, glycans, and metabolites) have facilitated the advances in biological research in recent years. Liquid (or gas) chromatography coupled with mass spectrometry (LC/GC-MS) has become an essential tool in such large-scale omic studies. Appropriate LC/GC-MS data preprocessing pipelines are needed to detect true differences between biological groups. Challenges exist in several aspects of MS data analysis. Specifically for biomarker discovery, one fundamental challenge in quantitation of biomolecules is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based omic studies. Purification of mass spectometric data is highly desired prior to subsequent differential analysis. In this research dissertation, we majorly target at addressing the purification problem through probabilistic modeling. We propose an intensity-level purification model (IPM) to computationally purify LC/GC-MS based cancerous data in biomarker discovery studies. We further extend IPM to scan-level purification model (SPM) by considering information from extracted ion chromatogram (EIC, scan-level feature). Both IPM and SPM belong to the category of topic modeling approach, which aims to identify the underlying "topics" (sources) and their mixture proportions in composing the heterogeneous data. Additionally, denoise deconvolution model (DMM) is proposed to capture the noise signals in samples based on purified profiles. Variational expectation-maximization (VEM) and Markov chain Monte Carlo (MCMC) methods are used to draw inference on the latent variables and estimate the model parameters. Before we come to purification, other research topics in related to mass spectrometric data analysis for cancer biomarker discovery are also investigated in this dissertation. Chapter 3 discusses the developed methods in the differential analysis of LC/GC-MS based omic data, specifically for the preprocessing in data of LC-MS profiled glycans. Chapter 4 presents the assumptions and inference details of IPM, SPM, and DDM. A latent Dirichlet allocation (LDA) core is used to model the heterogeneous cancerous data as mixtures of topics consisting of sample-specific pure cancerous source and non-cancerous contaminants. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum and tissue proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis. Chapter 5 elaborates these applications in cancer biomarker discovery, where typical single omic and integrative analysis of multi-omic studies are included. / Ph. D. / This dissertation documents the methodology and outputs for computational deconvolution of heterogeneous omics data generated from biospecimens of interest. These omics data convey qualitative and quantitative information of biomolecules (e.g., glycans, proteins, metabolites, etc.) which are profiled by instruments named liquid (or gas) chromatography and mass spectrometer (LC/GC-MS). In the scenarios of biomarker discovery, we aim to find out the significant difference on intensities of biomolecules with respect to two specific phenotype groups so that the biomarkers can be used as clinical indicators for early stage diagnose. However, the purity of collected samples constitutes the fundamental challenge to the process of differential analysis. Instead of experimental methods that are costly and time-consuming, we treat the purification task as one of the topic modeling procedures, where we assume each observed biomolecular profile is a mixture of hidden pure source together with unwanted contaminants. The developed models output the estimated mixture proportion as well as the underlying “topics”. With different level’s purification applied, improved discrimination power of candidate biomarkers and more biologically meaningful pathways were discovered in LC/GC-MS based multi-omic studies for liver cancer. This research work originates from a broader scope of probabilistic generative modeling, where rational assumptions are made to characterize the generation process of the observations. Therefore, the developed models in this dissertation have great potential in applications other than heterogeneous data purification discussed in this dissertation. A good example is to uncover the relationship of human gut microbiome with the host’s phenotypes of interest (e.g., disease like type-II diabetes). Similar challenges exist in how to infer the underlying intestinal flora distribution and estimate their mixture proportions. This dissertation also covers topics of related data preprocessing and integration, but with a consistent goal in improving the performance of biomarker discovery. In summary, the research help address sample heterogeneity issue observed in LC/GC-MS based cancer biomarker discovery studies and shed light on computational deconvolution of the mixtures, which can be generalized to other domains of interest.
238

Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly

Shi, Xu 24 October 2017 (has links)
The rapid development of biotechnology has enabled researchers to collect high-throughput data for studying various biological processes at the genomic level, transcriptomic level, and proteomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. The challenges call for more efforts in developing efficient and effective computational methods to analyze the data at different levels so as to understand the biological systems in different aspects. In this dissertation research, we have developed novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. Specifically, we focus on two research topics in this dissertation: isoform identification and phenotype-specific transcript assembly. For isoform identification, we develop a computational approach, SparseIso, to jointly model the existence and abundance of isoforms in a Bayesian framework. A spike-and-slab prior is incorporated into the model to enforce the sparsity of expressed isoforms. A Gibbs sampler is developed to sample the existence and abundance of isoforms iteratively. For transcript assembly, we develop a Bayesian approach, IntAPT, to assemble phenotype-specific transcripts from multiple RNA sequencing profiles. A two-layer Bayesian framework is used to model the existence of phenotype-specific transcripts and the transcript abundance in individual samples. Based on the hierarchical Bayesian model, a Gibbs sampling algorithm is developed to estimate the joint posterior distribution for phenotype-specific transcript assembly. The performances of our proposed methods are evaluated with simulation data, compared with existing methods and benchmarked with real cell line data. We then apply our methods on breast cancer data to identify biologically meaningful splicing mechanisms associated with breast cancer. For the further work, we will extend our methods for de novo transcript assembly to identify novel isoforms in biological systems; we will incorporate isoform-specific networks into our methods to better understand splicing mechanisms in biological systems. / Ph. D. / The next-generation sequencing technology has significantly improved the resolution of the biomedical research at the genomic level and transcriptomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. In this dissertation, we have developed two novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. We have demonstrated the advantages of our proposed approaches over existing methods on both simulation data and real cell line data. Furthermore, the application of our methods on real breast cancer data and glioblastoma tissue data has further shown the efficacy of our methods in real biological applications.
239

Essays on Bayesian Inference for Social Networks

Koskinen, Johan January 2004 (has links)
<p>This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time.</p><p>A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation.</p><p>The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step.</p><p>In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications.</p><p>Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.</p>
240

Robust spatio-temporal latent variable models

Christmas, Jacqueline January 2011 (has links)
Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are widely-used mathematical models for decomposing multivariate data. They capture spatial relationships between variables, but ignore any temporal relationships that might exist between observations. Probabilistic PCA (PPCA) and Probabilistic CCA (ProbCCA) are versions of these two models that explain the statistical properties of the observed variables as linear mixtures of an alternative, hypothetical set of hidden, or latent, variables and explicitly model noise. Both the noise and the latent variables are assumed to be Gaussian distributed. This thesis introduces two new models, named PPCA-AR and ProbCCA-AR, that augment PPCA and ProbCCA respectively with autoregressive processes over the latent variables to additionally capture temporal relationships between the observations. To make PPCA-AR and ProbCCA-AR robust to outliers and able to model leptokurtic data, the Gaussian assumptions are replaced with infinite scale mixtures of Gaussians, using the Student-t distribution. Bayesian inference calculates posterior probability distributions for each of the parameter variables, from which we obtain a measure of confidence in the inference. It avoids the pitfalls associated with the maximum likelihood method: integrating over all possible values of the parameter variables guards against overfitting. For these new models the integrals required for exact Bayesian inference are intractable; instead a method of approximation, the variational Bayesian approach, is used. This enables the use of automatic relevance determination to estimate the model orders. PPCA-AR and ProbCCA-AR can be viewed as linear dynamical systems, so the forward-backward algorithm, also known as the Baum-Welch algorithm, is used as an efficient method for inferring the posterior distributions of the latent variables. The exact algorithm is tractable because Gaussian assumptions are made regarding the distribution of the latent variables. This thesis introduces a variational Bayesian forward-backward algorithm based on Student-t assumptions. The new models are demonstrated on synthetic datasets and on real remote sensing and EEG data.

Page generated in 0.0567 seconds