• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 170
  • 17
  • 15
  • 11
  • 10
  • 8
  • 8
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 333
  • 333
  • 333
  • 333
  • 145
  • 78
  • 73
  • 54
  • 47
  • 46
  • 43
  • 42
  • 42
  • 31
  • 29
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Discovering contextual connections between biological processes using high-throughput data

Lasher, Christopher Donald 21 October 2011 (has links)
Hearkening to calls from life scientists for aid in interpreting rapidly-growing repositories of data, the fields of bioinformatics and computational systems biology continue to bear increasingly sophisticated methods capable of summarizing and distilling pertinent phenomena captured by high-throughput experiments. Techniques in analysis of genome-wide gene expression (e.g., microarray) data, for example, have moved beyond simply detecting individual genes perturbed in treatment-control experiments to reporting the collective perturbation of biologically-related collections of genes, or "processes". Recent expression analysis methods have focused on improving comprehensibility of results by reporting concise, non-redundant sets of processes by leveraging statistical modeling techniques such as Bayesian networks. Simultaneously, integrating gene expression measurements with gene interaction networks has led to computation of response networks--subgraphs of interaction networks in which genes exhibit strong collective perturbation or co-expression. Methods that integrate process annotations of genes with interaction networks identify high-level connections between biological processes, themselves. To identify context-specific changes in these inter-process connections, however, techniques beyond process-based expression analysis, which reports only perturbed processes and not their relationships, response networks, composed of interactions between genes rather than processes, and existing techniques in process connection detection, which do not incorporate specific biological context, proved necessary. We present two novel methods which take inspiration from the latest techniques in process-based gene expression analysis, computation of response networks, and computation of inter-process connections. We motivate the need for detecting inter-process connections by identifying a collection of processes exhibiting significant differences in collective expression in two liver tissue culture systems widely used in toxicological and pharmaceutical assays. Next, we identify perturbed connections between these processes via a novel method that integrates gene expression, interaction, and annotation data. Finally, we present another novel method that computes non-redundant sets of perturbed inter-process connections, and apply it to several additional liver-related data sets. These applications demonstrate the ability of our methods to capture and report biologically relevant high-level trends. / Ph. D.
122

Computational Modeling for Differential Analysis of RNA-seq and Methylation data

Wang, Xiao 16 August 2016 (has links)
Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment. / Ph. D.
123

Bayesian Alignment Model for Analysis of LC-MS-based Omic Data

Tsai, Tsung-Heng 22 May 2014 (has links)
Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used in various omic studies for biomarker discovery. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time alignment is one of the most important yet challenging preprocessing steps, in order to ensure that ion intensity measurements among multiple LC-MS runs are comparable. In this dissertation, we propose a Bayesian alignment model (BAM) for analysis of LC-MS data. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and provides estimates of the retention time variability along with uncertainty measures, enabling a natural framework to integrate information of various sources. From methodology development to practical application, we investigate the alignment problem through three research topics: 1) development of single-profile Bayesian alignment model, 2) development of multi-profile Bayesian alignment model, and 3) application to biomarker discovery research. Chapter 2 introduces the profile-based Bayesian alignment using a single chromatogram, e.g., base peak chromatogram from each LC-MS run. The single-profile alignment model improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler using a block Metropolis-Hastings algorithm, and 2) an adaptive mechanism for knot specification using stochastic search variable selection (SSVS). Chapter 3 extends the model to integrate complementary information that better captures the variability in chromatographic separation. We use Gaussian process regression on the internal standards to derive a prior distribution for the mapping functions. In addition, a clustering approach is proposed to identify multiple representative chromatograms for each LC-MS run. With the Gaussian process prior, these chromatograms are simultaneously considered in the profile-based alignment, which greatly improves the model estimation and facilitates the subsequent peak matching process. Chapter 4 demonstrates the applicability of the proposed Bayesian alignment model to biomarker discovery research. We integrate the proposed Bayesian alignment model into a rigorous preprocessing pipeline for LC-MS data analysis. Through the developed analysis pipeline, candidate biomarkers for hepatocellular carcinoma (HCC) are identified and confirmed on a complementary platform. / Ph. D.
124

A Bayesian Approach to Estimating Background Flows from a Passive Scalar

Krometis, Justin 26 June 2018 (has links)
We consider the statistical inverse problem of estimating a background flow field (e.g., of air or water) from the partial and noisy observation of a passive scalar (e.g., the concentration of a pollutant). Here the unknown is a vector field that is specified by large or infinite number of degrees of freedom. We show that the inverse problem is ill-posed, i.e., there may be many or no background flows that match a given set of observations. We therefore adopt a Bayesian approach, incorporating prior knowledge of background flows and models of the observation error to develop probabilistic estimates of the fluid flow. In doing so, we leverage frameworks developed in recent years for infinite-dimensional Bayesian inference. We provide conditions under which the inference is consistent, i.e., the posterior measure converges to a Dirac measure on the true background flow as the number of observations of the solute concentration grows large. We also define several computationally-efficient algorithms adapted to the problem. One is an adjoint method for computation of the gradient of the log likelihood, a key ingredient in many numerical methods. A second is a particle method that allows direct computation of point observations of the solute concentration, leveraging the structure of the inverse problem to avoid approximation of the full infinite-dimensional scalar field. Finally, we identify two interesting example problems with very different posterior structures, which we use to conduct a large-scale benchmark of the convergence of several Markov Chain Monte Carlo methods that have been developed in recent years for infinite-dimensional settings. / Ph. D.
125

Bayesian Hierarchical Modeling and Markov Chain Simulation for Chronic Wasting Disease

Mehl, Christopher 05 1900 (has links)
In this thesis, a dynamic spatial model for the spread of Chronic Wasting Disease in Colorado mule deer is derived from a system of differential equations that captures the qualitative spatial and temporal behaviour of the disease. These differential equations are incorporated into an empirical Bayesian hierarchical model through the unusual step of deterministic autoregressive updates. Spatial effects in the model are described directly in the differential equations rather than through the use of correlations in the data. The use of deterministic updates is a simplification that reduces the number of parameters that must be estimated, yet still provides a flexible model that gives reasonable predictions for the disease. The posterior distribution generated by the data model hierarchy possesses characteristics that are atypical for many Markov chain Monte Carlo simulation techniques. To address these difficulties, a new MCMC technique is developed that has qualities similar to recently introduced tempered Langevin type algorithms. The methodology is used to fit the CWD model, and posterior parameter estimates are then used to obtain predictions about Chronic Wasting Disease.
126

Statistical potentials for evolutionary studies

Kleinman, Claudia L. 06 1900 (has links)
Les séquences protéiques naturelles sont le résultat net de l’interaction entre les mécanismes de mutation, de sélection naturelle et de dérive stochastique au cours des temps évolutifs. Les modèles probabilistes d’évolution moléculaire qui tiennent compte de ces différents facteurs ont été substantiellement améliorés au cours des dernières années. En particulier, ont été proposés des modèles incorporant explicitement la structure des protéines et les interdépendances entre sites, ainsi que les outils statistiques pour évaluer la performance de ces modèles. Toutefois, en dépit des avancées significatives dans cette direction, seules des représentations très simplifiées de la structure protéique ont été utilisées jusqu’à présent. Dans ce contexte, le sujet général de cette thèse est la modélisation de la structure tridimensionnelle des protéines, en tenant compte des limitations pratiques imposées par l’utilisation de méthodes phylogénétiques très gourmandes en temps de calcul. Dans un premier temps, une méthode statistique générale est présentée, visant à optimiser les paramètres d’un potentiel statistique (qui est une pseudo-énergie mesurant la compatibilité séquence-structure). La forme fonctionnelle du potentiel est par la suite raffinée, en augmentant le niveau de détails dans la description structurale sans alourdir les coûts computationnels. Plusieurs éléments structuraux sont explorés : interactions entre pairs de résidus, accessibilité au solvant, conformation de la chaîne principale et flexibilité. Les potentiels sont ensuite inclus dans un modèle d’évolution et leur performance est évaluée en termes d’ajustement statistique à des données réelles, et contrastée avec des modèles d’évolution standards. Finalement, le nouveau modèle structurellement contraint ainsi obtenu est utilisé pour mieux comprendre les relations entre niveau d’expression des gènes et sélection et conservation de leur séquence protéique. / Protein sequences are the net result of the interplay of mutation, natural selection and stochastic variation. Probabilistic models of molecular evolution accounting for these processes have been substantially improved over the last years. In particular, models that explicitly incorporate protein structure and site interdependencies have recently been developed, as well as statistical tools for assessing their performance. Despite major advances in this direction, only simple representations of protein structure have been used so far. In this context, the main theme of this dissertation has been the modeling of three-dimensional protein structure for evolutionary studies, taking into account the limitations imposed by computationally demanding phylogenetic methods. First, a general statistical framework for optimizing the parameters of a statistical potential (an energy-like scoring system for sequence-structure compatibility) is presented. The functional form of the potential is then refined, increasing the detail of structural description without inflating computational costs. Always at the residue-level, several structural elements are investigated: pairwise distance interactions, solvent accessibility, backbone conformation and flexibility of the residues. The potentials are then included into an evolutionary model and their performance is assessed in terms of model fit, compared to standard evolutionary models. Finally, this new structurally constrained phylogenetic model is used to better understand the selective forces behind the differences in conservation found in genes of very different expression levels.
127

Reversible Jump Markov Chain Monte Carlo

Neuhoff, Daniel 15 March 2016 (has links)
Die vier in der vorliegenden Dissertation enthaltenen Studien beschäftigen sich vorwiegend mit dem dynamischen Verhalten makroökonomischer Zeitreihen. Diese Dynamiken werden sowohl im Kontext eines einfachen DSGE Modells, als auch aus der Sichtweise reiner Zeitreihenmodelle untersucht. / The four studies of this thesis are concerned predominantly with the dynamics of macroeconomic time series, both in the context of a simple DSGE model, as well as from a pure time series modeling perspective.
128

Online stochastic algorithms / Algorithmes stochastiques en ligne

Li, Le 27 November 2018 (has links)
Cette thèse travaille principalement sur trois sujets. Le premier concentre sur le clustering en ligne dans lequel nous présentons un nouvel algorithme stochastique adaptatif pour regrouper des ensembles de données en ligne. Cet algorithme repose sur l'approche quasi-bayésienne, avec une estimation dynamique (i.e., dépendant du temps) du nombre de clusters. Nous prouvons que cet algorithme atteint une borne de regret de l'ordre et que cette borne est asymptotiquement minimax sous la contrainte sur le nombre de clusters. Nous proposons aussi une implémentation par RJMCMC. Le deuxième sujet est lié à l'apprentissage séquentiel des courbes principales qui cherche à résumer une séquence des données par une courbe continue. Pour ce faire, nous présentons une procédure basée sur une approche maximum a posteriori pour le quasi-posteriori de Gibbs. Nous montrons que la borne de regret de cet algorithme et celui de sa version adaptative est sous-linéaire en l'horizon temporel T. En outre, nous proposons une implémentation par un algorithme glouton local qui intègre des éléments de sleeping experts et de bandit à plusieurs bras. Le troisième concerne les travaux qui visent à accomplir des tâches pratiques au sein d'iAdvize, l'entreprise qui soutient cette thèse. Il inclut l'analyse des sentiments pour les messages textuels et l'implémentation de chatbot dans lesquels la première est réalisé par les méthodes classiques dans la fouille de textes et les statistiques et la seconde repose sur le traitement du langage naturel et les réseaux de neurones artificiels. / This thesis works mainly on three subjects. The first one is online clustering in which we introduce a new and adaptive stochastic algorithm to cluster online dataset. It relies on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that this algorithm has a regret bound of the order of and is asymptotically minimax under the constraint on the number of clusters. A RJMCMC-flavored implementation is also proposed. The second subject is related to the sequential learning of principal curves which seeks to represent a sequence of data by a continuous polygonal curve. To this aim, we introduce a procedure based on the MAP of Gibbs-posterior that can give polygonal lines whose number of segments can be chosen automatically. We also show that our procedure is supported by regret bounds with sublinear remainder terms. In addition, a greedy local search implementation that incorporates both sleeping experts and multi-armed bandit ingredients is presented. The third one concerns about the work which aims to fulfilling practical tasks within iAdvize, the company which supports this thesis. It includes sentiment analysis for textual messages by using methods in both text mining and statistics, and implementation of chatbot based on nature language processing and neural networks.
129

MCMC adaptatifs à essais multiples

Fontaine, Simon 09 1900 (has links)
No description available.
130

Elförbrukningen i svenska hushåll : En analys inom projektet ”Förbättrad energistatistik i bebyggelsen” för Energimyndigheten / Electricity consumption in Swedish households : An analysis in the project “Improved energy statistics for settlements” for the Swedish Energy Agency

Nilsson, Josefine, Xie, Jing January 2012 (has links)
Energimyndigheten har drivit ett projekt kallat ”Förbättrad energistatistik i bebyggelsen” för att få mer kunskap om energianvändningen i byggnader.  Denna rapport fokuserar på ”Mätning av hushållsel på apparatnivå” som var ett delprojekt. Diverse regressionsmodeller används i denna rapport för att undersöka sambandet mellan elanvändningen och de olika förklarande variablerna, som exempelvis hushållens bakgrundsvariabler, hushållstyp och geografiska läge, elförbrukningen av olika elapparater samt antalet elapparater. Datamaterialet innefattar 389 hushåll där de flesta är spridda runt om i Mälardalen. Ett fåtal mätningar gjordes på hushåll i Kiruna och Malmö. Slutsatsen vi kan dra från denna uppsats är att hushållens bakgrund, hustyp, geografiska läge och antal elapparater samt dessa apparaters typ har relevans för elförbrukningen i ett hushåll. / The Swedish Energy Agency conducted a project which is called “Improved energy statistics for settlements”. This report focuses on one field of the project: “households’ electricity use on device level”. Various regression models are used in the analysis to analyze the relationship between electricity usage and different explanatory variables, for instance: background variables for the household, type of household, geographical setting, usage of different electrical devices and quantity of electrical devices used.  The data material consists of 389 households which are spread around the region of Märlardalen except for a few households from the communities of Kiruna and Malmö. The conclusion we can draw from this thesis shows that the background variables for a household, its type, its geographical setting and the amount and type of devices it contains all have a contribution to the electricity usage in the household. / Förbättrad energistatistik i bebyggelsen

Page generated in 0.1349 seconds