Spelling suggestions: "subject:"bayesian"" "subject:"eayesian""
391 |
Non-parametric Bayesian prediction of landmark times for analysis of failure-time dataLustgarten, Stephanie 24 June 2024 (has links)
In clinical trials with failure-time primary outcomes, also known as "event-driven" designs, the statistical information is determined by total observed events. Examples of failure-time clinical trial endpoints include: time to death and time to disease progression. In trials with event-driven designs, the interim and final analyses are performed after a pre-specified number of events have been observed, based on a priori design considerations, rather than after observing patients for a pre-specified period of time.
The timing of these analyses represent important milestones in the conduct of the study. In particular, if a trial requires review of interim analyses by a Data Monitoring Committee (DMC), convening the DMC members requires much advance planning and effort. In addition, advanced knowledge of when these milestones will occur can allow trial sponsors to make informed decisions regarding resources and financial planning. It is therefore of interest to predict when a pre-specified number of events will be observed based on accumulating data.
Parametric and semi-parametric methods have been proposed for event prediction when data are right censored. In cases when the underlying failure time distribution is unknown or accumulated events are relatively sparse, these methods may not provide accurate or efficient prediction. We propose a method to predict the number of events that is a fully Bayesian non-parametric approach in modeling the survival probabilities that is more flexible and generalizes to interval censored data. We use a Gibbs sampler to sample from the posterior of the survival distribution to obtain point and interval estimates for the specified number of events.
We compare the accuracy and precision of this approach to proposed parametric and semi-parametric methods under a variety of data generating mechanisms, beginning with right-censored data. We then extend the study to interval-censored data, comparing the methods under data generated from varying assessment intervals. Finally we consider the scenario in which we are blinded to treatment assignment, incorporating a Bayesian approach to determine the probability of membership to a particular treatment group. We demonstrate the proposed method offers greater flexibility and has the ability to match or outperform existing methods under multiple clinical trial scenarios.
|
392 |
Network-based methods to identify mechanisms of action in disease and drug perturbation profiles using high-throughput genomic dataPham, Lisa M. 24 June 2024 (has links)
In the past decade it has become increasingly clear that a biological response is rarely caused by a single gene or protein. Rather, it is a result of a myriad of biological factors, constituting a systematic network of biological variables that span multiple granularities of biology from gene transcription to cell metabolism. Therefore it has become a significant challenge in the field of bioinformatics to integrate different levels of biology and to think of biological problems from a network perspective. In my thesis, I will discuss three projects that address this challenge.
First, I will introduce two novel methods that integrate quantitative and qualitative biological data in a network approach. My aim in chapters two and three is to combine high-throughput data with biological databases to identify the causal mechanisms of action (MoA), in the form of canonical biological pathways, underlying the data for a given phenotype. In the second chapter, I will introduce an algorithm called Latent Pathway Identification Analysis (LPIA). This algorithm looks for statistically significant evidence of dysregulation in a network of pathways constructed in a manner that explicitly links pathways through their common function in the cell.
In chapter three, I will introduce a new method that focuses on the identification of perturbed pathways from high-throughput gene expression data, which we approach as a task in statistical modeling and inference. We develop a two-level statistical model, where (i) the first level captures the relationship between high-throughput gene expression and biological pathways, and (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation.
In the fourth chapter, I will focus on the integration of high throughput data on two distinct levels of biology to elucidate associations and causal relationships amongst genotype, gene expression and glycemic traits relevant to Type 2 Diabetes. I use the Framingham heart study as well as its extension, the SABRe initiative, to identify genes whose expression may be causally linked to fasting glucose.
|
393 |
Safety of Flight Prediction for Small Unmanned Aerial Vehicles Using Dynamic Bayesian NetworksBurns, Meghan Colleen 23 May 2018 (has links)
This thesis compares three variations of the Bayesian network as an aid for decision-making using uncertain information. After reviewing the basic theory underlying probabilistic graphical models and Bayesian estimation, the thesis presents a user-defined static Bayesian network, a static Bayesian network in which the parameter values are learned from data, and a dynamic Bayesian network with learning. As a basis for the comparison, these models are used to provide a prior assessment of the safety of flight of a small unmanned aircraft, taking into consideration the state of the aircraft and weather. The results of the analysis indicate that the dynamic Bayesian network is more effective than the static networks at predicting safety of flight. / Master of Science / This thesis used probabilities to aid decision-making using uncertain information. This thesis presents three models in the form of networks that use probabilities to aid the assessment of flight safety for a small unmanned aircraft. All three methods are forms of Bayesian networks, graphs that map causal relationships between random variables. Each network models the flight conditions and state of the aircraft; two of the networks are static and one varies with time. The results of the analysis indicate that the dynamic Bayesian network is more effective than the static networks at predicting safety of flight.
|
394 |
Empirical Analysis of User Passwords across Online ServicesWang, Chun 05 June 2018 (has links)
Leaked passwords from data breaches can pose a serious threat if users reuse or slightly modify the passwords for other services. With more and more online services getting breached today, there is still a lack of large-scale quantitative understanding of the risks of password reuse and modification. In this project, we perform the first large-scale empirical analysis of password reuse and modification patterns using a ground-truth dataset of 28.8 million users and their 61.5 million passwords in 107 services over 8 years. We find that password reuse and modification is a very common behavior (observed on 52% of the users). More surprisingly, sensitive online services such as shopping websites and email services received the most reused and modified passwords. We also observe that users would still reuse the already-leaked passwords for other online services for years after the initial data breach. Finally, to quantify the security risks, we develop a new training-based guessing algorithm. Extensive evaluations show that more than 16 million password pairs (30% of the modified passwords and all the reused passwords) can be cracked within just 10 guesses. We argue that more proactive mechanisms are needed to protect user accounts after major data breaches. / Master of Science / Since most of the internet services use text-based passwords for user authentication, the leaked passwords from data breaches pose a serious threat, especially if users reuse or slightly modify the passwords for other services. The attacker can leverage a known password from one site to guess the same user’s passwords at other sites more easily. In this project, we perform the first large-scale study of password usage based on the largest ever leaked password dataset. The dataset consists of 28.8 million users and their 61.5 million passwords from 107 internet services over 8 years. We find that password reuse and modification is a very common behavior (observed on 52% of the users). More surprisingly, we find that sensitive online services such as shopping websites and email services received the most reused and modified passwords. In addition, users would still reuse the already-leaked passwords for other online services for years after the initial data breach. Finally, we develop a cross-site password-guessing algorithm to guess the modified passwords based on one of the user’s leaked passwords. Our password guessing experiments show that 30% of the modified passwords can be cracked within only 10 guesses. Therefore, we argue that more proactive mechanisms are needed to protect user accounts after major data breaches.
|
395 |
Μηχανική μάθηση : Bayesian δίκτυα και εφαρμογέςΧριστακοπούλου, Κωνσταντίνα 13 October 2013 (has links)
Στην παρούσα διπλωματική εργασία πραγματευόμαστε το θέμα της χρήσης των Bayesian Δικτύων -και γενικότερα των Πιθανοτικών Γραφικών Μοντέλων - στη Μηχανική Μάθηση. Στα πρώτα κεφάλαια της εργασίας αυτής παρουσιάζουμε συνοπτικά τη θεωρητική θεμελίωση αυτών των δομημένων πιθανοτικών μοντέλων, η οποία απαρτίζεται από τις βασικές φάσεις της αναπαράστασης, επαγωγής συμπερασμάτων, λήψης αποφάσεων και εκμάθησης από τα διαθέσιμα δεδομένα. Στα επόμενα κεφάλαια, εξετάζουμε ένα ευρύ φάσμα εφαρμογών των πιθανοτικών γραφικών μοντέλων και παρουσιάζουμε τα αποτελέσματα των εξομοιώσεων που υλοποιήσαμε.
Συγκεκριμένα, αρχικά με χρήση γράφων ορίζονται τα Bayesian δίκτυα, Markov δίκτυα και Factor Graphs. Έπειτα, παρουσιάζονται οι αλγόριθμοι επαγωγής συμπερασμάτων που επιτρέπουν τον απευθείας υπολογισμό πιθανοτικών κατανομών από τους γράφους. Διευκολύνεται η λήψη αποφάσεων υπό αβεβαιότητα με τα δέντρα αποφάσεων και τα Influence διαγράμματα. Ακολούθως, μελετάται η εκμάθηση της δομής και των παραμέτρων των πιθανοτικών γραφικών μοντέλων σε παρουσία πλήρους ή μερικού συνόλου δεδομένων. Τέλος, παρουσιάζονται εκτενώς σενάρια τα οποία καταδεικνύουν την εκφραστική δύναμη, την ευελιξία και τη χρηστικότητα των Πιθανοτικών Γραφικών Μοντέλων σε εφαρμογές του πραγματικού κόσμου. / The main subject of this diploma thesis is how probabilistic graphical models can be used in a wide range of real-world scenarios. In the first chapters, we have presented in a concise way the theoretical foundations of graphical models, which consists of the deeply related phases of representation, inference, decision theory and learning from data. In the next chapters, we have worked on many applications, from Optical Character Recognition to Recoginizing Actions and we have presented the results from the simulations.
|
396 |
An Introduction to the Theory and Applications of Bayesian NetworksJaitha, Anant 01 January 2017 (has links)
Bayesian networks are a means to study data. A Bayesian network gives structure to data by creating a graphical system to model the data. It then develops probability distributions over these variables. It explores variables in the problem space and examines the probability distributions related to those variables. It conducts statistical inference over those probability distributions to draw meaning from them. They are good means to explore a large set of data efficiently to make inferences. There are a number of real world applications that already exist and are being actively researched. This paper discusses the theory and applications of Bayesian networks.
|
397 |
Divergência populacional e expansão demográfica de Dendrocolaptes platyrostris (Aves: Dendrocolaptidae) no final do Quaternário / Population divergence and demographic expansion of Dendrocolaptes platyrostris (Aves: Dendrocolaptidae) in the late QuaternaryCampos Junior, Ricardo Fernandes 29 October 2012 (has links)
Dendrocolaptes platyrostris é uma espécie de ave florestal associada às matas de galeria do corredor de vegetação aberta da América do sul (D. p. intermedius) e à Floresta Atlântica (D. p. platyrostris). Em um trabalho anterior, foi observada estrutura genética populacional associada às subespécies, além de dois clados dentro da Floresta Atlântica e evidências de expansão na população do sul, o que é compatível com o modelo Carnaval-Moritz. Utilizando approximate Bayesian computation, o presente trabalho avaliou a diversidade genética de dois marcadores nucleares e um marcador mitocondrial dessa espécie com o objetivo de comparar os resultados obtidos anteriormente com os obtidos utilizando uma estratégia multi-locus e considerando variação coalescente. Os resultados obtidos sugerem uma relação de politomia entre as populações que se separaram durante o último período interglacial, mas expandiram após o último máximo glacial. Este resultado é consistente com o modelo de Carnaval-Moritz, o qual sugere que as populações sofreram alterações demográficas devido às alterações climáticas ocorridas nestes períodos. Trabalhos futuros incluindo outros marcadores e modelos que incluam estabilidade em algumas populações e expansão em outras são necessários para avaliar o presente resultado / Dendrocolaptes platyrostris is a forest specialist bird associated to gallery forests of the open vegetation corridor of South America (D. p. intermedius) and to the Atlantic forest (D. p. platyrostris). A previous study showed a population genetic structure associated with the subspecies, two clades within the Atlantic forest, and evidence of population expansion in the south, which is compatible with Carnaval- Moritz\'s model. The present study evaluated the genetic diversity of two nuclear and one mitochondrial markers of this species using approximate Bayesian computation, in order to compare the results previously obtained with those based on a multi-locus strategy and considering the coalescent variation. The results suggest a polytomic relationship among the populations that split during the last interglacial period and expanded after the last glacial maximum. This result is consistent with the model of Carnaval-Moritz, which suggests that populations have undergone demographic changes due to climatic changes that occurred in these periods. Future studies including other markers and models that include stability in some populations and expansion in others are needed to evaluate the present result
|
398 |
Aplicações do approximate Bayesian computation a controle de qualidade / Applications of approximate Bayesian computation in quality controlCampos, Thiago Feitosa 11 June 2015 (has links)
Neste trabalho apresentaremos dois problemas do contexto de controle estatístico da qualidade: monitoramento \"on-line\'\' de qualidade e environmental stress screening, analisados pela óptica bayesiana. Apresentaremos os problemas dos modelos bayesianos relativos a sua aplicação e, os reanalisamos com o auxílio do ABC o que nos fornece resultados de uma maneira mais rápida, e assim possibilita análises diferenciadas e a previsão novas observações. / In this work we will present two problems in the context of statistical quality control: on line quality monitoring and environmental stress screening, analyzed from the Bayesian perspective. We will present problems of the Bayesian models related to their application, and also we reanalyze the problems with the assistance of ABC methods which provides results in a faster way, and so enabling differentiated analyzes and new observations forecast.
|
399 |
A comparison of Bayesian model selection based on MCMC with an application to GARCH-type modelsMiazhynskaia, Tatiana, Frühwirth-Schnatter, Sylvia, Dorffner, Georg January 2003 (has links) (PDF)
This paper presents a comprehensive review and comparison of five computational methods for Bayesian model selection, based on MCMC simulations from posterior model parameter distributions. We apply these methods to a well-known and important class of models in financial time series analysis, namely GARCH and GARCH-t models for conditional return distributions (assuming normal and t-distributions). We compare their performance vis--vis the more common maximum likelihood-based model selection on both simulated and real market data. All five MCMC methods proved feasible in both cases, although differing in their computational demands. Results on simulated data show that for large degrees of freedom (where the t-distribution becomes more similar to a normal one), Bayesian model selection results in better decisions in favour of the true model than maximum likelihood. Results on market data show the feasibility of all model selection methods, mainly because the distributions appear to be decisively non-Gaussian. / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
|
400 |
Structured Bayesian methods for splicing analysis in RNA-seq dataHuang, Yuanhua January 2018 (has links)
In most eukaryotes, alternative splicing is an important regulatory mechanism of gene expression that results in a single gene coding for multiple protein isoforms, thus largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide splicing isoform quantification, and several effective and powerful methods have been developed for splicing analysis with RNA-seq data. However, it remains problematic for genes with low coverages or large number of isoforms. These difficulties may in principle be ameliorated by exploiting correlations encoded in the structured data sources. This thesis contributes to developments of Bayesian methods for splicing analysis by leveraging additional information in multiple datasets with structured prior distributions. First, we developed DICEseq, the first isoform quantification method tailored to time-series RNA-seq experiments. DICEseq explicitly models the correlations between experiments at different time points to aid the quantification of isoforms across experiments. Numerical experiments on both simulated and real datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian hierarchical model which resolves the difficulties in splicing analysis in single-cell RNA-seq (scRNA-seq) data by learning an informative prior distribution from sequence features. This method combines the quantification and imputation for splicing analysis via a Bayesian way, which is particularly useful in scRNA-seq data due to its extreme low coverages and high technical noises. We validated BRIE on several scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion ratios in single cells. Third, we provided an effective tool by using Bayes factor to sensitively detect differential splicing between different single cells. When applying BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing events across cell population, for example alternative exons in DNMT3B. In summary, this thesis proposes structured Bayesian methods to integrate multiple datasets to improve splicing analysis and study its biological functions.
|
Page generated in 0.0254 seconds