Global ETD Search

1	Logic sampling, likelihood weighting and AIS-BN : an exploration of importance sampling Wang, Haiou 21 June 2001 (has links) Logic Sampling, Likelihood Weighting and AIS-BN are three variants of stochastic sampling, one class of approximate inference for Bayesian networks. We summarize the ideas underlying each algorithm and the relationship among them. The results from a set of empirical experiments comparing Logic Sampling, Likelihood Weighting and AIS-BN are presented. We also test the impact of each of the proposed heuristics and learning method separately and in combination in order to give a deeper look into AIS-BN, and see how the heuristics and learning method contribute to the power of the algorithm. Key words: belief network, probability inference, Logic Sampling, Likelihood Weighting, Importance Sampling, Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks(AIS-BN), Mean Percentage Error (MPE), Mean Square Error (MSE), Convergence Rate, heuristic, learning method. / Graduation date: 2002 Monte Carlo method Probabilities -- Data processing Bayesian statistical decision theory
2	Mining uncertain data with probabilistic guarantees Sun, Liwen, 孙理文 January 2010 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Uncertainty (Information theory) Data mining. Probabilities - Data processing.
3	Probabilistic skylines on uncertain data Jiang, Bin, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline analysis is important for multi-criteria decision making applications. The data in some of these applications are inherently uncertain due to various factors. Although a considerable amount of research has been dedicated separately to efficient skyline computation, as well as modeling uncertain data and answering some types of queries on uncertain data, how to conduct skyline analysis on uncertain data remains an open problem at large. In this thesis, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. An uncertain object is conceptually described by a probability density function (PDF) in the continuous case, or in the discrete case a set of instances (points) such that each instance has a probability to appear. We develop two efficient algorithms, the bottom-up and top-down algorithms, of computing p-skyline of a set of uncertain objects in the discrete case. We also discuss that our techniques can be applied to the continuous case as well. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. Uncertainty (Information theory) Skyline queries. Database management. Probabilities -- Data processing. Probabilities -- Computer simulations. Probabilities -- Mathematical models.
4	Probabilistic skylines on uncertain data Jiang, Bin, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline analysis is important for multi-criteria decision making applications. The data in some of these applications are inherently uncertain due to various factors. Although a considerable amount of research has been dedicated separately to efficient skyline computation, as well as modeling uncertain data and answering some types of queries on uncertain data, how to conduct skyline analysis on uncertain data remains an open problem at large. In this thesis, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. An uncertain object is conceptually described by a probability density function (PDF) in the continuous case, or in the discrete case a set of instances (points) such that each instance has a probability to appear. We develop two efficient algorithms, the bottom-up and top-down algorithms, of computing p-skyline of a set of uncertain objects in the discrete case. We also discuss that our techniques can be applied to the continuous case as well. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. Uncertainty (Information theory) Skyline queries. Database management. Probabilities -- Data processing. Probabilities -- Computer simulations. Probabilities -- Mathematical models.
5	Probabilistic skylines on uncertain data Jiang, Bin, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline analysis is important for multi-criteria decision making applications. The data in some of these applications are inherently uncertain due to various factors. Although a considerable amount of research has been dedicated separately to efficient skyline computation, as well as modeling uncertain data and answering some types of queries on uncertain data, how to conduct skyline analysis on uncertain data remains an open problem at large. In this thesis, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. An uncertain object is conceptually described by a probability density function (PDF) in the continuous case, or in the discrete case a set of instances (points) such that each instance has a probability to appear. We develop two efficient algorithms, the bottom-up and top-down algorithms, of computing p-skyline of a set of uncertain objects in the discrete case. We also discuss that our techniques can be applied to the continuous case as well. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. Uncertainty (Information theory) Skyline queries. Database management. Probabilities -- Data processing. Probabilities -- Computer simulations. Probabilities -- Mathematical models.
6	Multivariate semiparametric regression models for longitudinal data Li, Zhuokai January 2014 (has links) Multiple-outcome longitudinal data are abundant in clinical investigations. For example, infections with different pathogenic organisms are often tested concurrently, and assessments are usually taken repeatedly over time. It is therefore natural to consider a multivariate modeling approach to accommodate the underlying interrelationship among the multiple longitudinally measured outcomes. This dissertation proposes a multivariate semiparametric modeling framework for such data. Relevant estimation and inference procedures as well as model selection tools are discussed within this modeling framework. The first part of this research focuses on the analytical issues concerning binary data. The second part extends the binary model to a more general situation for data from the exponential family of distributions. The proposed model accounts for the correlations across the outcomes as well as the temporal dependency among the repeated measures of each outcome within an individual. An important feature of the proposed model is the addition of a bivariate smooth function for the depiction of concurrent nonlinear and possibly interacting influences of two independent variables on each outcome. For model implementation, a general approach for parameter estimation is developed by using the maximum penalized likelihood method. For statistical inference, a likelihood-based resampling procedure is proposed to compare the bivariate nonlinear effect surfaces across the outcomes. The final part of the dissertation presents a variable selection tool to facilitate model development in practical data analysis. Using the adaptive least absolute shrinkage and selection operator (LASSO) penalty, the variable selection tool simultaneously identifies important fixed effects and random effects, determines the correlation structure of the outcomes, and selects the interaction effects in the bivariate smooth functions. Model selection and estimation are performed through a two-stage procedure based on an expectation-maximization (EM) algorithm. Simulation studies are conducted to evaluate the performance of the proposed methods. The utility of the methods is demonstrated through several clinical applications. Biostatistics Estimation theory -- Research Biometry -- Methodology -- Research Binary system (Mathematics) -- Research Nonparametric statistics -- Research Probabilities -- Data processing Real-time data processing -- Research Parameter estimation -- Research Latent variables -- Research Meta-analysis -- Research -- Methodology Stochastic processes -- Research Least squares -- Research
7	Statistical analysis of clinical trial data using Monte Carlo methods Han, Baoguang 11 July 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In medical research, data analysis often requires complex statistical methods where no closed-form solutions are available. Under such circumstances, Monte Carlo (MC) methods have found many applications. In this dissertation, we proposed several novel statistical models where MC methods are utilized. For the first part, we focused on semicompeting risks data in which a non-terminal event was subject to dependent censoring by a terminal event. Based on an illness-death multistate survival model, we proposed flexible random effects models. Further, we extended our model to the setting of joint modeling where both semicompeting risks data and repeated marker data are simultaneously analyzed. Since the proposed methods involve high-dimensional integrations, Bayesian Monte Carlo Markov Chain (MCMC) methods were utilized for estimation. The use of Bayesian methods also facilitates the prediction of individual patient outcomes. The proposed methods were demonstrated in both simulation and case studies. For the second part, we focused on re-randomization test, which is a nonparametric method that makes inferences solely based on the randomization procedure used in clinical trials. With this type of inference, Monte Carlo method is often used for generating null distributions on the treatment difference. However, an issue was recently discovered when subjects in a clinical trial were randomized with unbalanced treatment allocation to two treatments according to the minimization algorithm, a randomization procedure frequently used in practice. The null distribution of the re-randomization test statistics was found not to be centered at zero, which comprised power of the test. In this dissertation, we investigated the property of the re-randomization test and proposed a weighted re-randomization method to overcome this issue. The proposed method was demonstrated through extensive simulation studies. Numerical analysis -- Data processing Random variables -- Research Bayesian statistical decision theory Mortality -- Mathematical models Clinical trials -- Statistical methods Medical statistics Biology -- Data processing Probabilities -- Data processing

1

Page generated in 0.0983 seconds