Global ETD Search

241	Probabilistic skylines on uncertain data Jiang, Bin, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline analysis is important for multi-criteria decision making applications. The data in some of these applications are inherently uncertain due to various factors. Although a considerable amount of research has been dedicated separately to efficient skyline computation, as well as modeling uncertain data and answering some types of queries on uncertain data, how to conduct skyline analysis on uncertain data remains an open problem at large. In this thesis, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. An uncertain object is conceptually described by a probability density function (PDF) in the continuous case, or in the discrete case a set of instances (points) such that each instance has a probability to appear. We develop two efficient algorithms, the bottom-up and top-down algorithms, of computing p-skyline of a set of uncertain objects in the discrete case. We also discuss that our techniques can be applied to the continuous case as well. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. Uncertainty (Information theory) Skyline queries. Database management. Probabilities -- Data processing. Probabilities -- Computer simulations. Probabilities -- Mathematical models.
242	Statistical inference for banding data Liu, Fei, 劉飛 January 2008 (has links) published_or_final_version / Statistics and Actuarial Science / Master / Master of Philosophy Survival analysis (Biometry) Regression analysis. Probabilities.
243	Stochastic nonlinear models of DNA breathing at a defect Duduială, Ciprian Ionut January 2010 (has links) Deoxyribonucleic acid (DNA) is a long polymer consisting of two chains of bases, in which the genetic information is stored. A base from one chain has a corresponding base on the other chain which together form a so-called base-pair. Molecular-dynamics simulations of a normal DNA duplex show that breathing events – the temporary opening of one or more base-pairs – typically occur on the microsecond time-scale. Using the molecular dynamics package AMBER, we analyse, for different twist angles in the range 30-40 degrees of twist, a 12 basepair DNA duplex solvated in a water box, which contains the ’rogue’ base difluorotoluene (F) in place of a thymine base (T). This replacement makes breathing occur on the nanosecond time-scale. The time spent simulating such large systems, as well as the variation of breathing length and frequency with helical twist, determined us to create a simplified model, which is capable to predict with accuracy the DNA behaviour. Starting from a nonlinear Klein-Gordon lattice model and adding noise and damping to our system, we obtain a new mesoscopic model of the DNA duplex, close to that observed in experiments and all-atom MD simulations. Defects are considered in the inter-chain interactions as well as in the along-chain interactions. The system parameters are fitted to AMBER data using the maximum likelihood method. This model enables us to discuss the role of the fluctuation-dissipation relations in the derivation of reduced (mesoscopic) models, the differences between the potential of mean force and the potential energies used in Klein-Gordon lattices and how breathing can be viewed as competition between the along-chain elastic energy, the inter-chain binding energy and the entropy term of the system’s free energy. Using traditional analysis methods, such as principal component analysis, data autocorrelation, normal modes and Fourier transform, we compare the AMBER and SDE simulations to emphasize the strength of the proposed model. In addition, the Fourier transform of the trajectory of the A-F base-pair suggests that DNA is a self-organised system and our SDE model is also capable of preserving this behaviour. However, we reach the conclusion that the critical DNA behaviour needs further investigations, since it might offer some information about bubble nucleation and growth and even about DNA transcription and replication. 572.8
244	Stochastic epidemic models for emerging diseases Spencer, Simon January 2008 (has links) In this thesis several problems concerning the stochastic modelling of emerging infections are considered. Mathematical modelling is often the only available method of predicting the extent of an emerging disease and assessing proposed control measures, as there may be little or no available data on previous outbreaks. Only stochastic models capture the inherent randomness in disease transmission observed in real-life outbreaks, which can strongly influence the outcome of an emerging epidemic because case numbers will initially be small compared with the population size. Chapter 2 considers a model for diseases in which some of the cases exhibit no symptoms and are therefore difficult to observe. Examples of such diseases include influenza, mumps and polio. This chapter investigates the problem of determining whether or not the epidemic has died out if a period containing no symptomatic individuals is observed. When modelling interventions, it is realistic to include a delay between observing the presence of infection and the implementation of control measures. Chapter 3 quantifies the effect that the length of such a delay has on an epidemic amongst a population divided into households. As well as a constant delay, an exponentially distributed delay is also considered. Chapter 4 develops a model for the spread of an emerging strain of influenza in humans. By considering the probability that an outbreak will be contained within a region in which an intervention strategy is active, it becomes possible to quantify and therefore compare the effectiveness of intervention strategies. 519
245	Asymmetric particle systems and last-passage percolation in one and two dimensions Schmidt, Philipp January 2011 (has links) This thesis studies three models: Multi-type TASEP in discrete time, long-range last- passage percolation on the line and convoy formation in a travelling servers model. All three models are relatively easy to state but they show a very rich and interesting behaviour. The TASEP is a basic model for a one-dimensional interacting particle system with non-reversible dynamics. We study some aspects of the TASEP in discrete time and compare the results to recently obtained results for the TASEP in continuous time. In particular we focus on stationary distributions for multi-type models, speeds of second- class particles, collision probabilities and the speed process. We consider various natural update rules. 519.2
246	New results in probabilistic modeling. / CUHK electronic theses & dissertations collection January 2000 (has links) Chan Ho-leung. / "December 2000." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (p. 154-[160]). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Distribution (Probability theory) Probabilities--Mathematical models
247	Some limit theorems and inequalities for weighted and non-identically distributed empirical processes Alexander, Kenneth S January 1982 (has links) Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND SCIENCE / Vita. / Bibliography: leaves 135-137. / by Kenneth Sidney Alexander. / Ph.D. Mathematics. Probabilities Central limit theorem Inequalities (Mathematics)
248	Semiparametric inference with shape constraints Patra, Rohit Kumar January 2016 (has links) This thesis deals with estimation and inference in two semiparametric problems: a two-component mixture model and a single index regression model. For the two-component mixture model, we assume that the distribution of one component is known and develop methods for estimating the mixing proportion and the unknown distribution using ideas from shape restricted function estimation. We establish the consistency of our estimators. We find the rate of convergence and the asymptotic limit of our estimator for the mixing proportion. Furthermore, we develop a completely automated distribution-free honest finite sample lower confidence bound for the mixing proportion. We compare the proposed estimators, which are easily implementable, with some of the existing procedures through simulation studies and analyse two data sets, one arising from an application in astronomy and the other from a microarray experiment. For the single index model, we consider estimation of the unknown link function and the finite dimensional index parameter. We study the problem when the true link function is assumed to be: (1) smooth or (2) convex. When the link function is just assumed to be smooth, in contrast to standard kernel based methods, we use smoothing splines to estimate the link function. We prove the consistency and find the rates of convergence of the proposed estimators. We establish root-n-rate of convergence and the semiparametric efficiency of the parametric component under mild assumptions. When the link function is assumed to be convex, we propose a shape constrained penalized least squares estimator and a Lipschitz constrained least squares estimator for the unknown quantities. We prove the consistency and find the rates of convergence for both estimators. For the shape constrained penalized least squares estimator, we establish root-n-rate of convergence and the semiparametric efficiency of the parametric component under mild assumptions and conjecture that the parametric component of the Lipschitz constrained least squares estimator is semiparametrically efficient. We develop the R package "simest'' that can be used (to compute the proposed estimators) even for moderately large dimensions. Regression analysis Mathematical statistics Probabilities Statistics Statistics
249	Distributionally Robust Performance Analysis: Data, Dependence and Extremes He, Fei January 2018 (has links) This dissertation focuses on distributionally robust performance analysis, which is an area of applied probability whose aim is to quantify the impact of model errors. Stochastic models are built to describe phenomena of interest with the intent of gaining insights or making informed decisions. Typically, however, the fidelity of these models (i.e. how closely they describe the underlying reality) may be compromised due to either the lack of information available or tractability considerations. The goal of distributionally robust performance analysis is then to quantify, and potentially mitigate, the impact of errors or model misspecifications. As such, distributionally robust performance analysis affects virtually any area in which stochastic modelling is used for analysis or decision making. This dissertation studies various aspects of distributionally robust performance analysis. For example, we are concerned with quantifying the impact of model error in tail estimation using extreme value theory. We are also concerned with the impact of the dependence structure in risk analysis when marginal distributions of risk factors are known. In addition, we also are interested in connections recently found to machine learning and other statistical estimators which are based on distributionally robust optimization. The first problem that we consider consists in studying the impact of model specification in the context of extreme quantiles and tail probabilities. There is a rich statistical theory that allows to extrapolate tail behavior based on limited information. This body of theory is known as extreme value theory and it has been successfully applied to a wide range of settings, including building physical infrastructure to withstand extreme environmental events and also guiding the capital requirements of insurance companies to ensure their financial solvency. Not surprisingly, attempting to extrapolate out into the tail of a distribution from limited observations requires imposing assumptions which are impossible to verify. The assumptions imposed in extreme value theory imply that a parametric family of models (known as generalized extreme value distributions) can be used to perform tail estimation. Because such assumptions are so difficult (or impossible) to be verified, we use distributionally robust optimization to enhance extreme value statistical analysis. Our approach results in a procedure which can be easily applied in conjunction with standard extreme value analysis and we show that our estimators enjoy correct coverage even in settings in which the assumptions imposed by extreme value theory fail to hold. In addition to extreme value estimation, which is associated to risk analysis via extreme events, another feature which often plays a role in the risk analysis is the impact of dependence structure among risk factors. In the second chapter we study the question of evaluating the worst-case expected cost involving two sources of uncertainty, each of them with a specific marginal probability distribution. The worst-case expectation is optimized over all joint probability distributions which are consistent with the marginal distributions specified for each source of uncertainty. So, our formulation allows to capture the impact of the dependence structure of the risk factors. This formulation is equivalent to the so-called Monge-Kantorovich problem studied in optimal transport theory, whose theoretical properties have been studied in the literature substantially. However, rates of convergence of computational algorithms for this problem have been studied only recently. We show that if one of the random variables takes finitely many values, a direct Monte Carlo approach allows to evaluate such worst case expectation with $O(n^{-1/2})$ convergence rate as the number of Monte Carlo samples, $n$, increases to infinity. Next, we continue our investigation of worst-case expectations in the context of multiple risk factors, not only two of them, assuming that their marginal probability distributions are fixed. This problem does not fit the mold of standard optimal transport (or Monge-Kantorovich) problems. We consider, however, cost functions which are separable in the sense of being a sum of functions which depend on adjacent pairs of risk factors (think of the factors indexed by time). In this setting, we are able to reduce the problem to the study of several separate Monge-Kantorovich problems. Moreover, we explain how we can even include martingale constraints which are often natural to consider in settings such as financial applications. While in the previous chapters we focused on the impact of tail modeling or dependence, in the later parts of the dissertation we take a broader view by studying decisions which are made based on empirical observations. So, we focus on so-called distributionally robust optimization formulations. We use optimal transport theory to model the degree of distributional uncertainty or model misspecification. Distributionally robust optimization based on optimal transport has been a very active research topic in recent years, our contribution consists in studying how to specify the optimal transport metric in a data-driven way. We explain our procedure in the context of classification, which is of substantial importance in machine learning applications. Operations research Performance--Analysis Stochastic models Probabilities
250	Properties of the maximum likelihood and Bayesian estimators of availability Kuo, Way January 2011 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries Probabilities Bayesian statistical decision theory Statistics

Search results