Anderson, Randy J. (Randy Jay)
This study compared the results of his chi square text of independence and the corrected chi square statistic against Fisher's exact probability test (the hypergeometric distribution) in contection with sampling from a finite population. Data were collected by advancing the minimum call size from zero to a maximum which resulted in a tail area probability of 20 percent for sample sizes from 10 to 100 by varying increments. Analysis of the data supported the rejection of the null hypotheses regarding the general rule-of-thumb guidelines concerning sample size, minimum cell expected frequency and the continuity correction factor. it was discovered that the computation using Yates' correction factor resulted in values which were so overly conservative (i.e. tail area porobabilities that were 20 to 50 percent higher than Fisher's exact test) that conclusions drawn from this calculation might prove to be inaccurate. Accordingly, a new correction factor was proposed which eliminated much of this discrepancy. Its performance was equally consistent with that of the uncorrected chi square statistic and at times, even better.
Willman, Edward N. (Edward Nicholas)
The classical normal curve approximation to cumulative hypergeometric probabilities requires that the standard deviation of the hypergeometric distribution be larger than three which limits the usefulness of the approximation for small populations. The purposes of this study are to develop clearly-defined rules which specify when the normal curve approximation to the cumulative hypergeometric probability distribution may be successfully utilized and to determine where maximum absolute differences between the cumulative hypergeometric and normal curve approximation of 0.01 and 0.05 occur in relation to the proportion of the population sampled.
01 January 2011
The objective of this thesis is to examine one of the most fundamental and yet important methodologies used in statistical practice, interval estimation of the probability of success in a binomial distribution. The textbook confidence interval for this problem is known as the Wald interval as it comes from the Wald large sample test for the binomial case. It is generally acknowledged that the actual coverage probability of the standard interval is poor for values of p near 0 or 1. Moreover, recently it has been documented that the coverage properties of the standard interval can be inconsistent even if p is not near the boundaries. For this reason, one would like to study the variety of methods for construction of confidence intervals for unknown probability p in the binomial case. The present thesis accomplishes the task by presenting several methods for constructing confidence intervals for unknown binomial probability p. It is well known that the hypergeometric distribution is related to the binomial distribution. In particular, if the size of the population, N, is large and the number of items of interest k is such that k/N tends to p as N grows, then the hypergeometric distribution can be approximated by the binomial distribution. Therefore, in this case, one can use the confidence intervals constructed for p in the case of the binomial distribution as a basis for construction of the confidence intervals for the unknown value k = pN. The goal of this thesis is to study this approximation and to point out several confidence intervals which are designed specifically for the hypergeometric distribution. In particular, this thesis considers several confidence intervals which are based on estimation of a binomial proportion as well as Bayesian credible sets based on various priors.
Chenette, Nathan Lee
05 July 2012
Large-scale data management systems rely more and more on cloud storage, where the need for efficient search capabilities clashes with the need for data confidentiality. Encryption and efficient accessibility are naturally at odds, as for instance strong encryption necessitates that ciphertexts reveal nothing about underlying data. Searchable encryption is an active field in cryptography studying encryption schemes that provide varying levels of efficiency, functionality, and security, and efficient searchable encryption focuses on schemes enabling sub-linear (in the size of the database) search time. I present the first cryptographic study of efficient searchable symmetric encryption schemes supporting two types of search queries, range queries and error-tolerant queries. The natural solution to accommodate efficient range queries on ciphertexts is to use order-preserving encryption (OPE). I propose a security definition for OPE schemes, construct the first OPE scheme with provable security, and further analyze security by characterizing one-wayness of the scheme. Efficient error-tolerant queries are enabled by efficient fuzzy-searchable encryption (EFSE). For EFSE, I introduce relevant primitives, an optimal security definition and a (somewhat space-inefficient, but in a sense efficient as possible) scheme achieving it, and more efficient schemes that achieve a weaker, but practical, security notion. In all cases, I introduce new appropriate security definitions, construct novel schemes, and prove those schemes secure under standard assumptions. The goal of this line of research is to provide constructions and provable security analysis that should help practitioners decide whether OPE or FSE provides a suitable efficiency-security-functionality tradeoff for a given application.
<p>In this thesis we study several inverse sampling procedures to test for homogeneity in a multivariate hypergeometric distribution. The procedures are finite population analogues of the procedures introduced in Panchapakesan et al. (1998) for the multinomial distribution. In order to develop some exact calculations for critical values not considered in Panchapakesan et al. we introduce some terminologies for target probabilities, transfer probabilities, potential target points, right intersection, and left union. Under the null and the alternative hypotheses, we give theorems to calculate the target and transfer probabilities, we then use these results to develop exact calculations for the critical values and powers of one of the procedures. We also propose a new approximate calculation. In order to speed up some of the calculations, we propose several fast algorithms for multiple summation.</p> <p>N >= 1680000, all the results are the same as those in the multinomial distribution.</p> <p>The computing results showed that the simulations agree closely with the exact results. For small population sizes the critical values and powers of the procedures are different from the corresponding multinomial procedures, but when</p> / Master of Science (MSc)
A Biased Urn Model for Taxonomic Identification / Ein gewichtetes Urnenmodell zur taxonomischen IdentifikationSurovcik, Katharina 26 June 2008 (has links)
No description available.
25 May 2010
In this paper, we first present the basic principles of set theory and combinatorial analysis which are the most useful tools in computing probabilities. Then, we show some important properties derived from axioms of probability. Conditional probabilities come into play not only when some partial information is available, but also as a tool to compute probabilities more easily, even when partial information is unavailable. Then, the concept of random variable and its some related properties are introduced. For univariate random variables, we introduce the basic properties of some common discrete and continuous distributions. The important properties of jointly distributed random variables are also considered. Some inequalities, the law of large numbers and the central limit theorem are discussed. Finally, we introduce additional topics the Poisson process.
Page generated in 0.1355 seconds