Multivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically
learning multivariate Gaussians under the constraint of approximate differential
privacy. These are the first finite sample upper bounds for general Gaussians
which do not impose restrictions on the parameters of the distribution. Our bounds
are near-optimal in the case when the covariance is known to be the identity, and
conjectured to be near-optimal in the general case. From a technical standpoint, we
provide analytic tools for arguing the existence of global "locally small" covers from
local covers of the space. These are exploited using modifications of recent techniques
for for differentially private hypothesis selection.
Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians
under the constraint of approximate differential privacy. We provide the first
sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned
(or even unbounded univariate) Gaussians. To prove our results, we design
a new technique for privately learning mixture distributions. A class of distributions
F is said to be list-decodable if there is an algorithm that, given "heavily corrupted"
samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then
we can privately learn mixtures of distributions in F. Finally, we show axis-aligned
Gaussian distributions are privately list-decodable, thereby proving mixtures of such
distributions are privately learnable. / Thesis / Master of Science (MSc) / Is it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information.
In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/26804 |
Date | January 2021 |
Creators | Aden-Ali, Ishaq |
Contributors | Ashtiani, Hassan, Computing and Software |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0022 seconds