11 |
Privacy Preserving Kin Genomic Data PublishingShang, Hui 16 July 2020 (has links)
No description available.
|
12 |
On the Sample Complexity of Privately Learning Gaussians and their Mixtures / Privately Learning Gaussians and their MixturesAden-Ali, Ishaq January 2021 (has links)
Multivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically
learning multivariate Gaussians under the constraint of approximate differential
privacy. These are the first finite sample upper bounds for general Gaussians
which do not impose restrictions on the parameters of the distribution. Our bounds
are near-optimal in the case when the covariance is known to be the identity, and
conjectured to be near-optimal in the general case. From a technical standpoint, we
provide analytic tools for arguing the existence of global "locally small" covers from
local covers of the space. These are exploited using modifications of recent techniques
for for differentially private hypothesis selection.
Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians
under the constraint of approximate differential privacy. We provide the first
sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned
(or even unbounded univariate) Gaussians. To prove our results, we design
a new technique for privately learning mixture distributions. A class of distributions
F is said to be list-decodable if there is an algorithm that, given "heavily corrupted"
samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then
we can privately learn mixtures of distributions in F. Finally, we show axis-aligned
Gaussian distributions are privately list-decodable, thereby proving mixtures of such
distributions are privately learnable. / Thesis / Master of Science (MSc) / Is it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information.
In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.
|
13 |
DIFFERENTIAL PRIVACY IN DISTRIBUTED SETTINGSZitao Li (14135316) 18 November 2022 (has links)
<p>Data is considered the "new oil" in the information society and digital economy. While many commercial activities and government decisions are based on data, the public raises more concerns about privacy leakage when their private data are collected and used. In this dissertation, we investigate the privacy risks in settings where the data are distributed across multiple data holders, and there is only an untrusted central server. We provide solutions for several problems under this setting with a security notion called differential privacy (DP). Our solutions can guarantee that there is only limited and controllable privacy leakage from the data holder, while the utility of the final results, such as model prediction accuracy, can be still comparable to the ones of the non-private algorithms.</p>
<p><br></p>
<p>First, we investigate the problem of estimating the distribution over a numerical domain while satisfying local differential privacy (LDP). Our protocol prevents privacy leakage in the data collection phase, in which an untrusted data aggregator (or a server) wants to learn the distribution of private numerical data among all users. The protocol consists of 1) a new reporting mechanism called the square wave (SW) mechanism, which randomizes the user inputs before sharing them with the aggregator; 2) an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions.</p>
<p><br></p>
<p>Second, we study the matrix factorization problem in three federated learning settings with an untrusted server, i.e., vertical, horizontal, and local federated learning settings. We propose a generic algorithmic framework for solving the problem in all three settings. We introduce how to adapt the algorithm into differentially private versions to prevent privacy leakage in the training and publishing stages.</p>
<p><br></p>
<p>Finally, we propose an algorithm for solving the k-means clustering problem in vertical federated learning (VFL). A big challenge in VFL is the lack of a global view of each data point. To overcome this challenge, we propose a lightweight and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin (FM) sketch to convey the weight information of the synopsis points. We provide theoretical utility analysis for the cardinality estimation algorithm and further refine it for better empirical performance.</p>
|
14 |
Secure and efficient query processing in outsourced databasesBogatov, Dmytro 16 September 2022 (has links)
As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. Various cryptographic techniques are used in outsourced database systems to ensure data privacy while allowing for efficient querying. This thesis proposes a definition and components of a new secure and efficient outsourced database system, which answers various types of queries, with different privacy guarantees in different security models.
This work starts with the survey of five order-preserving and order-revealing encryption schemes that can be used directly in many database indices, such as the B+ tree, and five range query protocols with various tradeoffs in terms of security and efficiency. The survey systematizes the state-of-the-art range query solutions in a snapshot adversary setting and offers some non-obvious observations regarding the efficiency of the constructions.
The thesis then proceeds with Epsolute - an efficient range query engine in a persistent adversary model. In Epsolute, security is achieved in a setting with a much stronger adversary where she can continuously observe everything on the server, and leaking even the result size can enable a reconstruction attack. Epsolute proposes a definition, construction, analysis, and experimental evaluation of a system that provably hides both access pattern and communication volume while remaining efficient.
The dissertation concludes with k-anon - a secure similarity search engine in a snapshot adversary model. The work presents a construction in which the security of kNN queries is achieved similarly to OPE / ORE solutions - encrypting the input with an approximate Distance Comparison Preserving Encryption scheme so that the inputs, the points in a hyperspace, are perturbed, but the query algorithm still produces accurate results. Analyzing the solution, we run a series of experiments to observe the tradeoff between search accuracy and attack effectiveness. We use TREC datasets and queries for the search, and track the rank quality metrics such as MRR and nDCG. For the attacks, we build an LSTM model that trains on the correlation between a sentence and its embedding and then predicts words from the embedding. We conclude on viability and practicality of the solution.
|
15 |
Practical Privacy-Preserving Federated Learning with Secure Multi-Party ComputationAkhtar, Benjamin Asad 12 August 2024 (has links)
Master of Science / In a world with ever greater need for machine learning and artificial intelligence, it has be- come increasingly important to offload computation intensive tasks to companies with the compute resources to perform training on potentially sensitive data. In applications such as finance or healthcare, the data providers may have a need to train large quantities of data, but cannot reveal the data to outside parties for legal or other reasons. Originally, using a decentralized training method known as Federated Learning (FL) was proposed to ensure data did not leave the client's device. This method still was susceptible to attacks and further security was needed. Multi-Party Computation (MPC) was proposed in conjunction with FL as it provides a way to securely compute with no leakage of data values. This was utilized in a framework called SAFEFL, however, it was extremely slow.
Reducing the computation overhead using programming tools at our disposal for this frame- work turns it from a unpractical to useful design. The design can now be used in industry with some overhead compared to non-MPC computing, however, it has been greatly im- proved.
|
16 |
Enhancing Data Utilization through Advanced Differential Privacy Mechanisms / 有用性を向上させる高度な差分プライバシ機構Takagi, Shun 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25428号 / 情博第866号 / 新制||情||145(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 伊藤 孝行, 教授 鹿島 久嗣, 教授 岡部 寿男, 吉川 正俊(京都大学 名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
17 |
Differentially Private Random Forests for Network Intrusion Detection in a Federated Learning SettingFrid, Alexander January 2023 (has links)
För varje dag som går möter stora industrier en ökad mängd intrång i sina IT-system. De flesta befintliga verktyg som använder sig utav maskininlärning är starkt beroende av stora mängder data, vilket innebär risker under dataöverföringen. Därför har syftet med denna studie varit att undersöka om en decentraliserad integritetsbevarande strategi kan vara ett bra alternativ för att minska effektiviteten av dessa attacker. Mer specifikt skulle användningen av Random Forests, en av de mest populära algoritmerna för maskininlärning, kunna utökas med decentraliseringstekniken Federated Learning tisammans med Differential Privacy, för att skapa en ideal metod för att upptäcka nätverksintrång? Med hjälp av befintliga kodbibliotek för maskininlärnings och verklighetsbaserad data har detta projekt konstruerat olika modeller för att simulera hur väl olika decentraliserade och integritetsbevarande modeller kan jämföras med traditionella alternativ. De skapade modellerna innehåller antingen Federated Learning, Differential Privacy eller en kombination av båda. Huvuduppgiften för dessa modeller är att förbättra integriteten och samtidigt minimera minskningen av precision. Resultaten indikerar att båda teknikerna kommer med en liten minskning i noggrannhet jämfört med traditionella alternativ. Huruvida precisionsförlusten är acceptabel eller beror på det specifika användningsområdet. Det utvecklade kombinerade alternativet lyckades dock inte nå acceptabel precision vilket hindrar oss från att dra några slutsatser. / With each passing day, large industries face an increasing amount of intrusions into their IT environments. Most existing machine learning countermeasures heavily rely on large amounts of data which introduces risk during the data-transmission. Therefore, the objective of this study has been to investigate whether a decentralized privacy-preserving approach could be a sensible alternative to decrease the effectiveness of these attacks. More specifically could the use of Random Forests, one of the most popular machine learning algorithms, be extended using the decentralization technique Federated Learning in cooperation with Differential Privacy, in order to create an ideal approach for network intrusion detection? With the assistance of existing machine learning code-libraries and real-life data, this thesis has constructed various experimental models to simulates how well different decentralized and privacy-preserving approaches compare to traditional ones. The models created incorporate either Federated Learning, Differential Privacy or a combination of both. The main task of these models is to enhance privacy while minimizing the decrease in accuracy. The results indicate that both techniques comes with a small decrease in accuracy compared to traditional alternatives. whether the accuracy loss is acceptable or not may depend on the specific scenario. The developed combined approach however, failed to reach acceptable accuracy which prevents us from drawing any conclusions.
|
18 |
Cloud security mechanismsJanuary 2014 (has links)
Cloud computing has brought great benefits in cost and flexibility for provisioning services. The greatest challenge of cloud computing remains however the question of security. The current standard tools in access control mechanisms and cryptography can only partly solve the security challenges of cloud infrastructures. In the recent years of research in security and cryptography, novel mechanisms, protocols and algorithms have emerged that offer new ways to create secure services atop cloud infrastructures. This report provides introductions to a selection of security mechanisms that were part of the "Cloud Security Mechanisms" seminar in summer term 2013 at HPI. / Cloud Computing hat deutliche Kostenersparnisse und verbesserte Flexibilität bei der Bereitstellung von Computer-Diensten ermöglicht. Allerdings bleiben Sicherheitsbedenken die größte Herausforderung bei der Nutzung von Cloud-Diensten. Die etablierten Mechanismen für Zugriffskontrolle und Verschlüsselungstechnik können die Herausforderungen und Probleme der Sicherheit von Cloud-Infrastrukturen nur teilweise lösen. In den letzten Jahren hat die Forschung jedoch neue Mechanismen, Protokolle und Algorithmen hervorgebracht, welche neue Möglichkeiten eröffnen die Sicherheit von Cloud-Anwendungen zu erhöhen. Dieser technische Bericht bietet Einführungen zu einigen dieser Mechanismen, welche im Seminar "Cloud Security Mechanisms" im Sommersemester 2013 am HPI behandelt wurden.
|
19 |
GENERAL-PURPOSE STATISTICAL INFERENCE WITH DIFFERENTIAL PRIVACY GUARANTEESZhanyu Wang (13893375) 06 December 2023 (has links)
<p dir="ltr">Differential privacy (DP) uses a probabilistic framework to measure the level of privacy protection of a mechanism that releases data analysis results to the public. Although DP is widely used by both government and industry, there is still a lack of research on statistical inference under DP guarantees. On the one hand, existing DP mechanisms mainly aim to extract dataset-level information instead of population-level information. On the other hand, DP mechanisms introduce calibrated noises into the released statistics, which often results in sampling distributions more complex and intractable than the non-private ones. This dissertation aims to provide general-purpose methods for statistical inference, such as confidence intervals (CIs) and hypothesis tests (HTs), that satisfy the DP guarantees. </p><p dir="ltr">In the first part of the dissertation, we examine a DP bootstrap procedure that releases multiple private bootstrap estimates to construct DP CIs. We present new DP guarantees for this procedure and propose to use deconvolution with DP bootstrap estimates to derive CIs for inference tasks such as population mean, logistic regression, and quantile regression. Our method achieves the nominal coverage level in both simulations and real-world experiments and offers the first approach to private inference for quantile regression.</p><p dir="ltr">In the second part of the dissertation, we propose to use the simulation-based ``repro sample'' approach to produce CIs and HTs based on DP statistics. Our methodology has finite-sample guarantees and can be applied to a wide variety of private inference problems. It appropriately accounts for biases introduced by DP mechanisms (such as by clamping) and improves over other state-of-the-art inference methods in terms of the coverage and type I error of the private inference. </p><p dir="ltr">In the third part of the dissertation, we design a debiased parametric bootstrap framework for DP statistical inference. We propose the adaptive indirect estimator, a novel simulation-based estimator that is consistent and corrects the clamping bias in the DP mechanisms. We also prove that our estimator has the optimal asymptotic variance among all well-behaved consistent estimators, and the parametric bootstrap results based on our estimator are consistent. Simulation studies show that our framework produces valid DP CIs and HTs in finite sample settings, and it is more efficient than other state-of-the-art methods.</p>
|
20 |
Privacy-Enhancing Techniques for Data AnalyticsFang-Yu Rao (6565679) 10 June 2019 (has links)
<div>
<div>
<div>
<p>Organizations today collect and aggregate huge amounts of data from individuals
under various scenarios and for different purposes. Such aggregation of individuals’
data when combined with techniques of data analytics allows organizations to make
informed decisions and predictions. But in many situations, different portions of the
data associated with individuals are collected and curated by different organizations.
To derive more accurate conclusions and predictions, those organization may want to
conduct the analysis based on their joint data, which cannot be simply accomplished
by each organization exchanging its own data with other organizations due to the
sensitive nature of data. Developing approaches for collaborative privacy-preserving
data analytics, however, is a nontrivial task. At least two major challenges have to be
addressed. The first challenge is that the security of the data possessed by each organization should always be properly protected during and after the collaborative analysis
process, whereas the second challenge is the high computational complexity usually
accompanied by cryptographic primitives used to build such privacy-preserving protocols.
</p><p><br></p><p>
</p><div>
<div>
<div>
<p>In this dissertation, based on widely adopted primitives in cryptography, we address the aforementioned challenges by developing techniques for data analytics that
not only allow multiple mutually distrustful parties to perform data analysis on their
joint data in a privacy-preserving manner, but also reduce the time required to complete the analysis. More specifically, using three common data analytics tasks as
concrete examples, we show how to construct the respective privacy-preserving protocols under two different scenarios: (1) the protocols are executed by a collaborative process only involving the participating parties; (2) the protocols are outsourced to
some service providers in the cloud. Two types of optimization for improving the
efficiency of those protocols are also investigated. The first type allows each participating party access to a statistically controlled leakage so as to reduce the amount
of required computation, while the second type utilizes the parallelism that could
be incorporated into the task and pushes some computation to the offline phase to
reduce the time needed for each participating party without any additional leakage.
Extensive experiments are also conducted on real-world datasets to demonstrate the
effectiveness of our proposed techniques.<br></p>
<p> </p>
</div>
</div>
</div>
</div>
</div>
</div>
|
Page generated in 0.0368 seconds