Global ETD Search

11	Privacy Preserving Kin Genomic Data Publishing Shang, Hui 16 July 2020 (has links) No description available. Computer Science Kin genomic data Differential privacy Factor graph
12	On the Sample Complexity of Privately Learning Gaussians and their Mixtures / Privately Learning Gaussians and their Mixtures Aden-Ali, Ishaq January 2021 (has links) Multivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be near-optimal in the general case. From a technical standpoint, we provide analytic tools for arguing the existence of global "locally small" covers from local covers of the space. These are exploited using modifications of recent techniques for for differentially private hypothesis selection. Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We provide the first sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then we can privately learn mixtures of distributions in F. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable. / Thesis / Master of Science (MSc) / Is it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information. In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.
13	DIFFERENTIAL PRIVACY IN DISTRIBUTED SETTINGS Zitao Li (14135316) 18 November 2022 (has links) <p>Data is considered the "new oil" in the information society and digital economy. While many commercial activities and government decisions are based on data, the public raises more concerns about privacy leakage when their private data are collected and used. In this dissertation, we investigate the privacy risks in settings where the data are distributed across multiple data holders, and there is only an untrusted central server. We provide solutions for several problems under this setting with a security notion called differential privacy (DP). Our solutions can guarantee that there is only limited and controllable privacy leakage from the data holder, while the utility of the final results, such as model prediction accuracy, can be still comparable to the ones of the non-private algorithms.</p> <p><br></p> <p>First, we investigate the problem of estimating the distribution over a numerical domain while satisfying local differential privacy (LDP). Our protocol prevents privacy leakage in the data collection phase, in which an untrusted data aggregator (or a server) wants to learn the distribution of private numerical data among all users. The protocol consists of 1) a new reporting mechanism called the square wave (SW) mechanism, which randomizes the user inputs before sharing them with the aggregator; 2) an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions.</p> <p><br></p> <p>Second, we study the matrix factorization problem in three federated learning settings with an untrusted server, i.e., vertical, horizontal, and local federated learning settings. We propose a generic algorithmic framework for solving the problem in all three settings. We introduce how to adapt the algorithm into differentially private versions to prevent privacy leakage in the training and publishing stages.</p> <p><br></p> <p>Finally, we propose an algorithm for solving the k-means clustering problem in vertical federated learning (VFL). A big challenge in VFL is the lack of a global view of each data point. To overcome this challenge, we propose a lightweight and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin (FM) sketch to convey the weight information of the synopsis points. We provide theoretical utility analysis for the cardinality estimation algorithm and further refine it for better empirical performance.</p> Data and information privacy Differential Privacy federated learning applications
14	Secure and efficient query processing in outsourced databases Bogatov, Dmytro 16 September 2022 (has links) As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. Various cryptographic techniques are used in outsourced database systems to ensure data privacy while allowing for efficient querying. This thesis proposes a definition and components of a new secure and efficient outsourced database system, which answers various types of queries, with different privacy guarantees in different security models. This work starts with the survey of five order-preserving and order-revealing encryption schemes that can be used directly in many database indices, such as the B+ tree, and five range query protocols with various tradeoffs in terms of security and efficiency. The survey systematizes the state-of-the-art range query solutions in a snapshot adversary setting and offers some non-obvious observations regarding the efficiency of the constructions. The thesis then proceeds with Epsolute - an efficient range query engine in a persistent adversary model. In Epsolute, security is achieved in a setting with a much stronger adversary where she can continuously observe everything on the server, and leaking even the result size can enable a reconstruction attack. Epsolute proposes a definition, construction, analysis, and experimental evaluation of a system that provably hides both access pattern and communication volume while remaining efficient. The dissertation concludes with k-anon - a secure similarity search engine in a snapshot adversary model. The work presents a construction in which the security of kNN queries is achieved similarly to OPE / ORE solutions - encrypting the input with an approximate Distance Comparison Preserving Encryption scheme so that the inputs, the points in a hyperspace, are perturbed, but the query algorithm still produces accurate results. Analyzing the solution, we run a series of experiments to observe the tradeoff between search accuracy and attack effectiveness. We use TREC datasets and queries for the search, and track the rank quality metrics such as MRR and nDCG. For the attacks, we build an LSTM model that trains on the correlation between a sentence and its embedding and then predicts words from the embedding. We conclude on viability and practicality of the solution. Computer science Differential privacy ORAM ORE Secure range queries
15	Differentially Private Random Forests for Network Intrusion Detection in a Federated Learning Setting Frid, Alexander January 2023 (has links) För varje dag som går möter stora industrier en ökad mängd intrång i sina IT-system. De flesta befintliga verktyg som använder sig utav maskininlärning är starkt beroende av stora mängder data, vilket innebär risker under dataöverföringen. Därför har syftet med denna studie varit att undersöka om en decentraliserad integritetsbevarande strategi kan vara ett bra alternativ för att minska effektiviteten av dessa attacker. Mer specifikt skulle användningen av Random Forests, en av de mest populära algoritmerna för maskininlärning, kunna utökas med decentraliseringstekniken Federated Learning tisammans med Differential Privacy, för att skapa en ideal metod för att upptäcka nätverksintrång? Med hjälp av befintliga kodbibliotek för maskininlärnings och verklighetsbaserad data har detta projekt konstruerat olika modeller för att simulera hur väl olika decentraliserade och integritetsbevarande modeller kan jämföras med traditionella alternativ. De skapade modellerna innehåller antingen Federated Learning, Differential Privacy eller en kombination av båda. Huvuduppgiften för dessa modeller är att förbättra integriteten och samtidigt minimera minskningen av precision. Resultaten indikerar att båda teknikerna kommer med en liten minskning i noggrannhet jämfört med traditionella alternativ. Huruvida precisionsförlusten är acceptabel eller beror på det specifika användningsområdet. Det utvecklade kombinerade alternativet lyckades dock inte nå acceptabel precision vilket hindrar oss från att dra några slutsatser. / With each passing day, large industries face an increasing amount of intrusions into their IT environments. Most existing machine learning countermeasures heavily rely on large amounts of data which introduces risk during the data-transmission. Therefore, the objective of this study has been to investigate whether a decentralized privacy-preserving approach could be a sensible alternative to decrease the effectiveness of these attacks. More specifically could the use of Random Forests, one of the most popular machine learning algorithms, be extended using the decentralization technique Federated Learning in cooperation with Differential Privacy, in order to create an ideal approach for network intrusion detection? With the assistance of existing machine learning code-libraries and real-life data, this thesis has constructed various experimental models to simulates how well different decentralized and privacy-preserving approaches compare to traditional ones. The models created incorporate either Federated Learning, Differential Privacy or a combination of both. The main task of these models is to enhance privacy while minimizing the decrease in accuracy. The results indicate that both techniques comes with a small decrease in accuracy compared to traditional alternatives. whether the accuracy loss is acceptable or not may depend on the specific scenario. The developed combined approach however, failed to reach acceptable accuracy which prevents us from drawing any conclusions. Machine Learning Random Forest Federated Learning Differential Privacy Maskininlärning Random Forest Federated Learning Differential Privacy Software Engineering Programvaruteknik
16	Cloud security mechanisms January 2014 (has links) Cloud computing has brought great benefits in cost and flexibility for provisioning services. The greatest challenge of cloud computing remains however the question of security. The current standard tools in access control mechanisms and cryptography can only partly solve the security challenges of cloud infrastructures. In the recent years of research in security and cryptography, novel mechanisms, protocols and algorithms have emerged that offer new ways to create secure services atop cloud infrastructures. This report provides introductions to a selection of security mechanisms that were part of the "Cloud Security Mechanisms" seminar in summer term 2013 at HPI. / Cloud Computing hat deutliche Kostenersparnisse und verbesserte Flexibilität bei der Bereitstellung von Computer-Diensten ermöglicht. Allerdings bleiben Sicherheitsbedenken die größte Herausforderung bei der Nutzung von Cloud-Diensten. Die etablierten Mechanismen für Zugriffskontrolle und Verschlüsselungstechnik können die Herausforderungen und Probleme der Sicherheit von Cloud-Infrastrukturen nur teilweise lösen. In den letzten Jahren hat die Forschung jedoch neue Mechanismen, Protokolle und Algorithmen hervorgebracht, welche neue Möglichkeiten eröffnen die Sicherheit von Cloud-Anwendungen zu erhöhen. Dieser technische Bericht bietet Einführungen zu einigen dieser Mechanismen, welche im Seminar "Cloud Security Mechanisms" im Sommersemester 2013 am HPI behandelt wurden. Cloud Sicherheit Privacy Bitcoin Homomorphe Verschlüsselung Differential Privacy cloud security privacy confidentiality threshold cryptography bitcoin homomorphic encryption differential privacy Data processing Computer science
17	GENERAL-PURPOSE STATISTICAL INFERENCE WITH DIFFERENTIAL PRIVACY GUARANTEES Zhanyu Wang (13893375) 06 December 2023 (has links) <p dir="ltr">Differential privacy (DP) uses a probabilistic framework to measure the level of privacy protection of a mechanism that releases data analysis results to the public. Although DP is widely used by both government and industry, there is still a lack of research on statistical inference under DP guarantees. On the one hand, existing DP mechanisms mainly aim to extract dataset-level information instead of population-level information. On the other hand, DP mechanisms introduce calibrated noises into the released statistics, which often results in sampling distributions more complex and intractable than the non-private ones. This dissertation aims to provide general-purpose methods for statistical inference, such as confidence intervals (CIs) and hypothesis tests (HTs), that satisfy the DP guarantees. </p><p dir="ltr">In the first part of the dissertation, we examine a DP bootstrap procedure that releases multiple private bootstrap estimates to construct DP CIs. We present new DP guarantees for this procedure and propose to use deconvolution with DP bootstrap estimates to derive CIs for inference tasks such as population mean, logistic regression, and quantile regression. Our method achieves the nominal coverage level in both simulations and real-world experiments and offers the first approach to private inference for quantile regression.</p><p dir="ltr">In the second part of the dissertation, we propose to use the simulation-based ``repro sample'' approach to produce CIs and HTs based on DP statistics. Our methodology has finite-sample guarantees and can be applied to a wide variety of private inference problems. It appropriately accounts for biases introduced by DP mechanisms (such as by clamping) and improves over other state-of-the-art inference methods in terms of the coverage and type I error of the private inference. </p><p dir="ltr">In the third part of the dissertation, we design a debiased parametric bootstrap framework for DP statistical inference. We propose the adaptive indirect estimator, a novel simulation-based estimator that is consistent and corrects the clamping bias in the DP mechanisms. We also prove that our estimator has the optimal asymptotic variance among all well-behaved consistent estimators, and the parametric bootstrap results based on our estimator are consistent. Simulation studies show that our framework produces valid DP CIs and HTs in finite sample settings, and it is more efficient than other state-of-the-art methods.</p> Data and information privacy Applied statistics Statistical theory differential privacy confidence intervals hypothesis tests simulation-based inference asymptotic statistics Gaussian differential privacy resampling distribution-free inference indirect inference
18	Privacy-Enhancing Techniques for Data Analytics Fang-Yu Rao (6565679) 10 June 2019 (has links) <div> <div> <div> <p>Organizations today collect and aggregate huge amounts of data from individuals under various scenarios and for different purposes. Such aggregation of individuals’ data when combined with techniques of data analytics allows organizations to make informed decisions and predictions. But in many situations, different portions of the data associated with individuals are collected and curated by different organizations. To derive more accurate conclusions and predictions, those organization may want to conduct the analysis based on their joint data, which cannot be simply accomplished by each organization exchanging its own data with other organizations due to the sensitive nature of data. Developing approaches for collaborative privacy-preserving data analytics, however, is a nontrivial task. At least two major challenges have to be addressed. The first challenge is that the security of the data possessed by each organization should always be properly protected during and after the collaborative analysis process, whereas the second challenge is the high computational complexity usually accompanied by cryptographic primitives used to build such privacy-preserving protocols. </p><p><br></p><p> </p><div> <div> <div> <p>In this dissertation, based on widely adopted primitives in cryptography, we address the aforementioned challenges by developing techniques for data analytics that not only allow multiple mutually distrustful parties to perform data analysis on their joint data in a privacy-preserving manner, but also reduce the time required to complete the analysis. More specifically, using three common data analytics tasks as concrete examples, we show how to construct the respective privacy-preserving protocols under two different scenarios: (1) the protocols are executed by a collaborative process only involving the participating parties; (2) the protocols are outsourced to some service providers in the cloud. Two types of optimization for improving the efficiency of those protocols are also investigated. The first type allows each participating party access to a statistically controlled leakage so as to reduce the amount of required computation, while the second type utilizes the parallelism that could be incorporated into the task and pushes some computation to the offline phase to reduce the time needed for each participating party without any additional leakage. Extensive experiments are also conducted on real-world datasets to demonstrate the effectiveness of our proposed techniques.<br></p> <p> </p> </div> </div> </div> </div> </div> </div> Computer System Security Data Encryption Differential Privacy Secure Multiparty Computation Record Linkage K-Means Clustering
19	Privacy Preserving in Online Social Network Data Sharing and Publication Tianchong Gao (7428566) 17 October 2019 (has links) <p>Following the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. </p> <p><br></p> <p>To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. </p> <p><br></p> <p>To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles.</p> Computer Engineering privacy protection online social networks differential privacy anonymization de-anonymization
20	Protecting sensitive information from untrusted code Roy, Indrajit 13 December 2010 (has links) As computer systems support more aspects of modern life, from finance to health care, security is becoming increasingly important. However, building secure systems remains a challenge. Software continues to have security vulnerabilities due to reasons ranging from programmer errors to inadequate programming tools. Because of these vulnerabilities we need mechanisms that protect sensitive data even when the software is untrusted. This dissertation shows that secure and practical frameworks can be built for protecting users' data from untrusted applications in both desktop and cloud computing environment. Laminar is a new framework that secures desktop applications by enforcing policies written as information flow rules. Information flow control, a form of mandatory access control, enables programmers to write powerful, end-to-end security guarantees while reducing the amount of trusted code. Current programming abstractions and implementations of this model either compromise end-to-end security guarantees or require substantial modifications to applications, thus deterring adoption. Laminar addresses these shortcomings by exporting a single set of abstractions to control information flows through operating system resources and heap-allocated objects. Programmers express security policies by labeling data and represent access restrictions on code using a new abstraction called a security region. The Laminar programming model eases incremental deployment, limits dynamic security checks, and supports multithreaded programs that can access heterogeneously labeled data. In large scale, distributed computations safeguarding information requires solutions beyond mandatory access control. An important challenge is to ensure that the computation, including its output, does not leak sensitive information about the inputs. For untrusted code, access control cannot guarantee that the output does not leak information. This dissertation proposes Airavat, a MapReduce-based system which augments mandatory access control with differential privacy to guarantee security and privacy for distributed computations. Data providers control the security policy for their sensitive data, including a mathematical bound on potential privacy violations. Users without security expertise can perform computations on the data; Airavat prevents information leakage beyond the data provider's policy. Our prototype implementation of Airavat demonstrates that several data mining tasks can be performed in a privacy preserving fashion with modest performance overheads. / text Security Operating systems Java virtual machine Cloud computing Information flow control Access control Differential privacy

Search results