Global ETD Search

21	Programmation sûre en précision finie : Contrôler les erreurs et les fuites d'informations Gazeau, Ivan 14 October 2013 (has links) (PDF) Dans cette thèse, nous analysons les problèmes liés à la représentation finie des nombres réels et nous contrôlons la déviation induite par cette approximation. Nous nous intéressons particulièrement à deux problèmes. Le premier est l'étude de l'influence de la représentation finie sur les protocoles de confidentialité différentielle. Nous présentons une méthode pour étudier les perturbations d'une distribution de probabilité causées par la représentation finie des nombres. Nous montrons qu'une implémentation directe des protocoles théoriques pour garantir la confidentialité différentielle n'est pas fiable, tandis qu'après l'ajout de correctifs, la propriété est conservée en précision finie avec une faible perte de confidentialité. Notre deuxième contribution est une méthode pour étudier les programmes qui ne peuvent pas être analysés par composition à cause de branchements conditionnels au comportement trop erratique. Cette méthode, basée sur la théorie des systèmes de réécriture, permet de partir de la preuve de l'algorithme en précision exacte pour fournir la preuve que le programme en précision finie ne déviera pas trop. Précision finie Analyse statique Differential privacy
22	Klassificeringsalgoritmer vs differential privacy : Effekt på klassificeringsalgoritmer vid användande av numerisk differential privacy / Classification algorithms vs differential privacy : Effect of using numerical differential privacy on classification algorithms Olsson, Mattias January 2018 (has links) Data mining är ett samlingsnamn för ett antal tekniker som används för att analysera datamängder och finna mönster, exempelvis genom klassificering. Anonymisering innefattar en rad tekniker för att skydda den personliga integriteten. Den här studien undersöker hur stor påverkansgrad anonymisering med tekniken differential privacy har på möjligheten att klassificera en datamängd. Genom ett experiment undersöks ett antal magnituder av anonymisering och vilken effekt de har på möjligheten att klassificera data. Klassificering av den anonymiserade datamängden jämförs mot klassificering av den råa datamängden. Liknande studier har genomförts med k-anonymitet som anonymiseringsteknik där möjligheten att klassificera förbättrades genom generalisering. Resultatet från den här studien å andra sidan visar att möjligheten att klassificera sjunker något, vilket beror på att differential privacy sprider ut informationen i datamängden över ett bredare spektrum. Detta försvårar generellt för klassificeringsalgoritmerna att hitta karakteriserande mönster i datamängden och de lyckas därmed inte få lika hög grad av korrekt klassificering. Data mining klassificering algoritmer anonymisering differential privacy runkeeper Computer and Information Sciences Data- och informationsvetenskap
23	New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Wang, Yu-Xiang 01 December 2017 (has links) Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. Within the field of machine learning itself, there had been quite a number of paradigm shifts caused by the explosion of data size, computing power, modeling tools, and the new ways people collect, share, and make use of data sets. Data privacy, for instance, was much less of a problem before the availability of personal information online that could be used to identify users in anonymized data sets. Images, videos, as well as observations generated over a social networks, often have highly localized structures, that cannot be captured by standard nonparametric models. Moreover, the “common task framework” that is adopted by many sub- disciplines of AI has made it possible for many people to collaboratively and repeated work on the same data set, leading to implicit overfitting on public benchmarks. In addition, data collected in many internet services, e.g., web search and targeted ads, are not iid, but rather feedbacks specific to the deployed algorithm. This thesis presents technical contributions under a number of new mathematical frameworks that are designed to partially address these new paradigms. • Firstly, we consider the problem of statistical learning with privacy constraints. Under Vapnik’s general learning setting and the formalism of differential privacy (DP), we establish simple conditions that characterizes the private learnability, which reveals a mixture of positive and negative insight. We then identify generic methods that reuses existing randomness to effectively solve private learning in practice; and discuss weaker notions of privacy that allows for more favorable privacy-utility tradeoff. • Secondly, we develop a few generalizations of trend filtering, a locally-adaptive nonparametric regression technique that is minimax in 1D, to the multivariate setting and to graphs. We also study specific instances of the problems, e.g., total variation denoising on d-dimensional grids more closely and the results reveal interesting statistical computational trade-offs. • Thirdly, we investigate two problems in sequential interactive learning: a) off- policy evaluation in contextual bandits, that aims to use data collected from one algorithm to evaluate the performance of a different algorithm; b) the problem of adaptive data analysis, that uses randomization to prevent adversarial data analysts from a form of “p-hacking” through multiple steps of sequential data access. In the above problems, we will provide not only performance guarantees of algorithms but also certain notions of optimality. Whenever applicable, careful empirical studies on synthetic and real data are also included. machine learning statistics differential privacy nonparametric regression trend filtering contextual bandits
24	Quantifying Information Leakage via Adversarial Loss Functions: Theory and Practice January 2020 (has links) abstract: Modern digital applications have significantly increased the leakage of private and sensitive personal data. While worst-case measures of leakage such as Differential Privacy (DP) provide the strongest guarantees, when utility matters, average-case information-theoretic measures can be more relevant. However, most such information-theoretic measures do not have clear operational meanings. This dissertation addresses this challenge. This work introduces a tunable leakage measure called maximal $\alpha$-leakage which quantifies the maximal gain of an adversary in inferring any function of a data set. The inferential capability of the adversary is modeled by a class of loss functions, namely, $\alpha$-loss. The choice of $\alpha$ determines specific adversarial actions ranging from refining a belief for $\alpha =1$ to guessing the best posterior for $\alpha = \infty$, and for the two specific values maximal $\alpha$-leakage simplifies to mutual information and maximal leakage, respectively. Maximal $\alpha$-leakage is proved to have a composition property and be robust to side information. There is a fundamental disjoint between theoretical measures of information leakages and their applications in practice. This issue is addressed in the second part of this dissertation by proposing a data-driven framework for learning Censored and Fair Universal Representations (CFUR) of data. This framework is formulated as a constrained minimax optimization of the expected $\alpha$-loss where the constraint ensures a measure of the usefulness of the representation. The performance of the CFUR framework with $\alpha=1$ is evaluated on publicly accessible data sets; it is shown that multiple sensitive features can be effectively censored to achieve group fairness via demographic parity while ensuring accuracy for several \textit{a priori} unknown downstream tasks. Finally, focusing on worst-case measures, novel information-theoretic tools are used to refine the existing relationship between two such measures, $(\epsilon,\delta)$-DP and R\'enyi-DP. Applying these tools to the moments accountant framework, one can track the privacy guarantee achieved by adding Gaussian noise to Stochastic Gradient Descent (SGD) algorithms. Relative to state-of-the-art, for the same privacy budget, this method allows about 100 more SGD rounds for training deep learning models. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020 Electrical engineering Differential Privacy Fairness Generative adversarial models Information leakage measure Loss function Privacy/Censoring
25	A Study on Federated Learning Systems in Healthcare Smith, Arthur, M.D. 18 August 2021 (has links) No description available. Computer Science Health Care Information Technology Medicine Federated Learning Differential Privacy Homomorphic Encryption
26	Side-channel Threats on Modern Platforms: Attacks and Countermeasures Zhang, Xiaokuan January 2021 (has links) No description available. Computer Science Computer Engineering side channels ARM cache attacks smartphones differential privacy
27	RISK INTERPRETATION OF DIFFERENTIAL PRIVACY Jiajun Liang (13190613) 31 July 2023 (has links) <p><br></p><p>How to set privacy parameters is a crucial problem for the consistent application of DP in practice. The current privacy parameters do not provide direct suggestions for this problem. On the other hand, different databases may have varying degrees of information leakage, allowing attackers to enhance their attacks with the available information. This dissertation provides an additional interpretation of the current DP notions by introducing a framework that directly considers the worst-case average failure probability of attackers under different levels of knowledge. </p><p><br></p><p>To achieve this, we introduce a novel measure of attacker knowledge and establish a dual relationship between (type I error, type II error) and (prior, average failure probability). By leveraging this framework, we propose an interpretable paradigm to consistently set privacy parameters on different databases with varying levels of leaked information. </p><p><br></p><p>Furthermore, we characterize the minimax limit of private parameter estimation, driven by $1/(n(1-2p))^2+1/n$, where $p$ represents the worst-case probability risk and $n$ is the number of data points. This characterization is more interpretable than the current lower bound $\min{1/(n\epsilon^2),1/(n\delta^2)}+1/n$ on $(\epsilon,\delta)$-DP. Additionally, we identify the phase transition of private parameter estimation based on this limit and provide suggestions for protocol designs to achieve optimal private estimations. </p><p><br></p><p>Last, we consider a federated learning setting where the data are stored in a distributed manner and privacy-preserving interactions are required. We extend the proposed interpretation to federated learning, considering two scenarios: protecting against privacy breaches against local nodes and protecting privacy breaches against the center. Specifically, we consider a non-convex sparse federated parameter estimation problem and apply it to the generalized linear models. We tackle two challenges in this setting. Firstly, we encounter the issue of initialization due to the privacy requirements that limit the number of queries to the database. Secondly, we overcome the heterogeneity in the distribution among local nodes to identify low-dimensional structures.</p> Statistical data science Statistical theory differential privacy sparse federated learning DP paradigm privacy parameters estimation limit
28	DIFFERENTIALLY PRIVATE SUBLINEAR ALGORITHMS Tamalika Mukherjee (16050815) 07 June 2023 (has links) <p>Collecting user data is crucial for advancing machine learning, social science, and government policies, but the privacy of the users whose data is being collected is a growing concern. {\em Differential Privacy (DP)} has emerged as the most standard notion for privacy protection with robust mathematical guarantees. Analyzing such massive amounts of data in a privacy-preserving manner motivates the need to study differentially-private algorithms that are also super-efficient. </p> <p><br></p> <p>This thesis initiates a systematic study of differentially-private sublinear-time and sublinear-space algorithms. The contributions of this thesis are two-fold. First, we design some of the first differentially private sublinear algorithms for many fundamental problems. Second, we develop general DP techniques for designing differentially-private sublinear algorithms. </p> <p><br></p> <p>We give the first DP sublinear algorithm for clustering by generalizing a subsampling framework from the non-DP sublinear-time literature. We give the first DP sublinear algorithm for estimating the maximum matching size. Our DP sublinear algorithm for estimating the average degree of the graph achieves a better approximation than previous works. We give the first DP algorithm for releasing $L_2$-heavy hitters in the sliding window model and a pure $L_1$-heavy hitter algorithm in the same model, which improves upon previous works. </p> <p><br></p> <p>We develop general techniques that address the challenges of designing sublinear DP algorithms. First, we introduce the concept of Coupled Global Sensitivity (CGS). Intuitively, the CGS of a randomized algorithm generalizes the classical notion of global sensitivity of a function, by considering a coupling of the random coins of the algorithm when run on neighboring inputs. We show that one can achieve pure DP by adding Laplace noise proportional to the CGS of an algorithm. Second, we give a black box DP transformation for a specific class of approximation algorithms. We show that such algorithms can be made differentially private without sacrificing accuracy, as long as the function has small global sensitivity. In particular, this transformation gives rise to sublinear DP algorithms for many problems, including triangle counting, the weight of the minimum spanning tree, and norm estimation.</p> Cryptography Data and information privacy Data structures and algorithms DIFFERENTIAL PRIVACY Sublinear Algorithm
29	Data Security and Privacy under the Binary Cloak Ji, Tianxi 26 August 2022 (has links) No description available. Computer Science Computer Engineering Data Security Data Privacy Differential Privacy Fingerprinting Utility
30	Optimizing Linear Queries Under Differential Privacy Li, Chao 01 September 2013 (has links) Private data analysis on statistical data has been addressed by many recent literatures. The goal of such analysis is to measure statistical properties of a database without revealing information of individuals who participate in the database. Differential privacy is a rigorous privacy definition that protects individual information using output perturbation: a differentially private algorithm produces statistically indistinguishable outputs no matter whether the database contains a tuple corresponding to an individual or not. It is straightforward to construct differentially private algorithms for many common tasks and there are published algorithms to support various tasks under differential privacy. However methods to design error-optimal algorithms for most non-trivial tasks are still unknown. In particular, we are interested in error-optimal algorithms for sets of linear queries. A linear query is a sum of counts of tuples that satisfy a certain condition, which covers the scope of many aggregation tasks including count, sum and histogram. We present the matrix mechanism, a novel mechanism for answering sets of linear queries under differential privacy. The matrix mechanism makes a clear distinction between a set of queries submitted by users, called the query workload, and an alternative set of queries to be answered under differential privacy, called the query strategy. The answer to the query workload can then be computed using the answer to the query strategy. Given a query workload, the query strategy determines the distribution of the output noise and the power of the matrix mechanism comes from adaptively choosing a query strategy that minimizes the output noise. Our analyses also provide a theoretical measure to the quality of different strategies for a given workload. This measure is then used in accurate and approximate formulations to the optimization problem that outputs the error-optimal strategy. We present a lower bound of error to answer each workload under the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. In addition, we design an approximate algorithm, which generates strategies generated by our a out perform state-of-art mechanisms over (epsilon, delta)-differential privacy. Those strategies lead to more accurate data analysis while preserving a rigorous privacy guarantee. Moreover, we also combine the matrix mechanism with a novel data-dependent algorithm, which achieves differential privacy by adding noise that is adapted to the input data and to the given query workload. Adaptive Algorithm Differential privacy Linear Query Lower Bound Spectral Decomposition Computer Sciences

Search results