Spelling suggestions: "subject:"differential privacy"" "subject:"indifferential privacy""
31 |
Addressing Fundamental Limitations in Differentially Private Machine LearningNandi, Anupama January 2021 (has links)
No description available.
|
32 |
Differentially Private Federated Learning Algorithms for Sparse Basis RecoveryAjinkya K Mulay (18823252) 14 June 2024 (has links)
<p dir="ltr">Sparse basis recovery is an important learning problem when the number of model dimensions (<i>p</i>) is much larger than the number of samples (<i>n</i>). However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the Differential Privacy (DP) of the client data must also be simultaneously protected. Notably, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when the system is ill-determined (i.e., <i>p >> n</i>), and thus they will fail to accurately learn the true underlying sparse model. The goal of my thesis is therefore to develop DP-FL sparse basis recovery algorithms that can recover the true underlying sparse basis provably accurately even when <i>p >> n</i>, yet still guaranteeing the differential privacy of the client data.</p><p dir="ltr">During my PhD studies, we developed three DP-FL sparse basis recovery algorithms for this purpose. Our first algorithm, SPriFed-OMP, based on the Orthogonal Matching Pursuit (OMP) algorithm, can achieve high accuracy even when <i>n = O(\sqrt{p})</i> under the stronger Restricted Isometry Property (RIP) assumption for least-square problems. Our second algorithm, Humming-Bird, based on a carefully modified variant of the Forward-Backward Algorithm (FoBA), can achieve differentially private sparse recovery for the same setup while requiring the much weaker Restricted Strong Convexity (RSC) condition. We further extend Humming-Bird to support loss functions beyond least-square satisfying the RSC condition. To the best of our knowledge, these are the first DP-FL results guaranteeing sparse basis recovery in the <i>p >> n</i> setting.</p>
|
33 |
A Study on Private and Secure Federated Learning / プライベートで安全な連合学習Kato, Fumiyuki 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25427号 / 情博第865号 / 新制||情||145(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 伊藤 孝行, 教授 黒田 知宏, 教授 岡部 寿男, 吉川 正俊(京都大学 名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
34 |
Anonymization of directory-structured sensitive data / Anonymisering av katalogstrukturerad känslig dataFolkesson, Carl January 2019 (has links)
Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
|
35 |
Privacy Preserving Survival Prediction With Graph Neural Networks / Förutsägelse av överlevnad med integritetsskydd med Graph Neural NetworksFedeli, Stefano January 2021 (has links)
In the development process of novel cancer drugs, one important aspect is to identify patient populations with a high risk of early death so that resources can be focused on patients with the highest medical unmet need. Many cancer types are heterogeneous and there is a need to identify patients with aggressive diseases, meaning a high risk of early death, compared to patients with indolent diseases, meaning a low risk of early death. Predictive modeling can be a useful tool for risk stratification in clinical practice, enabling healthcare providers to treat high-risk patients early and progressively, while applying a less aggressive watch-and-wait strategy for patients with a lower risk of death. This is important from a clinical perspective, but also a health economic perspective since society has limited resources, and costly drugs should be given to patients that can benefit the most from a specific treatment. Thus, the goal of predictive modeling is to ensure that the right patient will have access to the right drug at the right time. In the era of personalized medicine, Artificial Intelligence (AI) applied to high-quality data will most likely play an important role and many techniques have been developed. In particular, Graph Neural Network (GNN) is a promising tool since it captures the complexity of high dimensional data modeled as a graph. In this work, we have applied Network Representation Learning (NRL) techniques to predict survival, using pseudonymized patient-level data from national health registries in Sweden. Over the last decade, more health data of increased complexity has become available for research, and therefore precision medicine could take advantage of this trend by bringing better healthcare to the patients. However, it is important to develop reliable prediction models that not only show high performances but take into consideration privacy, avoiding any leakage of personal information. The present study contributes novel insights related to GNN performance in different survival prediction tasks, using population-based unique nationwide data. Furthermore, we also explored how privacy methods impact the performance of the models when applied to the same dataset. We conducted a set of experiments across 6 dataset using 8 models measuring both AUC, Precision and Recall. Our evaluation results show that Graph Neural Networks were able to reach accuracy performance close to the models used in clinical practice and constantly outperformed, by at least 4.5%, the traditional machine learning methods. Furthermore, the study demonstrated how graph modeling, when applied based on knowledge from clinical experts, performed well and showed high resiliency to the noise introduced for privacy preservation. / I utvecklingsprocessen för nya cancerläkemedel är en viktig aspekt att identifiera patientgrupper med hög risk för tidig död, så att resurser kan fokuseras på patientgrupper med störst medicinskt behov. Många cancertyper är heterogena och det finns ett behov av att identifiera patienter med aggressiv sjukdom, vilket innebär en hög risk för tidig död, jämfört med patienter med indolenta sjukdom, vilket innebär lägre risk för tidig död. Prediktiv modellering kan vara ett användbart verktyg för riskstratifiering i klinisk praxis, vilket gör det möjligt för vårdgivare att behandla patienter olika utifrån individuella behov. Detta är viktigt ur ett kliniskt perspektiv, men också ur ett hälsoekonomiskt perspektiv eftersom samhället har begränsade resurser och kostsamma läkemedel bör ges till de patienter som har störst nytta av en viss behandling. Målet med prediktiv modellering är således att möjliggöra att rätt patient får tillgång till rätt läkemedel vid rätt tidpunkt. Framför allt är Graph Neural Network (GNN) ett lovande verktyg eftersom det fångar komplexiteten hos högdimensionella data som modelleras som ett diagram. I detta arbete har vi tillämpat tekniker för inlärning av grafrepresentationer för att prediktera överlevnad med hjälp av pseudonymiserade data från nationella hälsoregister i Sverige. Under det senaste decennierna har mer hälsodata av ökad komplexitet blivit tillgänglig för forskning. Även om denna ökning kan bidra till utvecklingen av precisionsmedicinen är det viktigt att utveckla tillförlitliga prediktionsmodeller som tar hänsyn till patienters integritet och datasäkerhet. Den här studien kommer att bidra med nya insikter om GNNs prestanda i prediktiva överlevnadsmodeller, med hjälp av populations -baserade data. Dessutom har vi också undersökt hur integritetsmetoder påverkar modellernas prestanda när de tillämpas på samma dataset. Sammanfattningsvis, Graph Neural Network kan uppnå noggrannhets -prestanda som ligger nära de modeller som tidigare använts i klinisk praxis och i denna studie preserade de alltid bättre än traditionella maskininlärnings -metoder. Studien visisade vidare hur grafmodellering som utförs i samarbete med kliniska experter kan vara effektiva mot det brus som införs av olika integritetsskyddstekniker.
|
36 |
Des approches formelles pour le cachement d'information: Une analyse des systèmes interactifs, contrôle de divulgation statistique, et le raffinement des spécificationsAlvim, Mário 12 October 2011 (has links) (PDF)
Cette thèse traite des mesures des flux d'information dans les systèmes informatiques. Nous exploitons les similarités entre des différents scénarios où la sécurité est une préoccupation, et nous appliquons les concepts de la théorie de l'information pour évaluer le niveau de protection offerte. Dans le premier scénario, nous considérons le problème de la définition des fuites d'information dans les systèmes interactifs où les éléments secrets et les éléments observables peuvent s'alterner au cours du calcul. Nous montrons que l'approche classique de la théorie de l'information qui interprète des systèmes tels que des canaux bruités (simples) n'est plus valide. Toutefois, le principe peut être récupéré si l'on considère les canaux d'un type plus compliqué, que, dans Théorie de l'information sont connus en tant que canaux avec mémoire et rétroaction. Nous montrons qu'il existe une correspondance parfaite entre les systèmes interactifs et ce type de canaux. Dans le deuxième scénario, nous considérons le problème de la vie privée dans les bases de données statistiques. Dans la communauté des bases de données, le concept de Differential Privacy est une notion qui est devenue très populaire. En gros, l'idée est qu'un mécanisme d'interrogation aléatoire assure la protection suffisante si le rapport entre les probabilités que deux ensembles de données adjacentes donnent la même réponse est lié par une constante. On observe la similarité de cet objectif avec la principale préoccupation dans le domaine des flux de l'information: limiter la possibilité de déduire les éléments secrets à partir des éléments observables. Nous montrons comment modéliser le système d'interrogation en termes d'un canal d'information-théorique, et l'on compare la notion de Differential Privacy avec le concept information mutuelle basé sur le travail de Smith. Nous montrons que Differential Privacy implique une borne sur l'information mutuelle, mais pas vice-versa. Nous avons également réfléchir à l'utilité du mécanisme de randomisation, ce qui représente la proximité entre les réponses aléatoires et les vraies, en moyenne. Nous montrons que la notion de Differential Privacy implique une borne serrée sur l'utilité, et nous proposons une méthode qui, sous certaines conditions, construit un mécanisme de randomisation optimale. Déménagent de l'accent mis sur des approches quantitatives, nous abordons le problème de l'utilisation des équivalences des processus pour caractériser des propriétés de protection d'information. Dans la littérature, certains travaux ont utilisé cette approche, fondée sur le principe selon lequel un protocole P avec une variable x satisfait de ces biens si et seulement si, pour chaque paire de secrets s1 et s2, P [s1 / x] est équivalent à P [s2 / x]. Nous montrons que, en présence de non-déterminisme, le principe ci-dessus repose sur l'hypothèse que le scheduler "travaille pour le bénéfice du protocole", et ce n'est généralement pas une hypothèse valable. Parmi des équivalences non-secoures, en ce sens, il y a l'équivalence des traces complètes et la bisimulation. Nous présentons un formalisme dans lequel on peut spécifier schedulers valides et, en conséquence, des versions sécurité des équivalences sur dessus. Nous montrons que notre version de bisimulation est toujours une congruence. Enfin, nous montrons que nos équivalences de sécurité peuvent être utilisées pour établir les propriétés de protection d'information.
|
37 |
Towards secure computation for peopleIssa, Rawane 23 June 2023 (has links)
My research investigates three questions: How do we customize protocols and implementations to account for the unique requirement of each setting and its target community, what are necessary steps that we can take to transition secure computation tools into practice, and how can we promote their adoption for users at large? In this dissertation I present several of my works that address these three questions with a particular focus on one of them.
First my work on "Hecate: Abuse Reporting in Secure Messengers with Sealed Sender" designs a customized protocol to protect people from abuse and surveillance in online end to end encrypted messaging. Our key insight is to add pre-processing to asymmetric message franking, where the moderating entity can generate batches of tokens per user during off-peak hours that can later be deposited when reporting abuse.
This thesis then demonstrates that by carefully tailoring our cryptographic protocols for real world use cases, we can achieve orders of magnitude improvements over prior works with minimal assumptions over the resources available to people.
Second, my work on "Batched Differentially Private Information Retrieval" contributes a novel Private Information Retrieval (PIR) protocol called DP-PIR that is designed to provide high throughput at high query rates. It does so by pushing all public key operations into an offline stage, batching queries from multiple clients via techniques similar to mixnets, and maintain differential privacy guarantees over the access patterns of the database.
Finally, I provide three case studies showing that we cannot hope to further the adoption of cryptographic tools in practice without collaborating with the very people we are trying to protect. I discuss a pilot deployment of secure multi-party computation (MPC) that I have done with the Department of Education, deployments of MPC I have done for the Boston Women’s Workforce Council and the Greater Boston Chamber of Commerce, and ongoing work in developing tool chain support for MPC via an automated resource estimation tool called Carousels.
|
38 |
Image Classification using Federated Learning with Differential Privacy : A Comparison of Different Aggregation AlgorithmsNygård, Moa January 2024 (has links)
The objective of this thesis was to investigate how the addition of a privacy-preserving mechanism to a federated learning model was affecting the performance of the model for an image classification task. Further, it was to get knowledge on how the outlook to use federated learning in the biotech industry is and what possible threats and attacks that could obstruct the utilization of federated learning among competitors. In the project four different aggregation algorithms for federated learning were examined. The methods were weighted fedAvg, unweighted FedAvg, weighted FedProx and unweighted FedProx. The experiment was using tensorflow federated to simulate the different methods. They were evaluated using accuracy, loss, recall, precision and F1 score. The result of this study shows that the performance of the deep neural network model is decreasing as differential privacy is introduced to the process. Out of the four aggregation algorithms used, weighted fedProx was the one that performed the best despite the added noise. It was also concluded that federated learning has potential to be used in the biotechnology industry among competitors, but that there are still security threats and attacks to avoid.
|
39 |
Multi-armed bandits with unconventional feedback / Bandits multi-armés avec rétroaction partielleGajane, Pratik 14 November 2017 (has links)
Dans cette thèse, nous étudions des problèmes de prise de décisions séquentielles dans lesquels, pour chacune de ses décisions, l'apprenant reçoit une information qu'il utilise pour guider ses décisions futures. Pour aller au-delà du retour d’information conventionnel tel qu'il a été bien étudié pour des problèmes de prise de décision séquentielle tels que les bandits multi-bras, nous considérons des formes de retour d’information partielle motivées par des applications pratiques.En premier, nous considérons le problème des bandits duellistes, dans lequel l'apprenant sélectionne deux actions à chaque pas de temps et reçoit en retour une information relative (i.e. de préférence) entre les valeurs instantanées de ces deux actions.En particulier, nous proposons un algorithme optimal qui permet à l'apprenant d'obtenir un regret cumulatif quasi-optimal (le regret est la différence entre la récompense cumulative optimale et la récompense cumulative constatée de l’apprenant). Dans un second temps, nous considérons le problème des bandits corrompus, dans lequel un processus de corruption stochastique perturbe le retour d’information. Pour ce problème aussi, nous concevons des algorithmes pour obtenir un regret cumulatif asymptotiquement optimal. En outre, nous examinons la relation entre ces deux problèmes dans le cadre du monitoring partiel qui est un paradigme générique pour la prise de décision séquentielle avec retour d'information partielle. / The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation trade-off inherent to reinforcement learning, in which the learner chooses an action (symbolized by an arm) from a set of available actions in a sequence of trials in order to maximize their reward. In the classical MAB problem, the learner receives absolute bandit feedback i.e. it receives as feedback the reward of the arm it selects. In many practical situations however, different kind of feedback is more readily available. In this thesis, we study two of such kinds of feedbacks, namely, relative feedback and corrupt feedback.The main practical motivation behind relative feedback arises from the task of online ranker evaluation. This task involves choosing the optimal ranker from a finite set of rankers using only pairwise comparisons, while minimizing the comparisons between sub-optimal rankers. This is formalized by the MAB problem with relative feedback, in which the learner selects two arms instead of one and receives the preference feedback. We consider the adversarial formulation of this problem which circumvents the stationarity assumption over the mean rewards for the arms. We provide a lower bound on the performance measure for any algorithm for this problem. We also provide an algorithm called "Relative Exponential-weight algorithm for Exploration and Exploitation" with performance guarantees. We present a thorough empirical study on several information retrieval datasets that confirm the validity of these theoretical results.The motivating theme behind corrupt feedback is that the feedback the learner receives is a corrupted form of the corresponding reward of the selected arm. Practically such a feedback is available in the tasks of online advertising, recommender systems etc. We consider two goals for the MAB problem with corrupt feedback: best arm identification and exploration-exploitation. For both the goals, we provide lower bounds on the performance measures for any algorithm. We also provide various algorithms for these settings. The main contribution of this module is the algorithms "KLUCB-CF" and "Thompson Sampling-CF" which asymptotically attain the best possible performance. We present experimental results to demonstrate the performance of these algorithms. We also show how this problem setting can be used for the practical application of enforcing differential privacy.
|
40 |
Energy-Efficient Private Forecasting on Health Data using SNNs / Energieffektiv privat prognos om hälsodata med hjälp av SNNsDi Matteo, Davide January 2022 (has links)
Health monitoring devices, such as Fitbit, are gaining popularity both as wellness tools and as a source of information for healthcare decisions. Predicting such wellness goals accurately is critical for the users to make informed lifestyle choices. The core objective of this thesis is to design and implement such a system that takes energy consumption and privacy into account. This research is modelled as a time-series forecasting problem that makes use of Spiking Neural Networks (SNNs) due to their proven energy-saving capabilities. Thanks to their design that closely mimics natural neural networks (such as the brain), SNNs have the potential to significantly outperform classic Artificial Neural Networks in terms of energy consumption and robustness. In order to prove our hypotheses, a previous research by Sonia et al. [1] in the same domain and with the same dataset is used as our starting point, where a private forecasting system using Long short-term memory (LSTM) is designed and implemented. Their study also implements and evaluates a clustering federated learning approach, which fits well the highly distributed data. The results obtained in their research act as a baseline to compare our results in terms of accuracy, training time, model size and estimated energy consumed. Our experiments show that Spiking Neural Networks trades off accuracy (2.19x, 1.19x, 4.13x, 1.16x greater Root Mean Square Error (RMSE) for macronutrients, calories burned, resting heart rate, and active minutes respectively), to grant a smaller model (19% less parameters an 77% lighter in memory) and a 43% faster training. Our model is estimated to consume 3.36μJ per inference, which is much lighter than traditional Artificial Neural Networks (ANNs) [2]. The data recorded by health monitoring devices is vastly distributed in the real-world. Moreover, with such sensitive recorded information, there are many possible implications to consider. For these reasons, we apply the clustering federated learning implementation [1] to our use-case. However, it can be challenging to adopt such techniques since it can be difficult to learn from data sequences that are non-regular. We use a two-step streaming clustering approach to classify customers based on their eating and exercise habits. It has been shown that training different models for each group of users is useful, particularly in terms of training time; however this is strongly dependent on the cluster size. Our experiments conclude that there is a decrease in error and training time if the clusters contain enough data to train the models. Finally, this study addresses the issue of data privacy by using state of-the-art differential privacy. We apply e-differential privacy to both our baseline model (trained on the whole dataset) and our federated learning based approach. With a differential privacy of ∈= 0.1 our experiments report an increase in the measured average error (RMSE) of only 25%. Specifically, +23.13%, 25.71%, +29.87%, 21.57% for macronutrients (grams), calories burned (kCal), resting heart rate (beats per minute (bpm), and minutes (minutes) respectively. / Hälsoövervakningsenheter, som Fitbit, blir allt populärare både som friskvårdsverktyg och som informationskälla för vårdbeslut. Att förutsäga sådana välbefinnandemål korrekt är avgörande för att användarna ska kunna göra välgrundade livsstilsval. Kärnmålet med denna avhandling är att designa och implementera ett sådant system som tar hänsyn till energiförbrukning och integritet. Denna forskning är modellerad som ett tidsserieprognosproblem som använder sig av SNNs på grund av deras bevisade energibesparingsförmåga. Tack vare deras design som nära efterliknar naturliga neurala nätverk (som hjärnan) har SNNs potentialen att avsevärt överträffa klassiska artificiella neurala nätverk när det gäller energiförbrukning och robusthet. För att bevisa våra hypoteser har en tidigare forskning av Sonia et al. [1] i samma domän och med samma dataset används som utgångspunkt, där ett privat prognossystem som använder LSTM designas och implementeras. Deras studie implementerar och utvärderar också en klustringsstrategi för federerad inlärning, som passar väl in på den mycket distribuerade data. Resultaten som erhållits i deras forskning fungerar som en baslinje för att jämföra våra resultat vad gäller noggrannhet, träningstid, modellstorlek och uppskattad energiförbrukning. Våra experiment visar att Spiking Neural Networks byter ut precision (2,19x, 1,19x, 4,13x, 1,16x större RMSE för makronäringsämnen, förbrända kalorier, vilopuls respektive aktiva minuter), för att ge en mindre modell ( 19% mindre parametrar, 77% lättare i minnet) och 43% snabbare träning. Vår modell beräknas förbruka 3, 36μJ, vilket är mycket lättare än traditionella ANNs [2]. Data som registreras av hälsoövervakningsenheter är enormt spridda i den verkliga världen. Dessutom, med sådan känslig registrerad information finns det många möjliga konsekvenser att överväga. Av dessa skäl tillämpar vi klustringsimplementeringen för federerad inlärning [1] på vårt användningsfall. Det kan dock vara utmanande att använda sådana tekniker eftersom det kan vara svårt att lära sig av datasekvenser som är oregelbundna. Vi använder en tvåstegs streaming-klustringsmetod för att klassificera kunder baserat på deras mat- och träningsvanor. Det har visat sig att det är användbart att träna olika modeller för varje grupp av användare, särskilt när det gäller utbildningstid; detta är dock starkt beroende av klustrets storlek. Våra experiment drar slutsatsen att det finns en minskning av fel och träningstid om klustren innehåller tillräckligt med data för att träna modellerna. Slutligen tar denna studie upp frågan om datasekretess genom att använda den senaste differentiell integritet. Vi tillämpar e-differentiell integritet på både vår baslinjemodell (utbildad på hela datasetet) och vår federerade inlärningsbaserade metod. Med en differentiell integritet på ∈= 0.1 rapporterar våra experiment en ökning av det uppmätta medelfelet (RMSE) på endast 25%. Specifikt +23,13%, 25,71%, +29,87%, 21,57% för makronäringsämnen (gram), förbrända kalorier (kCal), vilopuls (bpm och minuter (minuter).
|
Page generated in 0.1021 seconds