• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 35
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 57
  • 57
  • 30
  • 26
  • 17
  • 15
  • 15
  • 14
  • 10
  • 10
  • 7
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

DIFFERENTIALLY PRIVATE SUBLINEAR ALGORITHMS

Tamalika Mukherjee (16050815) 07 June 2023 (has links)
<p>Collecting user data is crucial for advancing machine learning, social science, and government policies, but the privacy of the users whose data is being collected is a growing concern. {\em Differential Privacy (DP)} has emerged as the most standard notion for privacy protection with robust mathematical guarantees. Analyzing such massive amounts of data in a privacy-preserving manner motivates the need to study differentially-private algorithms that are also super-efficient.  </p> <p><br></p> <p>This thesis initiates a systematic study of differentially-private sublinear-time and sublinear-space algorithms. The contributions of this thesis are two-fold. First, we design some of the first differentially private sublinear algorithms for many fundamental problems. Second, we develop general DP techniques for designing differentially-private sublinear algorithms. </p> <p><br></p> <p>We give the first DP sublinear algorithm for clustering by generalizing a subsampling framework from the non-DP sublinear-time literature. We give the first DP sublinear algorithm for estimating the maximum matching size. Our DP sublinear algorithm for estimating the average degree of the graph achieves a better approximation than previous works. We give the first DP algorithm for releasing $L_2$-heavy hitters in the sliding window model and a pure $L_1$-heavy hitter algorithm in the same model, which improves upon previous works.  </p> <p><br></p> <p>We develop general techniques that address the challenges of designing sublinear DP algorithms. First, we introduce the concept of Coupled Global Sensitivity (CGS). Intuitively, the CGS of a randomized algorithm generalizes the classical  notion of global sensitivity of a function, by considering a coupling of the random coins of the algorithm when run on neighboring inputs. We show that one can achieve pure DP by adding Laplace noise proportional to the CGS of an algorithm. Second, we give a black box DP transformation for a specific class of approximation algorithms. We show that such algorithms can be made differentially private without sacrificing accuracy, as long as the function has small global sensitivity. In particular, this transformation gives rise to sublinear DP algorithms for many problems, including triangle counting, the weight of the minimum spanning tree, and norm estimation.</p>
32

Data Security and Privacy under the Binary Cloak

Ji, Tianxi 26 August 2022 (has links)
No description available.
33

Differential privacy and machine learning: Calculating sensitivity with generated data sets / Differential privacy och maskininlärning: Beräkning av sensitivitet med genererade dataset

Lundmark, Magnus, Dahlman, Carl-Johan January 2017 (has links)
Privacy has never been more important to maintain in today’s information society. Companies and organizations collect large amounts of data about their users. This information is considered to be valuable due to its statistical usage that provide insight into certain areas such as medicine, economics, or behavioural patterns among individuals. A technique called differential privacy has been developed to ensure that the privacy of individuals are maintained. This enables the ability to create useful statistics while the privacy of the individual is maintained. However the disadvantage of differential privacy is the magnitude of the randomized noise applied to the data in order to hide the individual. This research examined whether it is possible to improve the usability of the privatized result by using machine learning to generate a data set that the noise can be based on. The purpose of the generated data set is to provide a local representation of the underlying data set that is safe to use when calculating the magnitude of the randomized noise. The results of this research has determined that this approach is currently not a feasible solution, but demonstrates possible ways to base further research in order to improve the usability of differential privacy. The research indicates limiting the noise to a lower bound calculated from the underlying data set might be enough to reach all privacy requirements. Furthermore, the accuracy of the machining learning algorithm and its impact on the usability of the noise, was not fully investigated and could be of interest in future studies. / Aldrig tidigare har integritet varit viktigare att upprätthålla än i dagens informationssamhälle, där företag och organisationer samlar stora mängder data om sina användare. Merparten av denna information är sedd som värdefull och kan användas för att skapa statistik som i sin tur kan ge insikt inom områden som medicin, ekonomi eller beteendemönster bland individer. För att säkerställa att en enskild individs integritet upprätthålls har en teknik som heter differential privacy utvecklats. Denna möjliggör framtagandet av användbar statistik samtidigt som individens integritet upprätthålls. Differential privacy har dock en nackdel, och det är storleken på det randomiserade bruset som används för att dölja individen i en fråga om data. Denna undersökning undersökte huruvida detta brus kunde förbättras genom att använda maskininlärning för att generera ett data set som bruset kunde baseras på. Tanken var att den genererade datasetet skulle kunna ge en lokal representation av det underliggande datasetet som skulle vara säker att använda vid beräkning av det randomiserade brusets storlek. Forskningen visar att detta tillvägagångssätt för närvarande inte stöds av resultaten. Storleken på det beräknade bruset var inte tillräckligt stort och resulterade därmed i en oacceptabel mängd läckt information. Forskningen visar emellertid att genom att begränsa bruset till en lägsta nivå som är beräknad från det lokala datasetet möjligtvis kan räcka för att uppfylla alla sekretesskrav. Ytterligare forskning behövs för att säkerställa att detta ger den nödvändiga nivån av integritet. Vidare undersöktes inte noggrannheten hos maskininlärningsalgoritmen och dess inverkan på brusets användbarhet vilket kan vara en inriktning för vidare studier.
34

Optimizing Linear Queries Under Differential Privacy

Li, Chao 01 September 2013 (has links)
Private data analysis on statistical data has been addressed by many recent literatures. The goal of such analysis is to measure statistical properties of a database without revealing information of individuals who participate in the database. Differential privacy is a rigorous privacy definition that protects individual information using output perturbation: a differentially private algorithm produces statistically indistinguishable outputs no matter whether the database contains a tuple corresponding to an individual or not. It is straightforward to construct differentially private algorithms for many common tasks and there are published algorithms to support various tasks under differential privacy. However methods to design error-optimal algorithms for most non-trivial tasks are still unknown. In particular, we are interested in error-optimal algorithms for sets of linear queries. A linear query is a sum of counts of tuples that satisfy a certain condition, which covers the scope of many aggregation tasks including count, sum and histogram. We present the matrix mechanism, a novel mechanism for answering sets of linear queries under differential privacy. The matrix mechanism makes a clear distinction between a set of queries submitted by users, called the query workload, and an alternative set of queries to be answered under differential privacy, called the query strategy. The answer to the query workload can then be computed using the answer to the query strategy. Given a query workload, the query strategy determines the distribution of the output noise and the power of the matrix mechanism comes from adaptively choosing a query strategy that minimizes the output noise. Our analyses also provide a theoretical measure to the quality of different strategies for a given workload. This measure is then used in accurate and approximate formulations to the optimization problem that outputs the error-optimal strategy. We present a lower bound of error to answer each workload under the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. In addition, we design an approximate algorithm, which generates strategies generated by our a out perform state-of-art mechanisms over (epsilon, delta)-differential privacy. Those strategies lead to more accurate data analysis while preserving a rigorous privacy guarantee. Moreover, we also combine the matrix mechanism with a novel data-dependent algorithm, which achieves differential privacy by adding noise that is adapted to the input data and to the given query workload.
35

Balancing Privacy and Accuracy in IoT using Domain-Specific Features for Time Series Classification

Lakhanpal, Pranshul 01 June 2023 (has links) (PDF)
ε-Differential Privacy (DP) has been popularly used for anonymizing data to protect sensitive information and for machine learning (ML) tasks. However, there is a trade-off in balancing privacy and achieving ML accuracy since ε-DP reduces the model’s accuracy for classification tasks. Moreover, not many studies have applied DP to time series from sensors and Internet-of-Things (IoT) devices. In this work, we try to achieve the accuracy of ML models trained with ε-DP data to be as close to the ML models trained with non-anonymized data for two different physiological time series. We propose to transform time series into domain-specific 2D (image) representations such as scalograms, recurrence plots (RP), and their joint representation as inputs for training classifiers. The advantages of using these image representations render our proposed approach secure by preventing data leaks since these image transformations are irreversible. These images allow us to apply state-of-the-art image classifiers to obtain accuracy comparable to classifiers trained on non-anonymized data by ex- ploiting the additional information such as textured patterns from these images. In order to achieve classifier performance with anonymized data close to non-anonymized data, it is important to identify the value of ε and the input feature. Experimental results demonstrate that the performance of the ML models with scalograms and RP was comparable to ML models trained on their non-anonymized versions. Motivated by the promising results, an end-to-end IoT ML edge-cloud architecture capable of detecting input drifts is designed that employs our technique to train ML models on ε-DP physiological data. Our classification approach ensures the privacy of individuals while processing and analyzing the data at the edge securely and efficiently.
36

Addressing Fundamental Limitations in Differentially Private Machine Learning

Nandi, Anupama January 2021 (has links)
No description available.
37

Anonymization of directory-structured sensitive data / Anonymisering av katalogstrukturerad känslig data

Folkesson, Carl January 2019 (has links)
Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
38

Privacy Preserving Survival Prediction With Graph Neural Networks / Förutsägelse av överlevnad med integritetsskydd med Graph Neural Networks

Fedeli, Stefano January 2021 (has links)
In the development process of novel cancer drugs, one important aspect is to identify patient populations with a high risk of early death so that resources can be focused on patients with the highest medical unmet need. Many cancer types are heterogeneous and there is a need to identify patients with aggressive diseases, meaning a high risk of early death, compared to patients with indolent diseases, meaning a low risk of early death. Predictive modeling can be a useful tool for risk stratification in clinical practice, enabling healthcare providers to treat high-risk patients early and progressively, while applying a less aggressive watch-and-wait strategy for patients with a lower risk of death. This is important from a clinical perspective, but also a health economic perspective since society has limited resources, and costly drugs should be given to patients that can benefit the most from a specific treatment. Thus, the goal of predictive modeling is to ensure that the right patient will have access to the right drug at the right time. In the era of personalized medicine, Artificial Intelligence (AI) applied to high-quality data will most likely play an important role and many techniques have been developed. In particular, Graph Neural Network (GNN) is a promising tool since it captures the complexity of high dimensional data modeled as a graph. In this work, we have applied Network Representation Learning (NRL) techniques to predict survival, using pseudonymized patient-level data from national health registries in Sweden. Over the last decade, more health data of increased complexity has become available for research, and therefore precision medicine could take advantage of this trend by bringing better healthcare to the patients. However, it is important to develop reliable prediction models that not only show high performances but take into consideration privacy, avoiding any leakage of personal information. The present study contributes novel insights related to GNN performance in different survival prediction tasks, using population-based unique nationwide data. Furthermore, we also explored how privacy methods impact the performance of the models when applied to the same dataset. We conducted a set of experiments across 6 dataset using 8 models measuring both AUC, Precision and Recall. Our evaluation results show that Graph Neural Networks were able to reach accuracy performance close to the models used in clinical practice and constantly outperformed, by at least 4.5%, the traditional machine learning methods. Furthermore, the study demonstrated how graph modeling, when applied based on knowledge from clinical experts, performed well and showed high resiliency to the noise introduced for privacy preservation. / I utvecklingsprocessen för nya cancerläkemedel är en viktig aspekt att identifiera patientgrupper med hög risk för tidig död, så att resurser kan fokuseras på patientgrupper med störst medicinskt behov. Många cancertyper är heterogena och det finns ett behov av att identifiera patienter med aggressiv sjukdom, vilket innebär en hög risk för tidig död, jämfört med patienter med indolenta sjukdom, vilket innebär lägre risk för tidig död. Prediktiv modellering kan vara ett användbart verktyg för riskstratifiering i klinisk praxis, vilket gör det möjligt för vårdgivare att behandla patienter olika utifrån individuella behov. Detta är viktigt ur ett kliniskt perspektiv, men också ur ett hälsoekonomiskt perspektiv eftersom samhället har begränsade resurser och kostsamma läkemedel bör ges till de patienter som har störst nytta av en viss behandling. Målet med prediktiv modellering är således att möjliggöra att rätt patient får tillgång till rätt läkemedel vid rätt tidpunkt. Framför allt är Graph Neural Network (GNN) ett lovande verktyg eftersom det fångar komplexiteten hos högdimensionella data som modelleras som ett diagram. I detta arbete har vi tillämpat tekniker för inlärning av grafrepresentationer för att prediktera överlevnad med hjälp av pseudonymiserade data från nationella hälsoregister i Sverige. Under det senaste decennierna har mer hälsodata av ökad komplexitet blivit tillgänglig för forskning. Även om denna ökning kan bidra till utvecklingen av precisionsmedicinen är det viktigt att utveckla tillförlitliga prediktionsmodeller som tar hänsyn till patienters integritet och datasäkerhet. Den här studien kommer att bidra med nya insikter om GNNs prestanda i prediktiva överlevnadsmodeller, med hjälp av populations -baserade data. Dessutom har vi också undersökt hur integritetsmetoder påverkar modellernas prestanda när de tillämpas på samma dataset. Sammanfattningsvis, Graph Neural Network kan uppnå noggrannhets -prestanda som ligger nära de modeller som tidigare använts i klinisk praxis och i denna studie preserade de alltid bättre än traditionella maskininlärnings -metoder. Studien visisade vidare hur grafmodellering som utförs i samarbete med kliniska experter kan vara effektiva mot det brus som införs av olika integritetsskyddstekniker.
39

Des approches formelles pour le cachement d'information: Une analyse des systèmes interactifs, contrôle de divulgation statistique, et le raffinement des spécifications

Alvim, Mário 12 October 2011 (has links) (PDF)
Cette thèse traite des mesures des flux d'information dans les systèmes informatiques. Nous exploitons les similarités entre des différents scénarios où la sécurité est une préoccupation, et nous appliquons les concepts de la théorie de l'information pour évaluer le niveau de protection offerte. Dans le premier scénario, nous considérons le problème de la définition des fuites d'information dans les systèmes interactifs où les éléments secrets et les éléments observables peuvent s'alterner au cours du calcul. Nous montrons que l'approche classique de la théorie de l'information qui interprète des systèmes tels que des canaux bruités (simples) n'est plus valide. Toutefois, le principe peut être récupéré si l'on considère les canaux d'un type plus compliqué, que, dans Théorie de l'information sont connus en tant que canaux avec mémoire et rétroaction. Nous montrons qu'il existe une correspondance parfaite entre les systèmes interactifs et ce type de canaux. Dans le deuxième scénario, nous considérons le problème de la vie privée dans les bases de données statistiques. Dans la communauté des bases de données, le concept de Differential Privacy est une notion qui est devenue très populaire. En gros, l'idée est qu'un mécanisme d'interrogation aléatoire assure la protection suffisante si le rapport entre les probabilités que deux ensembles de données adjacentes donnent la même réponse est lié par une constante. On observe la similarité de cet objectif avec la principale préoccupation dans le domaine des flux de l'information: limiter la possibilité de déduire les éléments secrets à partir des éléments observables. Nous montrons comment modéliser le système d'interrogation en termes d'un canal d'information-théorique, et l'on compare la notion de Differential Privacy avec le concept information mutuelle basé sur le travail de Smith. Nous montrons que Differential Privacy implique une borne sur l'information mutuelle, mais pas vice-versa. Nous avons également réfléchir à l'utilité du mécanisme de randomisation, ce qui représente la proximité entre les réponses aléatoires et les vraies, en moyenne. Nous montrons que la notion de Differential Privacy implique une borne serrée sur l'utilité, et nous proposons une méthode qui, sous certaines conditions, construit un mécanisme de randomisation optimale. Déménagent de l'accent mis sur des approches quantitatives, nous abordons le problème de l'utilisation des équivalences des processus pour caractériser des propriétés de protection d'information. Dans la littérature, certains travaux ont utilisé cette approche, fondée sur le principe selon lequel un protocole P avec une variable x satisfait de ces biens si et seulement si, pour chaque paire de secrets s1 et s2, P [s1 / x] est équivalent à P [s2 / x]. Nous montrons que, en présence de non-déterminisme, le principe ci-dessus repose sur l'hypothèse que le scheduler "travaille pour le bénéfice du protocole", et ce n'est généralement pas une hypothèse valable. Parmi des équivalences non-secoures, en ce sens, il y a l'équivalence des traces complètes et la bisimulation. Nous présentons un formalisme dans lequel on peut spécifier schedulers valides et, en conséquence, des versions sécurité des équivalences sur dessus. Nous montrons que notre version de bisimulation est toujours une congruence. Enfin, nous montrons que nos équivalences de sécurité peuvent être utilisées pour établir les propriétés de protection d'information.
40

Towards secure computation for people

Issa, Rawane 23 June 2023 (has links)
My research investigates three questions: How do we customize protocols and implementations to account for the unique requirement of each setting and its target community, what are necessary steps that we can take to transition secure computation tools into practice, and how can we promote their adoption for users at large? In this dissertation I present several of my works that address these three questions with a particular focus on one of them. First my work on "Hecate: Abuse Reporting in Secure Messengers with Sealed Sender" designs a customized protocol to protect people from abuse and surveillance in online end to end encrypted messaging. Our key insight is to add pre-processing to asymmetric message franking, where the moderating entity can generate batches of tokens per user during off-peak hours that can later be deposited when reporting abuse. This thesis then demonstrates that by carefully tailoring our cryptographic protocols for real world use cases, we can achieve orders of magnitude improvements over prior works with minimal assumptions over the resources available to people. Second, my work on "Batched Differentially Private Information Retrieval" contributes a novel Private Information Retrieval (PIR) protocol called DP-PIR that is designed to provide high throughput at high query rates. It does so by pushing all public key operations into an offline stage, batching queries from multiple clients via techniques similar to mixnets, and maintain differential privacy guarantees over the access patterns of the database. Finally, I provide three case studies showing that we cannot hope to further the adoption of cryptographic tools in practice without collaborating with the very people we are trying to protect. I discuss a pilot deployment of secure multi-party computation (MPC) that I have done with the Department of Education, deployments of MPC I have done for the Boston Women’s Workforce Council and the Greater Boston Chamber of Commerce, and ongoing work in developing tool chain support for MPC via an automated resource estimation tool called Carousels.

Page generated in 0.1241 seconds