1 |
Defending against inference attack in online social networksChen, Jiayi 19 July 2017 (has links)
The privacy issues in online social networks (OSNs) have been increasingly arousing the public awareness since it is possible for attackers to launch several kinds of attacks to obtain users' sensitive and private information by exploiting the massive data obtained from the networks. Even if users conceal their sensitive information, attackers can infer their secrets by studying the correlations between private and public information with background knowledge. To address these issues, the thesis focuses on the inference attack and its countermeasures.
First, we study how to launch the inference attack to profile OSN users via relationships and network characteristics. Due to both user privacy concerns and unformatted textual information, it is quite difficult to build a completely labeled social network directly. However, both social relations and network characteristics can help attribute inference to profile OSN users. We propose several attribute inference models based on these two factors and implement them with Naive Bayes, Decision Tree, and Logistic Regression. Also, to study network characteristics and evaluate the performance of our proposed models, we use a well-labeled Google employee social network extracted from Google+ for inferring the social roles of Google employees. The experiment results demonstrate that the proposed models are effective in social role inference with Dyadic Label Model performing the best.
Second, we model the general inference attack and formulate the privacy-preserving data sharing problem to defend against the attack. The optimization problem is to maximize the users' self-disclosure utility while preserving their privacy. We propose two privacy-preserving social network data sharing methods to counter the inference attack. One is the efficient privacy-preserving disclosure algorithm (EPPD) targeting the high utility, and the other is to convert the original problem into a multi-dimensional knapsack problem (d-KP) which can be solved with a low computational complexity. We use real-world social network datasets to evaluate the performance. From the results, the proposed methods achieve a better performance when compared with the existing ones.
Finally, we design a privacy protection authorization framework based on the OAuth 2.0 protocol. Many third-party services and applications have integrated the login services of popular social networking sites, such as Facebook and Google+, and acquired user information to enrich their services by requesting user's permission. However, due to the inference attack, it is still possible to infer users' secrets. Therefore, we embed our privacy-preserving data sharing algorithms in the implementation of OAuth 2.0 framework and propose RANPriv-OAuth2 to protect users' privacy from the inference attack. / Graduate
|
2 |
MEMBERSHIP INFERENCE ATTACKS AND DEFENSES IN CLASSIFICATION MODELSJiacheng Li (17775408) 12 January 2024 (has links)
<p dir="ltr">Neural network-based machine learning models are now prevalent in our daily lives, from voice assistants~\cite{lopez2018alexa}, to image generation~\cite{ramesh2021zero} and chatbots (e.g., ChatGPT-4~\cite{openai2023gpt4}). These large neural networks are powerful but also raise serious security and privacy concerns, such as whether personal data used to train these models are leaked by these models. One way to understand and address this privacy concern is to study membership inference (MI) attacks and defenses~\cite{shokri2017membership,nasr2019comprehensive}. In MI attacks, an adversary seeks to infer if a given instance was part of the training data. We study the membership inference (MI) attack against classifiers, where the attacker's goal is to determine whether a data instance was used for training the classifier. Through systematic cataloging of existing MI attacks and extensive experimental evaluations of them, we find that a model's vulnerability to MI attacks is tightly related to the generalization gap---the difference between training accuracy and test accuracy. We then propose a defense against MI attacks that aims to close the gap by intentionally reduces the training accuracy. More specifically, the training process attempts to match the training and validation accuracies, by means of a new {\em set regularizer} using the Maximum Mean Discrepancy between the softmax output empirical distributions of the training and validation sets. Our experimental results show that combining this approach with another simple defense (mix-up training) significantly improves state-of-the-art defense against MI attacks, with minimal impact on testing accuracy. </p><p dir="ltr"><br></p><p dir="ltr">Furthermore, we considers the challenge of performing membership inference attacks in a federated learning setting ---for image classification--- where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). Passive attacks are one of the hardest-to-detect attacks, since they can be performed without modifying how the behavior of the central server or its clients, and assumes {\em no access to private data instances}. The key insight of our method is empirically observing that, near parameters that generalize well in test, the gradient of large overparameterized neural network models statistically behave like high-dimensional independent isotropic random vectors. Using this insight, we devise two attacks that are often little impacted by existing and proposed defenses. Finally, we validated the hypothesis that our attack depends on the overparametrization by showing that increasing the level of overparametrization (without changing the neural network architecture) positively correlates with our attack effectiveness.</p><p dir="ltr">Finally, we observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy. </p><p dir="ltr"><br></p>
|
3 |
Incorporating Obfuscation Techniques in Privacy Preserving Database-Driven Dynamic Spectrum Access SystemsZabransky, Douglas Milton 11 September 2018 (has links)
Modern innovation is a driving force behind increased spectrum crowding. Several studies performed by the National Telecommunications and Information Administration (NTIA), Federal Communications Commission (FCC), and other groups have proposed Dynamic Spectrum Access (DSA) as a promising solution to alleviate spectrum crowding. The spectrum assignment decisions in DSA will be made by a centralized entity referred to as as spectrum access system (SAS); however, maintaining spectrum utilization information in SAS presents privacy risks, as sensitive Incumbent User (IU) operation parameters are required to be stored by SAS in order to perform spectrum assignments properly. These sensitive operation parameters may potentially be compromised if SAS is the target of a cyber attack or an inference attack executed by a secondary user (SU).
In this thesis, we explore the operational security of IUs in SAS-based DSA systems and propose a novel privacy-preserving SAS-based DSA framework, Suspicion Zone SAS (SZ-SAS), the first such framework which protects against both the scenario of inference attacks in an area with sparsely distributed IUs and the scenario of untrusted or compromised SAS. We then define modifications to the SU inference attack algorithm, which demonstrate the necessity of applying obfuscation to SU query responses. Finally, we evaluate obfuscation schemes which are compatible with SZ-SAS, verifying the effectiveness of such schemes in preventing an SU inference attack. Our results show SZ-SAS is capable of utilizing compatible obfuscation schemes to prevent the SU inference attack, while operating using only homomorphically encrypted IU operation parameters. / Master of Science / Dynamic Spectrum Access (DSA) allows users to opportunistically access spectrum resources which were previously reserved for use by specified parties. This spectrum sharing protocol has been identified as a potential solution to the issue of spectrum crowding. This sharing will be accomplished through the use of a centralized server, known as a spectrum access system (SAS). However, current SAS-based DSA proposals require users to submit information such as location and transmission properties to SAS. The privacy of these users is of the utmost importance, as many existing users in these spectrum bands are military radars and other users for which operational security is pivotal. Storing the information for these users in a central database can be an major privacy issue, as this information could be leaked if SAS is compromised by a malicious party. Additionally, malicious secondary users (SUs) may perform an inference attack, which could also reveal the location of these military radars. In this thesis, we demonstrate a SAS-framework, SZ-SAS, which allows SAS to function without direct knowledge of user information. We also propose techniques for mitigating the inference attack which are compatible with SZ-SAS
|
4 |
Inference attacks on geolocated data / Attaques d'inférence sur des bases de données géolocaliséesNuñez del Prado Cortez, Miguel 12 December 2013 (has links)
Au cours des dernières années, nous avons observé le développement de dispositifs connectéset nomades tels que les téléphones mobiles, tablettes ou même les ordinateurs portablespermettant aux gens d’utiliser dans leur quotidien des services géolocalisés qui sont personnalisésd’après leur position. Néanmoins, les services géolocalisés présentent des risques enterme de vie privée qui ne sont pas forcément perçus par les utilisateurs. Dans cette thèse,nous nous intéressons à comprendre les risques en terme de vie privée liés à la disséminationet collection de données de localisation. Dans ce but, les attaques par inférence que nousavons développé sont l’extraction des points d’intérêts, la prédiction de la prochaine localisationainsi que la désanonymisation de traces de mobilité, grâce à un modèle de mobilité quenous avons appelé les chaînes de Markov de mobilité. Ensuite, nous avons établi un classementdes attaques d’inférence dans le contexte de la géolocalisation se basant sur les objectifsde l’adversaire. De plus, nous avons évalué l’impact de certaines mesures d’assainissement àprémunir l’efficacité de certaines attaques par inférence. En fin nous avons élaboré une plateformeappelé GEoPrivacy Enhanced TOolkit (GEPETO) qui permet de tester les attaques parinférences développées. / In recent years, we have observed the development of connected and nomad devices suchas smartphones, tablets or even laptops allowing individuals to use location-based services(LBSs), which personalize the service they offer according to the positions of users, on a dailybasis. Nonetheless, LBSs raise serious privacy issues, which are often not perceived by the endusers. In this thesis, we are interested in the understanding of the privacy risks related to thedissemination and collection of location data. To address this issue, we developed inferenceattacks such as the extraction of points of interest (POI) and their semantics, the predictionof the next location as well as the de-anonymization of mobility traces, based on a mobilitymodel that we have coined as mobility Markov chain. Afterwards, we proposed a classificationof inference attacks in the context of location data based on the objectives of the adversary.In addition, we evaluated the effectiveness of some sanitization measures in limiting the efficiencyof inference attacks. Finally, we have developed a generic platform called GEPETO (forGEoPrivacy Enhancing Toolkit) that can be used to test the developed inference attacks
|
5 |
Privacy-preserving spectrum sharing / Un partage de spectre préservant la confidentialitéBen-Mosbah, Azza 24 May 2017 (has links)
Les bandes des fréquences, telles qu'elles sont aménagées aujourd'hui, sont statiquement allouées. Afin d'améliorer la productivité et l'efficacité de l'utilisation du spectre, une nouvelle approche a été proposée : le "partage dynamique du spectre". Les régulateurs, les industriels et les scientifiques ont examiné le partage des bandes fédérales entre les détenteurs de licences (utilisateurs primaires) et les nouveaux entrants (utilisateurs secondaires). La nature d'un tel partage peut faciliter les attaques d'inférence et mettre en péril les paramètres opérationnels des utilisateurs primaires. Par conséquent, le but de cette thèse est d'améliorer la confidentialité des utilisateurs primaires tout en permettant un accès secondaire au spectre. Premièrement, nous présentons une brève description des règles de partage et des exigences en termes de confidentialité dans les bandes fédérales. Nous étudions également les techniques de conservation de confidentialité (obscurcissement) proposées dans les domaines d'exploration et d'édition de données pour contrecarrer les attaques d'inférence. Ensuite, nous proposons et mettons en œuvre notre approche pour protéger la fréquence et la localisation opérationnelles contre les attaques d'inférence. La première partie étudie la protection de la fréquence opérationnelle en utilisant un obscurcissement inhérent et explicite pour préserver la confidentialité. La deuxième partie traite la protection de la localisation opérationnelle en utilisant la confiance comme principale contre-mesure pour identifier et atténuer un risque d'inférence. Enfin, nous présentons un cadre axé sur les risques qui résume notre travail et s'adapte à d'autres approches de protection de la confidentialité. Ce travail est soutenu par des modèles, des simulations et des résultats qui focalisent sur l'importance de quantifier les techniques de préservation de la confidentialité et d'analyser le compromis entre la protection de la confidentialité et l'efficacité du partage du spectre / Radio frequencies, as currently allocated, are statically managed. Spectrum sharing between commercial users and incumbent users in the Federal bands has been considered by regulators, industry, and academia as a great way to enhance productivity and effectiveness in spectrum use. However, allowing secondary users to share frequency bands with sensitive government incumbent users creates new privacy threats in the form of inference attacks. Therefore, the aim of this thesis is to enhance the privacy of the incumbent while allowing secondary access to the spectrum. First, we present a brief description of different sharing regulations and privacy requirements in Federal bands. We also survey the privacy-preserving techniques (i.e., obfuscation) proposed in data mining and publishing to thwart inference attacks. Next, we propose and implement our approach to protect the operational frequency and location of the incumbent operations from inferences. We follow with research on frequency protection using inherent and explicit obfuscation to preserve the incumbent's privacy. Then, we address location protection using trust as the main countermeasure to identify and mitigate an inference risk. Finally, we present a risk-based framework that integrates our work and accommodates other privacy-preserving approaches. This work is supported with models, simulations and results that showcase our work and quantify the importance of evaluating privacy-preserving techniques and analyzing the trade-off between privacy protection and spectrum efficiency
|
6 |
Preventing Health Data from Leaking in a Machine Learning System : Implementing code analysis with LLM and model privacy evaluation testing / Förhindra att Hälsodata Läcker ut i ett Maskininlärnings System : Implementering av kod analys med stor språk-modell och modell integritets testningJanryd, Balder, Johansson, Tim January 2024 (has links)
Sensitive data leaking from a system can have tremendous negative consequences, such as discrimination, social stigma, and fraudulent economic consequences for those whose data has been leaked. Therefore, it’s of utmost importance that sensitive data is not leaked from a system. This thesis investigated different methods to prevent sensitive patient data from leaking in a machine learning system. Various methods have been investigated and evaluated based on previous research; the methods used in this thesis are a large language model (LLM) for code analysis and a membership inference attack on models to test their privacy level. The LLM code analysis results show that the Llama 3 (an LLM) model had an accuracy of 90% in identifying malicious code that attempts to steal sensitive patient data. The model analysis can evaluate and determine membership inference of sensitive patient data used for training in machine learning models, which is essential for determining data leakage a machine learning model can pose in machine learning systems. Further studies in increasing the deterministic and formatting of the LLM‘s responses must be investigated to ensure the robustness of the security system that utilizes LLMs before it can be deployed in a production environment. Further studies of the model analysis can apply a wider variety of evaluations, such as increased size of machine learning model types and increased range of attack testing types of machine learning models, which can be implemented into machine learning systems. / Känsliga data som läcker från ett system kan ha enorma negativa konsekvenser, såsom diskriminering, social stigmatisering och negativa ekonomiska konsekvenser för dem vars data har läckt ut. Därför är det av yttersta vikt att känsliga data inte läcker från ett system. Denna avhandling undersökte olika metoder för att förhindra att känsliga patientdata läcker ut ur ett maskininlärningssystem. Olika metoder har undersökts och utvärderats baserat på tidigare forskning; metoderna som användes i denna avhandling är en stor språkmodell (LLM) för kodanalys och en medlemskapsinfiltrationsattack på maskininlärnings (ML) modeller för att testa modellernas integritetsnivå. Kodanalysresultaten från LLM visar att modellen Llama 3 hade en noggrannhet på 90% i att identifiera skadlig kod som försöker stjäla känsliga patientdata. Modellanalysen kan utvärdera och bestämma medlemskap av känsliga patientdata som används för träning i maskininlärningsmodeller, vilket är avgörande för att bestämma den dataläckage som en maskininlärningsmodell kan exponera. Ytterligare studier för att öka determinismen och formateringen av LLM:s svar måste undersökas för att säkerställa robustheten i säkerhetssystemet som använder LLM:er innan det kan driftsättas i en produktionsmiljö. Vidare studier av modellanalysen kan tillämpa ytterligare bredd av utvärderingar, såsom ökad storlek på maskininlärningsmodelltyper och ökat utbud av attacktesttyper av maskininlärningsmodeller som kan implementeras i maskininlärningssystem.
|
Page generated in 0.0749 seconds