Spelling suggestions: "subject:"data anonymization"" "subject:"data synonymization""
1 |
Conception de mécanismes d'accréditations anonymes et d'anonymisation de données / Design of anonymous credentials systems and data anonymization techniquesBrunet, Solenn 27 November 2017 (has links)
L'émergence de terminaux mobiles personnels, capables à la fois de communiquer et de se positionner, entraîne de nouveaux usages et services personnalisés. Néanmoins, ils impliquent une collecte importante de données à caractère personnel et nécessitent des solutions adaptées en termes de sécurité. Les utilisateurs n'ont pas toujours conscience des informations personnelles et sensibles qui peuvent être déduites de leurs utilisations. L'objectif principal de cette thèse est de montrer comment des mécanismes cryptographiques et des techniques d'anonymisation de données peuvent permettre de concilier à la fois le respect de la vie privée, les exigences de sécurité et l'utilité du service fourni. Dans une première partie, nous étudions les accréditations anonymes avec vérification par clé. Elles permettent de garantir l'anonymat des utilisateurs vis-à-vis du fournisseur de service : un utilisateur prouve son droit d'accès, sans révéler d'information superflue. Nous introduisons des nouvelles primitives qui offrent des propriétés distinctes et ont un intérêt à elles-seules. Nous utilisons ces constructions pour concevoir trois systèmes respectueux de la vie privée : un premier système d'accréditations anonymes avec vérification par clé, un deuxième appliqué au vote électronique et un dernier pour le paiement électronique. Chaque solution est validée par des preuves de sécurité et offre une efficacité adaptée aux utilisations pratiques. En particulier, pour deux de ces contributions, des implémentations sur carte SIM ont été réalisées. Néanmoins, certains types de services nécessitent tout de même l'utilisation ou le stockage de données à caractère personnel, par nécessité de service ou encore par obligation légale. Dans une seconde partie, nous étudions comment rendre respectueuses de la vie privée les données liées à l'usage de ces services. Nous proposons un procédé d'anonymisation pour des données de mobilité stockées, basé sur la confidentialité différentielle. Il permet de fournir des bases de données anonymes, en limitant le bruit ajouté. De telles bases de données peuvent alors être exploitées à des fins d'études scientifiques, économiques ou sociétales, par exemple. / The emergence of personal mobile devices, with communication and positioning features, is leading to new use cases and personalized services. However, they imply a significant collection of personal data and therefore require appropriate security solutions. Indeed, users are not always aware of the personal and sensitive information that can be inferred from their use. The main objective of this thesis is to show how cryptographic mechanisms and data anonymization techniques can reconcile privacy, security requirements and utility of the service provided. In the first part, we study keyed-verification anonymous credentials which guarantee the anonymity of users with respect to a given service provider: a user proves that she is granted access to its services without revealing any additional information. We introduce new such primitives that offer different properties and are of independent interest. We use these constructions to design three privacy-preserving systems: a keyed-verification anonymous credentials system, a coercion-resistant electronic voting scheme and an electronic payment system. Each of these solutions is practical and proven secure. Indeed, for two of these contributions, implementations on SIM cards have been carried out. Nevertheless, some kinds of services still require using or storing personal data for compliance with a legal obligation or for the provision of the service. In the second part, we study how to preserve users' privacy in such services. To this end, we propose an anonymization process for mobility traces based on differential privacy. It allows us to provide anonymous databases by limiting the added noise. Such databases can then be exploited for scientific, economic or societal purposes, for instance.
|
2 |
Anonymization of directory-structured sensitive data / Anonymisering av katalogstrukturerad känslig dataFolkesson, Carl January 2019 (has links)
Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
|
3 |
Synthetic Graph Generation at Scale : A novel framework for generating large graphs using clustering, generative models and node embeddings / Storskalig generering av syntetiska grafer : En ny arkitektur för att tillverka stora grafer med hjälp av klustring, generativa modeller och nodinbäddningarHammarstedt, Johan January 2022 (has links)
The field of generative graph models has seen increased popularity during recent years as it allows us to model the underlying distribution of a network and thus recreate it. From allowing anonymization of sensitive information in social networks to data augmentation of rare diseases in the brain, the ability to generate synthetic data has multiple applications in various domains. However, most current methods face the bottleneck of trying to generate the entire adjacency matrix and are thus limited to graphs with less than tens of thousands of nodes. In contrast, large real-world graphs like social networks or transaction graphs can extend significantly beyond these boundaries. Furthermore, the current scalable approaches are predominantly based on stochasticity and do not capture local structures and communities. In this paper, we propose Graphwave Edge-Linking CELL or GELCELL, a novel three-step architecture for generating graphs at scale. First, instead of constructing the entire network, GELCELL partitions the data and generates each cluster separately, allowing for efficient and parallelizable training. Then, by encoding the nodes, it trains a classifier to predict the edges between the partitions to patch them together, creating a synthetic version of the original large graph. Although it does suffer from some limitations due to necessary constraints on the cluster sizes, the results showed that GELCELL, given optimized parameters, can produce graphs with reasonable accuracy on all data tested, with the largest having 400 000 nodes and 1 000 000 edges. / Generativa grafmodeller har sett ökad popularitet under de senaste åren eftersom det möjliggör modellering av grafens underliggande distribution, och vi kan på så sätt återskapa liknande kopior. Förmågan att generera syntetisk data har ett flertal applikationsområden i en mängd av områden, allt från att möjligöra anonymisering av känslig data i sociala nätverk till att utöka mängden tillgänglig data av ovanliga hjärnsjukdomar. Dagens metoder har länge varit begränsade till grafer med under tiotusental noder, då dessa inte är tillräckligt skalbara, men grafer som sociala nätverk eller transaktionsgrafer kan sträcka sig långt utöver dessa gränser. Dessutom är de nuvarande skalbara tillvägagångssätten till största delen baserade på stokasticitet och fångar inte lokala strukturer och kluster. I denna rapport föreslår vi ”Graphwave EdgeLinking CELL” eller GELCELL, en trestegsarkitektur för att generera grafer i större skala. Istället för att återskapa hela grafen direkt så partitionerar GELCELL all datat och genererar varje kluster separat, vilket möjliggör både effektiv och parallelliserbar träning. Vi kan sedan koppla samman grafen genom att koda noderna och träna en modell för att prediktera länkarna mellan kluster och återskapa en syntetisk version av originalet. Metoden kräver vissa antaganden gällande max-storleken på dess kluster men är flexibel och kan rymma domänkännedom om en specifik graf i form av informerad parameterinställning. Trots detta visar resultaten på varierade träningsdata att GELCELL, givet optimerade parametrar, är kapabel att genera grafer med godtycklig precision upp till den största beprövade grafen med 400 000 noder och 1 000 000 länkar.
|
4 |
Kodanonymisering vid integration med ChatGPT : Säkrare ChatGPT-användning med en kodanonymiseringsapplikation / Code anonymization when integrating with ChatGPT : Safer ChatGPT usage with a code anonymization applicationAzizi, Faruk January 2023 (has links)
Denna avhandling studerar området av kodanonymisering inom programvaruutveckling, med fokus på att skydda känslig källkod i en alltmer digitaliserad och AI-integrerad värld. Huvudproblemen som avhandlingen adresserar är de tekniska och säkerhetsmässiga utmaningarna som uppstår när källkod behöver skyddas, samtidigt som den ska vara tillgänglig för AI-baserade analysverktyg som ChatGPT. I denna avhandling presenteras utvecklingen av en applikation vars mål är att anonymisera källkod, för att skydda känslig information samtidigt som den möjliggör säker interaktion med AI. För att lösa dessa utmaningar har Roslyn API använts i kombination med anpassade identifieringsalgoritmer för att analysera och bearbeta C#-källkod, vilket säkerställer en balans mellan anonymisering och bevarande av kodens funktionalitet. Roslyn API är en del av Microsofts .NET-kompilatorplattform som tillhandahåller rika funktioner för kodanalys och transformation, vilket möjliggör omvandling av C#-källkod till ett detaljerat syntaxträd för inspektion och manipulering av kodstrukturer. Resultaten av projektet visar att den utvecklade applikationen framgångsrikt anonymiserar variabel-, klass- och metodnamn, samtidigt som den bibehåller källkodens logiska struktur. Dess integration med ChatGPT förbättrar användarupplevelsen genom att erbjuda interaktiva dialoger för analys och assistans, vilket gör den till en värdefull resurs för utvecklare. Framtida arbete inkluderar utvidgning av applikationen för att stödja fler programmeringsspråk och utveckling av användaranpassade konfigurationer för att ytterligare förbättra användarvänligheten och effektiviteten. / This thesis addresses the area of code anonymization in software development, with a focus on protecting sensitive source code in an increasingly digitized and AI-integrated world. The main problems that the thesis addresses are the technical and security challenges that arise when source code needs to be protected, while being accessible to AI-based analysis tools such as ChatGPT. This thesis presents the development of an application whose goal is to anonymize source code, in order to protect sensitive information while enabling safe interaction with AI. To solve these challenges, the Roslyn API has been used in combination with customized identification algorithms to analyze and process C# source code, ensuring a balance between anonymization and preservation of the code's functionality. The Roslyn API is part of Microsoft's .NET compiler platform that provides rich code analysis and transformation capabilities, enabling the transformation of C# source code into a detailed syntax tree for code structure inspection and manipulation.The results of the project show that the developed application successfully anonymizes variable, class, and method names, while maintaining the logical structure of the source code. Its integration with ChatGPT enhances the user experience by providing interactive dialogues for analysis and assistance, making it a valuable resource for developers. Future work includes extending the application to support more programming languages and developing customized configurations to further improve ease of use and efficiency.
|
5 |
GARBLED COMPUTATION: HIDING SOFTWARE, DATAAND COMPUTED VALUESShoaib Amjad Khan (19199497) 27 July 2024 (has links)
<p dir="ltr">This thesis presents an in depth study and evaluation of a class of secure multiparty protocols that enable execution of a confidential software program $\mathcal{P}$ owned by Alice, on confidential data $\mathcal{D}$ owned by Bob, without revealing anything about $\mathcal{P}$ or $\mathcal{D}$ in the process. Our initial adverserial model is an honest-but-curious adversary, which we later extend to a malicious adverarial setting. Depending on the requirements, our protocols can be set up such that the output $\mathcal{P(D)}$ may only be learned by Alice, Bob, both, or neither (in which case an agreed upon third party would learn it). Most of our protocols are run by only two online parties which can be Alice and Bob, or alternatively they could be two commodity cloud servers (in which case neither Alice nor Bob participate in the protocols' execution - they merely initialize the two cloud servers, then go offline). We implemented and evaluated some of these protocols as prototypes that we made available to the open source community via Github. We report our experimental findings that compare and contrast the viability of our various approaches and those that already exist. All our protocols achieve the said goals without revealing anything other than upper bounds on the sizes of program and data.</p><p><br></p>
|
6 |
Porovnání přístupů ke generování umělých dat / Comparison of Approaches to Synthetic Data GenerationŠejvlová, Ludmila January 2017 (has links)
The diploma thesis deals with synthetic data, selected approaches to their generation together with a practical task of data generation. The goal of the thesis is to describe the selected approaches to data generation, capture their key advantages and disadvantages and compare the individual approaches to each other. The practical part of the thesis describes generation of synthetic data for teaching knowledge discovery using databases. The thesis includes a basic description of synthetic data and thoroughly explains the process of their generation. The approaches selected for further examination are random data generation, the statistical approach, data generation languages and the ReverseMiner tool. The thesis also describes the practical usage of synthetic data and the suitability of each approach for certain purposes. Within this thesis, educational data Hotel SD were created using the ReverseMiner tool. The data contain relations discoverable with SD (set-difference) GUHA-procedures.
|
Page generated in 0.114 seconds