Global ETD Search

1	Privacy preserving data anonymisation: an experimental examination of customer data for POPI compliance in South Africa Chetty, Nirvashnee January 2020 (has links) Data has become an essential commodity in this day and age. Organisations want to share the massive amounts of data that they collect as a way to leverage and grow their businesses. On the other hand, the need to maintain privacy is critical in order to avoid the release of sensitive information. This has been shown to be a constant challenge, namely the trade-off between preserving privacy and data utility [1]. This study performs an evaluation of privacy models together with their relevant tools and techniques to ascertain whether data can be anonymised in such a way that it can be in compliance with the Protection of Personal Information (POPI) Act and preserve the privacy of individuals. The results of this research should provide a practical solution for organisations in South Africa to adequately anonymise customer data to ensure POPI Act compliance with the use of a software tool. An experimental environment was setup with the ARX de-identification tool as the tool of choice to implement the privacy models. Two privacy models, namely k-anonymity and ldiversity, were tested on a publicly available data set. Data quality models as well as privacy risk measures were implemented. The results of the study showed that when taking both data utility and privacy risks into consideration, neither privacy model was the clear winner. The K-anonymity privacy model was a better choice for data utility, whereas the l-diversity privacy model was a better choice for privacy preservation by reducing re-identification risks. Therefore, in relation to the aim of the study which is to compare the results of data anonymisation to ensure that data privacy needs are met more than data utility, the result showed that the l-diversity privacy model was the preferred model. Finally, considering that the POPI Act is still awaiting the final step to be promulgated, there is time to conduct further experiments in the various ways to practically implement and apply data anonymisation techniques in the day-to-day processing of data and information in South Africa. Computer Science Data Anonymisation
2	Anonymisation de documents RDF / Towards RDF Anonymization Dongo Escalante, Irvin Franco Benito 20 December 2017 (has links) Avec l'avancée du Web Sémantique et des initiatives Open Linked Data, une grande quantité de documents RDF sont disponibles sur Internet. L'objectif est de rendre ces données lisibles pour les humains et les machines, en adoptant des formats spéciaux et en les connectant à l'aide des IRIs (International Resource Identifier), qui sont des abstractions de ressources réelles du monde. L’augmentation du nombre de données publiées et partagées augmente également le nombre d’informations sensibles diffusées. En conséquence, la confidentialité des entités d'intérêts (personnes, entreprises, etc.) est un véritable défi, nécessitant des techniques spéciales pour assurer la confidentialité et la sécurité adéquate des données disponibles dans un environnement où chaque utilisateur a accès à l'information sans aucune restriction (Web).Ensuite, trois aspects principaux sont considérés pour assurer la protection de l'entité: (i) Préserver la confidentialité, en identifiant les données qui peuvent compromettre la confidentialité des entités (par exemple, les identifiants, les quasi-identifiants); (ii) Identifier l'utilité des données publiques pour diverses applications (par exemple, statistiques, tests, recherche); et (iii) Les connaissances antérieures du modèle qui peuvent être utilisées par les pirates informatiques (par exemple, le nombre de relations, une relation spécifique, l'information d'un nœud).L'anonymisation est une technique de protection de la confidentialité qui a été appliquée avec succès dans les bases de données et les graphes. Cependant, les études sur l'anonymisation dans le contexte des documents RDF sont très limitées. Ces études sont les travaux initiaux de protection des individus sur des documents RDF, puisqu'ils montrent les approches pratiques d'anonymisation pour des scénarios simples comme l'utilisation d'opérations de généralisation et d'opérations de suppression basées sur des hiérarchies. Cependant, pour des scénarios complexes, où une diversité de données est présentée, les approches d'anonymisations existantes n'assurent pas une confidentialité suffisante.Ainsi, dans ce contexte, nous proposons une approche d'anonymisation, qui analyse les voisins en fonction des connaissances antérieures, centrée sur la confidentialité des entités représentées comme des nœuds dans les documents RDF. Notre approche de l'anonymisation est capable de fournir une meilleure confidentialité, car elle prend en compte la condition de la diversité de l'environnement ainsi que les voisins (nœuds et arêtes) des entités d'intérêts. En outre, un processus d'anonymisation automatique est assuré par l'utilisation d'opérations d'anonymisations associées aux types de données. / With the advance of the Semantic Web and the Open Linked Data initiatives, a huge quantity of RDF data is available on Internet. The goal is to make this data readable for humans and machines, adopting special formats and connecting them by using International Resource Identifiers (IRIs), which are abstractions of real resources of the world. As more data is published and shared, sensitive information is also provided. In consequence, the privacy of entities of interest (e.g., people, companies) is a real challenge, requiring special techniques to ensure privacy and adequate security over data available in an environment in which every user has access to the information without any restriction (Web). Then, three main aspects are considered to ensure entity protection: (i) Preserve privacy, by identifying and treating the data that can compromise the privacy of the entities (e.g., identifiers, quasi-identifiers); (ii) Identify utility of the public data for diverse applications (e.g., statistics, testing, research); and (iii) Model background knowledge that can be used for adversaries (e.g., number of relationships, a specific relationship, information of a node). Anonymization is one technique for privacy protection that has been successfully applied in practice for databases and graph structures. However, studies about anonymization in the context of RDF data, are really limited. These studies are initial works for protecting individuals on RDF data, since they show a practical anonymization approach for simple scenarios as the use of generalization and suppression operations based on hierarchies. However, for complex scenarios, where a diversity of data is presented, the existing anonymization approaches does not ensure an enough privacy. Thus, in this context, we propose an anonymization framework, which analyzes the neighbors according to the background knowledge, focused on the privacy of entities represented as nodes in the RDF data. Our anonymization approach is able to provide better privacy, since it takes into account the l-diversity condition as well as the neighbors (nodes and edges) of entities of interest. Also, an automatic anonymization process is provided by the use of anonymization operations associated to the datatypes. RDF Web Sémantique Anonymisation Datatypes RDF Semantic Web Anonymization Datatypes
3	Conception de mécanismes d'accréditations anonymes et d'anonymisation de données / Design of anonymous credentials systems and data anonymization techniques Brunet, Solenn 27 November 2017 (has links) L'émergence de terminaux mobiles personnels, capables à la fois de communiquer et de se positionner, entraîne de nouveaux usages et services personnalisés. Néanmoins, ils impliquent une collecte importante de données à caractère personnel et nécessitent des solutions adaptées en termes de sécurité. Les utilisateurs n'ont pas toujours conscience des informations personnelles et sensibles qui peuvent être déduites de leurs utilisations. L'objectif principal de cette thèse est de montrer comment des mécanismes cryptographiques et des techniques d'anonymisation de données peuvent permettre de concilier à la fois le respect de la vie privée, les exigences de sécurité et l'utilité du service fourni. Dans une première partie, nous étudions les accréditations anonymes avec vérification par clé. Elles permettent de garantir l'anonymat des utilisateurs vis-à-vis du fournisseur de service : un utilisateur prouve son droit d'accès, sans révéler d'information superflue. Nous introduisons des nouvelles primitives qui offrent des propriétés distinctes et ont un intérêt à elles-seules. Nous utilisons ces constructions pour concevoir trois systèmes respectueux de la vie privée : un premier système d'accréditations anonymes avec vérification par clé, un deuxième appliqué au vote électronique et un dernier pour le paiement électronique. Chaque solution est validée par des preuves de sécurité et offre une efficacité adaptée aux utilisations pratiques. En particulier, pour deux de ces contributions, des implémentations sur carte SIM ont été réalisées. Néanmoins, certains types de services nécessitent tout de même l'utilisation ou le stockage de données à caractère personnel, par nécessité de service ou encore par obligation légale. Dans une seconde partie, nous étudions comment rendre respectueuses de la vie privée les données liées à l'usage de ces services. Nous proposons un procédé d'anonymisation pour des données de mobilité stockées, basé sur la confidentialité différentielle. Il permet de fournir des bases de données anonymes, en limitant le bruit ajouté. De telles bases de données peuvent alors être exploitées à des fins d'études scientifiques, économiques ou sociétales, par exemple. / The emergence of personal mobile devices, with communication and positioning features, is leading to new use cases and personalized services. However, they imply a significant collection of personal data and therefore require appropriate security solutions. Indeed, users are not always aware of the personal and sensitive information that can be inferred from their use. The main objective of this thesis is to show how cryptographic mechanisms and data anonymization techniques can reconcile privacy, security requirements and utility of the service provided. In the first part, we study keyed-verification anonymous credentials which guarantee the anonymity of users with respect to a given service provider: a user proves that she is granted access to its services without revealing any additional information. We introduce new such primitives that offer different properties and are of independent interest. We use these constructions to design three privacy-preserving systems: a keyed-verification anonymous credentials system, a coercion-resistant electronic voting scheme and an electronic payment system. Each of these solutions is practical and proven secure. Indeed, for two of these contributions, implementations on SIM cards have been carried out. Nevertheless, some kinds of services still require using or storing personal data for compliance with a legal obligation or for the provision of the service. In the second part, we study how to preserve users' privacy in such services. To this end, we propose an anonymization process for mobility traces based on differential privacy. It allows us to provide anonymous databases by limiting the added noise. Such databases can then be exploited for scientific, economic or societal purposes, for instance. Cryptographie Vie privée Sécurité Anonymisation de données Cryptography Privacy Security Data anonymization
4	Apprentissage automatique de fonctions d'anonymisation pour les graphes et les graphes dynamiques / Automatic Learning of Anonymization for Graphs and Dynamic Graphs Maag, Maria Coralia Laura 08 April 2015 (has links) La confidentialité des données est un problème majeur qui doit être considéré avant de rendre publiques les données ou avant de les transmettre à des partenaires tiers avec comme but d'analyser ou de calculer des statistiques sur ces données. Leur confidentialité est principalement préservée en utilisant des techniques d'anonymisation. Dans ce contexte, un nombre important de techniques d'anonymisation a été proposé dans la littérature. Cependant, des méthodes génériques capables de s'adapter à des situations variées sont souhaitables. Nous adressons le problème de la confidentialité des données représentées sous forme de graphe, données qui nécessitent, pour différentes raisons, d'être rendues publiques. Nous considérons que l'anonymiseur n'a pas accès aux méthodes utilisées pour analyser les données. Une méthodologie générique est proposée basée sur des techniques d'apprentissage artificiel afin d'obtenir directement une fonction d'anonymisation et d'optimiser la balance entre le risque pour la confidentialité et la perte dans l'utilité des données. La méthodologie permet d'obtenir une bonne procédure d'anonymisation pour une large catégorie d'attaques et des caractéristiques à préserver dans un ensemble de données. La méthodologie est instanciée pour des graphes simples et des graphes dynamiques avec une composante temporelle. La méthodologie a été expérimentée avec succès sur des ensembles de données provenant de Twitter, Enron ou Amazon. Les résultats sont comparés avec des méthodes de référence et il est montré que la méthodologie proposée est générique et peut s'adapter automatiquement à différents contextes d'anonymisation. / Data privacy is a major problem that has to be considered before releasing datasets to the public or even to a partner company that would compute statistics or make a deep analysis of these data. Privacy is insured by performing data anonymization as required by legislation. In this context, many different anonymization techniques have been proposed in the literature. These techniques are difficult to use in a general context where attacks can be of different types, and where measures are not known to the anonymizer. Generic methods able to adapt to different situations become desirable. We are addressing the problem of privacy related to graph data which needs, for different reasons, to be publicly made available. This corresponds to the anonymized graph data publishing problem. We are placing from the perspective of an anonymizer not having access to the methods used to analyze the data. A generic methodology is proposed based on machine learning to obtain directly an anonymization function from a set of training data so as to optimize a tradeoff between privacy risk and utility loss. The method thus allows one to get a good anonymization procedure for any kind of attacks, and any characteristic in a given set. The methodology is instantiated for simple graphs and complex timestamped graphs. A tool has been developed implementing the method and has been experimented with success on real anonymized datasets coming from Twitter, Enron or Amazon. Results are compared with baseline and it is showed that the proposed method is generic and can automatically adapt itself to different anonymization contexts. Anonymisation Apprentissage artificiel Confidentialité Graphes dynamiques Utilité des données Graphes temporels Anonymization Privacy 004
5	An anonymizable entity finder in judicial decisions Kazemi, Farzaneh January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Anonymisation Reconnaissance des entités nommées Désidentification Décisions de justice Entropie maximum Anonymization Named entity recognition De-identification Judicial decisions Maximum entropy
6	Applying Simulation to the Problem of Detecting Financial Fraud Lopez-Rojas, Edgar Alonso January 2016 (has links) This thesis introduces a financial simulation model covering two related financial domains: Mobile Payments and Retail Stores systems. The problem we address in these domains is different types of fraud. We limit ourselves to isolated cases of relatively straightforward fraud. However, in this thesis the ultimate aim is to introduce our approach towards the use of computer simulation for fraud detection and its applications in financial domains. Fraud is an important problem that impact the whole economy. Currently, there is a lack of public research into the detection of fraud. One important reason is the lack of transaction data which is often sensitive. To address this problem we present a mobile money Payment Simulator (PaySim) and Retail Store Simulator (RetSim), which allow us to generate synthetic transactional data that contains both: normal customer behaviour and fraudulent behaviour. These simulations are Multi Agent-Based Simulations (MABS) and were calibrated using real data from financial transactions. We developed agents that represent the clients and merchants in PaySim and customers and salesmen in RetSim. The normal behaviour was based on behaviour observed in data from the field, and is codified in the agents as rules of transactions and interaction between clients and merchants, or customers and salesmen. Some of these agents were intentionally designed to act fraudulently, based on observed patterns of real fraud. We introduced known signatures of fraud in our model and simulations to test and evaluate our fraud detection methods. The resulting behaviour of the agents generate a synthetic log of all transactions as a result of the simulation. This synthetic data can be used to further advance fraud detection research, without leaking sensitive information about the underlying data or breaking any non-disclose agreements. Using statistics and social network analysis (SNA) on real data we calibrated the relations between our agents and generate realistic synthetic data sets that were verified against the domain and validated statistically against the original source. We then used the simulation tools to model common fraud scenarios to ascertain exactly how effective are fraud techniques such as the simplest form of statistical threshold detection, which is perhaps the most common in use. The preliminary results show that threshold detection is effective enough at keeping fraud losses at a set level. This means that there seems to be little economic room for improved fraud detection techniques. We also implemented other applications for the simulator tools such as the set up of a triage model and the measure of cost of fraud. This showed to be an important help for managers that aim to prioritise the fraud detection and want to know how much they should invest in fraud to keep the loses below a desired limit according to different experimented and expected scenarios of fraud. security privacy anonymisation multi-agent-based simulation MABS ABS retail store fraud detection synthetic data mobile money Computer Systems Datorsystem
7	An anonymizable entity finder in judicial decisions Kazemi, Farzaneh January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Anonymisation Reconnaissance des entités nommées Désidentification Décisions de justice Entropie maximum Anonymization Named entity recognition De-identification Judicial decisions Maximum entropy
8	Méthode et outil d’anonymisation des données sensibles / Method and tool for anonymization sensitive data Ben Fredj, Feten 03 July 2017 (has links) L’anonymisation des données personnelles requiert l’utilisation d’algorithmes complexes permettant de minimiser le risque de ré-identification tout en préservant l’utilité des données. Dans cette thèse, nous décrivons une approche fondée sur les modèles qui guide le propriétaire des données dans son processus d’anonymisation. Le guidage peut être informatif ou suggestif. Il permet de choisir l’algorithme le plus pertinent en fonction des caractéristiques des données mais aussi de l’usage ultérieur des données anonymisées. Le guidage a aussi pour but de définir les bons paramètres à appliquer à l’algorithme retenu. Dans cette thèse, nous nous focalisons sur les algorithmes de généralisation de micro-données. Les connaissances liées à l’anonymisation tant théoriques qu’expérimentales sont stockées dans une ontologie. / Personal data anonymization requires complex algorithms aiming at avoiding disclosure risk without losing data utility. In this thesis, we describe a model-driven approach guiding the data owner during the anonymization process. The guidance may be informative or suggestive. It helps the data owner in choosing the most relevant algorithm given the data characteristics and the future usage of anonymized data. The guidance process also helps in defining the best input values for the algorithms. In this thesis, we focus on generalization algorithms for micro-data. The knowledge about anonymization is composed of both theoretical aspects and experimental results. It is managed thanks to an ontology. Protection de la vie privée Anonymisation Guidage Approche guidée par les modèles Ontologie Privacy Anonymization Guidelines Model-driven approach Ontology 005.8 302.231
9	La protection des données personnelles contenues dans les documents publics accessibles sur Internet : le cas des données judiciaires Duaso Calés, Rosario 12 1900 (has links) "Mémoire présenté à la faculté des études supérieures en vue de l'obtention du grade de maître en droit (LL.M.)" / Les bouleversements engendrés par les nouveaux moyens de communication des données publiques de même que les multiples possibilités offertes par le réseau Internet, telles que le stockage des informations, la mémoire sans faille et l'utilisation des moteurs de recherche, présentent des enjeux majeurs liés à la protection de la vie privée. La diffusion des données publiques en support numérique suscite un changement d'échelle dans le temps et dans l'espace et elle modifie le concept classique de publicité qui existait dans l'univers papier. Nous étudierons les moyens de respecter le droit à la vie privée et les conditions d'accès et d'utilisation des données personnelles, parfois à caractère sensible, contenues dans les documents publics diffusés sur Internet. Le cas particulier des données accessibles dans les banques de données judiciaires exige des solutions particulières : il s'agit de trouver l'équilibre nécessaire entre le principe de transparence judiciaire et le droit à la vie privée. / The upheavals generated by the new means of disseminating public data, together with the multiple possibilities offered by the Internet, such as information storage, comprehensive memory tools and the use of search engines, give rise to major issues related to privacy protection. The dissemination of public data in digital format causes a shift in our scales of time and space, and changes the traditional concept ofpublic nature previously associated with the "paper" universe. We will study the means of protecting privacy, and the conditions for accessing and using the personal information, sometimes of a "sensitive" nature, which is contained in the public documents posted on the Internet. The characteristics of the information available through judicial data banks require special protection solutions, so that the necessary balance can be found between the principle of judicial transparency and the right to privacy. Diffusion Banques de données jurisprudentielles Droit à l'oubli Anonymisation Vie privée Publicité de la justice Dissemination Personal information Internet Public documents Jurisprudential data banks Social forget fullness Anonymization Privacy Public nature of justice
10	Anonymisation de documents cliniques : performances et limites des méthodes symboliques et par apprentissage statistique Grouin, Cyril 26 June 2013 (has links) (PDF) Ce travail porte sur l'anonymisation automatique de comptes rendus cliniques. L'anonymisation consiste à masquer les informations personnelles présentes dans les documents tout en préservant les informations cliniques. Cette étape est obligatoire pour utiliser des documents cliniques en dehors du parcours de soins, qu'il s'agisse de publication de cas d'étude ou en recherche scientifique (mise au point d'outils informatiques de traitement du contenu des dossiers, recherche de cas similaire, etc.). Nous avons défini douze catégories d'informations à traiter : nominatives (noms, prénoms, etc.) et numériques (âges, dates, codes postaux, etc.). Deux approches ont été utilisées pour anonymiser les documents, l'une dite " symbolique ", à base de connaissances d'expert formalisées par des expressions régulières et la projection de lexiques, l'autre par apprentissage statistique au moyen de CRF de chaîne linéaire. Plusieurs expériences ont été menées parmi lesquelles l'utilisation simple ou enchaînée de chacune des deux approches. Nous obtenons nos meilleurs résultats (F-mesure globale=0,922) en enchaînant les deux méthodes avec rassemblement des noms et prénoms en une seule catégorie (pour cette catégorie : rappel=0,953 et F-mesure=0,931). Ce travail de thèse s'accompagne de la production de plusieurs ressources : un guide d'annotation, un corpus de référence de 562 documents dont 100 annotés en double avec adjudication et calculs de taux d'accord inter-annotateurs (K=0,807 avant fusion) et un corpus anonymisé de 17000 comptes rendus cliniques. Anonymisation comptes rendus médicaux guide d'annotation méthodes symboliques apprentissage statistique traitement automatique des langues

Search results