Global ETD Search

1	Anonymisation de documents RDF / Towards RDF Anonymization Dongo Escalante, Irvin Franco Benito 20 December 2017 (has links) Avec l'avancée du Web Sémantique et des initiatives Open Linked Data, une grande quantité de documents RDF sont disponibles sur Internet. L'objectif est de rendre ces données lisibles pour les humains et les machines, en adoptant des formats spéciaux et en les connectant à l'aide des IRIs (International Resource Identifier), qui sont des abstractions de ressources réelles du monde. L’augmentation du nombre de données publiées et partagées augmente également le nombre d’informations sensibles diffusées. En conséquence, la confidentialité des entités d'intérêts (personnes, entreprises, etc.) est un véritable défi, nécessitant des techniques spéciales pour assurer la confidentialité et la sécurité adéquate des données disponibles dans un environnement où chaque utilisateur a accès à l'information sans aucune restriction (Web).Ensuite, trois aspects principaux sont considérés pour assurer la protection de l'entité: (i) Préserver la confidentialité, en identifiant les données qui peuvent compromettre la confidentialité des entités (par exemple, les identifiants, les quasi-identifiants); (ii) Identifier l'utilité des données publiques pour diverses applications (par exemple, statistiques, tests, recherche); et (iii) Les connaissances antérieures du modèle qui peuvent être utilisées par les pirates informatiques (par exemple, le nombre de relations, une relation spécifique, l'information d'un nœud).L'anonymisation est une technique de protection de la confidentialité qui a été appliquée avec succès dans les bases de données et les graphes. Cependant, les études sur l'anonymisation dans le contexte des documents RDF sont très limitées. Ces études sont les travaux initiaux de protection des individus sur des documents RDF, puisqu'ils montrent les approches pratiques d'anonymisation pour des scénarios simples comme l'utilisation d'opérations de généralisation et d'opérations de suppression basées sur des hiérarchies. Cependant, pour des scénarios complexes, où une diversité de données est présentée, les approches d'anonymisations existantes n'assurent pas une confidentialité suffisante.Ainsi, dans ce contexte, nous proposons une approche d'anonymisation, qui analyse les voisins en fonction des connaissances antérieures, centrée sur la confidentialité des entités représentées comme des nœuds dans les documents RDF. Notre approche de l'anonymisation est capable de fournir une meilleure confidentialité, car elle prend en compte la condition de la diversité de l'environnement ainsi que les voisins (nœuds et arêtes) des entités d'intérêts. En outre, un processus d'anonymisation automatique est assuré par l'utilisation d'opérations d'anonymisations associées aux types de données. / With the advance of the Semantic Web and the Open Linked Data initiatives, a huge quantity of RDF data is available on Internet. The goal is to make this data readable for humans and machines, adopting special formats and connecting them by using International Resource Identifiers (IRIs), which are abstractions of real resources of the world. As more data is published and shared, sensitive information is also provided. In consequence, the privacy of entities of interest (e.g., people, companies) is a real challenge, requiring special techniques to ensure privacy and adequate security over data available in an environment in which every user has access to the information without any restriction (Web). Then, three main aspects are considered to ensure entity protection: (i) Preserve privacy, by identifying and treating the data that can compromise the privacy of the entities (e.g., identifiers, quasi-identifiers); (ii) Identify utility of the public data for diverse applications (e.g., statistics, testing, research); and (iii) Model background knowledge that can be used for adversaries (e.g., number of relationships, a specific relationship, information of a node). Anonymization is one technique for privacy protection that has been successfully applied in practice for databases and graph structures. However, studies about anonymization in the context of RDF data, are really limited. These studies are initial works for protecting individuals on RDF data, since they show a practical anonymization approach for simple scenarios as the use of generalization and suppression operations based on hierarchies. However, for complex scenarios, where a diversity of data is presented, the existing anonymization approaches does not ensure an enough privacy. Thus, in this context, we propose an anonymization framework, which analyzes the neighbors according to the background knowledge, focused on the privacy of entities represented as nodes in the RDF data. Our anonymization approach is able to provide better privacy, since it takes into account the l-diversity condition as well as the neighbors (nodes and edges) of entities of interest. Also, an automatic anonymization process is provided by the use of anonymization operations associated to the datatypes. RDF Web Sémantique Anonymisation Datatypes RDF Semantic Web Anonymization Datatypes
2	Efficient Methods for Arbitrary Data Redistribution Bai, Sheng-Wen 21 July 2005 (has links) In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. For the heterogeneous computation environment, irregular data redistributions can be used to adjust data assignment. Since data redistribution is performed at run-time, there is a performance trade-off between the efficiency of the new data distribution for a subsequent phase of an algorithm and the cost of redistributing array among processors. Thus, efficient methods for performing data redistribution are of great importance for the development of distributed memory compilers for data-parallel programming languages. For the regular data redistribution, two approaches are presented in this dissertation, indexing approach and packing/unpacking approach. In the indexing approach, we propose a generalized basic-cycle calculation (GBCC) technique to efficiently generate the communication sets for a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors data redistribution. In the packing/unpacking approach, we present a User-Defined Types (UDT) method to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution, using MPI user-defined datatypes. This method reduces the required memory buffers and avoids unnecessary movement of data. For the irregular data redistribution, in this dissertation, an Essential Cycle Calculation (ECC) method will be presented. The above methods are originally developed for one dimension array. However, the multi-dimension array can also be performed by simply applying these methods dimension by dimension starting from the first (last) dimension if array is in column-major (row-major). GBCC ECC Data Redistribution MPI User-Defined Datatypes Data Distribution

Search results

Anonymisation de documents RDF / Towards RDF Anonymization

Efficient Methods for Arbitrary Data Redistribution