Global ETD Search

1	Privacy Preserving in Online Social Network Data Sharing and Publication Gao, Tianchong 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Following the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles. Privacy protection Online social networks Differential privacy Anonymization De-anonymization
2	Anonymizace PCAP souborů / Anonymization of PCAP Files Navrátil, Petr January 2020 (has links) This diploma thesis deals with the design and implementation of an application suitable for the anonymization of PCAP files. The thesis presents TCP/IP model and for each layer highlights attributes that can be used to identify real people or organizations. Some of the anonymization methods suitable to modify highlighted attributes and sensitive data are described. The implemented application uses TShark tool to parse byte data of PCAP format to JSON format that is used in the application. TShark supports lots of network protocols which allows the application to anonymize various attributes. Anonymization process is controlled by anonymization politics that can be customized by adding new attributes or anonymization methods.
3	Algorithms in Privacy & Security for Data Analytics and Machine Learning Liang, Yuting January 2020 (has links) Applications employing very large datasets are increasingly common in this age of Big Data. While these applications provide great benefits in various domains, their usage can be hampered by real-world privacy and security risks. In this work we propose algorithms which aim to provide privacy and security protection in different aspects of these applications. First, we address the problem of data privacy. When the datasets used contain personal information, they must be properly anonymized in order to protect the privacy of the subjects to which the records pertain. A popular privacy preservation technique is the k-anonymity model which guarantees that any record in the dataset must be indistinguishable from at least k-1 other records in terms of quasi-identifiers (i.e. the subset of attributes that can be used to deduce the identity of an individual). Achieving k-anonymity while considering the competing goal of data utility can be a challenge, especially for datasets containing large numbers of records. We formulate k-anonymization as an optimization problem with the objective to maximize data utility, and propose two practical algorithms for solving this problem. Secondly, we address the problem of application security; specifically, for predictive models using Deep Learning, where adversaries can use minimally perturbed inputs (a.k.a. adversarial examples) to cause a neural network to produce incorrect outputs. We propose an approach which protects against adversarial examples in image classification-type networks. The approach relies on two mechanisms: 1) a mechanism that increases robustness at the expense of accuracy; and, 2) a mechanism that improves accuracy. We show that an approach combining the two mechanisms can provide protection against adversarial examples while retaining accuracy. We provide experimental results to demonstrate the effectiveness of our algorithms for both problems. / Thesis / Master of Science (MSc) / Applications employing very large datasets are increasingly common in this age of Big Data. While these applications provide great benefits in various domains, their usage can be hampered by real-world privacy and security risks. In this work we propose algorithms which aim to provide privacy and security protection in different aspects of these applications. We address the problem of data privacy; when the datasets used contain personal information, they must be properly anonymized in order to protect the privacy of the subjects to which the records pertain. We propose two practical algorithms for anonymization which are also utility-centric. We address the problem of application security, specifically for Deep Learning applications where adversaries can use minimally perturbed inputs to cause a neural network to produce incorrect outputs. We propose an approach which protects against these attacks. We provide experimental results to demonstrate the effectiveness of our algorithms for both problems. Privacy Security Anonymization Machine Learning Algorithms
4	Evaluating the security of anonymized big graph/structural data Ji, Shouling 27 May 2016 (has links) We studied the security of anonymized big graph data. Our main contributions include: new De-Anonymization (DA) attacks, comprehensive anonymity, utility, and de-anonymizability quantifications, and a secure graph data publishing/sharing system SecGraph. New DA Attacks. We present two novel graph DA frameworks: cold start single-phase Optimization-based DA (ODA) and De-anonymizing Social-Attribute Graphs (De-SAG). Unlike existing seed-based DA attacks, ODA does not priori knowledge. In addition, ODA’s DA results can facilitate existing DA attacks by providing more seed information. De-SAG is the first attack that takes into account both graph structure and attribute information. Through extensive evaluations leveraging real world graph data, we validated the performance of both ODA and De-SAG. Graph Anonymity, Utility, and De-anonymizability Quantifications. We developed new techniques that enable comprehensive graph data anonymity, utility, and de-anonymizability evaluation. First, we proposed the first seed-free graph de-anonymizability quantification framework under a general data model which provides the theoretical foundation for seed-free SDA attacks. Second, we conducted the first seed-based quantification on the perfect and partial de-anonymizability of graph data. Our quantification closes the gap between seed-based DA practice and theory. Third, we conducted the first attribute-based anonymity analysis for Social-Attribute Graph (SAG) data. Our attribute-based anonymity analysis together with existing structure-based de-anonymizability quantifications provide data owners and researchers a more complete understanding of the privacy of graph data. Fourth, we conducted the first graph Anonymity-Utility-De-anonymity (AUD) correlation quantification and provided close-forms to explicitly demonstrate such correlation. Finally, based on our quantifications, we conducted large-scale evaluations leveraging 100+ real world graph datasets generated by various computer systems and services. Using the evaluations, we demonstrated the datasets’ anonymity, utility, and de-anonymizability, as well as the significance and validity of our quantifications. SecGraph. We designed, implemented, and evaluated the first uniform and open-source Secure Graph data publishing/sharing (SecGraph) system. SecGraph enables data owners and researchers to conduct accurate comparative studies of anonymization/DA techniques, and to comprehensively understand the resistance/vulnerability of existing or newly developed anonymization techniques, the effectiveness of existing or newly developed DA attacks, and graph and application utilities of anonymized data. Security and privacy Graph data Structural data De-anonymization Anonymization Quantification Evaluation Utility Anonymity
5	Privacy Preserving in Online Social Network Data Sharing and Publication Tianchong Gao (7428566) 17 October 2019 (has links) <p>Following the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. </p> <p><br></p> <p>To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. </p> <p><br></p> <p>To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles.</p> Computer Engineering privacy protection online social networks differential privacy anonymization de-anonymization
6	Anonymizing subsets of social networks Gaertner, Jared Glen 23 August 2012 (has links) In recent years, concerns of privacy have become more prominent for social networks. Anonymizing a graph meaningfully is a challenging problem, as the original graph properties must be preserved as well as possible. We introduce a generalization of the degree anonymization problem posed by Liu and Terzi. In this problem, our goal is to anonymize a given subset of vertices in a graph while adding the fewest possible number of edges. We examine different approaches to solving the problem, one of which finds a degree-constrained subgraph to determine which edges to add within the given subset and another that uses a greedy approach that is not optimal, but is more efficient in space and time. The main contribution of this thesis is an efficient algorithm for this problem by exploring its connection with the degree-constrained subgraph problem. Our experimental results show that our algorithms perform very well on many instances of social network data. / Graduate privacy social network degree-constrained subgraphs k-degree anonymization subset graph anonymization
7	Application of anonymization techniques (k-anonymity, generalization, and suppression) on an employee database: Use case – Swedish municipality Oyedele, Babatunde January 2023 (has links) This thesis explores data anonymization techniques within the context of a Swedish municipality with a focus on safeguarding data privacy, enhancement of decision-making, and assessing re-identification risks. The investigation, grounded in a literature review and an experimental study, employed the ARX anonymization tool on a sample municipality employee database. Three distinct human resource management (HRM) datasets, analogous to the employee database, were created and anonymized using the ARX tool to ascertain the efficacy and re-identification risks of the employed techniques. A key finding indicates an inverse relationship between dataset size and re-identification risk, enhancing data utility with larger datasets. This suggests that larger datasets are more conducive to anonymization, motivating organizations to engage in anonymization efforts for internal analytics and open data publishing. The study contributes to Information Security discourse, emphasizing the criticality of data anonymization in preserving privacy and ensuring data utility in the era of big data. The research faced constraints due to privacy considerations, necessitating the use of similar, rather than actual, datasets, potentially affecting the results and limiting full representation for future techniques. The thesis primarily addresses HRM applications, indicating the scope for future research into other municipal or organizational governance areas. In conclusion, it underscores the necessity of data anonymization in the face of tightening regulations and sophisticated privacy breaches. This positions the organization strategically for compliance, minimizes data breach risks and upholds anonymization as a fundamental principle of Information Security. anonymization anonymization techniques k-anonymity generalization suppression ARX tool Information Systems
8	Anonymisation de documents RDF / Towards RDF Anonymization Dongo Escalante, Irvin Franco Benito 20 December 2017 (has links) Avec l'avancée du Web Sémantique et des initiatives Open Linked Data, une grande quantité de documents RDF sont disponibles sur Internet. L'objectif est de rendre ces données lisibles pour les humains et les machines, en adoptant des formats spéciaux et en les connectant à l'aide des IRIs (International Resource Identifier), qui sont des abstractions de ressources réelles du monde. L’augmentation du nombre de données publiées et partagées augmente également le nombre d’informations sensibles diffusées. En conséquence, la confidentialité des entités d'intérêts (personnes, entreprises, etc.) est un véritable défi, nécessitant des techniques spéciales pour assurer la confidentialité et la sécurité adéquate des données disponibles dans un environnement où chaque utilisateur a accès à l'information sans aucune restriction (Web).Ensuite, trois aspects principaux sont considérés pour assurer la protection de l'entité: (i) Préserver la confidentialité, en identifiant les données qui peuvent compromettre la confidentialité des entités (par exemple, les identifiants, les quasi-identifiants); (ii) Identifier l'utilité des données publiques pour diverses applications (par exemple, statistiques, tests, recherche); et (iii) Les connaissances antérieures du modèle qui peuvent être utilisées par les pirates informatiques (par exemple, le nombre de relations, une relation spécifique, l'information d'un nœud).L'anonymisation est une technique de protection de la confidentialité qui a été appliquée avec succès dans les bases de données et les graphes. Cependant, les études sur l'anonymisation dans le contexte des documents RDF sont très limitées. Ces études sont les travaux initiaux de protection des individus sur des documents RDF, puisqu'ils montrent les approches pratiques d'anonymisation pour des scénarios simples comme l'utilisation d'opérations de généralisation et d'opérations de suppression basées sur des hiérarchies. Cependant, pour des scénarios complexes, où une diversité de données est présentée, les approches d'anonymisations existantes n'assurent pas une confidentialité suffisante.Ainsi, dans ce contexte, nous proposons une approche d'anonymisation, qui analyse les voisins en fonction des connaissances antérieures, centrée sur la confidentialité des entités représentées comme des nœuds dans les documents RDF. Notre approche de l'anonymisation est capable de fournir une meilleure confidentialité, car elle prend en compte la condition de la diversité de l'environnement ainsi que les voisins (nœuds et arêtes) des entités d'intérêts. En outre, un processus d'anonymisation automatique est assuré par l'utilisation d'opérations d'anonymisations associées aux types de données. / With the advance of the Semantic Web and the Open Linked Data initiatives, a huge quantity of RDF data is available on Internet. The goal is to make this data readable for humans and machines, adopting special formats and connecting them by using International Resource Identifiers (IRIs), which are abstractions of real resources of the world. As more data is published and shared, sensitive information is also provided. In consequence, the privacy of entities of interest (e.g., people, companies) is a real challenge, requiring special techniques to ensure privacy and adequate security over data available in an environment in which every user has access to the information without any restriction (Web). Then, three main aspects are considered to ensure entity protection: (i) Preserve privacy, by identifying and treating the data that can compromise the privacy of the entities (e.g., identifiers, quasi-identifiers); (ii) Identify utility of the public data for diverse applications (e.g., statistics, testing, research); and (iii) Model background knowledge that can be used for adversaries (e.g., number of relationships, a specific relationship, information of a node). Anonymization is one technique for privacy protection that has been successfully applied in practice for databases and graph structures. However, studies about anonymization in the context of RDF data, are really limited. These studies are initial works for protecting individuals on RDF data, since they show a practical anonymization approach for simple scenarios as the use of generalization and suppression operations based on hierarchies. However, for complex scenarios, where a diversity of data is presented, the existing anonymization approaches does not ensure an enough privacy. Thus, in this context, we propose an anonymization framework, which analyzes the neighbors according to the background knowledge, focused on the privacy of entities represented as nodes in the RDF data. Our anonymization approach is able to provide better privacy, since it takes into account the l-diversity condition as well as the neighbors (nodes and edges) of entities of interest. Also, an automatic anonymization process is provided by the use of anonymization operations associated to the datatypes. RDF Web Sémantique Anonymisation Datatypes RDF Semantic Web Anonymization Datatypes
9	A Hybrid Privacy-Preserving Mechanism for Participatory Sensing Systems Vergara, Idalides Jose 18 September 2014 (has links) Participatory Sensing (PS) is a new data collection paradigm in which people use their cellular phone resources to sense and transmit data of interest to address a collective problem that would have been very difficult to assess otherwise. Although many PS-based applications can be foreseen to solve interesting and useful problems, many of them have not been fully implemented due to privacy concerns. As a result, several privacy-preserving mechanisms have been proposed. This dissertation presents the state-of-the-art of privacy-preserving mechanisms for PS systems. It includes a new taxonomy and describes the most important issues in the design, implementation, and evaluation of privacy-preserving mechanisms. Then, the most important mechanisms available in the literature are described, classified and qualitatively evaluated based on design issues. The dissertation also presents a model to study the interactions between privacy-preserving, incentive and inference mechanisms and the effects that they may have on one another, and more importantly, on the quality of information that the system provides to the final user. Then, a new hybrid privacy-preserving mechanism is proposed. This algorithm dynamically divides the area of interest into cells of different sizes according to the variability of the variable of interest being measured and chooses between two privacy-preserving mechanisms depending on the size of the cell. In small cells, where participants can be identified easier, the algorithm uses a double-encryption technique to protect the privacy of the participants and increase the quality of the information. In bigger cells, where the variability of the variable of interest is low, data anonymization and obfuscation techniques are used to protect the actual location (privacy) of the participant. The proposed mechanism is evaluated along with other privacy-preserving mechanisms using a real PS system for air pollution monitoring. The systems are evaluated considering the quality of information provided to the final user, energy consumption, and the level of privacy protection. This last criterion is evaluated when the adversary is able to compromise one or several records in the system. The experiments show the superior performance of proposed mechanism and the existing trade-offs in terms of privacy, quality of information, and energy consumption. Anonymization Encryption Energy Consumption Obfuscation Quality of Information Computer Engineering
10	Spatio-Temporal Data Mining for Location-Based Services Gidofalvi, Gyözö January 2008 (has links) Largely driven by advances in communication and information technology, such as the increasing availability and accuracy of GPS technology and the miniaturization of wireless communication devices, Location–Based Services (LBS) are continuously gaining popularity. Innovative LBSes integrate knowledge about the users into the service. Such knowledge can be derived by analyzing the location data of users. Such data contain two unique dimensions, space and time, which need to be analyzed. The objectives of this thesis are three–fold. First, to extend popular data mining methods to the spatio–temporal domain. Second, to demonstrate the usefulness of the extended methods and the derived knowledge in two promising LBS examples. Finally, to eliminate privacy concerns in connection with spatio–temporal data mining by devising systems for privacy–preserving location data collection and mining. To this extent, Chapter 2 presents a general methodology, pivoting, to extend a popular data mining method, namely rule mining, to the spatio–temporal domain. By considering the characteristics of a number of real–world data sources, Chapter 2 also derives a taxonomy of spatio–temporal data, and demonstrates the usefulness of the rules that the extended spatio–temporal rule mining method can discover. In Chapter 4 the proposed spatio–temporal extension is applied to find long, sharable patterns in trajectories of moving objects. Empirical evaluations show that the extended method and its variants, using high–level SQL implementations, are effective tools for analyzing trajectories of moving objects. Real–world trajectory data about a large population of objects moving over extended periods within a limited geographical space is difficult to obtain. To aid the development in spatio–temporal data management and data mining, Chapter 3 develops a Spatio–Temporal ACTivity Simulator (ST–ACTS). ST–ACTS uses a number of real–world geo–statistical data sources and intuitive principles to effectively generate realistic spatio–temporal activities of mobile users. Chapter 5 proposes an LBS in the transportation domain, namely cab–sharing. To deliver an effective service, a unique spatio–temporal grouping algorithm is presented and implemented as a sequence of SQL statements. Chapter 6 identifies ascalability bottleneck in the grouping algorithm. To eliminate the bottleneck, the chapter expresses the grouping algorithm as a continuous stream query in a data stream management system, and then devises simple but effective spatio–temporal partitioning methods for streams to parallelize the computation. Experimental results show that parallelization through adaptive partitioning methods leads to speed–ups of orders of magnitude without significantly effecting the quality of the grouping. Spatio–temporal stream partitioning is expected to be an effective method to scale computation–intensive spatial queries and spatial analysis methods for streams. Location–Based Advertising (LBA), the delivery of relevant commercial information to mobile consumers, is considered to be one of the most promising business opportunities amongst LBSes. To this extent, Chapter 7 describes an LBA framework and an LBA database that can be used for the management of mobile ads. Using a simulated but realistic mobile consumer population and a set of mobile ads, the LBA database is used to estimate the capacity of the mobile advertising channel. The estimates show that the channel capacity is extremely large, which is evidence for a strong business case, but it also necessitates adequate user controls. When data about users is collected and analyzed, privacy naturally becomes a concern. To eliminate the concerns, Chapter 8 first presents a grid–based framework in which location data is anonymized through spatio–temporal generalization, and then proposes a system for collecting and mining anonymous location data. Experimental results show that the privacy–preserving data mining component discovers patterns that, while probabilistic, are accurate enough to be useful for many LBSes. To eliminate any uncertainty in the mining results, Chapter 9 proposes a system for collecting exact trajectories of moving objects in a privacy–preserving manner. In the proposed system there are no trusted components and anonymization is performed by the clients in a P2P network via data cloaking and data swapping. Realistic simulations show that under reasonable conditions and privacy/anonymity settings the proposed system is effective. / QC 20120215

Search results