Global ETD Search

11	Graph anonymization through edge and vertex addition Srivastava, Gautam 20 December 2011 (has links) With an abundance of social network data being released, the need to protect sensitive information within these networks has become an important concern of data publishers. In this thesis we focus on the popular notion of k-anonymization as applied to social network graphs. Given such a network N, the problem we study is to transform N to N', such that some property P of each node in N' is attained by at least k-1 other nodes in N'. We study edge-labeled, vertex-labeled and unlabeled graphs, since instances of each occur in real-world social networks. Our main contributions are as follows 1. When looking at edge additions, we show that k-label sequence anonymity of arbitrary edge-labeled graphs is NP-complete, and use this fact to prove hardness results for many other recently introduced notions of anonymity. We also present interesting hardness results and algorithms for labeled and unlabeled bipartite graphs. 2. When looking at node additions, we show that on vertex-labeled graphs, the problem is NP-complete. For unlabeled graphs, we give an efficient (near-linear) algorithm and show that it gives solutions that are optimal modulo k, a guarantee that is novel in the literature. We examine anonymization both from its theoretical foundations and empirically, showing that our proposed algorithms for anonymization maintain structural properties shown to be necessary for graph analysis. / Graduate anonymization complexity social networks algorithms alpha-proximity security privacy graphs
12	Lost In The Crowd: Are Large Social Graphs Inherently Indistinguishable? Vadamalai, Subramanian Viswanathan 19 June 2017 (has links) Real social graphs datasets are fundamental to understanding a variety of phenomena, such as epidemics, crowd management and political uprisings, yet releasing digital recordings of such datasets exposes the participants to privacy violations. A safer approach to making real social network topologies available is to anonymize them by modifying the graph structure enough as to decouple the node identity from its social ties, yet preserving the graph characteristics in aggregate. At scale, this approach comes with a significant challenge in computational complexity. This thesis questions the need to structurally anonymize very large graphs. Intuitively, the larger the graph, the easier for an individual to be “lost in the crowd”. On the other hand, at scale new topological structures may emerge, and those can expose individual nodes in ways that smaller structures do not. To answer this problem, this work introduces a set of metrics for measuring the indistinguishability of nodes in large-scale social networks independent of attack models and shows how different graphs have different levels of inherent indistinguishability of nodes. Moreover, we show that when varying the size of a graph, the inherent node indistinguishability decreases with the size of the graph. In other words, the larger a graph of a graph structure, the higher the indistinguishability of its nodes. Graph Anonymization Privacy Metric Linkage Covariance Computer Sciences
13	The Effect of 5-anonymity on a classifier based on neural network that is applied to the adult dataset Paulson, Jörgen January 2019 (has links) Privacy issues relating to having data made public is relevant with the introduction of the GDPR. To limit problems related to data becoming public, intentionally or via an event such as a security breach, anonymization of datasets can be employed. In this report, the impact of the application of 5-anonymity to the adult dataset on a classifier based on a neural network predicting whether people had an income exceeding $50,000 was investigated using precision, recall and accuracy. The classifier was trained using the non-anonymized data, the anonymized data, and the non-anonymized data with those attributes which were suppressed in the anonymized data removed. The result was that average accuracy dropped from 0.82 to 0.76, precision from 0.58 to 0.50, and recall increased from 0.82 to 0.87. The average values and distributions seem to support the estimation that the majority of the performance impact of anonymization in this case comes from the suppression of attributes. Neural Network k-anonymity Anonymization Computer and Information Sciences Data- och informationsvetenskap
14	Kodanonymisering vid integration med ChatGPT : Säkrare ChatGPT-användning med en kodanonymiseringsapplikation / Code anonymization when integrating with ChatGPT : Safer ChatGPT usage with a code anonymization application Azizi, Faruk January 2023 (has links) Denna avhandling studerar området av kodanonymisering inom programvaruutveckling, med fokus på att skydda känslig källkod i en alltmer digitaliserad och AI-integrerad värld. Huvudproblemen som avhandlingen adresserar är de tekniska och säkerhetsmässiga utmaningarna som uppstår när källkod behöver skyddas, samtidigt som den ska vara tillgänglig för AI-baserade analysverktyg som ChatGPT. I denna avhandling presenteras utvecklingen av en applikation vars mål är att anonymisera källkod, för att skydda känslig information samtidigt som den möjliggör säker interaktion med AI. För att lösa dessa utmaningar har Roslyn API använts i kombination med anpassade identifieringsalgoritmer för att analysera och bearbeta C#-källkod, vilket säkerställer en balans mellan anonymisering och bevarande av kodens funktionalitet. Roslyn API är en del av Microsofts .NET-kompilatorplattform som tillhandahåller rika funktioner för kodanalys och transformation, vilket möjliggör omvandling av C#-källkod till ett detaljerat syntaxträd för inspektion och manipulering av kodstrukturer. Resultaten av projektet visar att den utvecklade applikationen framgångsrikt anonymiserar variabel-, klass- och metodnamn, samtidigt som den bibehåller källkodens logiska struktur. Dess integration med ChatGPT förbättrar användarupplevelsen genom att erbjuda interaktiva dialoger för analys och assistans, vilket gör den till en värdefull resurs för utvecklare. Framtida arbete inkluderar utvidgning av applikationen för att stödja fler programmeringsspråk och utveckling av användaranpassade konfigurationer för att ytterligare förbättra användarvänligheten och effektiviteten. / This thesis addresses the area of code anonymization in software development, with a focus on protecting sensitive source code in an increasingly digitized and AI-integrated world. The main problems that the thesis addresses are the technical and security challenges that arise when source code needs to be protected, while being accessible to AI-based analysis tools such as ChatGPT. This thesis presents the development of an application whose goal is to anonymize source code, in order to protect sensitive information while enabling safe interaction with AI. To solve these challenges, the Roslyn API has been used in combination with customized identification algorithms to analyze and process C# source code, ensuring a balance between anonymization and preservation of the code's functionality. The Roslyn API is part of Microsoft's .NET compiler platform that provides rich code analysis and transformation capabilities, enabling the transformation of C# source code into a detailed syntax tree for code structure inspection and manipulation.The results of the project show that the developed application successfully anonymizes variable, class, and method names, while maintaining the logical structure of the source code. Its integration with ChatGPT enhances the user experience by providing interactive dialogues for analysis and assistance, making it a valuable resource for developers. Future work includes extending the application to support more programming languages and developing customized configurations to further improve ease of use and efficiency. Code anonymization Data anonymization ChatGPT AI Anonymization information sanitization privacy protection Kodanonymisering Dataanonymisering ChatGPT AI Anonymisering informationssanering integritetsskydd Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Software Engineering Programvaruteknik Computer Engineering Datorteknik
15	A Cloud-based Surveillance and Performance Management Architecture for Community Healthcare Eze, Benjamin 03 June 2019 (has links) Governments and healthcare providers are under increasing pressure to streamline their processes to reduce operational costs while improving service delivery and quality of care. Systematic performance management of healthcare processes is important to ensure that quality of care goals are being met at all levels of the healthcare ecosystem. The challenge is that measuring these goals requires the aggregation and analysis of large amounts of data from various stakeholders in the healthcare industry. With the lack of interoperability between stakeholders in current healthcare compute and storage infrastructure, as well as the volume of data involved, our ability to measure quality of care across the healthcare system is limited. Cloud computing is an emerging technology that can help provide the needed interoperability and management of large volumes of data across the entire healthcare system. Cloud computing could be leveraged to integrate heterogeneous healthcare data silos if a regional health authority provided data hosting with appropriate patient identity management and privacy compliance. This thesis proposes a cloud-based architecture for surveillance and performance management of community healthcare. Our contributions address five critical roadblocks to interoperability in a cloud computing context: infrastructure for surveillance and performance management services, a common data model, a patient identity matching service, an anonymization service, and a privacy compliance model. Our results are validated through a pilot project, and two experimental case studies done in collaboration with a regional health authority for community care. Cloud Computing Surveillance Performance Management Community Healthcare Privacy Compliance Identity Matching Anonymization
16	Enabling and supporting the debugging of software failures Clause, James Alexander 21 March 2011 (has links) This dissertation evaluates the following thesis statement: Program analysis techniques can enable and support the debugging of failures in widely-used applications by (1) capturing, replaying, and, as much as possible, anonymizing failing executions and (2) highlighting subsets of failure-inducing inputs that are likely to be helpful for debugging such failures. To investigate this thesis, I developed techniques for recording, minimizing, and replaying executions captured from users' machines, anonymizing execution recordings, and automatically identifying failure-relevant inputs. I then performed experiments to evaluate the techniques in realistic scenarios using real applications and real failures. The results of these experiments demonstrate that the techniques can reduce the cost and difficulty of debugging. Debugging Software engineering Minimization Anonymization Software maintenance Debugging in computer science Data editing Quality control
17	Conception de mécanismes d'accréditations anonymes et d'anonymisation de données / Design of anonymous credentials systems and data anonymization techniques Brunet, Solenn 27 November 2017 (has links) L'émergence de terminaux mobiles personnels, capables à la fois de communiquer et de se positionner, entraîne de nouveaux usages et services personnalisés. Néanmoins, ils impliquent une collecte importante de données à caractère personnel et nécessitent des solutions adaptées en termes de sécurité. Les utilisateurs n'ont pas toujours conscience des informations personnelles et sensibles qui peuvent être déduites de leurs utilisations. L'objectif principal de cette thèse est de montrer comment des mécanismes cryptographiques et des techniques d'anonymisation de données peuvent permettre de concilier à la fois le respect de la vie privée, les exigences de sécurité et l'utilité du service fourni. Dans une première partie, nous étudions les accréditations anonymes avec vérification par clé. Elles permettent de garantir l'anonymat des utilisateurs vis-à-vis du fournisseur de service : un utilisateur prouve son droit d'accès, sans révéler d'information superflue. Nous introduisons des nouvelles primitives qui offrent des propriétés distinctes et ont un intérêt à elles-seules. Nous utilisons ces constructions pour concevoir trois systèmes respectueux de la vie privée : un premier système d'accréditations anonymes avec vérification par clé, un deuxième appliqué au vote électronique et un dernier pour le paiement électronique. Chaque solution est validée par des preuves de sécurité et offre une efficacité adaptée aux utilisations pratiques. En particulier, pour deux de ces contributions, des implémentations sur carte SIM ont été réalisées. Néanmoins, certains types de services nécessitent tout de même l'utilisation ou le stockage de données à caractère personnel, par nécessité de service ou encore par obligation légale. Dans une seconde partie, nous étudions comment rendre respectueuses de la vie privée les données liées à l'usage de ces services. Nous proposons un procédé d'anonymisation pour des données de mobilité stockées, basé sur la confidentialité différentielle. Il permet de fournir des bases de données anonymes, en limitant le bruit ajouté. De telles bases de données peuvent alors être exploitées à des fins d'études scientifiques, économiques ou sociétales, par exemple. / The emergence of personal mobile devices, with communication and positioning features, is leading to new use cases and personalized services. However, they imply a significant collection of personal data and therefore require appropriate security solutions. Indeed, users are not always aware of the personal and sensitive information that can be inferred from their use. The main objective of this thesis is to show how cryptographic mechanisms and data anonymization techniques can reconcile privacy, security requirements and utility of the service provided. In the first part, we study keyed-verification anonymous credentials which guarantee the anonymity of users with respect to a given service provider: a user proves that she is granted access to its services without revealing any additional information. We introduce new such primitives that offer different properties and are of independent interest. We use these constructions to design three privacy-preserving systems: a keyed-verification anonymous credentials system, a coercion-resistant electronic voting scheme and an electronic payment system. Each of these solutions is practical and proven secure. Indeed, for two of these contributions, implementations on SIM cards have been carried out. Nevertheless, some kinds of services still require using or storing personal data for compliance with a legal obligation or for the provision of the service. In the second part, we study how to preserve users' privacy in such services. To this end, we propose an anonymization process for mobility traces based on differential privacy. It allows us to provide anonymous databases by limiting the added noise. Such databases can then be exploited for scientific, economic or societal purposes, for instance. Cryptographie Vie privée Sécurité Anonymisation de données Cryptography Privacy Security Data anonymization
18	A insciência do usuário na fase de coleta de dados: privacidade em foco / The lack of awareness of the user in the data collect phase: privacy in focus Affonso, Elaine Parra [UNESP] 05 July 2018 (has links) Submitted by Elaine Parra Affonso (elaine_affonso@yahoo.com.br) on 2018-07-27T19:40:14Z No. of bitstreams: 1 TESE_FINAL_27_07.pdf: 7225615 bytes, checksum: d2fc79d0116faacbbf780985663be725 (MD5) / Approved for entry into archive by Satie Tagara (satie@marilia.unesp.br) on 2018-07-27T20:15:34Z (GMT) No. of bitstreams: 1 affonso_ep_dr_mar.pdf: 7225615 bytes, checksum: d2fc79d0116faacbbf780985663be725 (MD5) / Made available in DSpace on 2018-07-27T20:15:34Z (GMT). No. of bitstreams: 1 affonso_ep_dr_mar.pdf: 7225615 bytes, checksum: d2fc79d0116faacbbf780985663be725 (MD5) Previous issue date: 2018-07-05 / Não recebi financiamento / A coleta de dados tem se tornado uma atividade predominante nos mais diversos meios digitais, em que as redes de computadores, principalmente a Internet, são essenciais para essa fase. A fim de minimizar a complexidade envolvida no uso de aplicações e de meios de comunicação, a relação usuário-tecnologia tem sido apoiada por interfaces cada vez mais amigáveis, o que contribui para que a coleta de dados, muitas vezes, ocorra de forma imperceptível ao usuário, tornando-o insciente sobre a coleta realizada pelos detentores de dados, situação que pode ferir o direito à privacidade de usuários e de referenciados. Para proporcionar consciência sobre a coleta de dados aos usuários, ambientes digitais disponibilizam políticas de privacidade com informações sobre essa fase, buscando conformidade às leis e aos regulamentos que amparam a proteção de dados pessoais, muito representada na literatura acadêmica por meio de modelos e técnicas para anonimização. A insciência sobre a coleta de dados pode estabelecer como o indivíduo se preocupa em relação às ameaças à sua privacidade e quais são as atitudes que ele deveria ter para ampliar a proteção de seus dados, que também pode ser estimulada pela carência de ações e de pesquisas de diversas áreas do conhecimento. Diante do exposto, o objetivo desta tese é caracterizar o contexto que favorece a insciência do usuário enquanto alvo de fases de coleta de dados em ambientes digitais, considerando implicações de privacidade. Para tanto, adotou-se a pesquisa exploratória-descritiva, com abordagem qualitativa. Utilizou-se a triangulação metodológica, a partir do referencial teórico que abarca a anonimização na fase de coleta de dados; legislações que amparam a proteção de dados pessoais e a coleta de dados realizada por tecnologias. Em relação às pesquisas no âmbito de proteção de dados por anonimização, observou-se que existe uma carência de trabalhos na fase de coleta de dados, uma vez que, muitas pesquisas têm concentrado esforços no contexto de medidas para compartilhar dados anonimizados, e quando a anonimização se efetua na coleta de dados, a ênfase tem sido em relação a dados de localização. Muitas vezes, as legislações ao abordarem elementos que estão envolvidos com a fase de coleta, apresentam esses conceitos de modo generalizado, principalmente em relação ao consentimento sobre a coleta, inclusive, a própria menção a atividade de coleta, emerge na maioria das leis por meio do termo tratamento. A maior parte das leis não possui um tópico específico para a coleta de dados, fator que pode fortalecer a insciência do usuário no que tange a coleta de seus dados. Os termos técnicos como anonimização, cookies e dados de tráfego são mencionados nas leis de modo esparso, e muitas vezes não estão vinculados especificamente a fase de coleta. Os dados semi-identificadores se sobressaem na coleta de dados pelos ambientes digitais, cenário que pode ampliar ainda mais as ameaças a privacidade devido à possibilidade de correlação desses dados, e com isso, a construção de perfis de indivíduos. A opacidade promovida pela abstração na coleta de dados pelos dispositivos tecnológicos vai além da insciência do usuário, ocasionando incalculáveis ameaças à privacidade e ampliando, indubitavelmente, a assimetria informacional entre detentores de dados e usuários. Conclui-se que a insciência do usuário sobre sua interação com os ambientes digitais pode diminuir a autonomia para controlar seus dados e acentuar quebras de privacidade. No entanto, a privacidade na coleta de dados é fortalecida no momento em que o usuário tem consciência sobre as ações vinculadas aos seus dados, que devem ser determinadas pelas políticas de privacidade, pelas leis e pelas pesquisas acadêmicas, três elementos evidenciados neste trabalho que se configuram como participativos no cenário que propicia a insciência do usuário. / Data collection has become a predominant activity in several digital media, in which computer networks, especially the internet, are essential for this phase. In order to minimize the complexity involved in the use of applications and media, the relationship between user and technology has been supported by ever more friendly interfaces, which oftentimes contributes to that data collection often occurs imperceptibly. This procedure leads the user to lack of awareness about the collection performed by the data holders, a situation that may harm the right to the privacy of this user and the referenced users. In order to provide awareness about the data collection to the user, digital environments provide privacy policies with information on this phase, seeking compliance with laws and regulations that protect personal data, widely represented in the academic literature through models and techniques to anonymization in the phase of data recovery. The lack of awareness on the data collection can establish how the individual is concerned about threats to its privacy and what actions it should take to extend the protection of its data, which can also be stimulated by the lack of action and researches in several areas of the knowledge. In view of the above, the objective of this thesis is to characterize the context that favors the lack of awareness of the user while the target of data collection phases in digital environments, considering privacy implications. For that, the exploratory research was adopted, with a qualitative approach. The methodological triangulation was used, from the theoretical referential that includes the anonymization in the phase of the data collection; the legislation that supports the protection of personal data and the data collection performed by technologies. The results show that, regarding researches on data protection by anonymization, it was observed that there is an absence of works about the data collection phase, since many researches have concentrated efforts in the context of measures to share anonymized data. When anonymization is done in data collection, the emphasis has been on location data. Often, legislation when addressing elements that are involved with the collection phase, present these concepts in a generalized way, mainly in relation to the consent on the collection, including the very mention of the collection activity, emerges in most laws through the term treatment. Most laws do not have a specific topic for data collection, a factor that can strengthen user insight regarding the collection of their data. Technical terms such as anonymization, cookies and traffic data emerge in the laws sparingly and are often not specifically linked to the collection phase. The quasi-identifiers data stands out in the data collected by the digital environments, a scenario that can further extend the threats to privacy due to the possibility of a correlation of this data, and with this, the construction of profiles of individuals. The opacity promoted by abstraction in data collection by computer networks goes beyonds the lack of awareness of the user, causing incalculable threats to its privacy and undoubtedly widening the informational asymmetry among data keepers and users. It is concluded that user insight about their interaction with digital environments can reduce the autonomy to control their data and accentuates privacy breaches. However, privacy in data collection is strengthened when the user is aware of the actions linked to its data, which should be determined by privacy policies, laws and academic research, i.e, three elements evidenced in this work that are constitute as participatory in the scenario that provides the lack of awareness of the user. Coleta de dados Insciência Anonimização Privacidade Proteção de dados Data collect Lack of awareness Anonymization Privacy Data protection
19	Ähnlichkeitsmessung von ausgewählten Datentypen in Datenbanksystemen zur Berechnung des Grades der Anonymisierung Heinrich, Jan-Philipp, Neise, Carsten, Müller, Andreas 21 February 2018 (has links) (PDF) Es soll ein mathematisches Modell zur Berechnung von Abweichungen verschiedener Datentypen auf relationalen Datenbanksystemen eingeführt und getestet werden. Basis dieses Modells sind Ähnlichkeitsmessungen für verschiedene Datentypen. Hierbei führen wir zunächst eine Betrachtung der relevanten Datentypen für die Arbeit durch. Danach definieren wir für die für diese Arbeit relevanten Datentypen eine Algebra, welche die Grundlage zur Berechnung des Anonymisierungsgrades θ ist. Das Modell soll zur Messung des Grades der Anonymisierung, vor allem personenbezogener Daten, zwischen Test- und Produktionsdaten angewendet werden. Diese Messung ist im Zuge der Einführung der EU-DSGVO im Mai 2018 sinnvoll, und soll helfen personenbezogene Daten mit einem hohen Ähnlichkeitsgrad zu identifizieren. Anonymisierung Ähnlichkeitsmessung Datentypen anonymization data types measurement of similarity ddc:000 Anonymisierung Datentyp Datenbanksystem
20	Towards a Privacy Preserving Framework for Publishing Longitudinal Data Sehatkar, Morvarid January 2014 (has links) Recent advances in information technology have enabled public organizations and corporations to collect and store huge amounts of individuals' data in data repositories. Such data are powerful sources of information about an individual's life such as interests, activities, and finances. Corporations can employ data mining and knowledge discovery techniques to extract useful knowledge and interesting patterns from large repositories of individuals' data. The extracted knowledge can be exploited to improve strategic decision making, enhance business performance, and improve services. However, person-specific data often contain sensitive information about individuals and publishing such data poses potential privacy risks. To deal with these privacy issues, data must be anonymized so that no sensitive information about individuals can be disclosed from published data while distortion is minimized to ensure usefulness of data in practice. In this thesis, we address privacy concerns in publishing longitudinal data. A data set is longitudinal if it contains information of the same observation or event about individuals collected at several points in time. For instance, the data set of multiple visits of patients of a hospital over a period of time is longitudinal. Due to temporal correlations among the events of each record, potential background knowledge of adversaries about an individual in the context of longitudinal data has specific characteristics. None of the previous anonymization techniques can effectively protect longitudinal data against an adversary with such knowledge. In this thesis we identify the potential privacy threats on longitudinal data and propose a novel framework of anonymization algorithms in a way that protects individuals' privacy against both identity disclosure and attribute disclosure, and preserves data utility. Particularly, we propose two privacy models: (K,C)^P -privacy and (K,C)-privacy, and for each of these models we propose efficient algorithms for anonymizing longitudinal data. An extensive experimental study demonstrates that our proposed framework can effectively and efficiently anonymize longitudinal data. Longitudinal data Anonymization Privacy preserving data publishing Data mining Sequence data

Search results