Spelling suggestions: "subject:"privacy"" "subject:"eprivacy""
441 |
Varför beter jag mig såhär? : En studie om hur attityd och kunskap kring datainsamling kan påverka online-beteendeRezai, Farhad, Ånimmer, Pontus January 2019 (has links)
During the 21st century, the use of digital services has exploded. Companies are getting better at collecting and using data for their own gain. The increased collection of data has contributed to increased privacy concerns among many. This paper examines how a person’s attitude and knowledge about data collection can affect their online behavior. To examine this, a questionnaire survey was conducted. The study's results indicate that there is a paradoxical behavior among the respondents, since most people have a negative view of data collection, but do not take measures to reflect this attitude. Furthermore, the result suggests that knowledge about data collection acts as the primary motivator for taking measures to protect their data. Keywords: Data collection, online behaviour, privacy paradox, GDPR, attitude, knowledge, privacy concerns / Under 2000-talet har användningen av digitala tjänster exploderat. Företag blir bättre på att samla in och använda data för deras egen vinning. Den ökade datainsamlingen har bidragit till en ökad oro kring personlig data hos många. Denna uppsats undersöker hur internetanvändares attityd och kunskap kring datainsamling kan påverka deras online-beteende. För att undersöka detta utfördes en enkätundersökning. Studiens resultat indikerar att det finns ett paradoxalt beteende hos respondenterna då de flesta har en negativ syn till datainsamling men vidtar inte åtgärder för att spegla denna attityd. Vidare tyder resultatet på att kunskap kring datainsamling agerar som den primära motivatorn för att vidta åtgärder för att skydda deras data. Nyckelord: Datainsamling, online-beteende, Privacy Paradox, GDPR, attityd, kunskap. integritets-oro
|
442 |
Nymbler: Privacy-enhanced Protection from Abuses of AnonymityHenry, Ryan January 2010 (has links)
Anonymous communications networks help to solve the real and important problem of enabling users to communicate privately over the Internet. However, by doing so, they also introduce an entirely new problem: How can service providers on the Internet---such as websites, IRC networks and mail servers---allow anonymous access while protecting themselves against abuse by misbehaving anonymous users?
Recent research efforts have focused on using anonymous blacklisting systems (also known as anonymous revocation systems) to solve this problem. As opposed to revocable anonymity systems, which enable some trusted third party to deanonymize users, anonymous blacklisting systems provide a way for users to authenticate anonymously with a service provider, while enabling the service provider to revoke access from individual misbehaving anonymous users without revealing their identities. The literature contains several anonymous blacklisting systems, many of which are impractical for real-world deployment. In 2006, however, Tsang et al. proposed Nymble, which solves the anonymous blacklisting problem very efficiently using trusted third parties. Nymble has inspired a number of subsequent anonymous blacklisting systems. Some of these use fundamentally different approaches to accomplish what Nymble does without using third parties at all; so far, these proposals have all suffered from serious performance and scalability problems. Other systems build on the Nymble framework to reduce Nymble's trust assumptions while maintaining its highly efficient design.
The primary contribution of this thesis is a new anonymous blacklisting system built on the Nymble framework---a nimbler version of Nymble---called Nymbler. We propose several enhancements to the Nymble framework that facilitate the construction of a scheme that minimizes trust in third parties. We then propose a new set of security and privacy properties that anonymous blacklisting systems should possess to protect: 1) users' privacy against malicious service providers and third parties (including other malicious users), and 2) service providers against abuse by malicious users. We also propose a set of performance requirements that anonymous blacklisting systems should meet to maximize their potential for real-world adoption, and formally define some optional features in the anonymous blacklisting systems literature.
We then present Nymbler, which improves on existing Nymble-like systems by reducing the level of trust placed in third parties, while simultaneously providing stronger privacy guarantees and some new functionality. It avoids dependence on trusted hardware and unreasonable assumptions about non-collusion between trusted third parties. We have implemented all key components of Nymbler, and our measurements indicate that the system is highly practical. Our system solves several open problems in the anonymous blacklisting systems literature, and makes use of some new cryptographic constructions that are likely to be of independent theoretical interest.
|
443 |
Nymbler: Privacy-enhanced Protection from Abuses of AnonymityHenry, Ryan January 2010 (has links)
Anonymous communications networks help to solve the real and important problem of enabling users to communicate privately over the Internet. However, by doing so, they also introduce an entirely new problem: How can service providers on the Internet---such as websites, IRC networks and mail servers---allow anonymous access while protecting themselves against abuse by misbehaving anonymous users?
Recent research efforts have focused on using anonymous blacklisting systems (also known as anonymous revocation systems) to solve this problem. As opposed to revocable anonymity systems, which enable some trusted third party to deanonymize users, anonymous blacklisting systems provide a way for users to authenticate anonymously with a service provider, while enabling the service provider to revoke access from individual misbehaving anonymous users without revealing their identities. The literature contains several anonymous blacklisting systems, many of which are impractical for real-world deployment. In 2006, however, Tsang et al. proposed Nymble, which solves the anonymous blacklisting problem very efficiently using trusted third parties. Nymble has inspired a number of subsequent anonymous blacklisting systems. Some of these use fundamentally different approaches to accomplish what Nymble does without using third parties at all; so far, these proposals have all suffered from serious performance and scalability problems. Other systems build on the Nymble framework to reduce Nymble's trust assumptions while maintaining its highly efficient design.
The primary contribution of this thesis is a new anonymous blacklisting system built on the Nymble framework---a nimbler version of Nymble---called Nymbler. We propose several enhancements to the Nymble framework that facilitate the construction of a scheme that minimizes trust in third parties. We then propose a new set of security and privacy properties that anonymous blacklisting systems should possess to protect: 1) users' privacy against malicious service providers and third parties (including other malicious users), and 2) service providers against abuse by malicious users. We also propose a set of performance requirements that anonymous blacklisting systems should meet to maximize their potential for real-world adoption, and formally define some optional features in the anonymous blacklisting systems literature.
We then present Nymbler, which improves on existing Nymble-like systems by reducing the level of trust placed in third parties, while simultaneously providing stronger privacy guarantees and some new functionality. It avoids dependence on trusted hardware and unreasonable assumptions about non-collusion between trusted third parties. We have implemented all key components of Nymbler, and our measurements indicate that the system is highly practical. Our system solves several open problems in the anonymous blacklisting systems literature, and makes use of some new cryptographic constructions that are likely to be of independent theoretical interest.
|
444 |
Compliance issues within Europe's General Data Protection Regulation in the context of information security and privacy governance in Swedish corporations : A mixed methods study of compliance practices towards GDPR readinessStauber, Sebastian January 2018 (has links)
The European Union has introduced a new General Data Protection Regulation that regulates all aspects of privacy and data protection for the data of European citizens. To transition to the new rules, companies and public institutions were given two years to adapt their systems and controls. Due to the large area of changes the GDPR requires, many companies are facing severe problems to adapt the rules to be ready for enforcement. This marks the purpose of this study which is to look into compliance practices in the implementation of GDPR requirements. This includes a prospect of compliance mechanisms that may remain insufficiently addressed when the regulation comes into force on May 25, 2018. The study is conducted in Sweden and aims to investigate the situation in corporations and not in public institutions. Mixed methods have been applied by surveying and interviewing Swedish GDPR experts and consultants to gain an understanding of their view by using capability maturity scales to assess a variety of security processes and controls. The analysis shows a low implementation in GDPR requirements while having seen improvements over the past two years of transition. It points out that a holistic strategy towards compliance is mostly missing and many companies face obstacles that are difficult to overcome in a short period. This may result in non-compliance in many Swedish corporations after the regulation comes into force on May 25.
|
445 |
Attitudes toward, and awareness of, online privacy and security: a quantitative comparison of East Africa and U.S. internet usersRuhwanya, Zainab Said January 1900 (has links)
Master of Science / Computing and Information Sciences / Eugene Vasserman / The increase in penetration of Internet technology throughout the world is bringing an increasing volume of user information online, and developing countries such as those of East Africa are included as contributors and consumers of this voluminous information. While we have seen concerns from other parts of the world regarding user privacy and security, very little is known of East African Internet users’ concern with their online information exposure. The aim of this study is to compare Internet user awareness and concerns regarding online privacy and security between East Africa (EA) and the United States (U.S.) and to determine any common attitudes and differences. The study followed a quantitative research approach, with the EA population sampled from the Open University of Tanzania, an open and distance-learning university in East Africa, and the U.S. population sampled from Kansas State University, a public university in the U.S. Online questionnaires were used as survey instruments. The results show no significant difference in awareness of online privacy between Internet users from East Africa and the U.S. There is however, significant difference in concerns about online privacy, which differ with the type of information shared. Moreover, the results have shown that the U.S. Internet users are more aware of online privacy concerns, and more likely to have taken measure to protect their online privacy and conceal their online presence, than the East African Internet users. This study has also shown that East Africans Internet users are more likely to be victims of online identity theft, security issues and reputation damage.
|
446 |
Geração de rótulo de privacidade por palavras-chaves e casamento de padrõesPontes, Diego Roberto Gonçalves de 13 July 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-05-08T12:54:39Z
No. of bitstreams: 1
DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-10T14:04:50Z (GMT) No. of bitstreams: 1
DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-10T14:04:57Z (GMT) No. of bitstreams: 1
DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5) / Made available in DSpace on 2017-05-10T14:09:36Z (GMT). No. of bitstreams: 1
DissDRGP.pdf: 2915023 bytes, checksum: 6dc48dd58772bd3d2917206ca9a92646 (MD5)
Previous issue date: 2016-07-13 / Não recebi financiamento / Users do not usually read privacy policies from online services. Among the main reasons
for that is the fact that such policies are long and commonly hard to understand, which
makes the user lose interest in reading them carefully. In this scenario, users are prone to
agree to the policies terms without knowing what kind of data is being collected and why.
This dissertation discusses how the policies' content may be presented in a more friendly
way, showing information about data collection and usage in a table herein called Privacy
Label. The Privacy Label is a table with lines named according to data collection terms
and columns named according to expressions that reveal how the data is used by the
service. The table content shows if the policy collects a particular data to a particular
usage. To generate the Privacy Label, a study was made in a set of privacy policies to
identify which terms repeat more often along the texts. To do so, we used techniques to
find keywords, and from these keywords we were able to create privacy categories. The
categories define which kind of data is being collected and why, which are represented by
cells in the Privacy Label. Using word comparison techniques, a privacy policy can be
analyzed and important information can be extracted by comparing its terms with the
terms from the privacy categories. For each category we find, we show it in the Privacy
Label. To assess the proposed approach we developed an application prototype, herein
called PPMark, that analyzes a particular privacy policy, extract its keywords and
generates the Privacy Label automatically. The information extracted was analyzed
regarding its quality using three metrics: precision, recall and f-measure. The results
show that the approach is a viable functional alternative to generate the Privacy Label
and present privacy policies in a friendly manner. There are evidences of time saving by
using our approach, which facilitates the process of decision making. / Comumente, os usuários não leem as políticas de privacidade dos serviços online que
utilizam. Entre as principais causas estão os textos longos, muitas vezes de difícil
compreensão, desestimulando o interesse pela leitura atenciosa e integral. Neste
cenário, os usuários, muitas vezes, concordam com os termos sem saber os tipos de
dados que estão sendo coletados e o porquê. Esta dissertação discute como o conteúdo
das políticas de privacidade pode ser apresentado de forma mais sintética para o
usuário, com as informações sobre a coleta e a utilização dos dados sendo exibidas em
uma tabela, denominada Rótulo de Privacidade. O Rótulo de Privacidade é uma tabela
com linhas nomeadas por termos de coleta de dados e colunas nomeadas por
expressões que denotam finalidade das coletas. O conteúdo da tabela informa se a
política contempla a coleta de dados para a finalidade especificada. Para ser possível a
geração do Rótulo de Privacidade, foi feito um estudo em um conjunto de políticas de
privacidade para verificar quais termos mais se repetem nos textos. Para isto foram
utilizadas técnicas para encontrar palavras-chave e com estas foram criadas categorias
de privacidade. As categorias definem tipos de dados coletados e propósitos da coleta,
que no Rótulo de Privacidade são representados pelas células da tabela. Utilizando
técnicas de comparação de palavras, uma política de privacidade a ser lida pelo usuário
pode ser analisada pela abordagem, extraindo informações importantes por meio das
comparações de seus termos com os termos das categorias de privacidade elaboradas.
Para cada categoria encontrada na política de privacidade, a informação é ilustrada no
Rótulo de Privacidade. Para a avaliação da abordagem proposta, foi desenvolvido um
protótipo de uma aplicação, denominada PPMark, que analisa uma particular política de
privacidade, extrai as palavras-chave e gera o Rótulo de Privacidade de forma
automatizada. As informações extraídas foram analisadas quanto à qualidade utilizandose
três métricas que são empregadas para a avaliação de classificadores, sendo elas
precisão, recall e f-measure. Os resultados mostraram que a abordagem proposta é uma
alternativa funcional para o preenchimento do Rótulo de Privacidade e a apresentação
das políticas de privacidade. Há evidências de economia de tempo com a leitura e
entendimento das políticas, possibilitando suporte para a tomada de decisões.
|
447 |
On digital forensic readiness for information privacy incidentsReddy, Kamil 26 September 2012 (has links)
The right to information privacy is considered a basic human right in countries that recognise the right to privacy. South Africa, and other countries that recognise this right, offer individuals legal protections for their information privacy. Individuals, organisations and even governments in these countries often have an obligation under such laws to protect information privacy. Large organisations, for example, multinational companies and government departments are of special concern when it comes to protecting information privacy as they often hold substantial amounts of information about many individuals. The protection of information privacy, therefore, has become ever more significant as technological advances enable information privacy to be breached with increasing ease. There is, however, little research on holistic approaches to protecting information privacy in large organisations. Holistic approaches take account of both technical and non-technical factors that affect information privacy. Nontechnical factors may include the management of information privacy protection measures and other factors such as manual business processes and organisational policies. Amongst the protections that can be used by large organisations to protect information privacy is the ability to investigate incidents involving information privacy. Since large organisations typically make extensive use of information technology to store or process information, such investigations are likely to involve digital forensics. Digital forensic investigations require a certain amount of preparedness or readiness for investigations to be executed in an optimal fashion. The available literature on digital forensics and digital forensic readiness (DFR), unfortunately, does not specifically deal with the protection of information privacy, which has requirements over and above typical digital forensic investigations that are more concerned with information security breaches. The aim of this thesis, therefore, is to address the lack of research into DFR with regard to information privacy incidents. It adopts a holistic approach to DFR since many of the necessary measures are non-technical. There is, thus, an increased focus on management as opposed to specific technical issues. In addressing the lack of research into information privacy-specific DFR, the thesis provides large organisations with knowledge to better conduct digital forensic investigations into information privacy incidents. Hence, it allows for increased information privacy protection in large organisations because investigations may reveal the causes of information privacy breaches. Such breaches may then be prevented in future. The ability to conduct effective investigations also has a deterrent effect that may dissuade attempts at breaching information privacy. This thesis addresses the lack of research into information privacy-specific DFR by presenting a framework that allows large organisations to develop a digital forensic readiness capability for information privacy incidents. The framework is an idealistic representation of measures that can be taken to develop such a capability. In reality, large organisations operate within cost constraints. We therefore also contribute by showing how a cost management methodology known as time-driven activity-based costing can be used to determine the cost of DFR measures. Organisations are then able to make cost versus risk decisions when deciding which measures in the framework they wish to implement. Lastly, we introduce the concept of a digital forensics management system. The management of DFR in a large organisation can be a difficult task prone to error as it involves coordinating resources across multiple departments and organisational functions. The concept of the digital forensics management system proposed here allows management to better manage DFR by providing a central system from which information is available and control is possible. We develop an architecture for such a system and validate the architecture through a proof-of-concept prototype. / Thesis (PhD)--University of Pretoria, 2012. / Computer Science / unrestricted
|
448 |
Nya Dataskyddsförordningens påverkan på en organisation : En fallstudie med fokus på privacy by designRännare, Angelica January 2017 (has links)
Detta arbete har till syfte att studera den nya dataskyddsförordningen General Data Protection Regulation´s (GDPR) utmaningar och påverkan på både organisation samt system. Fokus i arbetet har varit på det specifika kravet privacy by design som är en del av GDPR. GDPR-förordningen kommer träda i kraft den 25 maj 2018. Eftersom GDPR är en ny förordning så har det inte skett forskning i större utsträckning i ämnet ännu. Den forskning som har skett inom området har mestadels varit inom juridiken. Detta resulterar i att ämnet är högaktuellt att undersökas eftersom ny kunskap kommer tillkomma genom detta arbete. Arbetet syftar till att undersöka hur GDPR genom sina krav påverkar en organisation och hur hänsyn tas till det specifika kravet privacy by design. Det kommer också undersökas vilka krav som ställs på teknik och funktioner. Genom att göra detta kommer kunskap tas fram om och hur en organisation förbereder sig och vad som krävs för att uppfylla kraven med GDPR. Privacy by design är en filosofi på hur inbyggd integritet kan användas för att skydda och bygga in den personliga integriteten i system. Den baseras på sju principer som skall användas för att förstå hur integritet kan skyddas. Men likt alla lösningar finns det utmaningar. Det är dessa utmaningar som arbetet skall undersöka och för att utifrån resultatet ge rekommendationer som förhoppningsvis kan användas för att få en överblick hur en organisation ligger i fas med privacy by design, som är en del av GDPR. Baserat på organisationens svar kommer rekommendationer ges för hur organisationen skulle kunna förbättra sitt arbete ytterligare. Metoden som använts till stöd för detta arbete är en fallstudie av kvalitativ art, och innefattar intervjuer med personer från en organisation inom säkerhetsbranschen som utvecklar metoder samt mjukvara för informationssäkerhetsarbete. Organisationen som har undersökts befinner sig i startgroparna för säkerställandet av GDPR och har gjort en inledande analys av läget. Fyra intervjuer har legat till grund för studien och på dessa har en innehållsanalys genomförts. Med hjälp av analysen så framträder en tydlig bild av hur arbetet kan se ut, i samband med lagförändringen ur privacy by design-perspektivet. För att ta reda på detta har en organisation som arbetar med informationssäkerhet och mjukvaruutveckling undersökts. Till arbetet utvecklades en frågeguide och en sammanfattning av principer, som är relaterade till privacy by design. Det har visat sig att organisationen som undersöktes till stora delar arbetar med privacy by design, men har ytterligare utmaningar att bemöta. Analys och diskussion av intervjuerna har resulterat i rekommendationer till organisationen angående hur de kan stärka upp sitt informationssäkerhetsarbete ytterligare. Dessutom har en frågeguide, som återfinns i bilagorna, tagits fram och denna kan användas av andra organisationer som önskar undersöka hur de ligger till i sitt arbete med GDPR:s krav på privacy by design. / The purpose of this work is to study the General Data Protection Regulation (GDPR) and what challenges and impact this regulation can have on both organization and systems. The focus of the work will be on the specific requirement “privacy by design” that is one part of GDPR. The GDPR will come into force on May 25, 2018. Since the GDPR is a new regulation, there has been little research on the subject yet. The research that has taken place in the field has mostly been in the field of law. This results in the subject being highly relevant for further studies, since this work will unravel new information. The purpose of the work is to investigate how GDPR, through its requirements, affects an organization and how to take into account the specific requirement of privacy by design. It will also be investigated which demands are made of technology and functions. By doing this, knowledge will come about if and how an organization prepares and what it takes to meet the requirements of the GDPR. Privacy by design is a philosophy of how built-in integrity can be used to protect and integrate the personal integrity of systems. It is based on seven principles that will be used to understand how integrity can be protected. But like all solutions there are challenges. These are the challenges that the work will investigate, and as a result give recommendations that hopefully can be used to get an overview of how an organization, is in phase with privacy by design, which is part of GDPR. Based on the organization's response, recommendations will be given for how the organization could further improve its work. The method used to support this work is of a qualitative nature and includes interviews with persons from an organization in the security industry that develop methods and software for information security work. The organization that has been investigated is in the pitfalls for ensuring GDPR and has conducted an initial analysis of the situation. The foundation of this study relies on four interviews, on which a content analysis was made. Through this analysis, a clear picture emerges of how the work with upcoming challenges can present itself, with the changes regarding the new law concerning privacy by design. In order to investigate this, an organisation that works with information security and software development has been scrutinized. As a part of the study, a questionnaire and a summary of the principles relevant to privacy by design, was developed. The conclusion was that the scrutinized organisation generally does work with privacy by design, but still has some challenges to face. The analysis and discussion of the interviews resulted in recommendations for the organization on how to further strengthen their work with information security. Furthermore, a questionnaire, which can be found in the appendix, has been developed, and can be used by other organizations wishing to examine their progress on the work with implementing the GDPR requirements regarding privacy by design.
|
449 |
The Privacy Paradox: Factors influencing information disclosure in the use of the Internet of Things (IoT) in South AfricaDavids, Natheer 21 January 2021 (has links)
The Internet of Things (IoT) has been acknowledged as one of the most innovative forms of technology since the computer, because of the influence it can have on multiple sectors of physical and virtual environments. The growth of IoT is expected to continue, by 2020 the number of connected devices is estimated to reach 50 billion. Recent developments in IoT provide an unprecedented opportunity for personalised services and other benefits. To exploit these potential benefits as best as possible, individuals are willing to provide their personal information despite potential privacy breaches. Therefore, this paper examines factors that influence the willingness to disclose personal information in the use of IoT in South Africa (SA) with the use of the privacy calculus as the theoretical underpinnings of this research. The privacy calculus accentuates that a risk-benefit trade off occurs when an individual decides to disclose their personal information, however, it is assumed that there are more factors than perceived risks and perceived benefits that influence information disclosure. After analysing previous literature, this study identified the following factors; information sensitivity, privacy concerns, social influence, perceived benefits, (perceived) privacy risks and privacy knowledge as possible key tenants in relation to willingness to disclose personal information. This research took on an objective ontological view, with the underlying epistemological stance being positivistic. The research incorporated a deductive approach, employing the use of a conceptual model which was constructed from a combination of studies orientated around privacy, the privacy calculus and the privacy paradox. Data for this research was collected using the quantitative research approach, through the use of an anonymous online questionnaire, where the targeted population was narrowed down to the general public residing within SA that make use of IoT devices and/or services. Data was collected using Qualtrics and analysed using SmartPLS 3. SmartPLS 3 was used to test for correlations between the factors which influence information disclosure in the use of IoT by utilising the complete bootstrapping method. A key finding was that the privacy paradox is apparent within SA, where individuals pursue enjoyment and predominantly use IoT for leisure purposes, while individuals are more likely to adopt self-withdrawal tendencies when faced with heightened privacy concerns or potential risks.
|
450 |
Local differentially private mechanisms for text privacy protectionMo, Fengran 08 1900 (has links)
Dans les applications de traitement du langage naturel (NLP), la formation d’un modèle
efficace nécessite souvent une quantité massive de données. Cependant, les données textuelles
dans le monde réel sont dispersées dans différentes institutions ou appareils d’utilisateurs.
Leur partage direct avec le fournisseur de services NLP entraîne d’énormes risques pour
la confidentialité, car les données textuelles contiennent souvent des informations sensibles,
entraînant une fuite potentielle de la confidentialité. Un moyen typique de protéger la confidentialité
consiste à privatiser directement le texte brut et à tirer parti de la confidentialité
différentielle (DP) pour protéger le texte à un niveau de protection de la confidentialité quantifiable.
Par ailleurs, la protection des résultats de calcul intermédiaires via un mécanisme
de privatisation de texte aléatoire est une autre solution disponible.
Cependant, les mécanismes existants de privatisation des textes ne permettent pas d’obtenir
un bon compromis entre confidentialité et utilité en raison de la difficulté intrinsèque
de la protection de la confidentialité des textes. Leurs limitations incluent principalement
les aspects suivants: (1) ces mécanismes qui privatisent le texte en appliquant la notion de
dχ-privacy ne sont pas applicables à toutes les métriques de similarité en raison des exigences
strictes; (2) ils privatisent chaque jeton (mot) dans le texte de manière égale en fournissant
le même ensemble de sorties excessivement grand, ce qui entraîne une surprotection; (3) les
méthodes actuelles ne peuvent garantir la confidentialité que pour une seule étape d’entraînement/
d’inférence en raison du manque de composition DP et de techniques d’amplification
DP.
Le manque du compromis utilité-confidentialité empêche l’adoption des mécanismes actuels
de privatisation du texte dans les applications du monde réel. Dans ce mémoire, nous
proposons deux méthodes à partir de perspectives différentes pour les étapes d’apprentissage
et d’inférence tout en ne requérant aucune confiance de sécurité au serveur. La première approche
est un mécanisme de privatisation de texte privé différentiel personnalisé (CusText)
qui attribue à chaque jeton d’entrée un ensemble de sortie personnalisé pour fournir une protection
de confidentialité adaptative plus avancée au niveau du jeton. Il surmonte également
la limitation des métriques de similarité causée par la notion de dχ-privacy, en adaptant
le mécanisme pour satisfaire ϵ-DP. En outre, nous proposons deux nouvelles stratégies de
5
privatisation de texte pour renforcer l’utilité du texte privatisé sans compromettre la confidentialité.
La deuxième approche est un modèle Gaussien privé différentiel local (GauDP)
qui réduit considérablement le volume de bruit calibrée sur la base d’un cadre avancé de
comptabilité de confidentialité et améliore ainsi la précision du modèle en incorporant plusieurs
composants. Le modèle se compose d’une couche LDP, d’algorithmes d’amplification
DP de sous-échantillonnage et de sur-échantillonnage pour l’apprentissage et l’inférence, et
d’algorithmes de composition DP pour l’étalonnage du bruit. Cette nouvelle solution garantit
pour la première fois la confidentialité de l’ensemble des données d’entraînement/d’inférence.
Pour évaluer nos mécanismes de privatisation de texte proposés, nous menons des expériences
étendues sur plusieurs ensembles de données de différents types. Les résultats
expérimentaux démontrent que nos mécanismes proposés peuvent atteindre un meilleur compromis
confidentialité-utilité et une meilleure valeur d’application pratique que les méthodes
existantes. En outre, nous menons également une série d’études d’analyse pour explorer
les facteurs cruciaux de chaque composant qui pourront fournir plus d’informations sur la
protection des textes et généraliser d’autres explorations pour la NLP préservant la confidentialité. / In Natural Language Processing (NLP) applications, training an effective model often
requires a massive amount of data. However, text data in the real world are scattered in
different institutions or user devices. Directly sharing them with the NLP service provider
brings huge privacy risks, as text data often contains sensitive information, leading to potential
privacy leakage. A typical way to protect privacy is to directly privatize raw text
and leverage Differential Privacy (DP) to protect the text at a quantifiable privacy protection
level. Besides, protecting the intermediate computation results via a randomized text
privatization mechanism is another available solution.
However, existing text privatization mechanisms fail to achieve a good privacy-utility
trade-off due to the intrinsic difficulty of text privacy protection. The limitations of them
mainly include the following aspects: (1) those mechanisms that privatize text by applying
dχ-privacy notion are not applicable for all similarity metrics because of the strict requirements;
(2) they privatize each token in the text equally by providing the same and excessively
large output set which results in over-protection; (3) current methods can only guarantee
privacy for either the training/inference step, but not both, because of the lack of DP composition
and DP amplification techniques.
Bad utility-privacy trade-off performance impedes the adoption of current text privatization
mechanisms in real-world applications. In this thesis, we propose two methods from
different perspectives for both training and inference stages while requiring no server security
trust. The first approach is a Customized differentially private Text privatization mechanism
(CusText) that assigns each input token a customized output set to provide more
advanced adaptive privacy protection at the token-level. It also overcomes the limitation
for the similarity metrics caused by dχ-privacy notion, by turning the mechanism to satisfy
ϵ-DP. Furthermore, we provide two new text privatization strategies to boost the utility of
privatized text without compromising privacy. The second approach is a Gaussian-based
local Differentially Private (GauDP) model that significantly reduces calibrated noise power
adding to the intermediate text representations based on an advanced privacy accounting
framework and thus improves model accuracy by incorporating several components. The
model consists of an LDP-layer, sub-sampling and up-sampling DP amplification algorithms
7
for training and inference, and DP composition algorithms for noise calibration. This novel
solution guarantees privacy for both training and inference data.
To evaluate our proposed text privatization mechanisms, we conduct extensive experiments
on several datasets of different types. The experimental results demonstrate that our
proposed mechanisms can achieve a better privacy-utility trade-off and better practical application
value than the existing methods. In addition, we also carry out a series of analyses
to explore the crucial factors for each component which will be able to provide more insights
in text protection and generalize further explorations for privacy-preserving NLP.
|
Page generated in 0.0655 seconds