• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 36
  • 8
  • 2
  • 1
  • 1
  • Tagged with
  • 74
  • 74
  • 32
  • 25
  • 12
  • 12
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Local differentially private mechanisms for text privacy protection

Mo, Fengran 08 1900 (has links)
Dans les applications de traitement du langage naturel (NLP), la formation d’un modèle efficace nécessite souvent une quantité massive de données. Cependant, les données textuelles dans le monde réel sont dispersées dans différentes institutions ou appareils d’utilisateurs. Leur partage direct avec le fournisseur de services NLP entraîne d’énormes risques pour la confidentialité, car les données textuelles contiennent souvent des informations sensibles, entraînant une fuite potentielle de la confidentialité. Un moyen typique de protéger la confidentialité consiste à privatiser directement le texte brut et à tirer parti de la confidentialité différentielle (DP) pour protéger le texte à un niveau de protection de la confidentialité quantifiable. Par ailleurs, la protection des résultats de calcul intermédiaires via un mécanisme de privatisation de texte aléatoire est une autre solution disponible. Cependant, les mécanismes existants de privatisation des textes ne permettent pas d’obtenir un bon compromis entre confidentialité et utilité en raison de la difficulté intrinsèque de la protection de la confidentialité des textes. Leurs limitations incluent principalement les aspects suivants: (1) ces mécanismes qui privatisent le texte en appliquant la notion de dχ-privacy ne sont pas applicables à toutes les métriques de similarité en raison des exigences strictes; (2) ils privatisent chaque jeton (mot) dans le texte de manière égale en fournissant le même ensemble de sorties excessivement grand, ce qui entraîne une surprotection; (3) les méthodes actuelles ne peuvent garantir la confidentialité que pour une seule étape d’entraînement/ d’inférence en raison du manque de composition DP et de techniques d’amplification DP. Le manque du compromis utilité-confidentialité empêche l’adoption des mécanismes actuels de privatisation du texte dans les applications du monde réel. Dans ce mémoire, nous proposons deux méthodes à partir de perspectives différentes pour les étapes d’apprentissage et d’inférence tout en ne requérant aucune confiance de sécurité au serveur. La première approche est un mécanisme de privatisation de texte privé différentiel personnalisé (CusText) qui attribue à chaque jeton d’entrée un ensemble de sortie personnalisé pour fournir une protection de confidentialité adaptative plus avancée au niveau du jeton. Il surmonte également la limitation des métriques de similarité causée par la notion de dχ-privacy, en adaptant le mécanisme pour satisfaire ϵ-DP. En outre, nous proposons deux nouvelles stratégies de 5 privatisation de texte pour renforcer l’utilité du texte privatisé sans compromettre la confidentialité. La deuxième approche est un modèle Gaussien privé différentiel local (GauDP) qui réduit considérablement le volume de bruit calibrée sur la base d’un cadre avancé de comptabilité de confidentialité et améliore ainsi la précision du modèle en incorporant plusieurs composants. Le modèle se compose d’une couche LDP, d’algorithmes d’amplification DP de sous-échantillonnage et de sur-échantillonnage pour l’apprentissage et l’inférence, et d’algorithmes de composition DP pour l’étalonnage du bruit. Cette nouvelle solution garantit pour la première fois la confidentialité de l’ensemble des données d’entraînement/d’inférence. Pour évaluer nos mécanismes de privatisation de texte proposés, nous menons des expériences étendues sur plusieurs ensembles de données de différents types. Les résultats expérimentaux démontrent que nos mécanismes proposés peuvent atteindre un meilleur compromis confidentialité-utilité et une meilleure valeur d’application pratique que les méthodes existantes. En outre, nous menons également une série d’études d’analyse pour explorer les facteurs cruciaux de chaque composant qui pourront fournir plus d’informations sur la protection des textes et généraliser d’autres explorations pour la NLP préservant la confidentialité. / In Natural Language Processing (NLP) applications, training an effective model often requires a massive amount of data. However, text data in the real world are scattered in different institutions or user devices. Directly sharing them with the NLP service provider brings huge privacy risks, as text data often contains sensitive information, leading to potential privacy leakage. A typical way to protect privacy is to directly privatize raw text and leverage Differential Privacy (DP) to protect the text at a quantifiable privacy protection level. Besides, protecting the intermediate computation results via a randomized text privatization mechanism is another available solution. However, existing text privatization mechanisms fail to achieve a good privacy-utility trade-off due to the intrinsic difficulty of text privacy protection. The limitations of them mainly include the following aspects: (1) those mechanisms that privatize text by applying dχ-privacy notion are not applicable for all similarity metrics because of the strict requirements; (2) they privatize each token in the text equally by providing the same and excessively large output set which results in over-protection; (3) current methods can only guarantee privacy for either the training/inference step, but not both, because of the lack of DP composition and DP amplification techniques. Bad utility-privacy trade-off performance impedes the adoption of current text privatization mechanisms in real-world applications. In this thesis, we propose two methods from different perspectives for both training and inference stages while requiring no server security trust. The first approach is a Customized differentially private Text privatization mechanism (CusText) that assigns each input token a customized output set to provide more advanced adaptive privacy protection at the token-level. It also overcomes the limitation for the similarity metrics caused by dχ-privacy notion, by turning the mechanism to satisfy ϵ-DP. Furthermore, we provide two new text privatization strategies to boost the utility of privatized text without compromising privacy. The second approach is a Gaussian-based local Differentially Private (GauDP) model that significantly reduces calibrated noise power adding to the intermediate text representations based on an advanced privacy accounting framework and thus improves model accuracy by incorporating several components. The model consists of an LDP-layer, sub-sampling and up-sampling DP amplification algorithms 7 for training and inference, and DP composition algorithms for noise calibration. This novel solution guarantees privacy for both training and inference data. To evaluate our proposed text privatization mechanisms, we conduct extensive experiments on several datasets of different types. The experimental results demonstrate that our proposed mechanisms can achieve a better privacy-utility trade-off and better practical application value than the existing methods. In addition, we also carry out a series of analyses to explore the crucial factors for each component which will be able to provide more insights in text protection and generalize further explorations for privacy-preserving NLP.
62

Privacy-Preserving Ontology Publishing:: The Case of Quantified ABoxes w.r.t. a Static Cycle-Restricted EL TBox: Extended Version

Baader, Franz, Koopmann, Patrick, Kriegel, Francesco, Nuradiansyah, Adrian, Peñaloza, Rafael 20 June 2022 (has links)
We review our recent work on how to compute optimal repairs, optimal compliant anonymizations, and optimal safe anonymizations of ABoxes containing possibly anonymized individuals. The results can be used both to remove erroneous consequences from a knowledge base and to hide secret information before publication of the knowledge base, while keeping as much as possible of the original information. / Updated on August 27, 2021. This is an extended version of an article accepted at DL 2021.
63

A comparative analysis of database sanitization techniques for privacy-preserving association rule mining / En jämförande analys av tekniker för databasanonymisering inom sekretessbevarande associationsregelutvinning

Mårtensson, Charlie January 2023 (has links)
Association rule hiding (ARH) is the process of modifying a transaction database to prevent sensitive patterns (association rules) from discovery by data miners. An optimal ARH technique successfully hides all sensitive patterns while leaving all nonsensitive patterns public. However, in practice, many ARH algorithms cause some undesirable side effects, such as failing to hide sensitive rules or mistakenly hiding nonsensitive ones. Evaluating the utility of ARH algorithms therefore involves measuring the side effects they cause. There are a wide array of ARH techniques in use, with evolutionary algorithms in particular gaining popularity in recent years. However, previous research in the area has focused on incremental improvement of existing algorithms. No work was found that compares the performance of ARH algorithms without the incentive of promoting a newly suggested algorithm as superior. To fill this research gap, this project compares three ARH algorithms developed between 2019 and 2022—ABC4ARH, VIDPSO, and SA-MDP— using identical and unbiased parameters. The algorithms were run on three real databases and three synthetic ones of various sizes, in each case given four different sets of sensitive rules to hide. Their performance was measured in terms of side effects, runtime, and scalability (i.e., performance on increasing database size). It was found that the performance of the algorithms varied considerably depending on the characteristics of the input data, with no algorithm consistently outperforming others at the task of mitigating side effects. VIDPSO was the most efficient in terms of runtime, while ABC4ARH maintained the most robust performance as the database size increased. However, results matching the quality of those in the papers originally describing each algorithm could not be reproduced, showing a clear need for validating the reproducibility of research before the results can be trusted. / ”Association rule hiding”, ungefär ”döljande av associationsregler” – hädanefter ARH – är en process som går ut på att modifiera en transaktionsdatabas för att förhindra att känsliga mönster (så kallade associationsregler) upptäcks genom datautvinning. En optimal ARH-teknik döljer framgångsrikt alla känsliga mönster medan alla ickekänsliga mönster förblir öppet tillgängliga. I praktiken är det dock vanligt att ARH-algoritmer orsakar oönskade sidoeffekter. Exempelvis kan de misslyckas med att dölja vissa känsliga regler eller dölja ickekänsliga regler av misstag. Evalueringen av ARH-algoritmers användbarhet inbegriper därför mätning av dessa sidoeffekter. Bland det stora urvalet ARH-tekniker har i synnerhet evolutionära algoritmer ökat i popularitet under senare år. Tidigare forskning inom området har dock fokuserat på inkrementell förbättring av existerande algoritmer. Ingen forskning hittades som jämförde ARH-algoritmer utan det underliggande incitamentet att framhäva överlägsenheten hos en nyutvecklad algoritm. Detta projekt ämnar fylla denna lucka i forskningen genom en jämförelse av tre ARH-algoritmer som tagits fram mellan 2019 och 2022 – ABC4ARH, VIDPSO och SA-MDP – med hjälp av identiska och oberoende parametrar. Algoritmerna kördes på sex databaser – tre hämtade från verkligheten, tre syntetiska av varierande storlek – och fick i samtliga fall fyra olika uppsättningar känsliga regler att dölja. Prestandan mättes enligt sidoeffekter, exekveringstid samt skalbarhet (dvs. prestation när databasens storlek ökar). Algoritmernas prestation varierade avsevärt beroende på indatans egenskaper. Ingen algoritm var konsekvent överlägsen de andra när det gällde att minimera sidoeffekter. VIDPSO var tidsmässigt mest effektiv, medan ABC4ARH var mest robust vid hanteringen av växande indata. Resultat i nivå med de som uppmättes i forskningsrapporterna som ursprungligen presenterat varje algoritm kunde inte reproduceras, vilket tyder på ett behov av att validera reproducerbarheten hos forskning innan dess resultat kan anses tillförlitliga.
64

Anonymous Opt-Out and Secure Computation in Data Mining

Shepard, Samuel Steven 09 November 2007 (has links)
No description available.
65

Privacy-Preserving Ontology Publishing for EL Instance Stores: Extended Version

Baader, Franz, Kriegel, Francesco, Nuradiansyah, Adrian 20 June 2022 (has links)
We make a first step towards adapting an existing approach for privacypreserving publishing of linked data to Description Logic (DL) ontologies. We consider the case where both the knowledge about individuals and the privacy policies are expressed using concepts of the DL EL, which corresponds to the setting where the ontology is an EL instance store. We introduce the notions of compliance of a concept with a policy and of safety of a concept for a policy, and show how optimal compliant (safe) generalizations of a given EL concept can be computed. In addition, we investigate the complexity of the optimality problem.
66

Un modèle rétroactif de réconciliation utilité-confidentialité sur les données d’assurance

Rioux, Jonathan 04 1900 (has links)
Le partage des données de façon confidentielle préoccupe un bon nombre d’acteurs, peu importe le domaine. La recherche évolue rapidement, mais le manque de solutions adaptées à la réalité d’une entreprise freine l’adoption de bonnes pratiques d’affaires quant à la protection des renseignements sensibles. Nous proposons dans ce mémoire une solution modulaire, évolutive et complète nommée PEPS, paramétrée pour une utilisation dans le domaine de l’assurance. Nous évaluons le cycle entier d’un partage confidentiel, de la gestion des données à la divulgation, en passant par la gestion des forces externes et l’anonymisation. PEPS se démarque du fait qu’il utilise la contextualisation du problème rencontré et l’information propre au domaine afin de s’ajuster et de maximiser l’utilisation de l’ensemble anonymisé. À cette fin, nous présentons un algorithme d’anonymat fortement contextualisé ainsi que des mesures de performances ajustées aux analyses d’expérience. / Privacy-preserving data sharing is a challenge for almost any enterprise nowadays, no matter their field of expertise. Research is evolving at a rapid pace, but there is still a lack of adapted and adaptable solutions for best business practices regarding the management and sharing of privacy-aware datasets. To this problem, we offer PEPS, a modular, upgradeable and end-to-end system tailored for the need of insurance companies and researchers. We take into account the entire cycle of sharing data: from data management to publication, while negotiating with external forces and policies. Our system distinguishes itself by taking advantage of the domain-specific and problem-specific knowledge to tailor itself to the situation and increase the utility of the resulting dataset. To this end, we also present a strongly contextualised privacy algorithm and adapted utility measures to evaluate the performance of a successful disclosure of experience analysis.
67

Energy efficient secure and privacy preserving data aggregation in Wireless Sensor Networks / Energy efficient secure and privacy preserving data aggregation in Wireless Sensor Networks

Memon, Irfana 12 November 2013 (has links)
Les réseaux de capteurs sans fils sont composés de noeuds capteurs capables de mesurer certains paramètres de l’environnement, traiter l’information recueillie, et communiquer par radio sans aucune autre infrastructure. La communication avec les autres noeuds consomme le plus d’énergie. Les protocoles de collecte des données des réseaux de capteurs sans fils doit donc avoir comme premier objectif de minimiser les communications. Une technique souvent utilisée pour ce faire est l’agrégation des données. Les réseaux de capteurs sans fils sont souvent déployés dans des environnements ouverts, et sont donc vulnérables aux attaques de sécurité. Cette thèse est une contribution à la conception de protocoles sécurisés pour réseaux de capteurs sans fils. Nous faisons une classification des principaux protocoles d’agrégation de données ayant des propriétés de sécurité. Nous proposons un nouveau protocole d’agrégation (ESPPA). ESPPA est basé sur la construction d’un arbre recouvrant sûr et utilise une technique de brouillage pour assurer la confidentialité et le respect de la vie privée. Notre algorithme de construction (et re-construction) de l’arbre recouvrant sûr tient compte des éventuelles pannes des noeuds capteurs. Les résultats de nos simulations montrent que ESPPA assure la sécurité en terme de confidentialité et de respect de la vie privée, et génère moins de communications que SMART. Finalement, nous proposons une extension du schéma de construction de l’arbre recouvrant sûr qui identifie les noeuds redondants en terme de couverture de captage et les met en veille. Les résultats de nos simulations montrent l’efficacité de l’extension proposée. / WSNs are formed by sensor nodes that have the ability to sense the environment, process the sensed information, and communicate via radio without any additional prior backbone infrastructure. In WSNs, communication with other nodes is the most energy consuming task. Hence, the primary objective in designing protocols for WSNs is to minimize communication overhead. This is often achieved using in-network data aggregation. As WSNs are often deployed in open environments, they are vulnerable to security attacks. This thesis contributes toward the design of energy efficient secure and privacy preserving data aggregation protocol for WSNs. First, we classify the main existing secure and privacy-preserving data aggregation protocols for WSNs in the literature. We then propose an energy-efficient secure and privacy-preserving data aggregation (ESPPA) scheme for WSNs. ESPPA scheme is tree-based and achieves confidentiality and privacy based on shuffling technique. We propose a secure tree construction (ST) and tree-reconstruction scheme. Simulation results show that ESPPA scheme effectively preserve privacy, confidentiality, and has less communication overhead than SMART. Finally we propose an extension of ST scheme, called secure coverage tree (SCT) construction scheme. SCT applies sleep scheduling. Through simulations, we show the efficacy and efficiency of the SCT scheme. Beside the work on secure and privacy preserving data aggregation, during my research period, we have also worked on another interesting topic (i.e., composite event detection for WSNs). Appendix B presents a complementary work on composite event detection for WSNs.
68

Secure and Efficient Comparisons between Untrusted Parties

Beck, Martin 11 September 2018 (has links)
A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems.
69

Privacy-preserving Building Occupancy Estimation via Low-Resolution Infrared Thermal Cameras

Zhu, Shuai January 2021 (has links)
Building occupancy estimation has become an important topic for sustainable buildings that has attracted more attention during the pandemics. Estimating building occupancy is a considerable problem in computer vision, while computer vision has achieved breakthroughs in recent years. But, machine learning algorithms for computer vision demand large datasets that may contain users’ private information to train reliable models. As privacy issues pose a severe challenge in the field of machine learning, this work aims to develop a privacypreserved machine learningbased method for people counting using a lowresolution thermal camera with 32 × 24 pixels. The method is applicable for counting people in different scenarios, concretely, counting people in spaces smaller than the field of view (FoV) of the camera, as well as large spaces over the FoV of the camera. In the first scenario, counting people in small spaces, we directly count people within the FoV of the camera by Multiple Object Detection (MOD) techniques. Our MOD method achieves up to 56.8% mean average precision (mAP). In the second scenario, we use Multiple Object Tracking (MOT) techniques to track people entering and exiting the space. We record the number of people who entered and exited, and then calculate the number of people based on the tracking results. The MOT method reaches 47.4% multiple object tracking accuracy (MOTA), 78.2% multiple object tracking precision (MOTP), and 59.6% identification F-Score (IDF1). Apart from the method, we create a novel thermal images dataset containing 1770 thermal images with proper annotation. / Uppskattning av hur många personer som vistas i en byggnad har blivit ett viktigt ämne för hållbara byggnader och har fått mer uppmärksamhet under pandemierna. Uppskattningen av byggnaders beläggning är ett stort problem inom datorseende, samtidigt som datorseende har fått ett genombrott under de senaste åren. Algoritmer för maskininlärning för datorseende kräver dock stora datamängder som kan innehålla användarnas privata information för att träna tillförlitliga modeller. Eftersom integritetsfrågor utgör en allvarlig utmaning inom maskininlärning syftar detta arbete till att utveckla en integritetsbevarande maskininlärningsbaserad metod för personräkning med hjälp av en värmekamera med låg upplösning med 32 x 24 pixlar. Metoden kan användas för att räkna människor i olika scenarier, dvs. att räkna människor i utrymmen som är mindre än kamerans FoV och i stora utrymmen som är större än kamerans FoV. I det första scenariot, att räkna människor i små utrymmen, räknar vi direkt människor inom kamerans FoV med MOD teknik. Vår MOD-metod uppnår upp till 56,8% av den totala procentuella fördelningen. I det andra scenariot använder vi MOT-teknik för att spåra personer som går in i och ut ur rummet. Vi registrerar antalet personer som går in och ut och beräknar sedan antalet personer utifrån spårningsresultaten. MOT-metoden ger 47,4% MOTA, 78,2% MOTP och 59,6% IDF1. Förutom metoden skapar vi ett nytt dataset för värmebilder som innehåller 1770 värmebilder med korrekt annotering.
70

Vers une plateforme holistique de protection de la vie privée dans les services géodépendants

Sahnoune, Zakaria 04 1900 (has links)
No description available.

Page generated in 0.0747 seconds