• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 35
  • 8
  • 2
  • 1
  • 1
  • Tagged with
  • 72
  • 72
  • 32
  • 24
  • 12
  • 12
  • 11
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Secure and Reliable Data Outsourcing in Cloud Computing

Cao, Ning 31 July 2012 (has links)
"The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure."
42

Practical Private Information Retrieval

Olumofin, Femi George January 2011 (has links)
In recent years, the subject of online privacy has been attracting much interest, especially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to demand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censorship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private information retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: Our first result addresses an existing question regarding the computational practicality of private information retrieval schemes. We show that, unlike previously argued, recent lattice-based computational PIR schemes and multi-server information-theoretic PIR schemes are much more computationally efficient than a trivial transfer of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10--1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clarifies the well-known result of Sion and Carbunar on the computational practicality of PIR. Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs.
43

Secure and high-performance big-data systems in the cloud

Tang, Yuzhe 21 September 2015 (has links)
Cloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay.
44

Practical Private Information Retrieval

Olumofin, Femi George January 2011 (has links)
In recent years, the subject of online privacy has been attracting much interest, especially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to demand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censorship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private information retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: Our first result addresses an existing question regarding the computational practicality of private information retrieval schemes. We show that, unlike previously argued, recent lattice-based computational PIR schemes and multi-server information-theoretic PIR schemes are much more computationally efficient than a trivial transfer of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10--1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clarifies the well-known result of Sion and Carbunar on the computational practicality of PIR. Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs.
45

FULLY HOMOMORPHIC ENCRYPTION BASED DATA ACCESS FRAMEWORK FOR PRIVACY-PRESERVING HEALTHCARE ANALYTICS

Ganduri, Sri Lasya 01 December 2021 (has links)
The main aim of this thesis is to develop a library for integrating fully homomorphic encryption-based computations on a standard database. The fully homomorphic encryption is an encryption scheme that allows functions to be performed directly on encrypted data without the requirement of decrypting the data and yields the same results as if the functions were run on the plaintext. This implementation is a promising solution for preserving the privacy of the health care system, where millions of patients’ data are stored. The personal health care tools gather medical data and store it in a database. Upon importing this library into the database, the data that is being entered into the database is encrypted and the computations can be performed on the encrypted data without decrypting.
46

Towards Building Privacy-Preserving Language Models: Challenges and Insights in Adapting PrivGAN for Generation of Synthetic Clinical Text

Nazem, Atena January 2023 (has links)
The growing development of artificial intelligence (AI), particularly neural networks, is transforming applications of AI in healthcare, yet it raises significant privacy concerns due to potential data leakage. As neural networks memorise training data, they may inadvertently expose sensitive clinical data to privacy breaches, which can engender serious repercussions like identity theft, fraud, and harmful medical errors. While regulations such as GDPR offer safeguards through guidelines, rooted and technical protections are required to address the problem of data leakage. Reviews of various approaches show that one avenue of exploration is the adaptation of Generative Adversarial Networks (GANs) to generate synthetic data for use in place of real data. Since GANs were originally designed and mainly researched for generating visual data, there is a notable gap for further exploration of adapting GANs with privacy-preserving measures for generating synthetic text data. Thus, to address this gap, this study aims at answering the research questions of how a privacy-preserving GAN can be adapted to safeguard the privacy of clinical text data and what challenges and potential solutions are associated with these adaptations. To this end, the existing privGAN framework—originally developed and tested for image data—was tailored to suit clinical text data. Following the design science research framework, modifications were made while adhering to the privGAN architecture to incorporate reinforcement learning (RL) for addressing the discrete nature of text data. For synthetic data generation, this study utilised the 'Discharge summary' class from the Noteevents table of the MIMIC-III dataset, which is clinical text data in American English. The utility of the generated data was assessed using the BLEU-4 metric, and a white-box attack was conducted to test the model's resistance to privacy breaches. The experiment yielded a very low BLEU-4 score, indicating that the generator could not produce synthetic data that would capture the linguistic characteristics and patterns of real data. The relatively low white-box attack accuracy of one discriminator (0.2055) suggests that the trained discriminator was less effective in inferring sensitive information with high accuracy. While this may indicate a potential for preserving privacy, increasing the number of discriminators proves less favourable results (0.361). In light of these results, it is noted that the adapted approach in defining the rewards as a measure of discriminators’ uncertainty can signal a contradicting learning strategy and lead to the low utility of data. This study underscores the challenges in adapting privacy-preserving GANs for text data due to the inherent complexity of GANs training and the required computational power. To obtain better results in terms of utility and confirm the effectiveness of the privacy measures, further experiments are required to consider a more direct and granular rewarding system for the generator and to obtain an optimum learning rate. As such, the findings reiterate the necessity for continued experimentation and refinement in adapting privacy-preserving GANs for clinical text.
47

Ex Ante Approaches for Security, Privacy, and Enforcement in Spectrum Sharing

Bahrak, Behnam 17 December 2013 (has links)
Cognitive radios (CRs) are devices that are capable of sensing the spectrum and using its free portions in an opportunistic manner. The free spectrum portions are referred to as white spaces or spectrum holes. It is widely believed that CRs are one of the key enabling technologies for realizing a new regulatory spectrum management paradigm, viz. dynamic spectrum access (DSA). CRs often employ software-defined radio (SDR) platforms that are capable of executing artificial intelligence (AI) algorithms to reconfigure their transmission/reception (TX/RX) parameters to communicate efficiently while avoiding interference with licensed (a.k.a. primary or incumbent) users and unlicensed (a.k.a. secondary or cognitive) users. When different stakeholders share a common resource, such as the case in spectrum sharing, security, privacy, and enforcement become critical considerations that affect the welfare of all stakeholders. Recent advances in radio spectrum access technologies, such as CRs, have made spectrum sharing a viable option for significantly improving spectrum utilization efficiency. However, those technologies have also contributed to exacerbating the difficult problems of security, privacy and enforcement. In this dissertation, we review some of the critical security and privacy threats that impact spectrum sharing. We also discuss ex ante (preventive) approaches which mitigate the security and privacy threats and help spectrum enforcement. / Ph. D.
48

Toward Privacy-Preserving and Secure Dynamic Spectrum Access

Dou, Yanzhi 19 January 2018 (has links)
Dynamic spectrum access (DSA) technique has been widely accepted as a crucial solution to mitigate the potential spectrum scarcity problem. Spectrum sharing between the government incumbents and commercial wireless broadband operators/users is one of the key forms of DSA. Two categories of spectrum management methods for shared use between incumbent users (IUs) and secondary users (SUs) have been proposed, i.e., the server-driven method and the sensing-based method. The server-driven method employs a central server to allocate spectrum resources while considering incumbent protection. The central server has access to the detailed IU operating information, and based on some accurate radio propagation model, it is able to allocate spectrum following a particular access enforcement method. Two types of access enforcement methods -- exclusion zone and protection zone -- have been adopted for server-driven DSA systems in the current literature. The sensing-based method is based on recent advances in cognitive radio (CR) technology. A CR can dynamically identify white spaces through various incumbent detection techniques and reconfigure its radio parameters in response to changes of spectrum availability. The focus of this dissertation is to address critical privacy and security issues in the existing DSA systems that may severely hinder the progress of DSA's deployment in the real world. Firstly, we identify serious threats to users' privacy in existing server-driven DSA designs and propose a privacy-preserving design named P²-SAS to address the issue. P²-SAS realizes the complex spectrum allocation process of protection-zone-based DSA in a privacy-preserving way through Homomorphic Encryption (HE), so that none of the IU or SU operation data would be exposed to any snooping party, including the central server itself. Secondly, we develop a privacy-preserving design named IP-SAS for the exclusion-zone- based server-driven DSA system. We extend the basic design that only considers semi- honest adversaries to include malicious adversaries in order to defend the more practical and complex attack scenarios that can happen in the real world. Thirdly, we redesign our privacy-preserving SAS systems entirely to remove the somewhat- trusted third party (TTP) named Key Distributor, which in essence provides a weak proxy re-encryption online service in P²-SAS and IP-SAS. Instead, in this new system, RE-SAS, we leverage a new crypto system that supports both a strong proxy re-encryption notion and MPC to realize privacy-preserving spectrum allocation. The advantages of RE-SAS are that it can prevent single point of vulnerability due to TTP and also increase SAS's service performance dramatically. Finally, we identify the potentially crucial threat of compromised CR devices to the ambient wireless infrastructures and propose a scalable and accurate zero-day malware detection system called GuardCR to enhance CR network security at the device level. GuardCR leverages a host-based anomaly detection technique driven by machine learning, which makes it autonomous in malicious behavior recognition. We boost the performance of GuardCR in terms of accuracy and efficiency by integrating proper domain knowledge of CR software. / Ph. D.
49

Web services oriented approach for privacy-preserving data sharing / Une approche orientée service pour la préservation des données confidentielles dans les compositions de services Web

Tbahriti, Salah Eddine 03 December 2012 (has links)
Bien que la technologie de composition de services Web soit considérée comme l’une des technologies les plus prometteuses pour l’intégration des sources de données hétérogènes et multiples ainsi que pour la réalisation d’opérations complexes, la question de la protection des données personnelles demeure l’une des préoccupation majeure liés à cette technologie. Ainsi, lors d’un processus de composition, l’échange de données entre toutes les entités – tels que, les services Web recueillant et fournissant des données, les individus dont les données peuvent être fournies et gérées par les services Web, les systèmes qui composent les services Web et les clients finaux des services – est une étape nécessaire et indispensable pour répondre à des requêtes complexes. En conséquence, des données personnelles sont échangées et manipulées entre toutes les entités du système. Notre objectif dans cette thèse est la conception et le développement d’un cadre permettant d’améliorer la composition des services Web avec des mécanismes de protection des données personnelles. Pour atteindre cet objectif, nous avons proposé une approche générale composée de trois éléments. Premièrement, nous avons proposé un modèle formel de confidentialité pour permettre aux services Web de décrire leurs contraintes de confidentialité liées aux données personnelles. Notre modèle permet une spécification des contraintes de confidentialité relative non seulement au niveau des données manipulées, mais aussi au niveau des opérations invoquées par les services. Deuxièmement, nous développons un algorithme de compatibilité qui permet de vérifier formellement la compatibilité entre les exigences et les politiques de confidentialité de tous les services lors d’un processus de composition. Troisièmement, dans le cas où certains services dans la composition sont incompatibles par rapport à leur spécification de confidentialité, nous avons introduit une nouvelle approche basée sur un modèle de négociation dans la perspective de trouver une composition compatible (c’est-à-dire, d’obtenir la compatibilité de toutes les spécifications de confidentialité des services impliqués dans la composition). Enfin, nous avons mis en œuvre les techniques présentées dans cette thèse au sein du prototype PAIRSE et mené une étude de performance sur les algorithmes proposés / While Web service composition technologies have been beneficial to the integration of a wealth of information sources and the realization of complex and personalized operations, the issue of privacy is considered by many as a major concern in services computing. Central to the development of the composition process is the exchange of sensitive and private data between all parties: Web services collecting and providing data, individuals whose data may be provided and managed by Web services, systems composing Web service to answer complex queries, and requesters. As a consequence, managing privacy between all parties of the system is far from being an easy task. Our goal in this thesis is to build the foundations of an integrated framework to enhance Web service composition with privacy protection capabilities. To this aim, we firstly propose a formal privacy model to allow Web services to describe their privacy specifications. Our privacy model goes beyond traditional data-oriented models by dealing with privacy not only at the data level but also service level. Secondly, we develop a compatibility-matching algorithm to check privacy compatibility between privacy requirements and policies within a composition. Thirdly, in the case where some services in the composition are incompatible regarding their privacy specifications, we introduce a novel approach based on a negotiation model to reach compatibility of concerned services (i.e. services that participate in a composition which are incompatible). Finally, we conduct an extensive performance study of the proposed algorithms. The techniques presented in this dissertation are implemented in PAIRSE prototype
50

Préservation de la confidentialité des données externalisées dans le traitement des requêtes top-k / Privacy preserving top-k query processing over outsourced data

Mahboubi, Sakina 21 November 2018 (has links)
L’externalisation de données d’entreprise ou individuelles chez un fournisseur de cloud, par exemple avec l’approche Database-as-a-Service, est pratique et rentable. Mais elle introduit un problème majeur: comment préserver la confidentialité des données externalisées, tout en prenant en charge les requêtes expressives des utilisateurs. Une solution simple consiste à crypter les données avant leur externalisation. Ensuite, pour répondre à une requête, le client utilisateur peut récupérer les données cryptées du cloud, les décrypter et évaluer la requête sur des données en texte clair (non cryptées). Cette solution n’est pas pratique, car elle ne tire pas parti de la puissance de calcul fournie par le cloud pour évaluer les requêtes.Dans cette thèse, nous considérons un type important de requêtes, les requêtes top-k, et le problème du traitement des requêtes top-k sur des données cryptées dans le cloud, tout en préservant la vie privée. Une requête top-k permet à l’utilisateur de spécifier un nombre k de tuples les plus pertinents pour répondre à la requête. Le degré de pertinence des tuples par rapport à la requête est déterminé par une fonction de notation.Nous proposons d’abord un système complet, appelé BuckTop, qui est capable d’évaluer efficacement les requêtes top-k sur des données cryptées, sans avoir à les décrypter dans le cloud. BuckTop inclut un algorithme de traitement des requêtes top-k qui fonctionne sur les données cryptées, stockées dans un nœud du cloud, et retourne un ensemble qui contient les données cryptées correspondant aux résultats top-k. Il est aidé par un algorithme de filtrage efficace qui est exécuté dans le cloud sur les données chiffrées et supprime la plupart des faux positifs inclus dans l’ensemble renvoyé. Lorsque les données externalisées sont volumineuses, elles sont généralement partitionnées sur plusieurs nœuds dans un système distribué. Pour ce cas, nous proposons deux nouveaux systèmes, appelés SDB-TOPK et SD-TOPK, qui permettent d’évaluer les requêtes top-k sur des données distribuées cryptées sans avoir à les décrypter sur les nœuds où elles sont stockées. De plus, SDB-TOPK et SD-TOPK ont un puissant algorithme de filtrage qui filtre les faux positifs autant que possible dans les nœuds et renvoie un petit ensemble de données cryptées qui seront décryptées du côté utilisateur. Nous analysons la sécurité de notre système et proposons des stratégies efficaces pour la mettre en œuvre.Nous avons validé nos solutions par l’implémentation de BuckTop, SDB-TOPK et SD-TOPK, et les avons comparé à des approches de base par rapport à des données synthétiques et réelles. Les résultats montrent un excellent temps de réponse par rapport aux approches de base. Ils montrent également l’efficacité de notre algorithme de filtrage qui élimine presque tous les faux positifs. De plus, nos systèmes permettent d’obtenir une réduction significative des coûts de communication entre les nœuds du système distribué lors du calcul du résultat de la requête. / Outsourcing corporate or individual data at a cloud provider, e.g. using Database-as-a-Service, is practical and cost-effective. But it introduces a major problem: how to preserve the privacy of the outsourced data, while supporting powerful user queries. A simple solution is to encrypt the data before it is outsourced. Then, to answer a query, the user client can retrieve the encrypted data from the cloud, decrypt it, and evaluate the query over plaintext (non encrypted) data. This solution is not practical, as it does not take advantage of the computing power provided by the cloud for evaluating queries.In this thesis, we consider an important kind of queries, top-k queries,and address the problem of privacy-preserving top-k query processing over encrypted data in the cloud.A top-k query allows the user to specify a number k, and the system returns the k tuples which are most relevant to the query. The relevance degree of tuples to the query is determined by a scoring function.We first propose a complete system, called BuckTop, that is able to efficiently evaluate top-k queries over encrypted data, without having to decrypt it in the cloud. BuckTop includes a top-k query processing algorithm that works on the encrypted data, stored at one cloud node,and returns a set that is proved to contain the encrypted data corresponding to the top-k results. It also comes with an efficient filtering algorithm that is executed in the cloud on encypted data and removes most of the false positives included in the set returned.When the outsourced data is big, it is typically partitioned over multiple nodes in a distributed system. For this case, we propose two new systems, called SDB-TOPK and SD-TOPK, that can evaluate top-k queries over encrypted distributed data without having to decrypt at the nodes where they are stored. In addition, SDB-TOPK and SD-TOPK have a powerful filtering algorithm that filters the false positives as much as possible in the nodes, and returns a small set of encrypted data that will be decrypted in the user side. We analyze the security of our system, and propose efficient strategies to enforce it.We validated our solutions through implementation of BuckTop , SDB-TOPK and SD-TOPK, and compared them to baseline approaches over synthetic and real databases. The results show excellent response time compared to baseline approaches. They also show the efficiency of our filtering algorithm that eliminates almost all false positives. Furthermore, our systems yieldsignificant reduction in communication cost between the distributed system nodes when computing the query result.

Page generated in 0.4756 seconds