Global ETD Search

1	Strategic behavior and database privacy Krehbiel, Sara 21 September 2015 (has links) This dissertation focuses on strategic behavior and database privacy. First, we look at strategic behavior as a tool for distributed computation. We blend the perspectives of game theory and mechanism design in proposals for distributed solutions to the classical set cover optimization problem. We endow agents with natural individual incentives, and we show that centrally broadcasting non-binding advice effectively guides the system to a near-optimal state while keeping the original incentive structure intact. We next turn to the database privacy setting, in which an analyst wishes to learn something from a database, but the individuals contributing the data want to protect their personal information. The notion of differential privacy allows us to do both by obscuring true answers to statistical queries with a small amount of noise. The ability to conduct a task differentially privately depends on whether the amount of noise required for privacy still permits statistical accuracy. We show that it is possible to give a satisfying tradeoff between privacy and accuracy for a computational problem called independent component analysis (ICA), which seeks to decompose an observed signal into its underlying independent source variables. We do this by releasing a perturbation of a compact representation of the observed data. This approach allows us to preserve individual privacy while releasing information that can be used to reconstruct the underlying relationship between the observed variables. In almost all of the differential privacy literature, the privacy requirement must be specified before looking at the data, and the noise added for privacy limits the statistical utility of the sanitized data. The third part of this dissertation ties together privacy and strategic behavior to answer the question of how to determine an appropriate level of privacy when data contributors prefer more privacy but an analyst prefers more accuracy. The proposed solution to this problem views privacy as a public good and uses market design techniques to collect these preferences and then privately select and enforce a socially efficient level of privacy. Differential privacy Game theory
2	Introducing Differential Privacy Mechanisms for Mobile App Analytics of Dynamic Content Latif, Sufian January 2021 (has links) No description available. Computer Science
3	Building trustworthy machine learning systems in adversarial environments Wang, Ning 26 May 2023 (has links) Modern AI systems, particularly with the rise of big data and deep learning in the last decade, have greatly improved our daily life and at the same time created a long list of controversies. AI systems are often subject to malicious and stealthy subversion that jeopardizes their efficacy. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly boost the accuracy of machine learning models, they also create opportunities for adversaries to tamper with models or extract sensitive data. Malicious data providers can compromise machine learning systems by supplying false data and intermediate computation results. Even a well-trained model can be deceived to misbehave by an adversary who provides carefully designed inputs. Furthermore, curious parties can derive sensitive information of the training data by interacting with a machine-learning model. These adversarial scenarios, known as poisoning attack, adversarial example attack, and inference attack, have demonstrated that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. To address these problems, we proposed the following solutions: (1) FLARE, which detects and mitigates stealthy poisoning attacks by leveraging latent space representations; (2) MANDA, which detects adversarial examples by utilizing evaluations from diverse sources, i.e, model-based prediction and data-based evaluation; (3) FeCo which enhances the robustness of machine learning-based network intrusion detection systems by introducing a novel representation learning method; and (4) DP-FedMeta, which preserves data privacy and improves the privacy-accuracy trade-off in machine learning systems through a novel adaptive clipping mechanism. / Doctor of Philosophy / Over the past few decades, machine learning (ML) has become increasingly popular for enhancing efficiency and effectiveness in data analytics and decision-making. Notable applications include intelligent transportation, smart healthcare, natural language generation, intrusion detection, etc. While machine learning methods are often employed for beneficial purposes, they can also be exploited for malicious intents. Well-trained language models have demonstrated generalizability deficiencies and intrinsic biases; generative ML models used for creating art have been repurposed by fraudsters to produce deepfakes; and facial recognition models trained on big data have been found to leak sensitive information about data owners. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly improve the accuracy of ML models, they also enable adversaries to corrupt models and infer sensitive data. This leads to various adversarial attacks, such as model poisoning during training, adversarially crafted data in testing, and data inference. It is evident that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. This research focuses on building trustworthy machine-learning systems in adversarial environments from a data perspective. It encompasses two themes: securing ML systems against security or privacy vulnerabilities (security of AI) and using ML as a tool to develop novel security solutions (AI for security). For the first theme, we studied adversarial attack detection in both the training and testing phases and proposed FLARE and MANDA to secure matching learning systems in the two phases, respectively. Additionally, we proposed a privacy-preserving learning system, dpfed, to defend against privacy inference attacks. We achieved a good trade-off between accuracy and privacy by proposing an adaptive data clipping and perturbing method. In the second theme, the research is focused on enhancing the robustness of intrusion detection systems through data representation learning. adversarial machine learning anomaly detection differential privacy
4	Privacy-aware Federated Learning with Global Differential Privacy Airody Suresh, Spoorthi 31 January 2023 (has links) There is an increasing need for low-power neural systems as neural networks become more widely used in embedded devices with limited resources. Spiking neural networks (SNNs) are proving to be a more energy-efficient option to conventional Artificial neural networks (ANNs), which are recognized for being computationally heavy. Despite its significance, there has been not enough attention on training SNNs on large-scale distributed Machine Learning techniques like Federated Learning (FL). As federated learning involves many energy-constrained devices, there is a significant opportunity to take advantage of the energy efficiency offered by SNNs. However, it is necessary to address the real-world communication constraints in an FL system and this is addressed with the help of three communication reduction techniques, namely, model compression, partial device participation, and periodic aggregation. Furthermore, the convergence of federated learning systems is also affected by data heterogeneity. Federated learning systems are capable of protecting the private data of clients from adversaries. However, by analyzing the uploaded client parameters, confidential information can still be revealed. To combat privacy attacks on the FL systems, various attempts have been made to incorporate differential privacy within the framework. In this thesis, we investigate the trade-offs between communication costs and training variance under a Federated Learning system with Differential Privacy applied at the parameter server (curator model). / Master of Science / Federated Learning is a decentralized method of training neural network models; it employs several participating devices to independently learn a model on their local data partition. These local models are then aggregated at a central server to achieve the same performance as if the model had been trained centrally. But with Federated Learning systems there is a communication overhead accumulated. Various communication reductions can be used to reduce these costs. Spiking Neural Networks, being the energy-efficient option to Artificial Neural Networks, can be utilized in Federated Learning systems. This is because FL systems consist of a network of energy-efficient devices. Federated learning systems are helpful in preserving the privacy of data in the system. However, an attacker can still obtain meaningful information from the parameters that are transmitted during a session. To this end, differential privacy techniques are utilized to combat privacy concerns in Federated Learning systems. In this thesis, we compare and contrast different communication costs and parameters of a federated learning system with differential privacy applied to it. Differential Privacy Federated learning Communication Constraints
5	Implementing Differential Privacy for Privacy Preserving Trajectory Data Publication in Large-Scale Wireless Networks Stroud, Caleb Zachary 14 August 2018 (has links) Wireless networks collect vast amounts of log data concerning usage of the network. This data aids in informing operational needs related to performance, maintenance, etc., but it is also useful for outside researchers in analyzing network operation and user trends. Releasing such information to these outside researchers poses a threat to privacy of users. The dueling need for utility and privacy must be addressed. This thesis studies the concept of differential privacy for fulfillment of these goals of releasing high utility data to researchers while maintaining user privacy. The focus is specifically on physical user trajectories in authentication manager log data since this is a rich type of data that is useful for trend analysis. Authentication manager log data is produced when devices connect to physical access points (APs) and trajectories are sequences of these spatiotemporal connections from one AP to another for the same device. The fulfillment of this goal is pursued with a variable length n-gram model that creates a synthetic database which can be easily ingested by researchers. We found that there are shortcomings to the algorithm chosen in specific application to the data chosen, but differential privacy itself can still be used to release sanitized datasets while maintaining utility if the data has a low sparsity. / Master of Science / Wireless internet networks store historical logs of user device interaction with it. For example, when a phone or other wireless device connects, data is stored by the Internet Service Provider (ISP) about the device, username, time, and location of connection. A database of this type of data can help researchers analyze user trends in the network, but the data contains personally identifiable information for the users. We propose and analyze an algorithm which can release this data in a high utility manner for the researchers, yet maintain user privacy. This is based on a verifiable approach to privacy called differential privacy. This algorithm is found to provide utility and privacy protection for datasets with many users compared to the size of the network. e-differential privacy differential privacy PPTDP privacy preserving data publication PPDP
6	Data-level privacy through data perturbation in distributed multi-application environments de Souza, Tulio January 2016 (has links) Wireless sensor networks used to have a main role as a monitoring tool for environmental purposes and animal tracking. This spectrum of applications, however, has dramatically grown in the past few years. Such evolution means that what used to be application-specific networks are now multi application environments, often with federation capabilities. This shift results in a challenging environment for data privacy, mainly caused by the broadening of the spectrum of data access points and involved entities. This thesis first evaluates existing privacy preserving data aggregation techniques to determine how suitable they are for providing data privacy in this more elaborate environment. Such evaluation led to the design of the set difference attack, which explores the fact that they all rely purely on data aggregation to achieve privacy, which is shown through simulation not to be suitable to the task. It also indicates that some form of uncertainty is required in order to mitigate the attack. Another relevant finding is that the attack can also be effective against standalone networks, by exploring the node availability factor. Uncertainty is achieved via the use of differential privacy, which offers a strong and formal privacy guarantee through data perturbation. In order to make it suitable to work in a wireless sensor network environment, which mainly deals with time-series data, two new approaches to address it have been proposed. These have a contrasting effect when it comes to utility and privacy levels, offering a flexible balance between privacy and data utility for sensed entities and data analysts/consumers. Lastly, this thesis proposes a framework to assist in the design of privacy preserving data aggregation protocols to suit application needs while at the same time complying with desired privacy requirements. The framework's evaluation compares and contrasts several scenarios to demonstrate the level of flexibility and effectiveness that the designed protocols can provide. Overall, this thesis demonstrates that data perturbation can be made significantly practical through the proposed framework. Although some problems remain, with further improvements to data correlation methods and better use of some intrinsic characteristics of such networks, the use of data perturbation may become a practical and efficient privacy preserving mechanism for wireless sensor networks. 004
7	Fundamental Limits in Data Privacy: From Privacy Measures to Economic Foundations January 2016 (has links) abstract: Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions, and then puts the main focus on developing economic foundations for a market model of trading private data. The first part characterizes differential privacy, identifiability and mutual-information privacy by their privacy--distortion functions, which is the optimal achievable privacy level as a function of the maximum allowable distortion. The results show that these notions are fundamentally related and exhibit certain consistency: (1) The gap between the privacy--distortion functions of identifiability and differential privacy is upper bounded by a constant determined by the prior. (2) Identifiability and mutual-information privacy share the same optimal mechanism. (3) The mutual-information optimal mechanism satisfies differential privacy with a level at most a constant away from the optimal level. The second part studies a market model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The value of epsilon units of privacy is measured by the minimum payment such that an individual's equilibrium strategy is to report data in an epsilon-differentially private manner. For the setting with binary private data that represents individuals' knowledge about a common underlying state, asymptotically tight lower and upper bounds on the value of privacy are established as the number of individuals becomes large, and the payment--accuracy tradeoff for learning the state is obtained. The lower bound assures the impossibility of using lower payment to buy epsilon units of privacy, and the upper bound is given by a designed reward mechanism. When the individuals' valuations of privacy are unknown to the data collector, mechanisms with possible negative payments (aiming to penalize individuals with "unacceptably" high privacy valuations) are designed to fulfill the accuracy goal and drive the total payment to zero. For the setting with binary private data following a general joint probability distribution with some symmetry, asymptotically optimal mechanisms are designed in the high data quality regime. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Electrical engineering Data privacy Differential privacy Economic foundations Mechanism design
8	Privacy in Complex Sample Based Surveys Shawn A Merrill (11806802) 20 December 2021 (has links) In the last few decades, there has been a dramatic uptick in the issues related to protecting user privacy in released data, both in statistical databases and anonymized records. Privacy-preserving data publishing is a field established to handle these releases while avoiding the problems that plagued many earlier attempts. This issue is of particular importance for governmental data, where both the release and the privacy requirements are frequently governed by legislature (e.g., HIPAA, FERPA, Clery Act). This problem is doubly compounded by the complex survey methods employed to counter problems in data collection. The preeminent definition for privacy is that of differential privacy, which protects users by limiting the impact that any individual can have on the result of any query. <br><br>The thesis proposes models for differentially private versions of current survey methodologies and, discusses the evaluation of those models. We focus on the issues of missing data and weighting which are common techniques employed in complex surveys to counter problems with sampling and response rates. First we propose a model for answering queries on datasets with missing data while maintaining differential privacy. Our model uses k-Nearest Neighbor imputation to replicate donor values while protecting the privacy of the donor. Our model provides significantly better bias reduction in realistic experiments using existing data, as well as providing less noise than a naive solution. Our second model proposes a method of performing Iterative Proportional Fitting (IPF) in a differentially private manner, a common technique used to ensure that survey records are weighted consistently with known values. We also focus on the general philosophical need to incorporate privacy when creating new survey methodologies, rather than assuming that privacy can simply be added at a later step. Theoretical Computer Science Differential Privacy Privacy Privacy preserving data publishing
9	Analyzing Sensitive Data with Local Differential Privacy Tianhao Wang (10711713) 30 April 2021 (has links) <div>Vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to effectively extract knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of privacy-preserving data collection and analysis. We focus on investigation of a technique called local differential privacy (LDP) and studied several aspects of it. </div><div><br></div><div><br></div><div>In particular, the thesis serves as a comprehensive study of multiple aspects of the LDP field. We investigated the following seven problems: (1) We studied LDP primitives, i.e., the basic mechanisms that are used to build LDP protocols. (2) We then studied the problem when the domain size is very big (e.g., larger than $2^{32$), where finding the values with high frequency is a challenge, because one needs to enumerate through all values. (3) Another interesting setting is when each user possesses a set of values, instead of a single private value. (4) With the basic problems visited, we then aim to make the LDP protocols practical for real-world scenarios. We investigated the case where each user's data is high-dimensional (e.g., in the census survey, each user has multiple questions to answer), and the goal is to recover the joint distribution among the attributes. (5) We also built a system for companies to issue SQL queries over the data protected under LDP, where each user is associated with some public weights and holds some private values; an LDP version of the values is sent to the server from each user. (6) To further increase the accuracy of LDP, we study how to add post-processing steps to protocols to make them consistent while achieving high accuracy for a wide range of tasks, including frequencies of individual values, frequencies of the most frequent values, and frequencies of subsets of values. (7) Finally, we investigate a different model of LDP which is called the shuffler model. While users still use LDP algorithms to report their sensitive data, now there exists a semi-trusted shuffler that shuffles the users' reports and then send them to the server. This model provides better utility but at the cost of requiring more trust that the shuffler should not collude with the server.</div> Computer System Security Data Structures Database Management Local Differential privacy
10	Privacy Preserving in Online Social Network Data Sharing and Publication Gao, Tianchong 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Following the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles. Privacy protection Online social networks Differential privacy Anonymization De-anonymization

Search results