Global ETD Search

1	Introducing Differential Privacy Mechanisms for Mobile App Analytics of Dynamic Content Latif, Sufian January 2021 (has links) No description available. Computer Science
2	Building trustworthy machine learning systems in adversarial environments Wang, Ning 26 May 2023 (has links) Modern AI systems, particularly with the rise of big data and deep learning in the last decade, have greatly improved our daily life and at the same time created a long list of controversies. AI systems are often subject to malicious and stealthy subversion that jeopardizes their efficacy. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly boost the accuracy of machine learning models, they also create opportunities for adversaries to tamper with models or extract sensitive data. Malicious data providers can compromise machine learning systems by supplying false data and intermediate computation results. Even a well-trained model can be deceived to misbehave by an adversary who provides carefully designed inputs. Furthermore, curious parties can derive sensitive information of the training data by interacting with a machine-learning model. These adversarial scenarios, known as poisoning attack, adversarial example attack, and inference attack, have demonstrated that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. To address these problems, we proposed the following solutions: (1) FLARE, which detects and mitigates stealthy poisoning attacks by leveraging latent space representations; (2) MANDA, which detects adversarial examples by utilizing evaluations from diverse sources, i.e, model-based prediction and data-based evaluation; (3) FeCo which enhances the robustness of machine learning-based network intrusion detection systems by introducing a novel representation learning method; and (4) DP-FedMeta, which preserves data privacy and improves the privacy-accuracy trade-off in machine learning systems through a novel adaptive clipping mechanism. / Doctor of Philosophy / Over the past few decades, machine learning (ML) has become increasingly popular for enhancing efficiency and effectiveness in data analytics and decision-making. Notable applications include intelligent transportation, smart healthcare, natural language generation, intrusion detection, etc. While machine learning methods are often employed for beneficial purposes, they can also be exploited for malicious intents. Well-trained language models have demonstrated generalizability deficiencies and intrinsic biases; generative ML models used for creating art have been repurposed by fraudsters to produce deepfakes; and facial recognition models trained on big data have been found to leak sensitive information about data owners. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly improve the accuracy of ML models, they also enable adversaries to corrupt models and infer sensitive data. This leads to various adversarial attacks, such as model poisoning during training, adversarially crafted data in testing, and data inference. It is evident that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. This research focuses on building trustworthy machine-learning systems in adversarial environments from a data perspective. It encompasses two themes: securing ML systems against security or privacy vulnerabilities (security of AI) and using ML as a tool to develop novel security solutions (AI for security). For the first theme, we studied adversarial attack detection in both the training and testing phases and proposed FLARE and MANDA to secure matching learning systems in the two phases, respectively. Additionally, we proposed a privacy-preserving learning system, dpfed, to defend against privacy inference attacks. We achieved a good trade-off between accuracy and privacy by proposing an adaptive data clipping and perturbing method. In the second theme, the research is focused on enhancing the robustness of intrusion detection systems through data representation learning. adversarial machine learning anomaly detection differential privacy
3	Implementing Differential Privacy for Privacy Preserving Trajectory Data Publication in Large-Scale Wireless Networks Stroud, Caleb Zachary 14 August 2018 (has links) Wireless networks collect vast amounts of log data concerning usage of the network. This data aids in informing operational needs related to performance, maintenance, etc., but it is also useful for outside researchers in analyzing network operation and user trends. Releasing such information to these outside researchers poses a threat to privacy of users. The dueling need for utility and privacy must be addressed. This thesis studies the concept of differential privacy for fulfillment of these goals of releasing high utility data to researchers while maintaining user privacy. The focus is specifically on physical user trajectories in authentication manager log data since this is a rich type of data that is useful for trend analysis. Authentication manager log data is produced when devices connect to physical access points (APs) and trajectories are sequences of these spatiotemporal connections from one AP to another for the same device. The fulfillment of this goal is pursued with a variable length n-gram model that creates a synthetic database which can be easily ingested by researchers. We found that there are shortcomings to the algorithm chosen in specific application to the data chosen, but differential privacy itself can still be used to release sanitized datasets while maintaining utility if the data has a low sparsity. / Master of Science / Wireless internet networks store historical logs of user device interaction with it. For example, when a phone or other wireless device connects, data is stored by the Internet Service Provider (ISP) about the device, username, time, and location of connection. A database of this type of data can help researchers analyze user trends in the network, but the data contains personally identifiable information for the users. We propose and analyze an algorithm which can release this data in a high utility manner for the researchers, yet maintain user privacy. This is based on a verifiable approach to privacy called differential privacy. This algorithm is found to provide utility and privacy protection for datasets with many users compared to the size of the network. e-differential privacy differential privacy PPTDP privacy preserving data publication PPDP
4	Data-level privacy through data perturbation in distributed multi-application environments de Souza, Tulio January 2016 (has links) Wireless sensor networks used to have a main role as a monitoring tool for environmental purposes and animal tracking. This spectrum of applications, however, has dramatically grown in the past few years. Such evolution means that what used to be application-specific networks are now multi application environments, often with federation capabilities. This shift results in a challenging environment for data privacy, mainly caused by the broadening of the spectrum of data access points and involved entities. This thesis first evaluates existing privacy preserving data aggregation techniques to determine how suitable they are for providing data privacy in this more elaborate environment. Such evaluation led to the design of the set difference attack, which explores the fact that they all rely purely on data aggregation to achieve privacy, which is shown through simulation not to be suitable to the task. It also indicates that some form of uncertainty is required in order to mitigate the attack. Another relevant finding is that the attack can also be effective against standalone networks, by exploring the node availability factor. Uncertainty is achieved via the use of differential privacy, which offers a strong and formal privacy guarantee through data perturbation. In order to make it suitable to work in a wireless sensor network environment, which mainly deals with time-series data, two new approaches to address it have been proposed. These have a contrasting effect when it comes to utility and privacy levels, offering a flexible balance between privacy and data utility for sensed entities and data analysts/consumers. Lastly, this thesis proposes a framework to assist in the design of privacy preserving data aggregation protocols to suit application needs while at the same time complying with desired privacy requirements. The framework's evaluation compares and contrasts several scenarios to demonstrate the level of flexibility and effectiveness that the designed protocols can provide. Overall, this thesis demonstrates that data perturbation can be made significantly practical through the proposed framework. Although some problems remain, with further improvements to data correlation methods and better use of some intrinsic characteristics of such networks, the use of data perturbation may become a practical and efficient privacy preserving mechanism for wireless sensor networks. 004
5	Fundamental Limits in Data Privacy: From Privacy Measures to Economic Foundations January 2016 (has links) abstract: Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions, and then puts the main focus on developing economic foundations for a market model of trading private data. The first part characterizes differential privacy, identifiability and mutual-information privacy by their privacy--distortion functions, which is the optimal achievable privacy level as a function of the maximum allowable distortion. The results show that these notions are fundamentally related and exhibit certain consistency: (1) The gap between the privacy--distortion functions of identifiability and differential privacy is upper bounded by a constant determined by the prior. (2) Identifiability and mutual-information privacy share the same optimal mechanism. (3) The mutual-information optimal mechanism satisfies differential privacy with a level at most a constant away from the optimal level. The second part studies a market model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The value of epsilon units of privacy is measured by the minimum payment such that an individual's equilibrium strategy is to report data in an epsilon-differentially private manner. For the setting with binary private data that represents individuals' knowledge about a common underlying state, asymptotically tight lower and upper bounds on the value of privacy are established as the number of individuals becomes large, and the payment--accuracy tradeoff for learning the state is obtained. The lower bound assures the impossibility of using lower payment to buy epsilon units of privacy, and the upper bound is given by a designed reward mechanism. When the individuals' valuations of privacy are unknown to the data collector, mechanisms with possible negative payments (aiming to penalize individuals with "unacceptably" high privacy valuations) are designed to fulfill the accuracy goal and drive the total payment to zero. For the setting with binary private data following a general joint probability distribution with some symmetry, asymptotically optimal mechanisms are designed in the high data quality regime. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Electrical engineering Data privacy Differential privacy Economic foundations Mechanism design
6	Privacy in Complex Sample Based Surveys Shawn A Merrill (11806802) 20 December 2021 (has links) In the last few decades, there has been a dramatic uptick in the issues related to protecting user privacy in released data, both in statistical databases and anonymized records. Privacy-preserving data publishing is a field established to handle these releases while avoiding the problems that plagued many earlier attempts. This issue is of particular importance for governmental data, where both the release and the privacy requirements are frequently governed by legislature (e.g., HIPAA, FERPA, Clery Act). This problem is doubly compounded by the complex survey methods employed to counter problems in data collection. The preeminent definition for privacy is that of differential privacy, which protects users by limiting the impact that any individual can have on the result of any query. <br><br>The thesis proposes models for differentially private versions of current survey methodologies and, discusses the evaluation of those models. We focus on the issues of missing data and weighting which are common techniques employed in complex surveys to counter problems with sampling and response rates. First we propose a model for answering queries on datasets with missing data while maintaining differential privacy. Our model uses k-Nearest Neighbor imputation to replicate donor values while protecting the privacy of the donor. Our model provides significantly better bias reduction in realistic experiments using existing data, as well as providing less noise than a naive solution. Our second model proposes a method of performing Iterative Proportional Fitting (IPF) in a differentially private manner, a common technique used to ensure that survey records are weighted consistently with known values. We also focus on the general philosophical need to incorporate privacy when creating new survey methodologies, rather than assuming that privacy can simply be added at a later step. Theoretical Computer Science Differential Privacy Privacy Privacy preserving data publishing
7	Analyzing Sensitive Data with Local Differential Privacy Tianhao Wang (10711713) 30 April 2021 (has links) <div>Vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to effectively extract knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of privacy-preserving data collection and analysis. We focus on investigation of a technique called local differential privacy (LDP) and studied several aspects of it. </div><div><br></div><div><br></div><div>In particular, the thesis serves as a comprehensive study of multiple aspects of the LDP field. We investigated the following seven problems: (1) We studied LDP primitives, i.e., the basic mechanisms that are used to build LDP protocols. (2) We then studied the problem when the domain size is very big (e.g., larger than $2^{32$), where finding the values with high frequency is a challenge, because one needs to enumerate through all values. (3) Another interesting setting is when each user possesses a set of values, instead of a single private value. (4) With the basic problems visited, we then aim to make the LDP protocols practical for real-world scenarios. We investigated the case where each user's data is high-dimensional (e.g., in the census survey, each user has multiple questions to answer), and the goal is to recover the joint distribution among the attributes. (5) We also built a system for companies to issue SQL queries over the data protected under LDP, where each user is associated with some public weights and holds some private values; an LDP version of the values is sent to the server from each user. (6) To further increase the accuracy of LDP, we study how to add post-processing steps to protocols to make them consistent while achieving high accuracy for a wide range of tasks, including frequencies of individual values, frequencies of the most frequent values, and frequencies of subsets of values. (7) Finally, we investigate a different model of LDP which is called the shuffler model. While users still use LDP algorithms to report their sensitive data, now there exists a semi-trusted shuffler that shuffles the users' reports and then send them to the server. This model provides better utility but at the cost of requiring more trust that the shuffler should not collude with the server.</div> Computer System Security Data Structures Database Management Local Differential privacy
8	Privacy Preserving in Online Social Network Data Sharing and Publication Gao, Tianchong 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Following the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles. Privacy protection Online social networks Differential privacy Anonymization De-anonymization
9	Privacy Preserving Kin Genomic Data Publishing Shang, Hui 16 July 2020 (has links) No description available. Computer Science Kin genomic data Differential privacy Factor graph
10	On the Sample Complexity of Privately Learning Gaussians and their Mixtures / Privately Learning Gaussians and their Mixtures Aden-Ali, Ishaq January 2021 (has links) Multivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be near-optimal in the general case. From a technical standpoint, we provide analytic tools for arguing the existence of global "locally small" covers from local covers of the space. These are exploited using modifications of recent techniques for for differentially private hypothesis selection. Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We provide the first sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then we can privately learn mixtures of distributions in F. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable. / Thesis / Master of Science (MSc) / Is it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information. In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.

Search results