11 |
Bucketization Techniques for Encrypted Databases: Quantifying the Impact of Query DistributionsRaybourn, Tracey 06 May 2013 (has links)
No description available.
|
12 |
Managing the risks associated with IT security and data privacy in the software development industry : Challenges related to operational, financial, and reputational risksHintze, Elias, Lofterud, Lukas January 2022 (has links)
This thesis examines how organisations within the IT software development industry manage risks associated with IT security and data privacy, with factors such as a growth in digitalisation and the Covid-19 pandemic. The research consists of four separate cases with interviewees in managerial positions in four different organisations. The research shows the risks and challenges from an operational, financial, and reputational perspective. Development of the existing methods has been identified using cryptocurrencies as means to expose system vulnerabilities, an increase in monitoring and surveillance, which comes with considerations of follow-up and communication, along with the concept of moral hazards and their future implications. Furthermore, IT security organisations strive towards a risk tolerance approaching zero, as a result, discrepancies can occur between growth and risk. Considerations towards the compliance of data privacy must also be made, as new legislations take shape while being attentive to the stakeholders' changes in demands and expectations. Contributions are made towards the field of risk management and IT security by taking a new era of digitalisation into consideration, giving the field an updated outlook for the future as the importance of data privacy and IT security is increasing. Therefore, the thesis provides valuable information that can be used as guidelines for organisations in this rapidly developing global environment.
|
13 |
Data Cleaning with Minimal Information DisclosureGairola, Dhruv 11 1900 (has links)
Businesses analyze large datasets in order to extract valuable insights from the data. Unfortunately, most real datasets contain errors that need to be corrected before any analysis. Businesses can utilize various data cleaning systems and algorithms to automate the correction of data errors. Many systems correct the data errors by using information present within the dirty dataset itself. Some also incorporate user feedback in order to validate the quality of the suggested data corrections. However, users are not always available for feedback. Hence, some systems rely on clean data sources to help with the data cleaning process. This involves comparing records between the dirty dataset and the clean dataset in order to detect high quality fixes for the erroneous data. Every record in the dirty dataset is compared with every record in the clean dataset in order to find similar records. The values of the records in the clean dataset can be used to correct the values of the erroneous records in the dirty dataset. Realistically, comparing records across two datasets may not be possible due to privacy reasons. For example, there are laws to restrict the free movement of personal data. Additionally, different records within a dataset may have different privacy requirements. Existing data cleaning systems do not factor in these privacy requirements on the respective datasets. This motivates the need for privacy aware data cleaning systems. In this thesis, we examine the role of privacy in the data cleaning process. We present a novel data cleaning framework that supports the cooperation between the clean and the dirty datasets such that the clean dataset discloses a minimal amount of information and the dirty dataset uses this information to (maximally) clean its data. We investigate the tradeoff between information disclosure and data cleaning utility, modelling this tradeoff as a multi-objective optimization problem within our framework. We propose four optimization functions to solve our optimization problem. Finally, we perform extensive experiments on datasets containing up to 3 million records by varying parameters such as the error rate of the dataset, the size of the dataset, the number of constraints on the dataset, etc and measure the impact on accuracy and performance for those parameters. Our results demonstrate that disclosing a larger amount of information within the clean dataset helps in cleaning the dirty dataset to a larger extent. We find that with 80% information disclosure (relative to the weighted optimization function), we are able to achieve a precision of 91% and a recall of 85%. We also compare our algorithms against each other to discover which ones produce better data repairs and which ones take longer to find repairs. We incorporate ideas from Barone et al. into our framework and show that our approach is 30% faster, but 7% worse for precision. We conclude that our data cleaning framework can be applied to real-world scenarios where controlling the amount of information disclosed is important. / Thesis / Master of Computer Science (MCS) / Businesses analyze large datasets in order to extract valuable insights from the data. Unfortunately, most real datasets contain errors that need to be corrected before any analysis. Businesses can utilize various data cleaning systems and algorithms to automate the correction of data errors. Many systems correct the data errors by using information present within the dirty dataset itself. Some also incorporate user feedback in order to validate the quality of the suggested data corrections. However, users are not always available for feedback. Hence, some systems rely on clean data sources to help with the data cleaning process. This involves comparing records between the dirty dataset and the clean dataset in order to detect high quality fixes for the erroneous data. Every record in the dirty dataset is compared with every record in the clean dataset in order to find similar records. The values of the records in the clean dataset can be used to correct the values of the erroneous records in the dirty dataset. Realistically, comparing records across two datasets may not be possible due to privacy reasons. For example, there are laws to restrict the free movement of personal data. Additionally, different records within a dataset may have different privacy requirements. Existing data cleaning systems do not factor in these privacy requirements on the respective datasets. This motivates the need for privacy aware data cleaning systems. In this thesis, we examine the role of privacy in the data cleaning process. We present a novel data cleaning framework that supports the cooperation between the clean and the dirty datasets such that the clean dataset discloses a minimal amount of information and the dirty dataset uses this information to (maximally) clean its data. We investigate the tradeoff between information disclosure and data cleaning utility, modelling this tradeoff as a multi-objective optimization problem within our framework. We propose four optimization functions to solve our optimization problem. Finally, we perform extensive experiments on datasets containing up to 3 million records by varying parameters such as the error rate of the dataset, the size of the dataset, the number of constraints on the dataset, etc and measure the impact on accuracy and performance for those parameters. Our results demonstrate that disclosing a larger amount of information within the clean dataset helps in cleaning the dirty dataset to a larger extent. We find that with 80% information disclosure (relative to the weighted optimization function), we are able to achieve a precision of 91% and a recall of 85%. We also compare our algorithms against each other to discover which ones produce better data repairs and which ones take longer to find repairs. We incorporate ideas from Barone et al. into our framework and show that our approach is 30% faster, but 7% worse for precision. We conclude that our data cleaning framework can be applied to real-world scenarios where controlling the amount of information disclosed is important.
|
14 |
The Quantum Panopticon : A theory of surveillance for the quantum eraOlsson, Erik January 2024 (has links)
This thesis examines how the race for quantum supremacy challenges current theoretical assumptions that underpin the data privacy literature. In pursuing this goal, the study examines the global surveillance infrastructure and introduces the concept of a quantum panopticon. As the traditional panopticon metaphor relies on a spatial dimension to understand surveillance, the quantum panopticon adds a temporal dimension, illustrating how a future watchman can look back on decrypted data. This theoretical contribution offers a new perspective on internet surveillance in the dawning quantum era. As such, the thesis brings the data preservation literature into dialogue with the cryptographic literature, while also connecting the ethical and political debate on data privacy with the more technical literature on encryption and surveillance.
|
15 |
Online Learning for Resource Allocation in Wireless Networks: Fairness, Communication Efficiency, and Data PrivacyLi, Fengjiao 13 December 2022 (has links)
As the Next-Generation (NextG, 5G and beyond) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by autonomous resource management in NextG.
Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We study both the linear-parameterized and non-parametric global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that can be integrated with different differential privacy models. We show that the proposed algorithms can achieve a near-optimal regret in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. / Doctor of Philosophy / As the Next-Generation (NextG) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by resource allocation in the presence uncertainty in NextG.
Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We consider both the linear-parameterized and non-parametric (unknown) global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that integrate different privacy models according to different privacy requirements or different scenarios. We show that the proposed algorithms can learn the unknown functions in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community.
|
16 |
REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained EnvironmentsDesai, Humaid Ahmed Habibullah 22 November 2023 (has links)
Federated Learning (FL) is a sub-domain of machine learning (ML) that enforces privacy by allowing the user's local data to reside on their device. Instead of having users send their personal data to a server where the model resides, FL flips the paradigm and brings the model to the user's device for training. Existing works share model parameters or use distillation principles to address the challenges of data heterogeneity. However, these methods ignore some of the other fundamental challenges in FL: device heterogeneity and communication efficiency. In practice, client devices in FL differ greatly in their computational power and communication resources. This is exacerbated by unbalanced data distribution, resulting in an overall increase in training times and the consumption of more bandwidth. In this work, we present a novel approach for resource-efficient FL called emph{REFT} with variable pruning and knowledge distillation techniques to address the computational and communication challenges faced by resource-constrained devices.
Our variable pruning technique is designed to reduce computational overhead and increase resource utilization for clients by adapting the pruning process to their individual computational capabilities. Furthermore, to minimize bandwidth consumption and reduce the number of back-and-forth communications between the clients and the server, we leverage knowledge distillation to create an ensemble of client models and distill their collective knowledge to the server. Our experimental results on image classification tasks demonstrate the effectiveness of our approach in conducting FL in a resource-constrained environment. We achieve this by training Deep Neural Network (DNN) models while optimizing resource utilization at each client. Additionally, our method allows for minimal bandwidth consumption and a diverse range of client architectures while maintaining performance and data privacy. / Master of Science / In a world driven by data, preserving privacy while leveraging the power of machine learning (ML) is a critical challenge. Traditional approaches often require sharing personal data with central servers, raising concerns about data privacy. Federated Learning (FL), is a cutting-edge solution that turns this paradigm on its head. FL brings the machine learning model to your device, allowing it to learn from your data without ever leaving your device. While FL holds great promise, it faces its own set of challenges. Existing research has largely focused on making FL work with different types of data, but there are still other issues to be resolved. Our work introduces a novel approach called REFT that addresses two critical challenges in FL: making it work smoothly on devices with varying levels of computing power and reducing the amount of data that needs to be transferred during the learning process. Imagine your smartphone and your laptop. They all have different levels of computing power. REFT adapts the learning process to each device's capabilities using a proposed technique called Variable Pruning. Think of it as a personalized fitness trainer, tailoring the workout to your specific fitness level. Additionally, we've adopted a technique called knowledge distillation. It's like a student learning from a teacher, where the teacher shares only the most critical information. In our case, this reduces the amount of data that needs to be sent across the internet, saving bandwidth and making FL more efficient. Our experiments, which involved training machines to recognize images, demonstrate that REFT works well, even on devices with limited resources. It's a step forward in ensuring your data stays private while still making machine learning smarter and more accessible.
|
17 |
INFLUENCE ANALYSIS TOWARDS BIG SOCIAL DATAHan, Meng 03 May 2017 (has links)
Large scale social data from online social networks, instant messaging applications, and wearable devices have seen an exponential growth in a number of users and activities recently. The rapid proliferation of social data provides rich information and infinite possibilities for us to understand and analyze the complex inherent mechanism which governs the evolution of the new technology age. Influence, as a natural product of information diffusion (or propagation), which represents the change in an individual’s thoughts, attitudes, and behaviors resulting from interaction with others, is one of the fundamental processes in social worlds. Therefore, influence analysis occupies a very prominent place in social related data analysis, theory, model, and algorithms. In this dissertation, we study the influence analysis under the scenario of big social data. Firstly, we investigate the uncertainty of influence relationship among the social network. A novel sampling scheme is proposed which enables the development of an efficient algorithm to measure uncertainty. Considering the practicality of neighborhood relationship in real social data, a framework is introduced to transform the uncertain networks into deterministic weight networks where the weight on edges can be measured as Jaccard-like index. Secondly, focusing on the dynamic of social data, a practical framework is proposed by only probing partial communities to explore the real changes of a social network data. Our probing framework minimizes the possible difference between the observed topology and the actual network through several representative communities. We also propose an algorithm that takes full advantage of our divide-and-conquer strategy which reduces the computational overhead. Thirdly, if let the number of users who are influenced be the depth of propagation and the area covered by influenced users be the breadth, most of the research results are only focused on the influence depth instead of the influence breadth. Timeliness, acceptance ratio, and breadth are three important factors that significantly affect the result of influence maximization in reality, but they are neglected by researchers in most of time. To fill the gap, a novel algorithm that incorporates time delay for timeliness, opportunistic selection for acceptance ratio, and broad diffusion for influence breadth has been investigated. In our model, the breadth of influence is measured by the number of covered communities, and the tradeoff between depth and breadth of influence could be balanced by a specific parameter. Furthermore, the problem of privacy preserved influence maximization in both physical location network and online social network was addressed. We merge both the sensed location information collected from cyber-physical world and relationship information gathered from online social network into a unified framework with a comprehensive model. Then we propose the resolution for influence maximization problem with an efficient algorithm. At the same time, a privacy-preserving mechanism are proposed to protect the cyber physical location and link information from the application aspect. Last but not least, to address the challenge of large-scale data, we take the lead in designing an efficient influence maximization framework based on two new models which incorporate the dynamism of networks with consideration of time constraint during the influence spreading process in practice. All proposed problems and models of influence analysis have been empirically studied and verified by different, large-scale, real-world social data in this dissertation.
|
18 |
An Empirical Investigation of the Relationship between Computer Self-Efficacy and Information Privacy ConcernsAwwal, Mohammad Abdul 01 January 2011 (has links)
The Internet and the growth of Information Technology (IT) and their enhanced capabilities to collect personal information have given rise to many privacy issues. Unauthorized access of personal information may result in identity theft, stalking, harassment, and other invasions of privacy. Information privacy concerns are impediments to broad-scale adoption of the Internet for purchasing decisions. Computer self-efficacy has been shown to be an effective predictor of behavioral intention and a critical determinant of intention to use Information Technology. This study investigated the relationship between an individual's computer self-efficacy and information privacy concerns; and also examined the differences among different age groups and between genders regarding information privacy concerns and their relationships with computer self-efficacy.
A paper-based survey was designed to empirically assess computer self-efficacy and information privacy concerns. The survey was developed by combining existing validated scales for computer self-efficacy and information privacy concerns. The target population of this study was the residents of New Jersey, U.S.A. The assessment was done by using the mall-intercept approach in which individuals were asked to fill out the survey. The sample size for this study was 400 students, professionals, and mature adults.
The Shapiro-Wilk test was used for testing data normality and the Spearman rank-order test was used for correlation analyses. MANOVA test was used for comparing mean values of computer self-efficacy and information privacy concerns between genders and among age groups. The results showed that the correlation between computer self-efficacy and information privacy concerns was significant and positive; and there were differences between genders and among age groups regarding information privacy concerns and their relationships with computer self-efficacy.
This study contributed to the body of knowledge about the relationships among antecedents and consequences of information privacy concerns and computer self-efficacy. The findings of this study can help corporations to improve e-commerce by targeting privacy policy-making efforts to address the explicit areas of consumer privacy concerns. The results of this study can also help IT practitioners to develop privacy protection tools and processes to address specific consumer privacy concerns.
|
19 |
Personalized Advertising: Examining the Consumer Attitudes of Generation Z Towards Data Privacy and Personalization : A study of consumer attitudes towards the commercial usage of personal dataTaneo Zander, Jennifer Taneo Zander, Mirkovic, Anna-Maria January 2019 (has links)
Abstract Background The advancement of Internet technology and the ability of companies to process large amounts of information has made it possible for marketers to communicate with their customers through customized measures, namely personalized advertising. One of the primary aspects that differentiates personalized advertising from traditional advertising is the collection and use of consumers’ personal information, which have presented marketers with numerous benefits and opportunities. However, this has also raised concerns among consumers regarding their privacy and the handling of their personal information. In this study, the attitudes of Generation Z will be examined regarding data privacy, personalization, and the commercial usage of their personal information, as well as how these attitudes may impact consumer behavior. Purpose The purpose of this study is to examine the attitudes of consumers towards personalized advertising and the commercial usage of personal consumer data, with the focus on consumers belonging to Generation Z. Issues regarding data privacy and personalization is explored, as well as how consumer attitudes towards the personalization of advertisements may impact consumer behavior in the digital environment. Method The positivistic approach was applied with the intention to draw conclusions about a population of people, namely Generation Z. A deductive approach was implemented to test an existing theory, the Theory of Planned Behavior (TPB) with the intention to examine whether Generation Z follows the trend found in the literature; namely that younger consumers (Millennials) are more positive towards personalized advertising and the sharing of personal data for commercial purposes than older generations. The empirical data was collected through a survey, which was later analyzed through statistical measures. Conclusion The results suggested a predominantly neutral attitude among the survey participants regarding personalized advertising and the sharing of personal data for commercial purposes. Moreover, a positive correlation between consumer attitudes and behavioral intention to interact with personalized advertisements was detected. However, the correlation was found to be rather weak, indicating that consumer attitudes are not necessarily the strongest predictor of behavioral intention among Generation Z consumers in regards to personalized advertising.
|
20 |
Towards an adaptive solution to data privacy protection in hierarchical wireless sensor networksAl-Riyami, Ahmed January 2016 (has links)
Hierarchical Wireless Sensor networks (WSNs) are becoming attractive to many applications due to their energy efficiency and scalability. However, if such networks are deployed in a privacy sensitive application context such as home utility consumption, protecting data privacy becomes an essential requirement. Our threat analysis in such networks has revealed that PPDA (Privacy Preserving Data Aggregation), NIDA (Node ID Anonymity) and ENCD (Early Node Compromise Detection) are three essential properties for protecting data privacy. The scope of this thesis is on protecting data privacy in hierarchical WSNs byaddressing issues in relation to two of the three properties identified, i.e., NIDA and ENCD, effectively and efficiently. The effectiveness property is achieved by considering NIDA and ENCD in an integrated manner, and the efficiency property is achieved by using an adaptive approach to security provisioning. To this end, the thesis has made the following four novel contributions. Firstly, this thesis presents a comprehensive analysis of the threats to data privacy and literature review of the countermeasures proposed to address these threats. The analysis and literature review have led to the identification of two main areas for improvements: (1) to reduce the resources consumed as the result of protecting data privacy, and (2) to address the compatibility issue between NIDA and ENCD.Secondly, a novel Adaptive Pseudonym Length Estimation (AdaptPLE) method has been proposed. The method allows the determination of a minimum acceptable length of the pseudonyms used in NIDA based on a given set of security and application related requirements and constraints. In this way, we can balance the trade-off between an ID anonymity protection level and the costs (i.e., transmission and energy) incurred in achieving the protection level. To demonstrate its effectiveness, we have evaluated the method by applying it to two existing NIDA schemes, the Efficient Anonymous Communication (EAC) scheme and theCryptographic Anonymous Scheme (CAS).Thirdly, a novel Adaptive Early Node Compromise Detection (AdaptENCD) scheme for cluster-based WSNs has been proposed. This scheme allows early detections of compromised nodes more effectively and efficiently than existing proposals. This is achieved by adjusting, at run-time, the transmission rate of heartbeat messages, used to detect nodes' aliveness, in response to the average message loss ratio in a cluster. This adaptive approach allows us to significantly reduce detection errors while keeping the number of transmitted heartbeat messages as low as possible, thus reducing transmission costs. Fourthly, a novel Node ID Anonymity Preserving Scheme (ID-APS) for clusterbased WSNs has been proposed. ID-APS protects nodes ID anonymity while, at the same time, also allowing the global identification of nodes. This later property supports the identification and removal of compromised nodes in the network, which is a significant improvement over the state-of-the-art solution, the CAS scheme. ID-APS supports both NIDA and ENCD by making a hybrid use of dynamic and global identification pseudonyms. More importantly, ID-APS achieves these properties with less overhead costs than CAS. All proposed solutions have been analysed and evaluated comprehensively to prove their effectiveness and efficiency.
|
Page generated in 0.0741 seconds