• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 63
  • 3
  • Tagged with
  • 85
  • 85
  • 36
  • 35
  • 29
  • 26
  • 23
  • 20
  • 19
  • 18
  • 15
  • 14
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

DIFFERENTIAL PRIVACY IN DISTRIBUTED SETTINGS

Zitao Li (14135316) 18 November 2022 (has links)
<p>Data is considered the "new oil" in the information society and digital economy. While many commercial activities and government decisions are based on data, the public raises more concerns about privacy leakage when their private data are collected and used. In this dissertation, we investigate the privacy risks in settings where the data are distributed across multiple data holders, and there is only an untrusted central server. We provide solutions for several problems under this setting with a security notion called differential privacy (DP). Our solutions can guarantee that there is only limited and controllable privacy leakage from the data holder, while the utility of the final results, such as model prediction accuracy, can be still comparable to the ones of the non-private algorithms.</p> <p><br></p> <p>First, we investigate the problem of estimating the distribution over a numerical domain while satisfying local differential privacy (LDP). Our protocol prevents privacy leakage in the data collection phase, in which an untrusted data aggregator (or a server) wants to learn the distribution of private numerical data among all users. The protocol consists of 1) a new reporting mechanism called the square wave (SW) mechanism, which randomizes the user inputs before sharing them with the aggregator; 2) an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions.</p> <p><br></p> <p>Second, we study the matrix factorization problem in three federated learning settings with an untrusted server, i.e., vertical, horizontal, and local federated learning settings. We propose a generic algorithmic framework for solving the problem in all three settings. We introduce how to adapt the algorithm into differentially private versions to prevent privacy leakage in the training and publishing stages.</p> <p><br></p> <p>Finally, we propose an algorithm for solving the k-means clustering problem in vertical federated learning (VFL). A big challenge in VFL is the lack of a global view of each data point. To overcome this challenge, we propose a lightweight and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin (FM) sketch to convey the weight information of the synopsis points. We provide theoretical utility analysis for the cardinality estimation algorithm and further refine it for better empirical performance.</p>
12

Learning with constraints on processing and supervision

Acar, Durmuş Alp Emre 30 August 2023 (has links)
Collecting a sufficient amount of data and centralizing them are both costly and privacy-concerning operations. These practical concerns arise due to the communication costs between data collecting devices and data being personal such as text messages of an end user. The goal is to train generalizable machine learning models with constraints on data without sharing or transferring the data. In this thesis, we will present solutions to several aspects of learning with data constraints, such as processing and supervision. We focus on federated learning, online learning, and learning generalizable representations and provide setting-specific training recipes. In the first scenario, we tackle a federated learning problem where data is decentralized through different users and should not be centralized. Traditional approaches either ignore the heterogeneity problem or increase communication costs to handle it. Our solution carefully addresses the heterogeneity issue of user data by imposing a dynamic regularizer that adapts to the heterogeneity of each user without extra transmission costs. Theoretically, we establish convergence guarantees. We extend our ideas to personalized federated learning, where the model is customized to each end user, and heterogeneous federated learning, where users support different model architectures. As a next scenario, we consider online meta-learning, where there is only one user, and the data distribution of the user changes over time. The goal is to adapt new data distributions with very few labeled data from each distribution. A naive way is to store data from different distributions to train a model from scratch with sufficient data. Our solution efficiently summarizes the information from each task data so that the memory footprint does not scale with the number of tasks. Lastly, we aim to train generalizable representations given a dataset. We consider a setting where we have access to a powerful teacher (more complex) model. Traditional methods do not distinguish points and force the model to learn all the information from the powerful model. Our proposed method focuses on the learnable input space and carefully distills attainable information from the teacher model by discarding the over-capacity information. We compare our methods with state-of-the-art methods in each setup and show significant performance improvements. Finally, we discuss potential directions for future work.
13

Attack Strategies in Federated Learning for Regression Models : A Comparative Analysis with Classification Models

Leksell, Sofia January 2024 (has links)
Federated Learning (FL) has emerged as a promising approach for decentralized model training across multiple devices, while still preserving data privacy. Previous research has predominantly concentrated on classification tasks in FL settings, leaving  a noticeable gap in FL research specifically for regression models. This thesis addresses this gap by examining the vulnerabilities of Deep Neural Network (DNN) regression models within FL, with a specific emphasis on adversarial attacks. The primary objective is to examine the impact on model performance of two distinct adversarial attacks-output-flipping and random weights attacks. The investigation involves training FL models on three distinct data sets, engaging eight clients in the training process. The study varies the presence of malicious clients to understand how adversarial attacks influence model performance.  Results indicate that the output-flipping attack significantly decreases the model performance with involvement of at least two malicious clients. Meanwhile, the random weights attack demonstrates a substantial decrease even with just one malicious client out of the eight. It is crucial to note that this study's focus is on a theoretical level and does not explicitly account for real-world settings such as non-identically distributed (non-IID) settings,  extensive data sets, and a larger number of clients. In conclusion, this study contributes to the understanding of adversarial attacks in FL, specifically focusing on DNN regression models. The results highlights the importance of defending FL models against adversarial attacks, emphasizing the significance of future research in this domain.
14

Attack Strategies in Federated Learning for Regression Models : A Comparative Analysis with Classification Models

Leksell, Sofia January 2024 (has links)
Federated Learning (FL) has emerged as a promising approach for decentralized model training across multiple devices, while still preserving data privacy. Previous research has predominantly concentrated on classification tasks in FL settings, leaving  a noticeable gap in FL research specifically for regression models. This thesis addresses this gap by examining the vulnerabilities of Deep Neural Network (DNN) regression models within FL, with a specific emphasis on adversarial attacks. The primary objective is to examine the impact on model performance of two distinct adversarial attacks-output-flipping and random weights attacks. The investigation involves training FL models on three distinct data sets, engaging eight clients in the training process. The study varies the presence of malicious clients to understand how adversarial attacks influence model performance.  Results indicate that the output-flipping attack significantly decreases the model performance with involvement of at least two malicious clients. Meanwhile, the random weights attack demonstrates a substantial decrease even with just one malicious client out of the eight. It is crucial to note that this study's focus is on a theoretical level and does not explicitly account for real-world settings such as non-identically distributed (non-IID) settings,  extensive data sets, and a larger number of clients. In conclusion, this study contributes to the understanding of adversarial attacks in FL, specifically focusing on DNN regression models. The results highlights the importance of defending FL models against adversarial attacks, emphasizing the significance of future research in this domain.
15

Distributed Architectures for Enhancing Artificial Intelligence of Things Systems. A Cloud Collaborative Model

Elouali, Aya 23 November 2023 (has links)
In today’s world, IoT systems are more and more overwhelming. All electronic devices are becoming connected. From lamps and refrigerators in smart homes, smoke detectors and cameras in monitoring systems, to scales and thermometers in healthcare systems, until phones, cars and watches in smart cities. All these connected devices generate a huge amount of data collected from the environment. To take advantage of these data, a processing phase is needed in order to extract useful information, allowing the best management of the system. Since most objects in IoT systems are resource limited, the processing step, usually performed by an artificial intelligence model, is offloaded to a more powerful machine such as the cloud server in order to benefit from its high storage and processing capacities. However, the cloud server is geographically remote from the connected device, which leads to a long communication delay and harms the effectiveness of the system. Moreover, due to the incredibly increasing number of IoT devices and therefore offloading operations, the load on the network has increased significantly. In order to benefit from the advantages of cloud based AIoT systems, we seek to minimize its shortcomings. In this thesis, we design a distributed architecture that allows combining these three domains while reducing latency and bandwidth consumption as well as the IoT device’s energy and resource consumption. Experiments conducted on different cloud based AIoT systems showed that the designed architecture is capable of reducing up to 80% of the transmitted data. / En el mundo actual, los sistemas de IoT (Internet de las cosas) son cada vez más abrumadores. Todos los dispositivos electrónicos se están conectando entre sí. Desde lámparas y refrigeradores en hogares inteligentes, detectores de humo y cámaras para sistemas de monitoreo, hasta básculas y termómetros para sistemas de atención médica, pasando por teléfonos, automóviles y relojes en ciudades inteligentes. Todos estos dispositivos conectados generan una enorme cantidad de datos recopilados del entorno. Para aprovechar estos datos, es necesario un proceso de análisis para extraer información útil que permita una gestión óptima del sistema. Dado que la mayoría de los objetos en los sistemas de IoT tienen recursos limitados, la etapa de procesamiento, generalmente realizada por un modelo de inteligencia artificial, se traslada a una máquina más potente, como el servidor en la nube, para beneficiarse de su alta capacidad de almacenamiento y procesamiento. Sin embargo, el servidor en la nube está geográficamente alejado del dispositivo conectado, lo que conduce a una larga demora en la comunicación y perjudica la eficacia del sistema. Además, debido al increíble aumento en el número de dispositivos de IoT y, por lo tanto, de las operaciones de transferencia de datos, la carga en la red ha aumentado significativamente. Con el fin de aprovechar las ventajas de los sistemas de AIoT (Inteligencia Artificial en el IoT) basados en la nube, buscamos minimizar sus desventajas. En esta tesis, hemos diseñado una arquitectura distribuida que permite combinar estos tres dominios al tiempo que reduce la latencia y el consumo de ancho de banda, así como el consumo de energía y recursos del dispositivo IoT. Los experimentos realizados en diferentes sistemas de AIoT basados en la nube mostraron que la arquitectura diseñada es capaz de reducir hasta un 80% de los datos transmitidos.
16

REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained Environments

Desai, Humaid Ahmed Habibullah 22 November 2023 (has links)
Federated Learning (FL) is a sub-domain of machine learning (ML) that enforces privacy by allowing the user's local data to reside on their device. Instead of having users send their personal data to a server where the model resides, FL flips the paradigm and brings the model to the user's device for training. Existing works share model parameters or use distillation principles to address the challenges of data heterogeneity. However, these methods ignore some of the other fundamental challenges in FL: device heterogeneity and communication efficiency. In practice, client devices in FL differ greatly in their computational power and communication resources. This is exacerbated by unbalanced data distribution, resulting in an overall increase in training times and the consumption of more bandwidth. In this work, we present a novel approach for resource-efficient FL called emph{REFT} with variable pruning and knowledge distillation techniques to address the computational and communication challenges faced by resource-constrained devices. Our variable pruning technique is designed to reduce computational overhead and increase resource utilization for clients by adapting the pruning process to their individual computational capabilities. Furthermore, to minimize bandwidth consumption and reduce the number of back-and-forth communications between the clients and the server, we leverage knowledge distillation to create an ensemble of client models and distill their collective knowledge to the server. Our experimental results on image classification tasks demonstrate the effectiveness of our approach in conducting FL in a resource-constrained environment. We achieve this by training Deep Neural Network (DNN) models while optimizing resource utilization at each client. Additionally, our method allows for minimal bandwidth consumption and a diverse range of client architectures while maintaining performance and data privacy. / Master of Science / In a world driven by data, preserving privacy while leveraging the power of machine learning (ML) is a critical challenge. Traditional approaches often require sharing personal data with central servers, raising concerns about data privacy. Federated Learning (FL), is a cutting-edge solution that turns this paradigm on its head. FL brings the machine learning model to your device, allowing it to learn from your data without ever leaving your device. While FL holds great promise, it faces its own set of challenges. Existing research has largely focused on making FL work with different types of data, but there are still other issues to be resolved. Our work introduces a novel approach called REFT that addresses two critical challenges in FL: making it work smoothly on devices with varying levels of computing power and reducing the amount of data that needs to be transferred during the learning process. Imagine your smartphone and your laptop. They all have different levels of computing power. REFT adapts the learning process to each device's capabilities using a proposed technique called Variable Pruning. Think of it as a personalized fitness trainer, tailoring the workout to your specific fitness level. Additionally, we've adopted a technique called knowledge distillation. It's like a student learning from a teacher, where the teacher shares only the most critical information. In our case, this reduces the amount of data that needs to be sent across the internet, saving bandwidth and making FL more efficient. Our experiments, which involved training machines to recognize images, demonstrate that REFT works well, even on devices with limited resources. It's a step forward in ensuring your data stays private while still making machine learning smarter and more accessible.
17

Decentralized machine learning on massive heterogeneous datasets : A thesis about vertical federated learning

Lundberg, Oskar January 2021 (has links)
The need for a method to create a collaborative machine learning model which can utilize data from different clients, each with privacy constraints, has recently emerged. This is due to privacy restrictions, such as General Data Protection Regulation, together with the fact that machine learning models in general needs large size data to perform well. Google introduced federated learning in 2016 with the aim to address this problem. Federated learning can further be divided into horizontal and vertical federated learning, depending on how the data is structured at the different clients. Vertical federated learning is applicable when many different features is obtained on distributed computation nodes, where they can not be shared in between. The aim of this thesis is to identify the current state of the art methods in vertical federated learning, implement the most interesting ones and compare the results in order to draw conclusions of the benefits and drawbacks of the different methods. From the results of the experiments, a method called FedBCD shows very promising results where it achieves massive improvements in the number of communication rounds needed for convergence, at the cost of more computations at the clients. A comparison between synchronous and asynchronous approaches shows slightly better results for the synchronous approach in scenarios with no delay. Delay refers to slower performance in one of the workers, either due to lower computational resources or due to communication issues. In scenarios where an artificial delay is implemented, the asynchronous approach shows superior results due to its ability to continue training in the case of delays in one or several of the clients.
18

Differentially Private Random Forests for Network Intrusion Detection in a Federated Learning Setting

Frid, Alexander January 2023 (has links)
För varje dag som går möter stora industrier en ökad mängd intrång i sina IT-system. De flesta befintliga verktyg som använder sig utav maskininlärning är starkt beroende av stora mängder data, vilket innebär risker under dataöverföringen. Därför har syftet med denna studie varit att undersöka om en decentraliserad integritetsbevarande strategi kan vara ett bra alternativ för att minska effektiviteten av dessa attacker. Mer specifikt skulle användningen av Random Forests, en av de mest populära algoritmerna för maskininlärning, kunna utökas med decentraliseringstekniken Federated Learning tisammans med Differential Privacy, för att skapa en ideal metod för att upptäcka nätverksintrång? Med hjälp av befintliga kodbibliotek för maskininlärnings och verklighetsbaserad data har detta projekt konstruerat olika modeller för att simulera hur väl olika decentraliserade och integritetsbevarande modeller kan jämföras med traditionella alternativ. De skapade modellerna innehåller antingen Federated Learning, Differential Privacy eller en kombination av båda. Huvuduppgiften för dessa modeller är att förbättra integriteten och samtidigt minimera minskningen av precision. Resultaten indikerar att båda teknikerna kommer med en liten minskning i noggrannhet jämfört med traditionella alternativ. Huruvida precisionsförlusten är acceptabel eller beror på det specifika användningsområdet. Det utvecklade kombinerade alternativet lyckades dock inte nå acceptabel precision vilket hindrar oss från att dra några slutsatser. / With each passing day, large industries face an increasing amount of intrusions into their IT environments. Most existing machine learning countermeasures heavily rely on large amounts of data which introduces risk during the data-transmission. Therefore, the objective of this study has been to investigate whether a decentralized privacy-preserving approach could be a sensible alternative to decrease the effectiveness of these attacks. More specifically could the use of Random Forests, one of the most popular machine learning algorithms, be extended using the decentralization technique Federated Learning in cooperation with Differential Privacy, in order to create an ideal approach for network intrusion detection? With the assistance of existing machine learning code-libraries and real-life data, this thesis has constructed various experimental models to simulates how well different decentralized and privacy-preserving approaches compare to traditional ones. The models created incorporate either Federated Learning, Differential Privacy or a combination of both. The main task of these models is to enhance privacy while minimizing the decrease in accuracy. The results indicate that both techniques comes with a small decrease in accuracy compared to traditional alternatives. whether the accuracy loss is acceptable or not may depend on the specific scenario. The developed combined approach however, failed to reach acceptable accuracy which prevents us from drawing any conclusions.
19

Federated Learning for Market Surveillance / Federerat Lärande för Marknadsövervakning

Song, Philip January 2022 (has links)
The increasing complexity of trading strategies, when combined with machine learning models, forces market surveillance corporations to develop increasingly sophisticated methods for recognizing potential misuse. One strategy is to employ traders’ weapons against themselves, namely machine learning. However, the data utilized in market surveillance is highly sensitive, what may be available for machine learning is limited. In this thesis, we examine how federated learning for time series data can be used to identify potential market abuse while maintaining client privacy and data security. We are interested in developing a time-series-specific neural network employing federated learning. We demonstrate that when this strategy is used, the performance of detecting potential market abuse is comparable to that of the standard data centralized approach. Specifically, a non-federated model, a federated model, and a federated model with extra data privacy and security protection are evaluated and compared. Each model utilize an LSTM autoencoder to identify market abuse. The results demonstrate that a federated model’s performance in detecting possible market abuse is comparable to that of a non-federated model. Moreover, a federated approach with extra data privacy and security experienced a slight performance loss but is still a competitive model in comparison to the other models. Although this approach results in increased privacy and security, there is a limit to how much privacy and security can be ensured, as excessive privacy led to extremely poor performance. Federated learning offers the ability to increase data privacy and security with little performance decrease. / Den ökande komplexiteten handelsstrategier, i kombination med maskininlärning modeller, tvingar marknadsövervakning företag att utveckla allt mer sofistikerade metoder för att identifiera potentiellt marknadsmissbruk. En strategi är att använda handlarnas vapen mot sig själva, nämligen maskininlärning. Däremot, data som används inom marknadsövervakning är mycket känslig och vad som kan finnas tillgängligt för maskininlärning är begränsat.I den här studien undersöker vi hur federerat lärande för tidsseriedata kan användas till att identifiera potentiellt marknadsmissbruk samtidigt som klienternas integritet och datasäkerhet bibehålls. Vi är intresserade av att utveckla ett tidsserie-specifikt neuralt nätverk med hjälp av federated inlärning. Vi visar att när denna strategi används är prestanda för att upptäcka potentiellt marknadsmissbruk jämförbart med det för den vanliga data-centraliserade metoden. Specifikt, en icke-federerad modell, en federerad modell och en federerad modell med extra dataintegritet och säkerhet utvärderas och jämförs. Varje modell använder en LSTM-Autoencoder för att identifiera marknadsmissbruk. Resultaten visar att en federerad modells prestanda när det gäller att upptäcka eventuellt marknadsmissbruk är jämförbar med en icke-federerad modell. Dessutom, ett federerat tillvägagångssätt med extra dataintegritet upplevde en liten prestandaförlust men är fortfarande en konkurrenskraftig modell i jämförelse med andra modeller. Även om detta tillvägagångssätt resulterar i ökad integritet och säkerhet, finns det en gräns för hur mycket som kan säkerställas. Federated learning möjliggör ökad datasekretess och säkerhet med liten prestandasänkning.
20

Implementation of Federated Learning on Raspberry Pi Boards : Implementation of Compressed FedAvg to reduce communication cost on Raspberry Pi Boards

Purba, Rini Apriyanti January 2021 (has links)
With the development of intelligent services and applications enabled by Artificial Intelligence (AI), the Internet of Things (IoT) is infiltrating many aspects of our everyday lives. The usability of phones and tablets is largely increasing as the primary computing device, since the powerful sensors allow these devices to have access to an unprecedented amount of data. However, there are risks and responsibilities to collect the data in a centralized location due to privacy issues. To overcome this challenge, Federated Learning (FL) allows users to collectively taking the benefits of shared models trained from this big data, without the need to centrally store it. In this research, we present and evaluate the implementation of federated learning on Raspberry Pi boards using the FedAvg method. In addition, the compression method such as quantization and sparsification was applied to the baseline implementation to improve communication efficiency. We accomplished the implementation by comparing the baseline implementation and the compressed Federated-Averaging (FedAvg) on Raspberry Pi boards in order to achieve the smallest cost and higher accuracy to fit IoT environment. / Med utvecklingen av intelligenta tjänster och applikationer möjliggjord av AI infiltrerar IoT många aspekter av vår vardag. Användbarheten för telefoner och surfplattor ökar till stor del som den primära datorenheten, eftersom de kraftfulla sensorerna tillåter dessa enheter att få tillgång till en oöverträffad mängd data. Det finns dock risker och ansvar för att lagra data på en central plats på grund av integritetsfrågor. För att övervinna denna utmaning tillåter Federated Learning (FL) användare att kollektivt ta fördelarna av delade modeller utbildade från denna stora data utan att behöva lagra den centralt. I denna forskning presenterar och utvärderar vi implementeringen av federerat lärande på Raspberry Pi-kort med FedAVG-metoden. Dessutom hade komprimeringsmetoden som kvantisering och versifiering tillämpats på basimplementeringen för att förbättra kommunikationseffektiviteten. Vi slutför implementeringen genom att jämföra baslinjeimplementeringen och den komprimerade FedAVG på Raspberry-Pi-kort för att uppnå lägsta kostnad och högre noggrannhet för att passa IoT-miljö

Page generated in 0.1038 seconds