Global ETD Search

1	Generalization in federated learning Tenison, Irene 08 1900 (has links) L'apprentissage fédéré est un paradigme émergent qui permet à un grand nombre de clients disposant de données hétérogènes de coordonner l'apprentissage d'un modèle global unifié sans avoir besoin de partager les données entre eux ou avec un stockage central. Il améliore la confidentialité des données, car celles-ci sont décentralisées et ne quittent pas les dispositifs clients. Les algorithmes standard d'apprentissage fédéré impliquent le calcul de la moyenne des paramètres du modèle ou des mises à jour du gradient pour approcher le modèle global au niveau du serveur. Cependant, dans des environnements hétérogènes, le calcul de la moyenne peut entraîner une perte d'information et conduire à une mauvaise généralisation en raison du biais induit par les gradients dominants des clients. Nous supposons que pour mieux généraliser sur des ensembles de données non-i.i.d., les algorithmes devraient se concentrer sur l'apprentissage du mécanisme invariant qui est constant tout en ignorant les mécanismes parasites qui diffèrent entre les clients. Inspirés par des travaux récents dans la littérature sur la distribution des données, nous proposons une approche de calcul de la moyenne masquée par le gradient pour FL comme alternative au calcul de la moyenne standard des mises à jour des clients. mises à jour des clients. Cette technique d'agrégation des mises à jour des clients peut être adaptée en tant que remplacement dans la plupart des algorithmes fédérés existants. Nous réalisons des expériences approfondies avec l'approche de masquage du gradient sur plusieurs algorithmes FL avec distribution, monde réel et hors distribution (en tant qu'algorithme fédéré). Hors distribution (comme le pire des scénarios) avec des déséquilibres quantitatifs. déséquilibres quantitatifs et montrent qu'elle apporte des améliorations constantes, en particulier dans le cas de clients hétérogènes. clients hétérogènes. Des garanties théoriques viennent étayer l'algorithme proposé. / Federated learning is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other or to a central storage. In enhances data privacy as data is decentralized and do not leave the client devices. Standard federated learning algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, in heterogeneous settings averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in the Out-of-Distribution literature, we propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This client update aggregation technique can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments with gradient masked approach on multiple FL algorithms with in-distribution, real-world, and out-of-distribution (as the worst case scenario) test datasets along with quantity imbalances and show that it provides consistent improvements, particularly in the case of heterogeneous clients. Theoretical guarantees further supports the proposed algorithm. Apprentissage fédéré Généralisation hors distribution Federated Learning Out of Distribution Generalization
2	Engineering-driven Machine Learning Methods for System Intelligence Wang, Yinan 19 May 2022 (has links) Smart manufacturing is a revolutionary domain integrating advanced sensing technology, machine learning methods, and the industrial internet of things (IIoT). The development of sensing technology provides large amounts and various types of data (e.g., profile, image, point cloud, etc.) to describe each stage of a manufacturing process. The machine learning methods have the advantages of efficiently and effectively processing and fusing large-scale datasets and demonstrating outstanding performance in different tasks (e.g., diagnosis, monitoring, etc.). Despite the advantages of incorporating machine learning methods into smart manufacturing, there are some widely existing concerns in practice: (1) Most of the edge devices in the manufacturing system only have limited memory space and computational capacity; (2) Both the performance and interpretability of the data analytics method are desired; (3) The connection to the internet exposes the manufacturing system to cyberattacks, which decays the trustiness of data, models, and results. To address these limitations, this dissertation proposed systematic engineering-driven machine learning methods to improve the system intelligence for smart manufacturing. The contributions of this dissertation can be summarized in three aspects. First, tensor decomposition is incorporated to approximately compress the convolutional (Conv) layer in Deep Neural Network (DNN), and a novel layer is proposed accordingly. Compared with the Conv layer, the proposed layer significantly reduces the number of parameters and computational costs without decaying the performance. Second, a physics-informed stochastic surrogate model is proposed by incorporating the idea of building and solving differential equations into designing the stochastic process. The proposed method outperforms pure data-driven stochastic surrogates in recovering system patterns from noised data points and exploiting limited training samples to make accurate predictions and conduct uncertainty quantification. Third, a Wasserstein-based out-of-distribution detection (WOOD) framework is proposed to strengthen the DNN-based classifier with the ability to detect adversarial samples. The properties of the proposed framework have been thoroughly discussed. The statistical learning bound of the proposed loss function is theoretically investigated. The proposed framework is generally applicable to DNN-based classifiers and outperforms state-of-the-art benchmarks in identifying out-of-distribution samples. / Doctor of Philosophy / The global industries are experiencing the fourth industrial revolution, which is characterized by the use of advanced sensing technology, big data analytics, and the industrial internet of things (IIoT) to build a smart manufacturing system. The massive amount of data collected in the engineering process provides rich information to describe the complex physical phenomena in the manufacturing system. The big data analytics methods (e.g., machine learning, deep learning, etc.) are developed to exploit the collected data to complete specific tasks, such as checking the quality of the product, diagnosing the root cause of defects, etc. Given the outstanding performances of the big data analytics methods in these tasks, there are some concerns arising from the engineering practice, such as the limited available computational resources, the model's lack of interpretability, and the threat of hacking attacks. In this dissertation, we propose systematic engineering-driven machine learning methods to address or mitigate these widely existing concerns. First, the model compression technique is developed to reduce the number of parameters and computational complexity of the deep learning model to fit the limited available computational resources. Second, physics principles are incorporated into designing the regression method to improve its interpretability and enable it better explore the properties of the data collected from the manufacturing system. Third, the cyberattack detection method is developed to strengthen the smart manufacturing system with the ability to detect potential hacking threats. Smart Manufacturing Engineering-driven Machine Learning Model Compression Physics-informed Stochastic Surrogate Out-of-distribution Detection
3	Asymmetry Learning for Out-of-distribution Tasks Chandra Mouli Sekar (18437814) 02 May 2024 (has links) <p dir="ltr">Despite their astonishing capacity to fit data, neural networks have difficulties extrapolating beyond training data distribution. When the out-of-distribution prediction task is formalized as a counterfactual query on a causal model, the reason for their extrapolation failure is clear: neural networks learn spurious correlations in the training data rather than features that are causally related to the target label. This thesis proposes to perform a causal search over a known family of causal models to learn robust (maximally invariant) predictors for single- and multiple-environment extrapolation tasks.</p><p dir="ltr">First, I formalize the out-of-distribution task as a counterfactual query over a structural causal model. For single-environment extrapolation, I argue that symmetries of the input data are valuable for training neural networks that can extrapolate. I introduce Asymmetry learning, a new learning paradigm that is guided by the hypothesis that all (known) symmetries are mandatory even without evidence in training, unless the learner deems it inconsistent with the training data. Asymmetry learning performs a causal model search to find the simplest causal model defining a causal connection between the target labels and the symmetry transformations that affect the label. My experiments on a variety of out-of-distribution tasks on images and sequences show that proposed methods extrapolate much better than the standard neural networks.</p><p dir="ltr">Then, I consider multiple-environment out-of-distribution tasks in dynamical system forecasting that arise due to shifts in initial conditions or parameters of the dynamical system. I identify key OOD challenges in the existing deep learning and physics-informed machine learning (PIML) methods for these tasks. To mitigate these drawbacks, I combine meta-learning and causal structure discovery over a family of given structural causal models to learn the underlying dynamical system. In three simulated forecasting tasks, I show that the proposed approach is 2x to 28x more robust than the baselines.</p> Deep learning Deep learning Robustness Out-of-distribution Causality physics-informed machine learning invariance symmetry
4	On simulating and predicting pedestrian trajectories in a crowd Bisagno, Niccolò 15 April 2020 (has links) Crowds of people are gathering at multiple venues, such as concerts, political rallies, as well as in commercial malls, or just simply walking on the streets. More and more people are flocking to live in urban areas, thus generating a lot of scenarios of crowds. As a consequence, there is an increasing demand for automatic tools that can analyze and predict the behavior of crowds to ensure safety. Crowd motion analysis is a key feature in surveillance and monitoring applications, providing useful hints about potential threats to safety and security in urban and public spaces. It is well known that people gatherings are generally difficult to model, due to the diversity of the agents composing the crowd. Each individual is unique, being driven not only by the destination but also by personality traits and attitude. The domain of crowd analysis has been widely investigated in the literature. However, crowd gatherings have sometimes resulted in dangerous scenarios in recent years, such as stampedes or during dangerous situations. To take a step toward ensuring the safety of crowds, in this work we investigate two main research problems: we try to predict each person future position and we try to understand which are the key factors for simulating crowds. Predicting in advance how a mass of people will fare in a given space would help in ensuring the safety of public gatherings. Crowd Analysi Crowd Simulation Trajectory Prediction Crowd Surveillance LSTM Out Of Distribution Open Set
5	Out-of-distribution Recognition and Classification of Time-Series Pulsed Radar Signals / Out-of-distribution Igenkänning och Klassificering av Pulserade Radar Signaler Hedvall, Paul January 2022 (has links) This thesis investigates out-of-distribution recognition for time-series data of pulsedradar signals. The classifier is a naive Bayesian classifier based on Gaussian mixturemodels and Dirichlet process mixture models. In the mixture models, we model thedistribution of three pulse features in the time series, namely radio-frequency in thepulse, duration of the pulse, and pulse repetition interval which is the time betweenpulses. We found that simple thresholds on the likelihood can effectively determine ifsamples are out-of-distribution or belong to one of the classes trained on. In addition,we present a simple method that can be used for deinterleaving/pulse classification andshow that it can robustly classify 100 interleaved signals and simultaneously determineif pulses are out-of-distribution. / Det här examensarbetet undersöker hur en maskininlärnings-modell kan anpassas för attkänna igen när pulserade radar-signaler inte tillhör samma fördelning som modellen är tränadmed men också känna igen om signalen tillhör en tidigare känd klass. Klassifieringsmodellensom används här är en naiv Bayesiansk klassifierare som använder sig av Gaussian mixturemodels och Dirichlet Process mixture models. Modellen skapar en fördelning av tidsseriedatan för pulserade radar-signaler och specifikt för frekvensen av varje puls, pulsens längd och tiden till nästa puls. Genom att sätta gränser i sannolikheten av varje puls eller sannolikhetenav en sekvens kan vi känna igen om datan är okänd eller tillhör en tidigare känd klass.Vi presenterar även en enkel metod för att klassifiera specifika pulser i sammanhang närflera signaler överlappar och att metoden kan användas för att robust avgöra om pulser ärokända. Out-of-Distribution Gaussian Mixture Models Dirichlet Process Mixture Models Deinterleaving Radar classification Time-series analysis Pulsed radar signals Out-of-Distribution Gaussian Mixture Models Dirichlet Process Mixture Models Deinterleaving Radar classification Time-series analysis Pulsed radar signals Other Mathematics Annan matematik
6	Evaluating Unsupervised Methods for Out-of-Distribution Detection on Semantically Similar Image Data / Utvärdering av oövervakade metoder för anomalidetektion på semantiskt liknande bilddata Pierrau, Magnus January 2021 (has links) Out-of-distribution detection considers methods used to detect data that deviates from the underlying data distribution used to train some machine learning model. This is an important topic, as artificial neural networks have previously been shown to be capable of producing arbitrarily confident predictions, even for anomalous samples that deviate from the training distribution. Previous work has developed many reportedly effective methods for out-of-distribution detection, but these are often evaluated on data that is semantically different from the training data, and therefore does not necessarily reflect the true performance that these methods would show in more challenging conditions. In this work, six unsupervised out-of- distribution detection methods are evaluated and compared under more challenging conditions, in the context of classification of semantically similar image data using deep neural networks. It is found that the performance of all methods vary significantly across the tested datasets, and that no one method is consistently superior. Encouraging results are found for a method using ensembles of deep neural networks, but overall, the observed performance for all methods is considerably lower than in many related works, where easier tasks are used to evaluate the performance of these methods. / Begreppet “out-of-distribution detection” (OOD-detektion) avser metoder vilka används för att upptäcka data som avviker från den underliggande datafördelningen som använts för att träna en maskininlärningsmodell. Detta är ett viktigt ämne, då artificiella neuronnät tidigare har visat sig benägna att generera godtyckligt säkra förutsägelser, även på data som avviker från den underliggande träningsfördelningen. Tidigare arbeten har producerat många välpresterande OOD-detektionsmetoder, men dessa har ofta utvärderats på data som är semantiskt olikt träningsdata, och reflekterar därför inte nödvändigtvis metodernas förmåga under mer utmanande förutsättningar. I detta arbete utvärderas och jämförs sex oövervakade OOD-detektionsmetoder under utmanande förhållanden, i form av klassificering av semantiskt liknande bilddata med hjälp av djupa neuronnät. Arbetet visar att resultaten för samtliga metoder varierar markant mellan olika data och att ingen enskild modell är konsekvent överlägsen de andra. Arbetet finner lovande resultat för en metod som utnyttjar djupa neuronnätsensembler, men överlag så presterar samtliga modeller sämre än vad tidigare arbeten rapporterat, där mindre utmanande data har nyttjats för att utvärdera metoderna. Out-of-distribution detection anomaly detection semantic similarity image data comparative evaluation synthetic image data Out-of-distribution detektion anomali detektion semantisk likhet bilddata jämförande utvärdering syntetisk bilddata Computer and Information Sciences Data- och informationsvetenskap
7	Out of Distribution Representation Learning for Network System Forecasting Jianfei Gao (15208960) 12 April 2023 (has links) <p>Representation learning algorithms, as the cutting edge of modern AIs, has shown their ability to automatically solve complex tasks in diverse fields including computer vision, speech recognition, autonomous driving, biology. Unsurprisingly, representation learning applications in computer networking domains, such as network management, video streaming, traffic forecasting, are enjoying increasing interests in recent years. However, the success of representation learning algorithms is based on consistency between training and test data distribution, which can not be guaranteed in some scenario due to resource limitation, privacy or other infrastructure reasons. Caused by distribution shift in training and test data, representation learning algorithms have to apply tuned models into environments whose data distribution are solidly different from the model training. This issue is addressed as Out-Of-Distribution (OOD) Generalization, and is still an open topic in machine learning. In this dissertation, I present solutions for OOD cases found in cloud services which will be beneficial to improve user experience. First, I implement Infinity SGD which can extrapolate from light-load server log to predict server performance under heavy-load. Infinity SGD builds the bridge between light-load and heavy-load server status through modeling server status under different loads by an unified Continuous Time Markov Chain (CTMC) of same parameters. I show that Infinity SGD can perform extrapolations that no precedent works can do on real-world testbed and synthetic experiments. Next, I propose Veritas, a framework to answer what will be the user experience if a different ABR, a kind of video streaming data transfer algorithm, was used with the same server, client and connection status. Veritas strictly follows Structural Causal Model (SCM) which guarantees its power to answer what-if counterfactual and interventional questions for video streaming. I showcase that Veritas can accurately answer confounders for what-if questions on real-world emulations where on existing works can. Finally, I propose time-then-graph, a provable more expressive temporal graph neural network (TGNN) than precedent works. We empirically show that time-then-graph is a more efficient and accurate framework on forecasting traffic on network data which will serve as an essential input data for Infinity SGD. Besides, paralleling with this dissertation, I formalize Knowledge Graph (KG) as doubly exchangeable attributed graph. I propose a doubly exchangeable representation blueprint based on the formalization which enables a complex logical reasoning task with no precedent works. This work may also find potential traffic classification applications in networking field.</p> Deep learning Computer Science Machine Learning Deep Learning Representation Learning Out of Distribution
8	Diffusion models for anomaly detection in digital pathology Bromée, Ruben January 2023 (has links) Challenges within the field of pathology leads to a high workload for pathologists. Machine learning has the ability to assist pathologists in their daily work and has shown good performance in a research setting. Anomaly detection is useful for preventing machine learning models used for classification and segmentation to be applied on data outside of the training distribution of the model. The purpose of this work was to create an optimal anomaly detection pipeline for digital pathology data using a latent diffusion model and various image similarity metrics. An anomaly detection pipeline was created which used a partial diffusion process, a combined similarity metric containing the result of multiple other similarity metrics and a contrast matching strategy for better anomaly detection performance. The anomaly detection pipeline had a good performance in an out-of-distribution detection task with an ROC-AUC score of 0.90. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> out-of-distribution detection digital pathology data latent diffusion model machine learning neural network pathology Medical Image Processing Medicinsk bildbehandling
9	(Out-of-distribution?) : generalization in deep learning Caballero, Ethan 08 1900 (has links) Le principe d’invariance par rapport à la causalité est au coeur d’approches notables telles que la minimisation du risque invariant (IRM) qui cherchent à résoudre les échecs de généralisation hors distribution (OOD). Malgré la théorie prometteuse, les approches basées sur le principe d’invariance échouent dans les tâches de classification courantes, où les caractéristiques invariantes (causales) capturent toutes les informations sur l’étiquette. Ces échecs sont-ils dus à l’incapacité des méthodes à capter l’invariance ? Ou le principe d’invariance lui-même est-il insuffisant ? Pour répondre à ces questions, nous réexaminons les hypothèses fondamentales dans les tâches de régression linéaire, où il a été démontré que les approches basées sur l’invariance généralisent de manière prouvée l’OOD. Contrairement aux tâches de régression linéaire, nous montrons que pour les tâches de classification linéaire, nous avons besoin de restrictions beaucoup plus fortes sur les changements de distribution, sinon la généralisation OOD est impossible. De plus, même avec des restrictions appropriées sur les changements de distribution en place, nous montrons que le principe d’invariance seul est insuffisant. Nous prouvons qu’une forme de contrainte de goulot d’étranglement d’information avec l’invariance aide à résoudre les échecs clés lorsque les caractéristiques invariantes capturent toutes les informations sur l’étiquette et conservent également le succès existant lorsqu’elles ne le font pas. Nous proposons une approche qui combine ces deux principes et démontre son efficacité sur des tests unitaires linéaires et sur divers jeux de données réelles de grande dimension. / The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address the key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that combines both these principles and demonstrate its effectiveness on linear unit tests and on various high-dimensional real datasets. Apprentissage en profondeur Généralisation Généralisation hors distribution Deep learning Generalization Out-of-Distribution Generalization
10	Deep Contrastive Metric Learning to Detect Polymicrogyria in Pediatric Brain MRI Zhang, Lingfeng 28 November 2022 (has links) Polymicrogyria (PMG) is one brain disease that mainly occurs in the pediatric brain. Heavy PMG will cause seizures, delayed development, and a series of problems. For this reason, it is critical to effectively identify PMG and start early treatment. Radiologists typically identify PMG through magnetic resonance imaging scans. In this study, we create and open a pediatric MRI dataset (named PPMR dataset) including PMG and controls from the Children's Hospital of Eastern Ontario (CHEO), Ottawa, Canada. The difference between PMG MRIs and control MRIs is subtle and the true distribution of the features of the disease is unknown. Hence, we propose a novel center-based deep contrastive metric learning loss function (named cDCM Loss) to deal with this difficult problem. Cross-entropy-based loss functions do not lead to models with good generalization on small and imbalanced dataset with partially known distributions. We conduct exhaustive experiments on a modified CIFAR-10 dataset to demonstrate the efficacy of our proposed loss function compared to cross-entropy-based loss functions and the state-of-the-art Deep SAD loss function. Additionally, based on our proposed loss function, we customize a deep learning model structure that integrates dilated convolution, squeeze-and-excitation blocks and feature fusion for our PPMR dataset, to achieve 92.01% recall. Since our suggested method is a computer-aided tool to assist radiologists in selecting potential PMG MRIs, 55.04% precision is acceptable. To our best knowledge, this research is the first to apply machine learning techniques to identify PMG only from MRI and our innovative method achieves better results than baseline methods. Polymicrogyria Pediatric Brain MRI Images Small and Imbalanced Datasets Out of Distribution Deep Metric Learning Supervised Anomaly Detection Convolutional Neural Networks

Search results