Global ETD Search

341	Model independent searches for New Physics using Machine Learning at the ATLAS experiment / Recherche de Nouvelle Physique indépendante d'un modèle en utilisant l’apprentissage automatique sur l’experience ATLAS Jimenez, Fabricio 16 September 2019 (has links) Nous abordons le problème de la recherche indépendante du modèle pour la Nouvelle Physique (NP), au Grand Collisionneur de Hadrons (LHC) en utilisant le détecteur ATLAS. Une attention particulière est accordée au développement et à la mise à l'essai de nouvelles techniques d'apprentissage automatique à cette fin. Le présent ouvrage présente trois résultats principaux. Tout d'abord, nous avons mis en place un système de surveillance automatique des signatures génériques au sein de TADA, un outil logiciel d'ATLAS. Nous avons exploré plus de 30 signatures au cours de la période de collecte des données de 2017 et aucune anomalie particulière n'a été observée par rapport aux simulations des processus du modèle standard. Deuxièmement, nous proposons une méthode collective de détection des anomalies pour les recherches de NP indépendantes du modèle au LHC. Nous proposons l'approche paramétrique qui utilise un algorithme d'apprentissage semi-supervisé. Cette approche utilise une probabilité pénalisée et est capable d'effectuer simultanément une sélection appropriée des variables et de détecter un comportement anormal collectif possible dans les données par rapport à un échantillon de fond donné. Troisièmement, nous présentons des études préliminaires sur la modélisation du bruit de fond et la détection de signaux génériques dans des spectres de masse invariants à l'aide de processus gaussiens (GPs) sans information préalable moyenne. Deux méthodes ont été testées dans deux ensembles de données : une procédure en deux étapes dans un ensemble de données tiré des simulations du modèle standard utilisé pour ATLAS General Search, dans le canal contenant deux jets à l'état final, et une procédure en trois étapes dans un ensemble de données simulées pour le signal (Z′) et le fond (modèle standard) dans la recherche de résonances dans le cas du spectre de masse invariant de paire supérieure. Notre étude est une première étape vers une méthode qui utilise les GPs comme outil de modélisation qui peut être appliqué à plusieurs signatures dans une configuration plus indépendante du modèle. / We address the problem of model-independent searches for New Physics (NP), at the Large Hadron Collider (LHC) using the ATLAS detector. Particular attention is paid to the development and testing of novel Machine Learning techniques for that purpose. The present work presents three main results. Firstly, we put in place a system for automatic generic signature monitoring within TADA, a software tool from ATLAS. We explored over 30 signatures in the data taking period of 2017 and no particular discrepancy was observed with respect to the Standard Model processes simulations. Secondly, we propose a collective anomaly detection method for model-independent searches for NP at the LHC. We propose the parametric approach that uses a semi-supervised learning algorithm. This approach uses penalized likelihood and is able to simultaneously perform appropriate variable selection and detect possible collective anomalous behavior in data with respect to a given background sample. Thirdly, we present preliminary studies on modeling background and detecting generic signals in invariant mass spectra using Gaussian processes (GPs) with no mean prior information. Two methods were tested in two datasets: a two-step procedure in a dataset taken from Standard Model simulations used for ATLAS General Search, in the channel containing two jets in the final state, and a three-step procedure from a simulated dataset for signal (Z′) and background (Standard Model) in the search for resonances in the top pair invariant mass spectrum case. Our study is a first step towards a method that takes advantage of GPs as a modeling tool that can be applied to several signatures in a more model independent setup. Grand collisionneur de hadrons ATLAS Large Hadron Collider Standard Model Beyond the Standard Model ATLAS New Physics Machine Learning Anomaly Detection Semi-supervised Penalized likelihood Gaussian Processes
342	Real-time anomaly detection with in-flight data : streaming anomaly detection with heterogeneous communicating agents / Détection des anomalies sur les données en vol en temps réel avec des agents communicants hétérogènes Aussel, Nicolas 21 June 2019 (has links) Avec l'augmentation du nombre de capteurs et d'actuateurs dans les avions et le développement de liaisons de données fiables entre les avions et le sol, il est devenu possible d'améliorer la sécurité et la fiabilité des systèmes à bord en appliquant des techniques d'analyse en temps réel. Cependant, étant donné la disponibilité limité des ressources de calcul embarquées et le coût élevé des liaisons de données, les solutions architecturelles actuelles ne peuvent pas exploiter pleinement toutes les ressources disponibles, limitant leur précision.Notre but est de proposer un algorithme distribué de prédiction de panne qui pourrait être exécuté à la fois à bord de l'avion et dans une station au sol tout en respectant un budget de communication. Dans cette approche, la station au sol disposerait de ressources de calcul rapides et de données historiques et l'avion disposerait de ressources de calcul limitées et des données de vol actuelles.Dans cette thèse, nous étudierons les spécificités des données aéronautiques et les méthodes déjà existantes pour produire des prédictions de pannes à partir de ces dernières et nous proposerons une solution au problème posé. Notre contribution sera détaillé en trois parties.Premièrement, nous étudierons le problème de prédiction d'événements rares créé par la haute fiabilité des systèmes aéronautiques. Beaucoup de méthodes d'apprentissage en classification reposent sur des jeux de données équilibrés. Plusieurs approches existent pour corriger le déséquilibre d'un jeu de donnée et nous étudierons leur efficacité sur des jeux de données extrêmement déséquilibrés.Deuxièmement, nous étudierons le problème d'analyse textuelle de journaux car de nombreux systèmes aéronautiques ne produisent pas d'étiquettes ou de valeurs numériques faciles à interpréter mais des messages de journaux textuels. Nous étudierons les méthodes existantes basées sur une approche statistique et sur l'apprentissage profond pour convertir des messages de journaux textuels en une forme utilisable en entrée d'algorithmes d'apprentissage pour classification. Nous proposerons notre propre méthode basée sur le traitement du langage naturel et montrerons comment ses performances dépassent celles des autres méthodes sur un jeu de donnée public standard.Enfin, nous offrirons une solution au problème posé en proposant un nouvel algorithme d'apprentissage distribué s'appuyant sur deux paradigmes d'apprentissage existant, l'apprentissage actif et l'apprentissage fédéré. Nous détaillerons notre algorithme, son implémentation et fournirons une comparaison de ses performances avec les méthodes existantes / With the rise of the number of sensors and actuators in an aircraft and the development of reliable data links from the aircraft to the ground, it becomes possible to improve aircraft security and maintainability by applying real-time analysis techniques. However, given the limited availability of on-board computing and the high cost of the data links, current architectural solutions cannot fully leverage all the available resources limiting their accuracy.Our goal is to provide a distributed algorithm for failure prediction that could be executed both on-board of the aircraft and on a ground station and that would produce on-board failure predictions in near real-time under a communication budget. In this approach, the ground station would hold fast computation resources and historical data and the aircraft would hold limited computational resources and current flight's data.In this thesis, we will study the specificities of aeronautical data and what methods already exist to produce failure prediction from them and propose a solution to the problem stated. Our contribution will be detailed in three main parts.First, we will study the problem of rare event prediction created by the high reliability of aeronautical systems. Many learning methods for classifiers rely on balanced datasets. Several approaches exist to correct a dataset imbalance and we will study their efficiency on extremely imbalanced datasets.Second, we study the problem of log parsing as many aeronautical systems do not produce easy to classify labels or numerical values but log messages in full text. We will study existing methods based on a statistical approach and on Deep Learning to convert full text log messages into a form usable as an input by learning algorithms for classifiers. We will then propose our own method based on Natural Language Processing and show how it outperforms the other approaches on a public benchmark.Last, we offer a solution to the stated problem by proposing a new distributed learning algorithm that relies on two existing learning paradigms Active Learning and Federated Learning. We detail our algorithm, its implementation and provide a comparison of its performance with existing methods Apprentissage automatique Analyse temps réel Architecture logicielle répartie Analyse de grands volumes de données Détéction d'anomalies Machine learning Real-time analysis Distributed software architecture Big data analysis Anomaly detection
343	Real time intelligent decision making from heterogeneous and imperfect data / La prise de décision intelligente en temps réel à partir de données hétérogènes et imparfaites Sfar, Hela 09 July 2019 (has links) De nos jours, l'informatique omniprésente fait face à un progrès croissant. Ce paradigme est caractérisé par de multiples capteurs intégrés dans des objets du monde physique. Le développement d'applications personnelles utilisant les données fournies par ces capteurs a conduit à la création d'environnements intelligents, conçus comme un framework de superposition avancé qui aide de manière proactive les individus dans leur vie quotidienne. Une application d’environnement intelligent collecte les données de capteurs deployés d'une façon en continu , traite ces données et les analyse avant de prendre des décisions pour exécuter des actions sur l’environnement physique. Le traitement de données en ligne consiste principalement en une segmentation des données pour les diviser en fragments. Généralement, dans la littérature, la taille des fragments est fixe. Cependant, une telle vision statique entraîne généralement des problèmes de résultats imprécis. Par conséquent, la segmentation dynamique utilisant des tailles variables de fenêtres d’observation est une question ouverte. La phase d'analyse prend en entrée un segment de données de capteurs et extrait des connaissances au moyen de processus de raisonnement ou d'extraction. La compréhension des activités quotidiennes des utilisateurs et la prévention des situations anormales sont une préoccupation croissante dans la littérature, mais la résolution de ces problèmes à l'aide de données de petite taille et imparfaites reste un problème clé. En effet, les données fournies par les capteurs sont souvent imprécises, inexactes, obsolètes, contradictoires ou tout simplement manquantes. Par conséquent, l'incertitude liée à la gestion est devenue un aspect important. De plus, il n'est pas toujours possible et trop intrusif de surveiller l'utilisateur pour obtenir une grande quantité de données sur sa routine de vie. Les gens ne sont pas souvent ouverts pour être surveillés pendant une longue période. Évidemment, lorsque les données acquises sur l'utilisateur sont suffisantes, la plupart des méthodes existantes peuvent fournir une reconnaissance précise, mais les performances baissent fortement avec de petits ensembles de données. Dans cette thèse, nous avons principalement exploré la fertilisation croisée d'approches d'apprentissage statistique et symbolique et les contributions sont triples: (i) DataSeg, un algorithme qui tire parti à la fois de l'apprentissage non supervisé et de la représentation ontologique pour la segmentation des données. Cette combinaison choisit de manière dynamique la taille de segment pour plusieurs applications, contrairement à la plupart des méthodes existantes. De plus, contrairement aux approches de la littérature, Dataseg peut être adapté à toutes les fonctionnalités de l’application; (ii) AGACY Monitoring, un modèle hybride de reconnaissance d'activité et de gestion des incertitudes qui utilise un apprentissage supervisé, une inférence de logique possibiliste et une ontologie permettant d'extraire des connaissances utiles de petits ensembles de données; (iii) CARMA, une méthode basée sur les réseaux de Markov et les règles d'association causale pour détecter les causes d'anomalie dans un environnement intelligent afin d'éviter leur apparition. En extrayant automatiquement les règles logiques concernant les causes d'anomalies et en les intégrant dans les règles MLN, nous parvenons à une identification plus précise de la situation, même avec des observations partielles. Chacune de nos contributions a été prototypée, testée et validée à l'aide de données obtenues à partir de scénarios réels réalisés. / Nowadays, pervasive computing is facing an increasing advancement. This paradigm is characterized by multiple sensors highly integrated in objects of the physical world.The development of personal applications using data provided by these sensors has prompted the creation of smart environments, which are designed as an overlay advanced framework that proactively, but sensibly, assist individuals in their every day lives. A smart environment application gathers streaming data from the deployed sensors, processes and analyzes the collected data before making decisions and executing actions on the physical environment. Online data processing consists mainly in data segmentation to divide data into fragments. Generally, in the literature, the fragment size is fixed. However, such static vision usually brings issues of imprecise outputs. Hence, dynamic segmentation using variable sizes of observation windows is an open issue. The analysis phase takes as input a segment of sensor data and extract knowledge by means of reasoning or mining processes. In particular, understanding user daily activities and preventing anomalous situations are a growing concern in the literature but addressing these problems with small and imperfect data is still a key issue. Indeed, data provided by sensors is often imprecise, inaccurate, outdated, in contradiction, or simply missing. Hence, handling uncertainty became an important aspect. Moreover, monitoring the user to obtain a large amount of data about his/her life routine is not always possible and too intrusive. People are not often open to be monitored for a long period of time. Obviously, when the acquired data about the user are sufficient, most existing methods can provide precise recognition but the performances decline sharply with small datasets.In this thesis, we mainly explored cross-fertilization of statistic and symbolic learning approaches and the contributions are threefold: (i) DataSeg, an algorithm that takes advantage of both unsupervised learning and ontology representation for data segmentation. This combination chooses dynamically the segment size for several applications unlike most of existing methods. Moreover, unlike the literature approaches, Dataseg is able to be adapted to any application features; (ii) AGACY Monitoring, a hybrid model for activity recognition and uncertainty handling which uses supervised learning, possibilistic logic inference, and an ontology to extract meaningful knowledge from small datasets; (iii) CARMA, a method based on Markov Logic Networks (MLN) and causal association rules to detect anomaly causes in a smart environment so as to prevent their occurrence. By automatically extracting logic rules about anomalies causes and integrating them in the MLN rules, we reach a more accurate situation identification even with partial observations. Each of our contributions was prototyped, tested and validated through data obtained from real scenarios that are realized. Segmentation des données Reconnaissances des activités Détection des anomalies Environnements intelligents Apprentissage automatique Méthodes symbolique Data segmentation Activity recognition Anomaly detection Smart environment Machine learning Symbolic methods
344	Towards a learning system for process and energy industry : Enabling optimal control, diagnostics and decision support Rahman, Moksadur January 2019 (has links) Driven by intense competition, increasing operational cost and strict environmental regulations, the modern process and energy industry needs to find the best possible way to adapt to maintain profitability. Optimization of control and operation of the industrial systems is essential to satisfy the contradicting objectives of improving product quality and process efficiency while reducing production cost and plant downtime. Use of optimization not only improves the control and monitoring of assets but also offers better coordination among different assets. Thus, it can lead to considerable savings in energy and resource consumption, and consequently offer a reduction in operational costs, by offering better control, diagnostics and decision support. This is one of the main driving forces behind developing new methods, tools and frameworks that can be integrated with the existing industrial automation platforms to benefit from optimal control and operation. The main focus of this dissertation is the use of different process models, soft sensors and optimization techniques to improve the control, diagnostics and decision support for the process and energy industry. A generic architecture for an optimal control, diagnostics and decision support system, referred to here as a learning system, is proposed. The research is centred around an investigation of different components of the proposed learning system. Two very different case studies within the energy-intensive pulp and paper industry and the promising micro-combined heat and power (CHP) industry are selected to demonstrate the learning system. One of the main challenges in this research arises from the marked differences between the case studies in terms of size, functions, quantity and structure of the existing automation systems. Typically, only a few pulp digesters are found in a Kraft pulping mill, but there may be hundreds of units in a micro-CHP fleet. The main argument behind the selection of these two case studies is that if the proposed learning system architecture can be adapted for these significantly different cases, it can be adapted for many other energy and process industrial cases. Within the scope of this thesis, mathematical modelling, model adaptation, model predictive control and diagnostics methods are studied for continuous pulp digesters, whereas mathematical modelling, model adaptation and diagnostics techniques are explored for the micro-CHP fleet. / FUDIPO – FUture DIrections for Process industry Optimization Learning system Supervisory system Pulp and paper Micro gas turbine Process modelling Model-based control Diagnostics Decision support Anomaly detection Fault detection Energy Engineering Energiteknik
345	Hluboké neuronové sítě pro detekci anomálií při kontrole kvality / Deep Neural Networks for Defect Detection Juřica, Tomáš January 2019 (has links) The goal of this work is to bring automatic defect detection to the manufacturing process of plastic cards. A card is considered defective when it is contaminated with a dust particle or a hair. The main challenges I am facing to accomplish this task are a very few training data samples (214 images), small area of target defects in context of an entire card (average defect area is 0.0068 \% of the card) and also very complex background the detection task is performed on. In order to accomplish the task, I decided to use Mask R-CNN detection algorithm combined with augmentation techniques such as synthetic dataset generation. I trained the model on the synthetic dataset consisting of 20 000 images. This way I was able to create a model performing 0.83 AP at 0.1 IoU on the original data test set.
346	Detekce anomálií běhu RTOS aplikace / Detecting RTOS Runtime Anomalies Arm, Jakub January 2020 (has links) Due to higher requirements of computational power and safety, or functional safety ofequipments intended for the use in the industrial domain, embedded systems containing areal-time operating system are still the active area of research. This thesis addresses thehardware-assisted control module that is based on the runtime model-based verificationof a target application. This subsystem is intended to increase the diagnostic coverage,particularly, the detection of the execution errors. After the specification of the architecture,the formal model is defined and implemented into hardware using FPGA technology.This thesis also discuss some other aspects and embodies new approaches in the area ofembedded flow control, e.g. the integration of the design patterns. Using the simulation,the created module was tested using the created scenarios, which follow the real programexecution record. The results suggest that the error detection time is lower than usingstandard techniques, such a watchdog.
347	Detekce síťových anomálií / Network Anomaly Detection Pšorn, Daniel January 2012 (has links) This master thesis deals with detecting anomalies methods in network traffic. First of all this thesis analyzes the basic concepts of anomaly detection and already using technology. Next, there are also described in more detail three methods for anomalies search and some types of anomalies. In the second part of this thesis there is described implementation of all three methods and there are presented the results of experimentation using real data.
348	Développement d'algorithmes d'analyse spectrale en spectrométrie gamma embarquée / Embedded gamma spectrometry : new algorithms for spectral analysis Martin-Burtart, Nicolas 06 December 2012 (has links) Jusqu’au début des années 1980, la spectrométrie gamma aéroportée a avant tout été utilisée pour des applications géophysiques et ne concernait que la mesure des concentrations dans les sols des trois radionucléides naturels (K40, U238 et Th232). Durant les quinze dernières années, un grand nombre de dispositifs de mesures a été développé, la plupart après l’accident de Tchernobyl, pour intervenir en cas d’incidents nucléaires ou de surveillance de l’environnement. Les algorithmes développés ont suivi les différentes missions de ces systèmes. La plupart sont dédiés à l’extraction des signaux à moyenne et haute énergie, où les radionucléides naturels (K40, les chaînes U238 et Th232) et les produits de fission (Cs137 et Co60 principalement) sont présents. A plus basse énergie (< 400 keV), ces méthodes peuvent toujours être utilisées mais les particularités du fond de diffusion, très intense, les rendent peu précises. Cette zone énergétique est importante : les SNM émettent à ces énergies. Un algorithme, appelé 2-fenêtres (étendu à 3), a été développé permettant une extraction précise et tenant compte des conditions de vol. La surveillance du trafic de matières radioactives dans le cadre de la sécurité globale a fait son apparition depuis quelques années. Cette utilisation nécessite non plus des méthodes sensibles à un élément particulier mais des critères d’anomalie prenant en compte l’ensemble du spectre enregistré. Il faut être sensible à la fois aux radionucléides médicaux, industriels et nucléaires. Ce travail a permis d’identifier deux familles d’algorithmes permettant de telles utilisations. Enfin, les anomalies détectées doivent être identifiées. La liste des radionucléides nécessitant une surveillance particulière, recommandée par l’AIEA, contient une trentaine d’émetteurs. Un nouvel algorithme d’identification a été entièrement développé, permettant de s’appuyer sur plusieurs raies d’absorption par élément et de lever les conflits d’identification. / Airborne gamma spectrometry was first used for mining prospection. Three main families were looked for: K40, U238 and Th232. The Chernobyl accident acted as a trigger and for the last fifteen years, a lot of new systems have been developed for intervention in case of nuclear accident or environmental purposes. Depending on their uses, new algorithms were developed, mainly for medium or high energy signal extraction. These spectral regions are characteristics of natural emissions (K40, U238- and Th-232 decay chains) and fissions products (mainly Cs137 and Co60). Below 400 keV, where special nuclear materials emit, these methods can still be used but are greatly imprecise. A new algorithm called 2-windows (extended to 3), was developed. It allows an accurate extraction, taking the flight altitude into account to minimize false detection. Watching radioactive materials traffic appeared with homeland security policy a few years ago. This particular use of dedicated sensors require a new type of algorithms. Before, one algorithm was very efficient for a particular nuclide or spectral region. Now, we need algorithm able to detect an anomaly wherever it is and whatever it is : industrial, medical or SNM. This work identified two families of methods working under these circumstances. Finally, anomalies have to be identified. IAEA recommend to watch around 30 radionuclides. A brand new identification algorithm was developed, using several rays per element and avoiding identifications conflicts. Spectrométrie gamma embarquée Analyse spectrale Basse énergie Détection d’anomalie Identification Embedded gamma spectrometry Spectral analysis Low energy Anomaly detection Identification 539.7
349	Hypervisor-based cloud anomaly detection using supervised learning techniques Nwamuo, Onyekachi 23 January 2020 (has links) Although cloud network flows are similar to conventional network flows in many ways, there are some major differences in their statistical characteristics. However, due to the lack of adequate public datasets, the proponents of many existing cloud intrusion detection systems (IDS) have relied on the DARPA dataset which was obtained by simulating a conventional network environment. In the current thesis, we show empirically that the DARPA dataset by failing to meet important statistical characteristics of real-world cloud traffic data centers is inadequate for evaluating cloud IDS. We analyze, as an alternative, a new public dataset collected through cooperation between our lab and a non-profit cloud service provider, which contains benign data and a wide variety of attack data. Furthermore, we present a new hypervisor-based cloud IDS using an instance-oriented feature model and supervised machine learning techniques. We investigate 3 different classifiers: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) algorithms. Experimental evaluation on a diversified dataset yields a detection rate of 92.08% and a false-positive rate of 1.49% for the random forest, the best performing of the three classifiers. / Graduate Cloud Anomaly Detection ISOT-CID CLoud Network vs Conventional Network DARPA 1998 IDS Dataset Hypervisor-Based Intrusion Detection
350	Forecasting anomalies in time series data from online production environments Sseguya, Raymond January 2020 (has links) Anomaly detection on time series forecasts can be used by many industries in especially forewarning systems that can predict anomalies before they happen. Infor (Sweden) AB is software company that provides Enterprise Resource Planning cloud solutions. Infor is interested in predicting anomalies in their data and that is the motivation for this thesis work. The general idea is firstly to forecast the time series and then secondly detect and classify anomalies on the forecast. The first part is time series forecasting and the second part is anomaly detection and classification done on the forecasted values. In this thesis work, the time series forecasting to predict anomalous behaviour is done using two strategies namely the recursive strategy and the direct strategy. The recursive strategy includes two methods; AutoRegressive Integrated Moving Average and Neural Network AutoRegression. The direct strategy is done with ForecastML-eXtreme Gradient Boosting. Then the three methods are compared concerning performance of forecasting. The anomaly detection and classification is done by setting a decision rule based on a threshold. In this thesis work, since the true anomaly thresholds were not previously known, an arbitrary initial anomaly threshold is set by using a combination of statistical methods for outlier detection and then human judgement by the company commissioners. These statistical methods include Seasonal and Trend decomposition using Loess + InterQuartile Range, Twitter + InterQuartile Range and Twitter + GESD (Generalized Extreme Studentized Deviate). After defining what an anomaly threshold is in the usage context of Infor (Sweden) AB, then a decision rule is set and used to classify anomalies in time series forecasts. The results from comparing the classifications of the forecasts from the three time series forecasting methods are unfortunate and no recommendation is made concerning what model or algorithm to be used by Infor (Sweden) AB. However, the thesis work concludes by recommending other methods that can be tried in future research. Infor (Sweden) AB time series forecasting anomaly detection ARIMA neural network autoregression eXtreme Gradient Boosting package Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik

Search results