Global ETD Search

341	Detecção de Cross-Site Scripting em páginas Web Nunan, Angelo Eduardo 14 May 2012 (has links) Made available in DSpace on 2015-04-11T14:03:18Z (GMT). No. of bitstreams: 1 Angelo Eduardo Nunan.pdf: 2892243 bytes, checksum: 5653024cae1270242c7b4f8228cf0d2c (MD5) Previous issue date: 2012-05-14 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Web applications are currently an important environment for access to services available on the Internet. However, the security assurance of these resources has become an elementary task. The structure of dynamic websites composed by a set of objects such as HTML tags, script functions, hyperlinks and advanced features in web browsers may provide numerous resources and interactive services, for instance e-commerce, Internet banking, social networking, blogs, forums, among others. On the other hand, these features helped to increase the potential security risks and attacks, which are the results of malicious codes injection. In this context, Cross-Site Scripting (XSS) is highlighted at the top of the lists of the greatest threats to web applications in recent years. This work presents a method based on supervised machine learning techniques to detect XSS in web pages. A set of features extracted from URL contents and web document are employed in order to discriminate XSS patterns and to successfully classify both malicious and non-malicious pages / As aplicações web atualmente representam um importante ambiente de acesso aos serviços oferecidos na Internet. Garantir a segurança desses recursos se tornou uma tarefa elementar. A estrutura de sites dinâmicos constituída por um conjunto de objetos, tais como tags de HTML, funções de script, hiperlinks e recursos avançados em navegadores web levou a inúmeras funcionalidades e à interatividade de serviços, tais como e-commerce, Internet banking, redes sociais, blogs, fóruns, entre outros. No entanto, esses recursos têm aumentado potencialmente os riscos de segurança e os ataques resultantes da injeção de códigos maliciosos, onde o Cross-Site Scripting aparece em destaque, no topo das listas das maiores ameaças para aplicações web nos últimos anos. Este trabalho apresenta um método baseado em técnicas de aprendizagem de máquina supervisionada para detectar XSS em páginas web, a partir de um conjunto de características extraídas da URL e do documento web, capazes de discriminar padrões de ataques XSS e distinguir páginas web maliciosas das páginas web normais ou benignas Cross-site Scripting Segurança de aplicações web Detecção de anomalia Aprendizagem de máquina Cross-site scripting Web application security Anomaly detection Machine learning
342	Detecção de anomalias em aplicações Web utilizando filtros baseados em coeficiente de correlação parcial / Anomaly detection in web applications using filters based on partial correlation coefficient Silva, Otto Julio Ahlert Pinno da 31 October 2014 (has links) Submitted by Erika Demachki (erikademachki@gmail.com) on 2015-03-09T12:10:52Z No. of bitstreams: 2 Dissertação - Otto Julio Ahlert Pinno da Silva - 2014.pdf: 1770799 bytes, checksum: 02efab9704ef08dc041959d737152b0a (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Erika Demachki (erikademachki@gmail.com) on 2015-03-09T12:11:12Z (GMT) No. of bitstreams: 2 Dissertação - Otto Julio Ahlert Pinno da Silva - 2014.pdf: 1770799 bytes, checksum: 02efab9704ef08dc041959d737152b0a (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-03-09T12:11:12Z (GMT). No. of bitstreams: 2 Dissertação - Otto Julio Ahlert Pinno da Silva - 2014.pdf: 1770799 bytes, checksum: 02efab9704ef08dc041959d737152b0a (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-10-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Finding faults or causes of performance problems in modernWeb computer systems is an arduous task that involves many hours of system metrics monitoring and log analysis. In order to aid administrators in this task, many anomaly detection mechanisms have been proposed to analyze the behavior of the system by collecting a large volume of statistical information showing the condition and performance of the computer system. One of the approaches adopted by these mechanism is the monitoring through strong correlations found in the system. In this approach, the collection of large amounts of data generate drawbacks associated with communication, storage and specially with the processing of information collected. Nevertheless, few mechanisms for detecting anomalies have a strategy for the selection of statistical information to be collected, i.e., for the selection of monitored metrics. This paper presents three metrics selection filters for mechanisms of anomaly detection based on monitoring of correlations. These filters were based on the concept of partial correlation technique which is capable of providing information not observable by common correlations methods. The validation of these filters was performed on a scenario of Web application, and, to simulate this environment, we use the TPC-W, a Web transactions Benchmark of type E-commerce. The results from our evaluation shows that one of our filters allowed the construction of a monitoring network with 8% fewer metrics that state-of-the-art filters, and achieve fault coverage up to 10% more efficient. / Encontrar falhas ou causas de problemas de desempenho em sistemas computacionais Web atuais é uma tarefa árdua que envolve muitas horas de análise de logs e métricas de sistemas. Para ajudar administradores nessa tarefa, diversos mecanismos de detecção de anomalia foram propostos visando analisar o comportamento do sistema mediante a coleta de um grande volume de informações estatísticas que demonstram o estado e o desempenho do sistema computacional. Uma das abordagens adotadas por esses mecanismo é o monitoramento por meio de correlações fortes identificadas no sistema. Nessa abordagem, a coleta desse grande número de dados gera inconvenientes associados à comunicação, armazenamento e, especialmente, com o processamento das informações coletadas. Apesar disso, poucos mecanismos de detecção de anomalias possuem uma estratégia para a seleção das informações estatísticas a serem coletadas, ou seja, para a seleção das métricas monitoradas. Este trabalho apresenta três filtros de seleção de métricas para mecanismos de detecção de anomalias baseados no monitoramento de correlações. Esses filtros foram baseados no conceito de correlação parcial, técnica que é capaz de fornecer informações não observáveis por métodos de correlações comuns. A validação desses filtros foi realizada sobre um cenário de aplicação Web, sendo que, para simular esse ambiente, nós utilizamos o TPC-W, um Benchmark de transações Web do tipo E-commerce. Os resultados obtidos em nossa avaliação mostram que um de nossos filtros permitiu a construção de uma rede de monitoramento com 8% menos métricas que filtros estado-da-arte, além de alcançar uma cobertura de falhas até 10% mais eficiente. Detecção de anomalias Sistemas distribuídos complexos Correlação parcial Filtragem de métricas Anomaly detection Complex distributed systems Partial correlation Metric filtering
343	Cadre méthodologique et applicatif pour le développement de réseaux de capteurs fiables / The design of reliable sensor networks : methods and applications Lalem, Farid 11 September 2017 (has links) Les réseaux de capteurs sans fil émergent comme une technologie innovatrice qui peut révolutionner et améliorer notre façon de vivre, de travailler et d'interagir avec l'environnement physique qui nous entoure. Néanmoins, l'utilisation d'une telle technologie soulève de nouveaux défis concernant le développement de systèmes fiables et sécurisés. Ces réseaux de capteurs sans fil sont souvent caractérisés par un déploiement dense et à grande échelle dans des environnements limités en terme de ressources. Les contraintes imposées sont la limitation des capacités de traitement, de stockage et surtout d'énergie car ils sont généralement alimentés par des piles.Nous visons comme objectif principal à travers cette thèse à proposer des solutions permettant de garantir un certain niveau de fiabilité dans un RCSF dédié aux applications sensibles. Nous avons ainsi abordé trois axes, qui sont :- Le développement de méthodes permettant de détecter les noeuds capteurs défaillants dans un RCSF,- Le développement de méthodes permettant de détecter les anomalies dans les mesures collectées par les nœuds capteurs, et par la suite, les capteurs usés (fournissant de fausses mesures).- Le développement de méthodes permettant d'assurer l'intégrité et l'authenticité des données transmise dans un RCSF. / Wireless sensor networks emerge as an innovative technology that can revolutionize and improve our way to live, work and interact with the physical environment around us. Nevertheless, the use of such technology raises new challenges in the development of reliable and secure systems. These wireless sensor networks are often characterized by dense deployment on a large scale in resource-onstrained environments. The constraints imposed are the limitation of the processing, storage and especially energy capacities since they are generally powered by batteries.Our main objective is to propose solutions that guarantee a certain level of reliability in a WSN dedicated to sensitive applications. We have thus proposed three axes, which are:- The development of methods for detecting failed sensor nodes in a WSN.- The development of methods for detecting anomalies in measurements collected by sensor nodes, and subsequently fault sensors (providing false measurements).- The development of methods ensuring the integrity and authenticity of transmitted data over a WSN. Réseaux de capteurs sans fil Noeuds capteurs défaillants Fiabilité de données Noeuds frontières Copule Détection d'anomalies Wireless sensor networks Faulty sensor nodes Data reliability Border nodes Copula Anomaly detection
344	Monitoring et détection d'anomalie par apprentissage dans les infrastructures virtualisées / Monitoring and detection of learning abnormalities in virtualized infrastructures Sauvanaud, Carla 13 December 2016 (has links) Le cloud computing est un modèle de délivrance à la demande d’un ensemble de ressources informatiques distantes, partagées et configurables. Ces ressources, détenues par un fournisseur de service cloud, sont mutualisées grâce à la virtualisation de serveurs qu’elles composent et sont mises à disposition d’utilisateurs sous forme de services disponibles à la demande. Ces services peuvent être aussi variés que des applications, des plateformes de développement ou bien des infrastructures. Afin de répondre à leurs engagements de niveau de service auprès des utilisateurs, les fournisseurs de cloud se doivent de prendre en compte des exigences différentes de sûreté de fonctionnement. Assurer ces exigences pour des services différents et pour des utilisateurs aux demandes hétérogènes représente un défi pour les fournisseurs, notamment de part leur engagement de service à la demande. Ce défi est d’autant plus important que les utilisateurs demandent à ce que les services rendus soient au moins aussi sûrs de fonctionnement que ceux d’applications traditionnelles. Nos travaux traitent particulièrement de la détection d’anomalies dans les services cloud de type SaaS et PaaS. Les différents types d’anomalie qu’il est possible de détecter sont les erreurs, les symptômes préliminaires de violations de service et les violations de service. Nous nous sommes fixé quatre critères principaux pour la détection d’anomalies dans ces services : i) elle doit s’adapter aux changements de charge de travail et reconfiguration de services ; ii) elle doit se faire en ligne, iii) de manière automatique, iv) et avec un effort de configuration minimum en utilisant possiblement la même technique quel que soit le type de service. Dans nos travaux, nous avons proposé une stratégie de détection qui repose sur le traitement de compteurs de performance et sur des techniques d’apprentissage automatique. La détection utilise les données de performance système collectées en ligne à partir du système d’exploitation hôte ou bien via les hyperviseurs déployés dans le cloud. Concernant le traitement des ces données, nous avons étudié trois types de technique d’apprentissage : supervisé, non supervisé et hybride. Une nouvelle technique de détection reposant sur un algorithme de clustering est de plus proposée. Elle permet de prendre en compte l’évolution de comportement d’un système aussi dynamique qu’un service cloud. Une plateforme de type cloud a été déployée afin d’évaluer les performances de détection de notre stratégie. Un outil d’injection de faute a également été développé dans le but de cette évaluation ainsi que dans le but de collecter des jeux de données pour l’entraînement des modèles d’apprentissage. L’évaluation a été appliquée à deux cas d’étude : un système de gestion de base de données (MongoDB) et une fonction réseau virtualisée. Les résultats obtenus à partir d’analyses de sensibilité, montrent qu’il est possible d’obtenir de très bonnes performances de détection pour les trois types d’anomalies, tout en donnant les contextes adéquats pour la généralisation de ces résultats. / Nowadays, the development of virtualization technologies as well as the development of the Internet contributed to the rise of the cloud computing model. A cloud computing enables the delivery of configurable computing resources while enabling convenient, on-demand network access to these resources. Resources hosted by a provider can be applications, development platforms or infrastructures. Over the past few years, computing systems are characterized by high development speed, parallelism, and the diversity of task to be handled by applications and services. In order to satisfy their Service Level Agreements (SLA) drawn up with users, cloud providers have to handle stringent dependability demands. Ensuring these demands while delivering various services makes clouds dependability a challenging task, especially because providers need to make their services available on demand. This task is all the more challenging that users expect cloud services to be at least as dependable as traditional computing systems. In this manuscript, we address the problem of anomaly detection in cloud services. A detection strategy for clouds should rely on several principal criteria. In particular it should adapt to workload changes and reconfigurations, and at the same time require short configurations durations and adapt to several types of services. Also, it should be performed online and automatic. Finally, such a strategy needs to tackle the detection of different types of anomalies namely errors, preliminary symptoms of SLA violation and SLA violations. We propose a new detection strategy based on system monitoring data. The data is collected online either from the service, or the underlying hypervisor(s) hosting the service. The strategy makes use of machine learning algorithms to classify anomalous behaviors of the service. Three techniques are used, using respectively algorithms with supervised learning, unsupervised learning or using a technique exploiting both types of learning. A new anomaly detection technique is developed based on online clustering, and allowing to handle possible changes in a service behavior. A cloud platform was deployed so as to evaluate the detection performances of our strategy. Moreover a fault injection tool was developed for the sake of two goals : the collection of service observations with anomalies so as to train detection models, and the evaluation of the strategy in presence of anomalies. The evaluation was applied to two case studies : a database management system and a virtual network function. Sensitivity analyzes show that detection performances of our strategy are high for the three anomaly types. The context for the generalization of the results is also discussed. Apprentissage automatique Cloud computing Détection d'anomalie Injection de faute Monitoring Virtualisation Machine learning Cloud computing Anomaly detection Fault injection Monitoring Virtualization 004
345	Tier-scalable reconnaissance: the future in autonomous C4ISR systems has arrived: progress towards an outdoor testbed Fink, Wolfgang, Brooks, Alexander J.-W., Tarbell, Mark A., Dohm, James M. 18 May 2017 (has links) Autonomous reconnaissance missions are called for in extreme environments, as well as in potentially hazardous (e.g., the theatre, disaster-stricken areas, etc.) or inaccessible operational areas (e.g., planetary surfaces, space). Such future missions will require increasing degrees of operational autonomy, especially when following up on transient events. Operational autonomy encompasses: (1) Automatic characterization of operational areas from different vantages (i.e., spaceborne, airborne, surface, subsurface); (2) automatic sensor deployment and data gathering; (3) automatic feature extraction including anomaly detection and region-of-interest identification; (4) automatic target prediction and prioritization; (5) and subsequent automatic (re-) deployment and navigation of robotic agents. This paper reports on progress towards several aspects of autonomous (CISR)-I-4 systems, including: Caltech-patented and NASA award-winning multi-tiered mission paradigm, robotic platform development (air, ground, water-based), robotic behavior motifs as the building blocks for autonomous telecommanding, and autonomous decision making based on a Caltech-patented framework comprising sensor-data-fusion (feature-vectors), anomaly detection (clustering and principal component analysis), and target prioritization (hypothetical probing). Autonomous (CISR)-I-4 systems smart service systems multi-tiered architectures robotic agents navigational behavior motifs sensor-data-fusion framework objective anomaly detection target prioritization
346	Anomaly Detection in Electricity Consumption Data GHORBANI, SONIYA January 2017 (has links) Distribution grids play an important role in delivering electricityto end users. Electricity customers would like to have a continuouselectricity supply without any disturbance. For customerssuch as airports and hospitals electricity interruption may havedevastating consequences. Therefore, many electricity distributioncompanies are looking for ways to prevent power outages.Sometimes the power outages are caused from the grid sidesuch as failure in transformers or a break down in power cablesbecause of wind. And sometimes the outages are caused bythe customers such as overload. In fact, a very high peak inelectricity consumption and irregular load profile may causethese kinds of failures.In this thesis, we used an approach consisting of two mainsteps for detecting customers with irregular load profile. In thefirst step, we create a dictionary based on all common load profileshapes using daily electricity consumption for one-monthperiod. In the second step, the load profile shapes of customersfor a specific week are compared with the load patterns in thedictionary. If the electricity consumption for any customer duringthat week is not similar to any of the load patterns in thedictionary, it will be grouped as an anomaly. In this case, loadprofile data are transformed to symbols using Symbolic AggregateapproXimation (SAX) and then clustered using hierarchicalclustering.The approach is used to detect anomaly in weekly load profileof a data set provided by HEM Nät, a power distributioncompany located in the south of Sweden. electricity consumption smart meter data symbolic representation anomaly detection Engineering and Technology Teknik och teknologier Elektroteknik och elektronik
347	Network security monitoring and anomaly detection in industrial control system networks Mantere, M. (Matti) 19 May 2015 (has links) Abstract Industrial control system (ICS) networks used to be isolated environments, typically separated by physical air gaps from the wider area networks. This situation has been changing and the change has brought with it new cybersecurity issues. The process has also exacerbated existing problems that were previously less exposed due to the systems’ relative isolation. This process of increasing connectivity between devices, systems and persons can be seen as part of a paradigm shift called the Internet of Things (IoT). This change is progressing and the industry actors need to take it into account when working to improve the cybersecurity of ICS environments and thus their reliability. Ensuring that proper security processes and mechanisms are being implemented and enforced on the ICS network level is an important part of the general security posture of any given industrial actor. Network security and the detection of intrusions and anomalies in the context of ICS networks are the main high-level research foci of this thesis. These issues are investigated through work on machine learning (ML) based anomaly detection (AD). Potentially suitable features, approaches and algorithms for implementing a network anomaly detection system for use in ICS environments are investigated. After investigating the challenges, different approaches and methods, a proof-ofconcept (PoC) was implemented. The PoC implementation is built on top of the Bro network security monitoring framework (Bro) for testing the selected approach and tools. In the PoC, a Self-Organizing Map (SOM) algorithm is implemented using Bro scripting language to demonstrate the feasibility of using Bro as a base system. The implemented approach also represents a minimal case of event-driven machine learning anomaly detection (EMLAD) concept conceived during the research. The contributions of this thesis are as follows: a set of potential features for use in machine learning anomaly detection, proof of the feasibility of the machine learning approach in ICS network setting, a concept for event-driven machine learning anomaly detection, a design and initial implementation of user configurable and extendable machine learning anomaly detection framework for ICS networks. / Tiivistelmä Kehittyneet yhteiskunnat käyttävät teollisuuslaitoksissaan ja infrastruktuuriensa operoinnissa monimuotoisia automaatiojärjestelmiä. Näiden automaatiojärjestelmien tieto- ja kyberturvallisuuden tila on hyvin vaihtelevaa. Laitokset ja niiden hyödyntämät järjestelmät voivat edustaa usean eri aikakauden tekniikkaa ja sisältää useiden eri aikakauden heikkouksia ja haavoittuvaisuuksia. Järjestelmät olivat aiemmin suhteellisen eristyksissä muista tietoverkoista kuin omista kommunikaatioväylistään. Tämä automaatiojärjestelmien eristyneisyyden heikkeneminen on luonut uuden joukon uhkia paljastamalla niiden kommunikaatiorajapintoja ympäröivälle maailmalle. Nämä verkkoympäristöt ovat kuitenkin edelleen verrattaen eristyneitä ja tätä ominaisuutta voidaan hyödyntää niiden valvonnassa. Tässä työssä esitetään tutkimustuloksia näiden verkkojen turvallisuuden valvomisesta erityisesti poikkeamien havainnoinnilla käyttäen hyväksi koneoppimismenetelmiä. Alkuvaiheen haasteiden ja erityispiirteiden tutkimuksen jälkeen työssä käytetään itsejärjestyvien karttojen (Self-Organizing Map, SOM) algoritmia esimerkkiratkaisun toteutuksessa uuden konseptin havainnollistamiseksi. Tämä uusi konsepti on tapahtumapohjainen koneoppiva poikkeamien havainnointi (Event-Driven Machine Learning Anomaly Detection, EMLAD). Työn kontribuutiot ovat seuraavat, kaikki teollisuusautomaatioverkkojen kontekstissa: ehdotus yhdeksi anomalioiden havainnoinnissa käytettävien ominaisuuksien ryhmäksi, koneoppivan poikkeamien havainnoinnin käyttökelpoisuuden toteaminen, laajennettava ja joustava esimerkkitoteutus uudesta EMLAD-konseptista toteutettuna Bro NSM työkalun ohjelmointikielellä. anomaly detection cybersecurity industrial control system security information security intrusion detection machine learning network security automaatiojärjestelmien turvallisuus koneoppiminen kyberturvallisuus poikkeamien havainnointi tietoturva tunkeutumisen havainnointi
348	Model independent searches for New Physics using Machine Learning at the ATLAS experiment / Recherche de Nouvelle Physique indépendante d'un modèle en utilisant l’apprentissage automatique sur l’experience ATLAS Jimenez, Fabricio 16 September 2019 (has links) Nous abordons le problème de la recherche indépendante du modèle pour la Nouvelle Physique (NP), au Grand Collisionneur de Hadrons (LHC) en utilisant le détecteur ATLAS. Une attention particulière est accordée au développement et à la mise à l'essai de nouvelles techniques d'apprentissage automatique à cette fin. Le présent ouvrage présente trois résultats principaux. Tout d'abord, nous avons mis en place un système de surveillance automatique des signatures génériques au sein de TADA, un outil logiciel d'ATLAS. Nous avons exploré plus de 30 signatures au cours de la période de collecte des données de 2017 et aucune anomalie particulière n'a été observée par rapport aux simulations des processus du modèle standard. Deuxièmement, nous proposons une méthode collective de détection des anomalies pour les recherches de NP indépendantes du modèle au LHC. Nous proposons l'approche paramétrique qui utilise un algorithme d'apprentissage semi-supervisé. Cette approche utilise une probabilité pénalisée et est capable d'effectuer simultanément une sélection appropriée des variables et de détecter un comportement anormal collectif possible dans les données par rapport à un échantillon de fond donné. Troisièmement, nous présentons des études préliminaires sur la modélisation du bruit de fond et la détection de signaux génériques dans des spectres de masse invariants à l'aide de processus gaussiens (GPs) sans information préalable moyenne. Deux méthodes ont été testées dans deux ensembles de données : une procédure en deux étapes dans un ensemble de données tiré des simulations du modèle standard utilisé pour ATLAS General Search, dans le canal contenant deux jets à l'état final, et une procédure en trois étapes dans un ensemble de données simulées pour le signal (Z′) et le fond (modèle standard) dans la recherche de résonances dans le cas du spectre de masse invariant de paire supérieure. Notre étude est une première étape vers une méthode qui utilise les GPs comme outil de modélisation qui peut être appliqué à plusieurs signatures dans une configuration plus indépendante du modèle. / We address the problem of model-independent searches for New Physics (NP), at the Large Hadron Collider (LHC) using the ATLAS detector. Particular attention is paid to the development and testing of novel Machine Learning techniques for that purpose. The present work presents three main results. Firstly, we put in place a system for automatic generic signature monitoring within TADA, a software tool from ATLAS. We explored over 30 signatures in the data taking period of 2017 and no particular discrepancy was observed with respect to the Standard Model processes simulations. Secondly, we propose a collective anomaly detection method for model-independent searches for NP at the LHC. We propose the parametric approach that uses a semi-supervised learning algorithm. This approach uses penalized likelihood and is able to simultaneously perform appropriate variable selection and detect possible collective anomalous behavior in data with respect to a given background sample. Thirdly, we present preliminary studies on modeling background and detecting generic signals in invariant mass spectra using Gaussian processes (GPs) with no mean prior information. Two methods were tested in two datasets: a two-step procedure in a dataset taken from Standard Model simulations used for ATLAS General Search, in the channel containing two jets in the final state, and a three-step procedure from a simulated dataset for signal (Z′) and background (Standard Model) in the search for resonances in the top pair invariant mass spectrum case. Our study is a first step towards a method that takes advantage of GPs as a modeling tool that can be applied to several signatures in a more model independent setup. Grand collisionneur de hadrons ATLAS Large Hadron Collider Standard Model Beyond the Standard Model ATLAS New Physics Machine Learning Anomaly Detection Semi-supervised Penalized likelihood Gaussian Processes
349	Real-time anomaly detection with in-flight data : streaming anomaly detection with heterogeneous communicating agents / Détection des anomalies sur les données en vol en temps réel avec des agents communicants hétérogènes Aussel, Nicolas 21 June 2019 (has links) Avec l'augmentation du nombre de capteurs et d'actuateurs dans les avions et le développement de liaisons de données fiables entre les avions et le sol, il est devenu possible d'améliorer la sécurité et la fiabilité des systèmes à bord en appliquant des techniques d'analyse en temps réel. Cependant, étant donné la disponibilité limité des ressources de calcul embarquées et le coût élevé des liaisons de données, les solutions architecturelles actuelles ne peuvent pas exploiter pleinement toutes les ressources disponibles, limitant leur précision.Notre but est de proposer un algorithme distribué de prédiction de panne qui pourrait être exécuté à la fois à bord de l'avion et dans une station au sol tout en respectant un budget de communication. Dans cette approche, la station au sol disposerait de ressources de calcul rapides et de données historiques et l'avion disposerait de ressources de calcul limitées et des données de vol actuelles.Dans cette thèse, nous étudierons les spécificités des données aéronautiques et les méthodes déjà existantes pour produire des prédictions de pannes à partir de ces dernières et nous proposerons une solution au problème posé. Notre contribution sera détaillé en trois parties.Premièrement, nous étudierons le problème de prédiction d'événements rares créé par la haute fiabilité des systèmes aéronautiques. Beaucoup de méthodes d'apprentissage en classification reposent sur des jeux de données équilibrés. Plusieurs approches existent pour corriger le déséquilibre d'un jeu de donnée et nous étudierons leur efficacité sur des jeux de données extrêmement déséquilibrés.Deuxièmement, nous étudierons le problème d'analyse textuelle de journaux car de nombreux systèmes aéronautiques ne produisent pas d'étiquettes ou de valeurs numériques faciles à interpréter mais des messages de journaux textuels. Nous étudierons les méthodes existantes basées sur une approche statistique et sur l'apprentissage profond pour convertir des messages de journaux textuels en une forme utilisable en entrée d'algorithmes d'apprentissage pour classification. Nous proposerons notre propre méthode basée sur le traitement du langage naturel et montrerons comment ses performances dépassent celles des autres méthodes sur un jeu de donnée public standard.Enfin, nous offrirons une solution au problème posé en proposant un nouvel algorithme d'apprentissage distribué s'appuyant sur deux paradigmes d'apprentissage existant, l'apprentissage actif et l'apprentissage fédéré. Nous détaillerons notre algorithme, son implémentation et fournirons une comparaison de ses performances avec les méthodes existantes / With the rise of the number of sensors and actuators in an aircraft and the development of reliable data links from the aircraft to the ground, it becomes possible to improve aircraft security and maintainability by applying real-time analysis techniques. However, given the limited availability of on-board computing and the high cost of the data links, current architectural solutions cannot fully leverage all the available resources limiting their accuracy.Our goal is to provide a distributed algorithm for failure prediction that could be executed both on-board of the aircraft and on a ground station and that would produce on-board failure predictions in near real-time under a communication budget. In this approach, the ground station would hold fast computation resources and historical data and the aircraft would hold limited computational resources and current flight's data.In this thesis, we will study the specificities of aeronautical data and what methods already exist to produce failure prediction from them and propose a solution to the problem stated. Our contribution will be detailed in three main parts.First, we will study the problem of rare event prediction created by the high reliability of aeronautical systems. Many learning methods for classifiers rely on balanced datasets. Several approaches exist to correct a dataset imbalance and we will study their efficiency on extremely imbalanced datasets.Second, we study the problem of log parsing as many aeronautical systems do not produce easy to classify labels or numerical values but log messages in full text. We will study existing methods based on a statistical approach and on Deep Learning to convert full text log messages into a form usable as an input by learning algorithms for classifiers. We will then propose our own method based on Natural Language Processing and show how it outperforms the other approaches on a public benchmark.Last, we offer a solution to the stated problem by proposing a new distributed learning algorithm that relies on two existing learning paradigms Active Learning and Federated Learning. We detail our algorithm, its implementation and provide a comparison of its performance with existing methods Apprentissage automatique Analyse temps réel Architecture logicielle répartie Analyse de grands volumes de données Détéction d'anomalies Machine learning Real-time analysis Distributed software architecture Big data analysis Anomaly detection
350	Real time intelligent decision making from heterogeneous and imperfect data / La prise de décision intelligente en temps réel à partir de données hétérogènes et imparfaites Sfar, Hela 09 July 2019 (has links) De nos jours, l'informatique omniprésente fait face à un progrès croissant. Ce paradigme est caractérisé par de multiples capteurs intégrés dans des objets du monde physique. Le développement d'applications personnelles utilisant les données fournies par ces capteurs a conduit à la création d'environnements intelligents, conçus comme un framework de superposition avancé qui aide de manière proactive les individus dans leur vie quotidienne. Une application d’environnement intelligent collecte les données de capteurs deployés d'une façon en continu , traite ces données et les analyse avant de prendre des décisions pour exécuter des actions sur l’environnement physique. Le traitement de données en ligne consiste principalement en une segmentation des données pour les diviser en fragments. Généralement, dans la littérature, la taille des fragments est fixe. Cependant, une telle vision statique entraîne généralement des problèmes de résultats imprécis. Par conséquent, la segmentation dynamique utilisant des tailles variables de fenêtres d’observation est une question ouverte. La phase d'analyse prend en entrée un segment de données de capteurs et extrait des connaissances au moyen de processus de raisonnement ou d'extraction. La compréhension des activités quotidiennes des utilisateurs et la prévention des situations anormales sont une préoccupation croissante dans la littérature, mais la résolution de ces problèmes à l'aide de données de petite taille et imparfaites reste un problème clé. En effet, les données fournies par les capteurs sont souvent imprécises, inexactes, obsolètes, contradictoires ou tout simplement manquantes. Par conséquent, l'incertitude liée à la gestion est devenue un aspect important. De plus, il n'est pas toujours possible et trop intrusif de surveiller l'utilisateur pour obtenir une grande quantité de données sur sa routine de vie. Les gens ne sont pas souvent ouverts pour être surveillés pendant une longue période. Évidemment, lorsque les données acquises sur l'utilisateur sont suffisantes, la plupart des méthodes existantes peuvent fournir une reconnaissance précise, mais les performances baissent fortement avec de petits ensembles de données. Dans cette thèse, nous avons principalement exploré la fertilisation croisée d'approches d'apprentissage statistique et symbolique et les contributions sont triples: (i) DataSeg, un algorithme qui tire parti à la fois de l'apprentissage non supervisé et de la représentation ontologique pour la segmentation des données. Cette combinaison choisit de manière dynamique la taille de segment pour plusieurs applications, contrairement à la plupart des méthodes existantes. De plus, contrairement aux approches de la littérature, Dataseg peut être adapté à toutes les fonctionnalités de l’application; (ii) AGACY Monitoring, un modèle hybride de reconnaissance d'activité et de gestion des incertitudes qui utilise un apprentissage supervisé, une inférence de logique possibiliste et une ontologie permettant d'extraire des connaissances utiles de petits ensembles de données; (iii) CARMA, une méthode basée sur les réseaux de Markov et les règles d'association causale pour détecter les causes d'anomalie dans un environnement intelligent afin d'éviter leur apparition. En extrayant automatiquement les règles logiques concernant les causes d'anomalies et en les intégrant dans les règles MLN, nous parvenons à une identification plus précise de la situation, même avec des observations partielles. Chacune de nos contributions a été prototypée, testée et validée à l'aide de données obtenues à partir de scénarios réels réalisés. / Nowadays, pervasive computing is facing an increasing advancement. This paradigm is characterized by multiple sensors highly integrated in objects of the physical world.The development of personal applications using data provided by these sensors has prompted the creation of smart environments, which are designed as an overlay advanced framework that proactively, but sensibly, assist individuals in their every day lives. A smart environment application gathers streaming data from the deployed sensors, processes and analyzes the collected data before making decisions and executing actions on the physical environment. Online data processing consists mainly in data segmentation to divide data into fragments. Generally, in the literature, the fragment size is fixed. However, such static vision usually brings issues of imprecise outputs. Hence, dynamic segmentation using variable sizes of observation windows is an open issue. The analysis phase takes as input a segment of sensor data and extract knowledge by means of reasoning or mining processes. In particular, understanding user daily activities and preventing anomalous situations are a growing concern in the literature but addressing these problems with small and imperfect data is still a key issue. Indeed, data provided by sensors is often imprecise, inaccurate, outdated, in contradiction, or simply missing. Hence, handling uncertainty became an important aspect. Moreover, monitoring the user to obtain a large amount of data about his/her life routine is not always possible and too intrusive. People are not often open to be monitored for a long period of time. Obviously, when the acquired data about the user are sufficient, most existing methods can provide precise recognition but the performances decline sharply with small datasets.In this thesis, we mainly explored cross-fertilization of statistic and symbolic learning approaches and the contributions are threefold: (i) DataSeg, an algorithm that takes advantage of both unsupervised learning and ontology representation for data segmentation. This combination chooses dynamically the segment size for several applications unlike most of existing methods. Moreover, unlike the literature approaches, Dataseg is able to be adapted to any application features; (ii) AGACY Monitoring, a hybrid model for activity recognition and uncertainty handling which uses supervised learning, possibilistic logic inference, and an ontology to extract meaningful knowledge from small datasets; (iii) CARMA, a method based on Markov Logic Networks (MLN) and causal association rules to detect anomaly causes in a smart environment so as to prevent their occurrence. By automatically extracting logic rules about anomalies causes and integrating them in the MLN rules, we reach a more accurate situation identification even with partial observations. Each of our contributions was prototyped, tested and validated through data obtained from real scenarios that are realized. Segmentation des données Reconnaissances des activités Détection des anomalies Environnements intelligents Apprentissage automatique Méthodes symbolique Data segmentation Activity recognition Anomaly detection Smart environment Machine learning Symbolic methods

Search results