Global ETD Search

21	Apprentissage non supervisé de dépendances à partir de textes / Unsupervised dependency parsing from texts Arcadias, Marie 02 October 2015 (has links) Les grammaires de dépendance permettent de construire une organisation hiérarchique syntaxique des mots d’une phrase. La construction manuelle des arbres de dépendances étant une tâche exigeant temps et expertise, de nombreux travaux cherchent à l’automatiser. Visant à établir un processus léger et facilement adaptable nous nous sommes intéressés à l’apprentissage non supervisé de dépendances, évitant ainsi d’avoir recours à une expertise coûteuse. L’état de l’art en apprentissage non supervisé de dépendances (DMV) se compose de méthodes très complexes et extrêmement sensibles au paramétrage initial. Nous présentons dans cette thèse un nouveau modèle pour résoudre ce problème d’analyse de dépendances, mais de façon plus simple, plus rapide et plus adaptable. Nous apprenons une famille de grammaires (PCFG) réduites à moins de 6 non terminaux et de 15 règles de combinaisons des non terminaux à partir des étiquettes grammaticales. Les PCFG de cette famille que nous nommons DGdg (pour DROITE GAUCHE droite gauche) se paramètrent très légèrement, ainsi elles s’adaptent sans effort aux 12 langues testées. L’apprentissage et l’analyse sont effectués au moins deux fois plus rapidement que DMV sur les mêmes données. Et la qualité des analyses DGdg est pour certaines langues proches des analyses par DMV. Nous proposons une première application de notre méthode d’analyse de dépendances à l’extraction d’informations. Nous apprenons par des CRF un étiquetage en fonctions « sujet », « objet » et « prédicat », en nous fondant sur des caractéristiques extraites des arbres construits. / Dependency grammars allow the construction of a hierarchical organization of the words of sentences. The one-by-one building of dependency trees can be very long and it requries expert knowledge. In this regard, we are interested in unsupervised dependency learning. Currently, DMV give the state-of-art results in unsupervised dependency parsing. However, DMV has been known to be highly sensitive to initial parameters. The training of DMV model is also heavy and long. We present in this thesis a new model to solve this problem in a simpler, faster and more adaptable way. We learn a family of PCFG using less than 6 nonterminal symbols and less than 15 combination rules from the part-of-speech tags. The tuning of these PCFG is ligth, and so easily adaptable to the 12 languages we tested. Our proposed method for unsupervised dependency parsing can show the near state-of-the-art results, being twice faster. Moreover, we describe our interests in dependency trees to other applications such as relation extraction. Therefore, we show how such information from dependency structures can be integrated into condition random fields and how to improve a relation extraction task. Apprentissage non supervisé Grammaire de dépendances Grammaire hors contexte CYK Inside-Outside CRF Extraction de relations Unsupervised machine learning Dependency grammar Context-free grammar CKY Inside- Outside CRF Relation extraction 006.35
22	Computer Vision in Fitness: Exercise Recognition and Repetition Counting / Datorseende i fitness: Träningsigenkänning och upprepningsräkning Barysheva, Anna January 2022 (has links) Motion classification and action localization have rapidly become essential tasks in computer vision and video analytics. In particular, Human Action Recognition (HAR), which has important applications in clinical assessments, activity monitoring, and sports performance evaluation, has drawn a lot of attention in research communities. Nevertheless, the high-dimensional and time-continuous nature of motion data creates non-trivial challenges in action detection and action recognition. In this degree project, on a set of recorded unannotated mixed workouts, we test and evaluate unsupervised and semi-supervised machine learning models to identify the correct location, i.e., a timestamp, of various exercises in videos and to study different approaches in clustering detected actions. This is done by modelling the data via the two-step clustering pipeline using the Bag-of-Visual-Words (BoVW) approach. Moreover, the concept of repetition counting is under consideration as a parallel task. We find that clustering alone tends to produce cluster solutions with a mixture of exercises and is not sufficient to solve the exercise recognition problem. Instead, we use clustering as an initial step to aggregate similar exercises. This allows us to effectively find many repetitions of similar exercises for their further annotation. When combined with a subsequent Support Vector Machine (SVM) classifier, the BoVW concept proved itself, achieving an accuracy score of 95.5% on the labelled subset. Much attention has also been paid to various methods of dimensionality reduction and benchmarking their ability to encode the original data into a lower-dimensional latent space. / Rörelseklassificering och handlingslokalisering har snabbt blivit viktiga uppgifter inom datorseende och videoanalys. I synnerhet har HAR fångat en stor uppmärksamhet i forskarsamhällen, då den har viktiga tillämpningar i kliniska bedömningar, aktivitetsövervakning och utvärdering av sportprestanda.Likväl så skapar den högdimensionella och tidskontinuerliga naturen hos rörelsedata icke-triviala utmaningar i handlingsdetektering och handlingsigenkänning. I detta examensarbete testar vi samt utvärderar oövervakade och semi-övervarakde maskininlärningsmodeller på en samling av inspelade blandade träningspass, som inte är noterade. Detta är för att identifiera den korrekta positionen, d.v.s en tidsstämpel, för olika övningar i videofilmer och för att studera olika tillvägagångssätt för att gruppera upptäckta handlingar. Detta görs genom att modellera data via tvåstegs klustringspipeline, med tillämpning av BoVW-metoden. Som en parallell uppgift övervägs även repetitionsräkning som koncept. Vi finner att kluster enbart tenderar att producera klusterlösningar med en blandning av övningar och är därför inte tillräckligt för att lösa problemet med övningsigenkänning. Istället, använder vi klustring som ett första steg för att sammanställa liknande övningar. Detta gör att vi effektivt kan hitta många upprepningar av liknande övningar för att vidare hantera dess anteckningar. Detta, kombinerad med en efterföljande SVM-klassificerare, visade sig att BoVWkonceptet är mycket effektivt, vilket uppnådde en noggrannhet på 95, 5% på den märkta delmängden. Mycket uppmärksamhet har också ägnats åt olika metoder för dimensionalitetsreduktion och jämförelse av dessa metoders förmåga att koda originaldata till ett dimensionellt lägre latentutrymme. Exercise classification human action recognition repetition counting skeletal motion recognition unsupervised machine learning Övningsklassificering igenkänning av mänsklig handling upprepningsräkning igenkänning av skelettrörelse oövervakad maskininlärning Other Mathematics Annan matematik
23	Role Mining With Hierarchical Clustering and Binary Similarity Measures / Role mining med hierarkisk klustring och binära likhetsmått Olsson, Magnus January 2023 (has links) Role engineering, a critical task in role-based access control systems, is the process of identifying a complete set of roles that accurately reflect the structure of an organization. Role mining, a data-driven approach, utilizes data mining techniques on user-permission assignments represented as binary data to automatically derive these roles. However, relying solely on data-driven methods often leads to the generation of a large set of roles lacking interpretability. To address this limitation, this thesis presents a role mining algorithm, whose results can be viewed as an initial step in the role engineering process, in order to streamline the task of defining semantically meaningful roles, where human analysis is an inevitable post-processing step. The algorithm is based on hierarchical clustering analysis, and its main objective is identifying a sufficiently small set of roles that cover as large a proportion of the user-permission assignments as possible. To evaluate the performance of the algorithm, multiple real-world data sets representing diverse access control scenarios are utilized. The evaluation focuses on comparing various binary similarity measures, with the goal of determining the most suitable characteristics of a binary similarity measure to be used for role mining. The evaluation of different binary similarity measures provides insights into their effectiveness in achieving accurate role definitions to be used as a foundation for constructing meaningful roles. Ultimately, this research contributes to the advancement of role mining methodologies, facilitating improved access control systems that align with organizational needs and enhance security and efficiency. / Role engineering går ut på att identifiera en komplett uppsättning roller som återspeglar strukturen i en organisation och är en viktig uppgift när organisationer övergår till rollbaserad åtkomstkontroll. Role mining är en datadriven metod som använder data mining-tekniker på användarnas behörighetstilldelningar för att automatiskt härleda dessa roller. Dessa tilldelningar kan representeras som binär data. Att enbart förlita sig på datadrivna metoder leder dock ofta till att en stor uppsättning svårtolkade roller genereras. För att adressera denna begränsning har en role mining-algoritm utvecklas i det här arbetet. Genom att applicera algoritmen på den binära tilldelningsdatan kan de erhållna resultaten betraktas som ett inledande steg i role engineering-processen. Syftet är att effektivisera arbetet med att definiera semantiskt meningsfulla roller, där mänsklig analys är en oundviklig fas. Algoritmen är baserad på hierarkisk klustring och har som huvudsyfte att identifiera en lagom stor uppsättning roller som täcker så stor del av behörighetstilldelningarna som möjligt. För att utvärdera algoritmens prestanda appliceras den på flertalet datamängder insamlade från varierande verkliga åtkomstkontrollsystem. Utvärderingen fokuserar på att jämföra olika binära likhetsmått med målet att bestämma de mest lämpliga egenskaperna för ett binärt likhetsmått som ska användas för role mining. Utvärderingen av olika binära likhetsmått ger insikter i deras effektivitet att uppnå korrekta rolldefinitioner som kan användas som grund för att konstruera meningsfulla roller. Denna forskning bidrar till framsteg inom role mining och syftar till att underlätta övergången till rollbaserad åtkomstkontroll samt förbättra metoderna för att identifiera roller som överensstämmer med organisationsbehov och förbättrar säkerhet och effektivitet. Access Control RBAC Role Engineering Role Mining Unsupervised Machine Learning Hierarchical Clustering Binary Similarity Measures IAM Rollbaserad åtkomstkontroll Rollgranskning Maskininlärning Hierarkisk klustring binära likhetsmått Other Mathematics Annan matematik
24	Analysis of Drinking Water Delivery Patterns in the Northern Part of Stockholm – Effects of Population Growth, Holidays and Weather Conditions / Analys av dricksvattenleveransmönster i norra Stockholm – effekter av befolkningstillväxt, semester och väderförhållanden Elina, Irina January 2022 (has links) Global warming is widely reported to be a cause of water scarcity and increased water con-sumption. As a consequence, it becomes harder for water suppliers to be prepared for increaseddemands. It is possible to predict the upcoming demand with the help of machine learningtools, however, a preliminary analysis of water consumption patterns is important for a goodprediction. This work focuses on water consumption patterns and studies their change withtime as well as the effects of meteorological factors on it.In order to aid the investigation and scrutinization of the patterns, a new semi-automatedtool was developed. Its algorithm is based on the Mann-Whitney U statistical test and performsgrouping of the weeks with similar sets of hourly water consumption. It helps to frame off theseasons of the year within which the patterns are similar. Along with that, K-means clusteringwas applied to the data to retrieve the patterns and to compare the performance with the newlydeveloped algorithm. On top of that, the effects of the population growth and meteorologicalvariables on water consumption were studied.K-means clustering showed more robust performance than the newly developed algorithmand therefore the ways of improvement were discussed along with the significance of gooddata quality and thorough data pre-processing. It was detected that municipalities with thedifferent housing situation had different persistent summer patterns of water consumption. Ingeneral municipalities with prevailing individual housing tend to consume more water duringthe summer per capita than others. Furthermore, municipalities with prevailing individualhousing were observed to be less robust against temperature growth and humidity decreasethan those with prevailing apartment housing as the latter increase their water consumptionless significantly in response to mentioned meteorological variables change. Therefore, consid-ering the population growth, the benefits of planning new multi-apartment dwelling areas inpreference to individual housing were discussed in the context of sustainable water use andclimate change. / Global uppvärmning kan orsaka både vattenbrist och ökad vattenförbrukning. Som en kon-sekvens blir det svårare för vattenförsörjningsföretag att förberedda sig på de ökande kraven.Det är möjligt att förutsäga den kommande efterfrågan med hjälp av verktyg för maskininlärn-ing, men det är viktigt att analysera vattenförbrukningsmönster för att få en bra förutsägelse.Detta arbete fokuserar därför på att analysera, samt studerar effekterna av meteorologiskafaktorer och hur semesterperioden påverkar vattenförbrukningen.Ett nytt halvautomatiskt verktyg utformades för att extrahera dagliga vattenförbrukn-ingsmönster från förbrukningstidserier. Algoritmen anger vilka veckor på året som har liknandemönster och grupperar dem i så kallade konsumtionssäsonger. För att utvärdera prestandan förverktyget användes en grupperingsmetod den så kallade K-means clustering på samma data.Utöver det studerades även effekterna av befolkningstillväxten och meteorologiska variabler påvattenförbrukningen.K-means klustring visade sig ha en mer robust prestanda än den nya framtagna utveckladealgoritmen och därför diskuterades olika sätt att förbättra algoritmen samt vikten av god rå-datakvalitet. Det upptäcktes att kommuner med olika bostadssituation reagerade olika på varmtoch torrt väder samt vissa semesterhändelser. I allmänhet brukar kommuner med enbostadshu-sområden förbruka mer vatten under sommaren per capita än andra. Fördelar med att planeranya flerbostadsområden som ett mer hållbart alternativ till enbostadshusområden diskuteradesi kontexten av befolkningstillväxt och klimatförändringar. clustering unsupervised machine learning water consumption patterns water demand klustring oövervakad maskininlärning vattenförbrukningsmönster vattenbehov Geosciences, Multidisciplinary Multidisciplinär geovetenskap Annan geovetenskap och miljövetenskap Other Environmental Engineering Annan naturresursteknik
25	Classification de situations de conduite et détection des événements critiques d'un deux roues motorisé / Powered Two Wheelers riding patterns classification and critical events recognition Attal, Ferhat 06 July 2015 (has links) L'objectif de cette thèse est de développer des outils d'analyse de données recueillies sur les deux roues motorisés (2RMs). Dans ce cadre, des expérimentations sont menées sur des motos instrumentés dans un contexte de conduite réelle incluant à la fois des conduites normales dites naturelles et des conduites à risques (presque chute et chute). Dans la première partie de la thèse, des méthodes d'apprentissage supervisé ont été utilisées pour la classification de situations de conduite d'un 2RM. Les approches développées dans ce contexte ont montré l'intérêt de prendre en compte l'aspect temporel des données dans la conduite d'un 2RM. A cet effet, nous avons montré l'efficacité des modèles de Markov cachés. La seconde partie de cette thèse porte sur le développement d'outils de détection et de classification hors ligne des évènements critiques de conduite, ainsi que, la détection en ligne des situations de chute d'un 2RM. L'approche proposée pour la détection hors ligne des évènements critiques de conduite repose sur l'utilisation d'un modèle de mélange de densités gaussiennes à proportions logistiques. Ce modèle sert à la segmentation non supervisée des séquences de conduite. Des caractéristiques extraites du paramètre du modèle de mélange sont utilisées comme entrées d'un classifieur pour classifier les évènements critiques. Pour la détection en ligne de chute, une méthode simple de détection séquentielle d'anomalies basée sur la carte de contrôle MCUSUM a été proposée. Les résultats obtenus sur une base de données réelle ont permis de montrer l'efficacité des méthodologies proposées à la fois pour la classification de situations de conduite et à la détection des évènements critiques de conduite / This thesis aims to develop framework tools for analyzing and understanding the riding of Powered Two Wheelers (PTW). Experiments are conducted using instrumented PTW in real context including both normal (naturalistic) riding behaviors and critical riding behaviors (near fall and fall). The two objectives of this thesis are the riding patterns classification and critical riding events detection. In the first part of this thesis, a machine-learning framework is used for riding pattern recognition problem. Therefore, this problem is formulated as a classification task to identify the class of riding patterns. The approaches developed in this context have shown the interest to take into account the temporal aspect of the data in PTW riding. Moreover, we have shown the effectiveness of hidden Markov models for such problem. The second part of this thesis focuses on the development of the off-line detection and classification of critical riding events tools and the on-line fall detection. The problem of detection and classification of critical riding events has been performed towards two steps: (1) the segmentation step, where the multidimensional time of data were modeled and segmented by using a mixture model with quadratic logistic proportions; (2) the classification step, which consists in using a pattern recognition algorithm in order to assign each event by its extracted features to one of the three classes namely Fall, near Fall and Naturalistic riding. Regarding the fall detection problem, it is formulated as a sequential anomaly detection problem. The Multivariate CUmulative SUM (MCUSUM) control chart was applied on the data collected from sensors mounted on the motorcycle. The obtained results on a real database have shown the effectiveness of the proposed methodology for both riding pattern recognition and critical riding events detection problems Deux roues motorisés (2RMs) Classification de situations de conduite Détection en ligne des chutes d'un 2RM Détection d'anomalies Powered Two Wheelers (PTW) Riding pattern recognition Offline critical riding event detection On line fall detection Anomalies detection
26	Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals Wood, Sean 12 1900 (has links) On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception. Apprentissage machine non-supervisé Apprentissage machine semi-supervisé Factorisation matricielle non-négative Encodage parcimonieux Extraction de l’information musicale Détection de la hauteur de notes Unsupervised machine learning Semi-supervised machine learning Non-negative matrix factorization Sparse coding Music information retrieval Pitch detection
27	Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals Wood, Sean 12 1900 (has links) On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception. Apprentissage machine non-supervisé Apprentissage machine semi-supervisé Factorisation matricielle non-négative Encodage parcimonieux Extraction de l’information musicale Détection de la hauteur de notes Unsupervised machine learning Semi-supervised machine learning Non-negative matrix factorization Sparse coding Music information retrieval Pitch detection
28	Classification de situations de conduite et détection des événements critiques d'un deux roues motorisé / Powered Two Wheelers riding patterns classification and critical events recognition Attal, Ferhat 06 July 2015 (has links) L'objectif de cette thèse est de développer des outils d'analyse de données recueillies sur les deux roues motorisés (2RMs). Dans ce cadre, des expérimentations sont menées sur des motos instrumentés dans un contexte de conduite réelle incluant à la fois des conduites normales dites naturelles et des conduites à risques (presque chute et chute). Dans la première partie de la thèse, des méthodes d'apprentissage supervisé ont été utilisées pour la classification de situations de conduite d'un 2RM. Les approches développées dans ce contexte ont montré l'intérêt de prendre en compte l'aspect temporel des données dans la conduite d'un 2RM. A cet effet, nous avons montré l'efficacité des modèles de Markov cachés. La seconde partie de cette thèse porte sur le développement d'outils de détection et de classification hors ligne des évènements critiques de conduite, ainsi que, la détection en ligne des situations de chute d'un 2RM. L'approche proposée pour la détection hors ligne des évènements critiques de conduite repose sur l'utilisation d'un modèle de mélange de densités gaussiennes à proportions logistiques. Ce modèle sert à la segmentation non supervisée des séquences de conduite. Des caractéristiques extraites du paramètre du modèle de mélange sont utilisées comme entrées d'un classifieur pour classifier les évènements critiques. Pour la détection en ligne de chute, une méthode simple de détection séquentielle d'anomalies basée sur la carte de contrôle MCUSUM a été proposée. Les résultats obtenus sur une base de données réelle ont permis de montrer l'efficacité des méthodologies proposées à la fois pour la classification de situations de conduite et à la détection des évènements critiques de conduite / This thesis aims to develop framework tools for analyzing and understanding the riding of Powered Two Wheelers (PTW). Experiments are conducted using instrumented PTW in real context including both normal (naturalistic) riding behaviors and critical riding behaviors (near fall and fall). The two objectives of this thesis are the riding patterns classification and critical riding events detection. In the first part of this thesis, a machine-learning framework is used for riding pattern recognition problem. Therefore, this problem is formulated as a classification task to identify the class of riding patterns. The approaches developed in this context have shown the interest to take into account the temporal aspect of the data in PTW riding. Moreover, we have shown the effectiveness of hidden Markov models for such problem. The second part of this thesis focuses on the development of the off-line detection and classification of critical riding events tools and the on-line fall detection. The problem of detection and classification of critical riding events has been performed towards two steps: (1) the segmentation step, where the multidimensional time of data were modeled and segmented by using a mixture model with quadratic logistic proportions; (2) the classification step, which consists in using a pattern recognition algorithm in order to assign each event by its extracted features to one of the three classes namely Fall, near Fall and Naturalistic riding. Regarding the fall detection problem, it is formulated as a sequential anomaly detection problem. The Multivariate CUmulative SUM (MCUSUM) control chart was applied on the data collected from sensors mounted on the motorcycle. The obtained results on a real database have shown the effectiveness of the proposed methodology for both riding pattern recognition and critical riding events detection problems Deux roues motorisés (2RMs) Classification de situations de conduite Détection en ligne des chutes d'un 2RM Détection d'anomalies Powered Two Wheelers (PTW) Riding pattern recognition Offline critical riding event detection On line fall detection Anomalies detection
29	Détection dynamique des intrusions dans les systèmes informatiques / Dynamic intrusion detection in computer systems Pierrot, David 21 September 2018 (has links) La démocratisation d’Internet, couplée à l’effet de la mondialisation, a pour résultat d’interconnecter les personnes, les états et les entreprises. Le côté déplaisant de cette interconnexion mondiale des systèmes d’information réside dans un phénomène appelé « Cybercriminalité ». Des personnes, des groupes mal intentionnés ont pour objectif de nuire à l’intégrité des systèmes d’information dans un but financier ou pour servir une cause. Les conséquences d’une intrusion peuvent s’avérer problématiques pour l’existence d’une entreprise ou d’une organisation. Les impacts sont synonymes de perte financière, de dégradation de l’image de marque et de manque de sérieux. La détection d’une intrusion n’est pas une finalité en soit, la réduction du delta détection-réaction est devenue prioritaire. Les différentes solutions existantes s’avèrent être relativement lourdes à mettre place aussi bien en matière de compétence que de mise à jour. Les travaux de recherche ont permis d’identifier les méthodes de fouille de données les plus performantes mais l’intégration dans une système d’information reste difficile. La capture et la conversion des données demandent des ressources de calcul importantes et ne permettent pas forcément une détection dans des délais acceptables. Notre contribution permet, à partir d’une quantité de données relativement moindre de détecter les intrusions. Nous utilisons les événements firewall ce qui réduit les besoins en terme de puissance de calcul tout en limitant la connaissance du système d’information par les personnes en charge de la détection des intrusions. Nous proposons une approche prenant en compte les aspects techniques par l’utilisation d’une méthode hybride de fouille de données mais aussi les aspects fonctionnels. L’addition de ces deux aspects est regroupé en quatre phases. La première phase consiste à visualiser et identifier les activités réseau. La deuxième phase concerne la détection des activités anormales en utilisant des méthodes de fouille de données sur la source émettrice de flux mais également sur les actifs visés. Les troisième et quatrième phases utilisent les résultats d’une analyse de risque et d’audit technique de sécurité pour une prioritisation des actions à mener. L’ensemble de ces points donne une vision générale sur l’hygiène du système d’information mais aussi une orientation sur la surveillance et les corrections à apporter. L’approche développée a donné lieu à un prototype nommé D113. Ce prototype, testé sur une plate-forme d’expérimentation sur deux architectures de taille différentes a permis de valider nos orientations et approches. Les résultats obtenus sont positifs mais perfectibles. Des perspectives ont été définies dans ce sens. / The expansion and democratization of the digital world coupled with the effect of the Internet globalization, has allowed individuals, countries, states and companies to interconnect and interact at incidence levels never previously imagined. Cybercrime, in turn, is unfortunately one the negative aspects of this rapid global interconnection expansion. We often find malicious individuals and/or groups aiming to undermine the integrity of Information Systems for either financial gain or to serve a cause. The consequences of an intrusion can be problematic for the existence of a company or an organization. The impacts are synonymous with financial loss, brand image degradation and lack of seriousness. The detection of an intrusion is not an end in itself, the reduction of the delta detection-reaction has become a priority. The different existing solutions prove to be cumbersome to set up. Research has identified more efficient data mining methods, but integration into an information system remains difficult. Capturing and converting protected resource data does not allow detection within acceptable time frames. Our contribution helps to detect intrusions. Protect us against Firewall events which reduces the need for computing power while limiting the knowledge of the information system by intrusion detectors. We propose an approach taking into account the technical aspects by the use of a hybrid method of data mining but also the functional aspects. The addition of these two aspects is grouped into four phases. The first phase is to visualize and identify network activities. The second phase concerns the detection of abnormal activities using data mining methods on the source of the flow but also on the targeted assets. The third and fourth phases use the results of a risk analysis and a safety verification technique to prioritize the actions to be carried out. All these points give a general vision on the hygiene of the information system but also a direction on monitoring and corrections to be made.The approach developed to a prototype named D113. This prototype, tested on a platform of experimentation in two architectures of different size made it possible to validate our orientations and approaches. The results obtained are positive but perfectible. Prospects have been defined in this direction. Securité Détection d'intrusions Pare-feu (firewall) Apprentissage supervisé Collecte de renseignements Journaux (logs) Évènements Attaques Clustering Validité des clusters Disponibilité Intégrité Confidentialité Traçabilité Apprentissage non-surpervisé Security Intrusion detection Firewall Supervised machine learning Unsupervised machine learning Information gathering Logs Events Attacks Clustering Cluster validity Avaibility Integrity Privacy Traceability 004

Search results