Global ETD Search

11	Anomaly detection technique for sequential data / Technique de détection d'anomalies utilisant des données séquentielles Pellissier, Muriel 15 October 2013 (has links) De nos jours, beaucoup de données peuvent être facilement accessibles. Mais toutes ces données ne sont pas utiles si nous ne savons pas les traiter efficacement et si nous ne savons pas extraire facilement les informations pertinentes à partir d'une grande quantité de données. Les techniques de détection d'anomalies sont utilisées par de nombreux domaines afin de traiter automatiquement les données. Les techniques de détection d'anomalies dépendent du domaine d'application, des données utilisées ainsi que du type d'anomalie à détecter.Pour cette étude nous nous intéressons seulement aux données séquentielles. Une séquence est une liste ordonnée d'objets. Pour de nombreux domaines, il est important de pouvoir identifier les irrégularités contenues dans des données séquentielles comme par exemple les séquences ADN, les commandes d'utilisateur, les transactions bancaires etc.Cette thèse présente une nouvelle approche qui identifie et analyse les irrégularités de données séquentielles. Cette technique de détection d'anomalies peut détecter les anomalies de données séquentielles dont l'ordre des objets dans les séquences est important ainsi que la position des objets dans les séquences. Les séquences sont définies comme anormales si une séquence est presque identique à une séquence qui est fréquente (normale). Les séquences anormales sont donc les séquences qui diffèrent légèrement des séquences qui sont fréquentes dans la base de données.Dans cette thèse nous avons appliqué cette technique à la surveillance maritime, mais cette technique peut être utilisée pour tous les domaines utilisant des données séquentielles. Pour notre application, la surveillance maritime, nous avons utilisé cette technique afin d'identifier les conteneurs suspects. En effet, de nos jours 90% du commerce mondial est transporté par conteneurs maritimes mais seulement 1 à 2% des conteneurs peuvent être physiquement contrôlés. Ce faible pourcentage est dû à un coût financier très élevé et au besoin trop important de ressources humaines pour le contrôle physique des conteneurs. De plus, le nombre de conteneurs voyageant par jours dans le monde ne cesse d'augmenter, il est donc nécessaire de développer des outils automatiques afin d'orienter le contrôle fait par les douanes afin d'éviter les activités illégales comme les fraudes, les quotas, les produits illégaux, ainsi que les trafics d'armes et de drogues. Pour identifier les conteneurs suspects nous comparons les trajets des conteneurs de notre base de données avec les trajets des conteneurs dits normaux. Les trajets normaux sont les trajets qui sont fréquents dans notre base de données.Notre technique est divisée en deux parties. La première partie consiste à détecter les séquences qui sont fréquentes dans la base de données. La seconde partie identifie les séquences de la base de données qui diffèrent légèrement des séquences qui sont fréquentes. Afin de définir une séquence comme normale ou anormale, nous calculons une distance entre une séquence qui est fréquente et une séquence aléatoire de la base de données. La distance est calculée avec une méthode qui utilise les différences qualitative et quantitative entre deux séquences. / Nowadays, huge quantities of data can be easily accessible, but all these data are not useful if we do not know how to process them efficiently and how to extract easily relevant information from a large quantity of data. The anomaly detection techniques are used in many domains in order to help to process the data in an automated way. The anomaly detection techniques depend on the application domain, on the type of data, and on the type of anomaly.For this study we are interested only in sequential data. A sequence is an ordered list of items, also called events. Identifying irregularities in sequential data is essential for many application domains like DNA sequences, system calls, user commands, banking transactions etc.This thesis presents a new approach for identifying and analyzing irregularities in sequential data. This anomaly detection technique can detect anomalies in sequential data where the order of the items in the sequences is important. Moreover, our technique does not consider only the order of the events, but also the position of the events within the sequences. The sequences are spotted as anomalous if a sequence is quasi-identical to a usual behavior which means if the sequence is slightly different from a frequent (common) sequence. The differences between two sequences are based on the order of the events and their position in the sequence.In this thesis we applied this technique to the maritime surveillance, but this technique can be used by any other domains that use sequential data. For the maritime surveillance, some automated tools are needed in order to facilitate the targeting of suspicious containers that is performed by the customs. Indeed, nowadays 90% of the world trade is transported by containers and only 1-2% of the containers can be physically checked because of the high financial cost and the high human resources needed to control a container. As the number of containers travelling every day all around the world is really important, it is necessary to control the containers in order to avoid illegal activities like fraud, quota-related, illegal products, hidden activities, drug smuggling or arm smuggling. For the maritime domain, we can use this technique to identify suspicious containers by comparing the container trips from the data set with itineraries that are known to be normal (common). A container trip, also called itinerary, is an ordered list of actions that are done on containers at specific geographical positions. The different actions are: loading, transshipment, and discharging. For each action that is done on a container, we know the container ID and its geographical position (port ID).This technique is divided into two parts. The first part is to detect the common (most frequent) sequences of the data set. The second part is to identify those sequences that are slightly different from the common sequences using a distance-based method in order to classify a given sequence as normal or suspicious. The distance is calculated using a method that combines quantitative and qualitative differences between two sequences. Détection d'anomalies Données séquentielles Expressions régulières Distance Extraction d'informations Sécurité maritime Anomaly detection Sequential data Regular expression Distance Extraction of information Maritime security 004
12	Maskininlärning: avvikelseklassificering på sekventiell sensordata. En jämförelse och utvärdering av algoritmer för att klassificera avvikelser i en miljövänlig IoT produkt med sekventiell sensordata Heidfors, Filip, Moltedo, Elias January 2019 (has links) Ett företag har tagit fram en miljövänlig IoT produkt med sekventiell sensordata och vill genom maskininlärning kunna klassificera avvikelser i sensordatan. Det har genom åren utvecklats ett flertal väl fungerande algoritmer för klassificering men det finns emellertid ingen algoritm som fungerar bäst för alla olika problem. Syftet med det här arbetet var därför att undersöka, jämföra och utvärdera olika klassificerare inom "supervised machine learning" för att ta reda på vilken klassificerare som ger högst träffsäkerhet att klassificera avvikelser i den typ av IoT produkt som företaget tagit fram. Genom en litteraturstudie tog vi först reda på vilka klassificerare som vanligtvis använts och fungerat bra i tidigare vetenskapliga arbeten med liknande applikationer. Vi kom fram till att jämföra och utvärdera Random Forest, Naïve Bayes klassificerare och Support Vector Machines ytterligare. Vi skapade sedan ett dataset på 513 exempel som vi använde för träning och validering för respektive klassificerare. Resultatet visade att Random Forest hade betydligt högre träffsäkerhet med 95,7% jämfört med Naïve Bayes klassificerare (81,5%) och Support Vector Machines (78,6%). Slutsatsen för arbetet är att Random Forest med sina 95,7% ger en tillräckligt hög träffsäkerhet så att företaget kan använda maskininlärningsmodellen för att förbättra sin produkt. Resultatet pekar också på att Random Forest, för det här arbetets specifika klassificeringsproblem, är den klassificerare som fungerar bäst inom "supervised machine learning" men att det eventuellt finns möjlighet att få ännu högre träffsäkerhet med andra tekniker som till exempel "unsupervised machine learning" eller "semi-supervised machine learning". / A company has developed a environment-friendly IoT device with sequential sensor data and want to use machine learning to classify anomalies in their data. Throughout the years, several well working algorithms for classifications have been developed. However, there is no optimal algorithm for every problem. The purpose of this work was therefore to investigate, compare and evaluate different classifiers within supervised machine learning to find out which classifier that gives the best accuracy to classify anomalies in the kind of IoT device that the company has developed. With a literature review we first wanted to find out which classifiers that are commonly used and have worked well in related work for similar purposes and applications. We concluded to further compare and evaluate Random Forest, Naïve Bayes and Support Vector Machines. We created a dataset of 513 examples that we used for training and evaluation for each classifier. The result showed that Random Forest had superior accuracy with 95.7% compared to Naïve Bayes (81.5%) and Support Vector Machines (78.6%). The conclusion for this work is that Random Forest, with 95.7%, gives a high enough accuracy for the company to have good use of the machine learning model. The result also indicates that Random Forest, for this thesis specific classification problem, is the best classifier within supervised machine learning but that there is a potential possibility to get even higher accuracy with other techniques such as unsupervised machine learning or semi-supervised machine learning. Machine learning Supervised learning Classifying algorithms Classifiers Random Forest Naïve bayes Support vector machine Sensor data Sequential data Engineering and Technology Teknik och teknologier
13	On Computational Stylistics : mining Literary Texts for the Extraction of Characterizing Stylistic Patterns / De la stylistique computationnelle : fouille de textes littéraires pour l'extraction de motifs stylistiques caractérisants Boukhaled, Mohamed Amine 13 September 2016 (has links) Notre thèse se situe dans le domaine interdisciplinaire de la stylistique computationnelle, à savoir l'application des méthodes statistiques et computationnelles à l'étude du style littéraire. Historiquement, la plupart des travaux effectués en stylistique computationnelle se sont concentrés sur les aspects lexicaux. Dans notre thèse, l’accent est mis sur l'aspect syntaxique du style qui est beaucoup plus difficile à analyser étant donné sa nature abstraite. Comme contribution principale, dans cette thèse, nous travaillons sur une approche à l'étude stylistique computationnelle de textes classiques de littérature française d'un point de vue herméneutique, où découvrir des traits linguistiques intéressants se fait sans aucune connaissance préalable. Plus concrètement, nous nous concentrons sur le développement et l'extraction des motifs morphosyntaxiques. Suivant la ligne de pensée herméneutique, nous proposons un processus de découverte de connaissances pour la caractérisation stylistique accentué sur la dimension syntaxique du style et permettant d'extraire des motifs pertinents à partir d'un texte donné. Ce processus proposé consiste en deux étapes principales, une étape d'extraction de motifs séquentiels suivi de l'application de certaines mesures d'intérêt. En particulier, l'extraction de tous les motifs syntaxiques possibles d'une longueur donnée est proposée comme un moyen particulièrement utile pour extraire des caractéristiques intéressantes dans un scénario exploratoire. Nous proposons, évaluons et présentons des résultats sur les trois mesures d'intérêt proposées, basée chacune sur un raisonnement théorique linguistique et statistique différent. / The present thesis locates itself in the interdisciplinary field of computational stylistics, namely the application of statistical and computational methods to the study of literary style. Historically, most of the work done in computational stylistics has been focused on lexical aspects especially in the early decades of the discipline. However, in this thesis, our focus is put on the syntactic aspect of style which is quite much harder to capture and to analyze given its abstract nature. As main contribution, we work on an approach to the computational stylistic study of classic French literary texts based on a hermeneutic point of view, in which discovering interesting linguistic patterns is done without any prior knowledge. More concretely, we focus on the development and the extraction of complex yet computationally feasible stylistic features that are linguistically motivated, namely morpho-syntactic patterns. Following the hermeneutic line of thought, we propose a knowledge discovery process for the stylistic characterization with an emphasis on the syntactic dimension of style by extracting relevant patterns from a given text. This knowledge discovery process consists of two main steps, a sequential pattern mining step followed by the application of some interestingness measures. In particular, the extraction of all possible syntactic patterns of a given length is proposed as a particularly useful way to extract interesting features in an exploratory scenario. We propose, carry out an experimental evaluation and report results on three proposed interestingness measures, each of which is based on a different theoretical linguistic and statistical backgrounds. Stylistique computationnelle Fouille de données séquentielles Découverte de connaissances Fouille de textes Motif morphosyntaxique Mesure d'interêt Computational stylistics Sequential data mining Knowledge dicovery 004
14	Modélisation spatio-temporelle de la pollution atmosphérique urbaine à partir d'un réseau de surveillance de la qualité de l'air / Spatio-temporal modelling of atmospheric pollution based on observations provided by an air quality monitoring network at a regional scale Coman, Adriana 26 September 2008 (has links) Cette étude est consacrée à la modélisation spatio-temporelle de la pollution atmosphérique urbaine en utilisant un ensemble de méthodes statistiques exploitant les mesures de concentrations de polluants (NO2, O3) fournies par un réseau de surveillance de la qualité de l'air (AIRPARIF). Le principal objectif visé est l'amélioration de la cartographie des champs de concentration de polluants (le domaine d'intérêt étant la région d'Île-de-France) en utilisant, d'une part, des méthodes d'interpolation basées sur la structure spatiale ou spatio-temporelle des observations (krigeage spatial ou spatio-temporel), et d'autre part, des algorithmes, prenant en compte les mesures, pour corriger les sorties d'un modèle déterministe (Filtre de Kalman d'Ensemble). Les résultats obtenus montrent que dans le cas du dioxyde d'azote la cartographie basée uniquement sur l'interpolation spatiale (le krigeage) conduit à des résultats satisfaisants, car la répartition spatiale des stations est bonne. En revanche, pour l'ozone, c'est l'assimilation séquentielle de données appliquée au modèle (CHIMERE) qui permet une meilleure reconstitution de la forme et de la position du panache pendant les épisodes de forte pollution analysés. En complément de la cartographie, un autre but de ce travail est d'effectuer localement la prévision des niveaux d'ozone sur un horizon de 24 heures. L'approche choisie est celle mettant en œuvre des méthodes de type réseaux neuronaux. Les résultats obtenus en appliquant deux types d'architectures neuronales indiquent une précision correcte surtout pour les 8 premières heures de l'horizon de prédiction / This study is devoted to the spatio-temporal modelling of air pollution at a regional scale using a set of statistical methods in order to treat the measurements of pollutant concentrations (NO2, O3) provided by an air quality monitoring network (AIRPARIF). The main objective is the improvement of the pollutant _elds mapping using either interpolation methods based on the spatial or spatio-temporal structure of the data (spatial or spatiotemporal kriging) or some algorithms taking into account the observations, in order to correct the concentrations simulated by a deterministic model (Ensemble Kalman Filter). The results show that nitrogen dioxide mapping based only on spatial interpolation (kriging) gives the best results, while the spatial repartition of the monitoring sites is good. For the ozone mapping it is the sequential data assimilation that leads us to a better reconstruction of the plume's form and position for the analyzed cases. Complementary to the pollutant mapping, another objective was to perform a local prediction of ozone concentrations on a 24-hour horizon; this task was performed using Artificial Neural Networks. The performance indices obtained using two types of neural architectures indicate a fair accuracy especially for the first 8 hours of prediction horizon Pollution atmosphérique Dioxyde d'azote Ozone Cartographie Prévision Géostatistique Assimilation séquentielle de données Réseaux neuronaux artificiels Atmospheric pollution Nitrogen dioxide Ozone Mapping Prediction Geostatistics Sequential data assimilation Artificial neural networks
15	Anomaly Detection in Streaming Data from a Sensor Network / Anomalidetektion i strömmande data från sensornätverk Vignisson, Egill January 2019 (has links) In this thesis, the use of unsupervised and semi-supervised machine learning techniques was analyzed as potential tools for anomaly detection in the sensor network that the electrical system in a Scania truck is comprised of. The experimentation was designed to analyse the need for both point and contextual anomaly detection in this setting. For the point anomaly detection the method of Isolation Forest was experimented with and for contextual anomaly detection two different recurrent neural network architectures using Long Short Term Memory units was relied on. One model was simply a many to one regression model trained to predict a certain signal, while the other was an encoder-decoder network trained to reconstruct a sequence. Both models were trained in an semi-supervised manner, i.e. on data that only depicts normal behaviour, which theoretically should lead to a performance drop on abnormal sequences resulting in higher error terms. In both setting the parameters of a Gaussian distribution were estimated using these error terms which allowed for a convenient way of defining a threshold which would decide if the observation would be flagged as anomalous or not. Additional experimentation's using an exponential weighted moving average over a number of past observations to filter the signal was also conducted. The models performance on this particular task was very different but the regression model showed a lot of promise especially when combined with a filtering preprocessing step to reduce the noise in the data. However the model selection will always be governed by the nature the particular task at hand so the other methods might perform better in other settings. / I den här avhandlingen var användningen av oövervakad och halv-övervakad maskininlärning analyserad som ett möjligt verktyg för att upptäcka avvikelser av anomali i det sensornätverk som elektriska systemet en Scanialastbil består av. Experimentet var konstruerat för att analysera behovet av både punkt och kontextuella avvikelser av anomali i denna miljö. För punktavvikelse av anomali var metoden Isolation Forest experimenterad med och för kontextuella avvikelser av anomali användes två arkitekturer av återkommande neurala nätverk. En av modellerna var helt enkelt många-till-en regressionmodell tränad för att förutspå ett visst märke, medan den andre var ett kodare-avkodare nätverk tränat för att rekonstruera en sekvens.Båda modellerna blev tränade på ett halv-övervakat sätt, d.v.s. på data som endast visar normalt beteende, som teoretiskt skulle leda till minskad prestanda på onormala sekvenser som ger ökat antal feltermer. I båda fallen blev parametrarna av en Gaussisk distribution estimerade på grund av dessa feltermer som tillåter ett bekvämt sätt att definera en tröskel som skulle bestämma om iakttagelsen skulle bli flaggad som en anomali eller inte. Ytterligare experiment var genomförda med exponentiellt viktad glidande medelvärde över ett visst antal av tidigare iakttagelser för att filtera märket. Modellernas prestanda på denna uppgift var välidt olika men regressionmodellen lovade mycket, särskilt kombinerad med ett filterat förbehandlingssteg för att minska bruset it datan. Ändå kommer modelldelen alltid styras av uppgiftens natur så att andra metoder skulle kunna ge bättre prestanda i andra miljöer. Anomaly Detection Sequential Data Semi-supervised Learning Recurrent Neural Network Applied Mathematics Anomalidetektion sekventiell data halv-övervakad maskininlärning återkommande neuralt nätverk tillämpad matematik. Mathematics Matematik
16	Topic change in robot-moderated group discussions : Investigating machine learning approaches for topic change in robot-moderated discussions using non-verbal features / Ämnesbyte i robotmodererade gruppdiskussioner : Undersöka maskininlärningsmetoder för ämnesändring i robotmodererad diskussion med hjälp av icke-verbala egenskaper Hadjiantonis, Georgios January 2024 (has links) Moderating group discussions among humans can often be challenging and require certain skills, particularly in deciding when to ask other participants to elaborate or change the current topic of the discussion. Recent research on Human-Robot Interaction in groups has demonstrated the positive impact of robot behavior on the quality and effectiveness of the interaction and their ability to shape the dynamics of the group and promote social behavior. In light of this, there is the potential of using social robots as discussion moderators to facilitate engaging and productive discussions among humans. Previous work on topic management in conversational agents was predominantly based on human engagement and topic personalization, with the agent having an active/central role in the conversation. This thesis focuses exclusively on the moderation of group discussions; instead of moderating the topic based on evaluated human engagement, the thesis builds upon previous research on non-verbal cues related to discussion topic structure and turntaking to determine whether participants intend to continue discussing the current topic in a content-free manner. This thesis investigates the suitability of machine-learning models and the contribution of different audiovisual non-verbal features in predicting appropriate topic changes. For this purpose, we utilized pre-recorded interactions between a robot moderator and human participants, which we annotated and from which we extracted acoustic and body language-related features. We provide an analysis of the performance of sequential and nonsequential machine learning approaches using different sets of features, as well as a comparison with rule-based heuristics. The results indicate promising performance in classifying between cases when a topic change was inappropriate versus when a topic change could or should change, outperforming rule-based approaches and demonstrating the feasibility of using machine learning models for topic moderation. Regarding the type of models, the results suggest no distinct advantage of sequential over non-sequential modeling approaches, indicating the effectiveness of simpler non-sequential data models. Acoustic features exhibited comparable and, in some cases, improved overall performance and robustness compared to using only body language-related features or a combination of both types. In summary, this thesis provides a foundation for future research in robot-mediated topic moderation in groups using non-verbal cues, presenting opportunities to further improve social robots with topic moderation capabilities. / Att moderera gruppdiskussioner mellan människor kan ofta vara utmanande och kräver vissa färdigheter, särskilt när det gäller att bestämma när man ska be andra deltagare att utveckla eller ändra det aktuella ämnet för diskussionen. Ny forskning om människa-robotinteraktion i grupper har visat den positiva effekten av robotbeteende på interaktionens kvalitet och effektivitet och deras förmåga att forma gruppens dynamik och främja socialt beteende. I ljuset av detta finns det potential att använda sociala robotar som diskussionsmoderatorer för att underlätta engagerande och produktiva diskussioner bland människor. Tidigare arbete med ämneshantering hos konversationsagenter baserades till övervägande del på mänskligt engagemang och ämnesanpassning, där agenten hade en aktiv/central roll i samtalet. Denna avhandling fokuserar uteslutande på moderering av gruppdiskussioner; istället för att moderera ämnet baserat på utvärderat mänskligt engagemang, bygger avhandlingen på tidigare forskning om icke-verbala ledtrådar relaterade till diskussionsämnesstruktur och turtagning för att avgöra om deltagarna avser att fortsätta diskutera det aktuella ämnet på ett innehållsfritt sätt. Denna avhandling undersöker lämpligheten av maskininlärningsmodeller och bidraget från olika audiovisuella icke-verbala funktioner för att förutsäga lämpliga ämnesändringar. För detta ändamål använde vi förinspelade interaktioner mellan en robotmoderator och mänskliga deltagare, som vi kommenterade och från vilka vi extraherade akustiska och kroppsspråksrelaterade funktioner. Vi tillhandahåller en analys av prestandan för sekventiell och ickesekventiell maskininlärningsmetoder med olika uppsättningar funktioner, samt en jämförelse med regelbaserad heuristik. Resultaten indikerar lovande prestation när det gäller att klassificera mellan fall när ett ämnesbyte var olämpligt kontra när ett ämnesbyte kunde eller borde ändras, överträffande regelbaserade tillvägagångssätt och demonstrerar genomförbarheten av att använda maskininlärningsmodeller för ämnesmoderering. När det gäller typen av modeller tyder resultaten inte på någon tydlig fördel med sekventiella metoder framför icke-sekventiella modelleringsmetoder, vilket indikerar effektiviteten hos enklare icke-sekventiella datamodeller. Akustiska funktioner uppvisade jämförbara och, i vissa fall, förbättrade övergripande prestanda och robusthet jämfört med att endast använda kroppsspråksrelaterade funktioner eller en kombination av båda typerna.svis ger denna avhandling en grund för framtida forskning inom robotmedierad ämnesmoderering i grupper som använder icke-verbala ledtrådar, och presenterar möjligheter att förbättra sociala robotar ytterligare med ämnesmodererande förmåga. Human-Robot Interaction topic moderation group discussion non-verbal features sequential data modeling Människa-robotinteraktion ämnesmodellering gruppdiskussion icke-verbala egenskaper sekventiell datamodellering Computer Sciences Datavetenskap (datalogi)
17	Reconnaissance d’activités humaines à partir de séquences vidéo / Human activity recognition from video sequences Selmi, Mouna 12 December 2014 (has links) Cette thèse s’inscrit dans le contexte de la reconnaissance des activités à partir de séquences vidéo qui est une des préoccupations majeures dans le domaine de la vision par ordinateur. Les domaines d'application pour ces systèmes de vision sont nombreux notamment la vidéo surveillance, la recherche et l'indexation automatique de vidéos ou encore l'assistance aux personnes âgées. Cette tâche reste problématique étant donnée les grandes variations dans la manière de réaliser les activités, l'apparence de la personne et les variations des conditions d'acquisition des activités. L'objectif principal de ce travail de thèse est de proposer une méthode de reconnaissance efficace par rapport aux différents facteurs de variabilité. Les représentations basées sur les points d'intérêt ont montré leur efficacité dans les travaux d'art; elles ont été généralement couplées avec des méthodes de classification globales vue que ses primitives sont temporellement et spatialement désordonnées. Les travaux les plus récents atteignent des performances élevées en modélisant le contexte spatio-temporel des points d'intérêts par exemple certains travaux encodent le voisinage des points d'intérêt à plusieurs échelles. Nous proposons une méthode de reconnaissance des activités qui modélise explicitement l'aspect séquentiel des activités tout en exploitant la robustesse des points d'intérêts dans les conditions réelles. Nous commençons par l'extractivité des points d'intérêt dont a montré leur robustesse par rapport à l'identité de la personne par une étude tensorielle. Ces primitives sont ensuite représentées en tant qu'une séquence de sac de mots (BOW) locaux: la séquence vidéo est segmentée temporellement en utilisant la technique de fenêtre glissante et chacun des segments ainsi obtenu est représenté par BOW des points d'intérêt lui appartenant. Le premier niveau de notre système de classification séquentiel hybride consiste à appliquer les séparateurs à vaste marge (SVM) en tant que classifieur de bas niveau afin de convertir les BOWs locaux en des vecteurs de probabilités des classes d'activité. Les séquences de vecteurs de probabilité ainsi obtenues sot utilisées comme l'entrées de classifieur séquentiel conditionnel champ aléatoire caché (HCRF). Ce dernier permet de classifier d'une manière discriminante les séries temporelles tout en modélisant leurs structures internes via les états cachés. Nous avons évalué notre approche sur des bases publiques ayant des caractéristiques diverses. Les résultats atteints semblent être intéressant par rapport à celles des travaux de l'état de l'art. De plus, nous avons montré que l'utilisation de classifieur de bas niveau permet d'améliorer la performance de système de reconnaissance vue que le classifieur séquentiel HCRF traite directement des informations sémantiques des BOWs locaux, à savoir la probabilité de chacune des activités relativement au segment en question. De plus, les vecteurs de probabilités ont une dimension faible ce qui contribue à éviter le problème de sur apprentissage qui peut intervenir si la dimension de vecteur de caractéristique est plus importante que le nombre des données; ce qui le cas lorsqu'on utilise les BOWs qui sont généralement de dimension élevée. L'estimation les paramètres du HCRF dans un espace de dimension réduite permet aussi de réduire le temps d'entrainement / Human activity recognition (HAR) from video sequences is one of the major active research areas of computer vision. There are numerous application HAR systems, including video-surveillance, search and automatic indexing of videos, and the assistance of frail elderly. This task remains a challenge because of the huge variations in the way of performing activities, in the appearance of the person and in the variation of the acquisition conditions. The main objective of this thesis is to develop an efficient HAR method that is robust to different sources of variability. Approaches based on interest points have shown excellent state-of-the-art performance over the past years. They are generally related to global classification methods as these primitives are temporally and spatially disordered. More recent studies have achieved a high performance by modeling the spatial and temporal context of interest points by encoding, for instance, the neighborhood of the interest points over several scales. In this thesis, we propose a method of activity recognition based on a hybrid model Support Vector Machine - Hidden Conditional Random Field (SVM-HCRF) that models the sequential aspect of activities while exploiting the robustness of interest points in real conditions. We first extract the interest points and show their robustness with respect to the person's identity by a multilinear tensor analysis. These primitives are then represented as a sequence of local "Bags of Words" (BOW): The video is temporally fragmented using the sliding window technique and each of the segments thus obtained is represented by the BOW of interest points belonging to it. The first layer of our hybrid sequential classification system is a Support Vector Machine that converts each local BOW extracted from the video sequence into a vector of activity classes’ probabilities. The sequence of probability vectors thus obtained is used as input of the HCRF. The latter permits a discriminative classification of time series while modeling their internal structures via the hidden states. We have evaluated our approach on various human activity datasets. The results achieved are competitive with those of the current state of art. We have demonstrated, in fact, that the use of a low-level classifier (SVM) improves the performance of the recognition system since the sequential classifier HCRF directly exploits the semantic information from local BOWs, namely the probability of each activity relatively to the current local segment, rather than mere raw information from interest points. Furthermore, the probability vectors have a low-dimension which prevents significantly the risk of overfitting that can occur if the feature vector dimension is relatively high with respect to the training data size; this is precisely the case when using BOWs that generally have a very high dimension. The estimation of the HCRF parameters in a low dimension allows also to significantly reduce the duration of the HCRF training phase Reconnaissance des activités Points d’intérêt Points denses Analyse tensorielle multilinéaire Séparateurs à vaste marge Champs aléatoires conditionnels cachés Human activity recognition Interest points Dense points Multilinear tensor analysis Classification of sequential data Support vector machines Hidden conditional random fields
18	Amélioration du système de recueils d'information de l'entreprise Semantic Group Company grâce à la constitution de ressources sémantiques / Improvement of the information system of the Semantic Group Company through the creation of semantic resources Yahaya Alassan, Mahaman Sanoussi 05 October 2017 (has links) Prendre en compte l'aspect sémantique des données textuelles lors de la tâche de classification s'est imposé comme un réel défi ces dix dernières années. Cette difficulté vient s'ajouter au fait que la plupart des données disponibles sur les réseaux sociaux sont des textes courts, ce qui a notamment pour conséquence de rendre les méthodes basées sur la représentation "bag of words" peu efficientes. L'approche proposée dans ce projet de recherche est différente des approches proposées dans les travaux antérieurs sur l'enrichissement des messages courts et ce pour trois raisons. Tout d'abord, nous n'utilisons pas des bases de connaissances externes comme Wikipedia parce que généralement les messages courts qui sont traités par l'entreprise proveniennent des domaines spécifiques. Deuxièment, les données à traiter ne sont pas utilisées pour la constitution de ressources à cause du fonctionnement de l'outil. Troisièment, à notre connaissance il n'existe pas des travaux d'une part qui exploitent des données structurées comme celles de l'entreprise pour constituer des ressources sémantiques, et d'autre part qui mesurent l'impact de l'enrichissement sur un système interactif de regroupement de flux de textes. Dans cette thèse, nous proposons la création de ressources permettant d'enrichir les messages courts afin d'améliorer la performance de l'outil du regroupement sémantique de l'entreprise Succeed Together. Ce dernier implémente des méthodes de classification supervisée et non supervisée. Pour constituer ces ressources, nous utilisons des techniques de fouille de données séquentielles. / Taking into account the semantic aspect of the textual data during the classification task has become a real challenge in the last ten years. This difficulty is in addition to the fact that most of the data available on social networks are short texts, which in particular results in making methods based on the "bag of words" representation inefficient. The approach proposed in this research project is different from the approaches proposed in previous work on the enrichment of short messages for three reasons. First, we do not use external knowledge like Wikipedia because typically short messages that are processed by the company come from specific domains. Secondly, the data to be processed are not used for the creation of resources because of the operation of the tool. Thirdly, to our knowledge there is no work on the one hand, which uses structured data such as the company's data to constitute semantic resources, and on the other hand, which measure the impact of enrichment on a system Interactive grouping of text flows. In this thesis, we propose the creation of resources enabling to enrich the short messages in order to improve the performance of the tool of the semantic grouping of the company Succeed Together. The tool implements supervised and unsupervised classification methods. To build these resources, we use sequential data mining techniques. Fouille de motifs séquentielle Extraction de ressources sémantiques Unsupervised short texts clustering Supervised short texts clustering Sequential data mining Semantic resources extraction
19	Exploring sequential data with relational concept analysis / Exploration de données séquentielles à l’aide de l’analyse relationnelle de concepts Nica, Cristina 13 October 2017 (has links) De nombreuses méthodes d’extraction de motifs séquentiels ont été proposées pour découvrir des motifs utiles qui décrivent les données analysées. Certaines de ces travaux se sont concentrés sur l’énumération efficace de motifs partiellement ordonnés fermés (cpo-motifs), ce qui rend leur évaluation difficile pour les experts, car leur nombre peut être important. Par suite, nous proposons une approche nouvelle, qui consiste à extraire directement des cpo-motifs multi-niveaux qui sont organisés dans une hiérarchie. Nous proposons une méthode originale dans la cadre de l’Analyse Relationnelle de Concepts (ARC), appelée RCA-SEQ, qui exploite la structure et les propriétés des treillis issus de l’ARC. RCA-SEQ comporte cinq étapes : le prétraitement des données ; l'exploration par l’ARC des données ; l'extraction automatisée d'une hiérarchie de cpo-motifs multi-niveaux par navigation des treillis issus de l’ARC ; la sélection de cpo-motifs pertinents ; l'évaluation des motifs par les experts. / Many sequential pattern mining methods have been proposed to discover useful patterns that describe the analysed sequential data. Several of these works have focused on efficiently enumerating all closed partially-ordered patterns (cpo-patterns), that makes their evaluation a laboured task for experts since their number can be large. To address this issue, we propose a new approach, that is to directly extract multilevel cpo-patterns implicitly organised into a hierarchy. To this end, we devise an original method within the Relational Concept Analysis (RCA) framework, referred to as RCA-SEQ, that exploits the structure and properties of the lattices from the RCA output. RCA-SEQ spans five steps: the preprocessing of the raw data; the RCA-based exploration of the preprocessed data; the automatic extraction of a hierarchy of multilevel cpo-patterns by navigating the lattices from the RCA output; the selection of relevant multilevel cpo-patterns; the pattern evaluation done by experts. Données séquentielles Analyse Relationnelle de Concepts Motifs partialement ordonnés fermés Motifs multi-niveaux Hiérarchie de motifs Mesures d’intérêt Sequential data Relational Concept Analysis Closed partially-ordered patterns Multilevel patterns Hierarchy of patterns Measures of interest 005.7 006.33
20	Basil-GAN / Basilika-GAN Risberg, Jonatan January 2022 (has links) Developments in computer vision has sought to design deep neural networks which trained on a large set of images are able to generate high quality artificial images which share semantic qualities with the original image set. A pivotal shift was made with the introduction of the generative adversarial network (GAN) by Goodfellow et al.. Building on the work by Goodfellow more advanced models using the same idea have shown great improvements in terms of both image quality and data diversity. GAN models generate images by feeding samples from a vector space into a generative neural network. The structure of these so called latent vector samples show to correspond to semantic similarities of their corresponding generated images. In this thesis the DCGAN model is trained on a novel data set consisting of image sequences of the growth process of basil plants from germination to harvest. We evaluate the trained model by comparing the DCGAN performance on benchmark data sets such as MNIST and CIFAR10 and conclude that the model trained on the basil plant data set achieved similar results compared to the MNIST data set and better results in comparison to the CIFAR10 data set. To argue for the potential of using more advanced GAN models we compare the results from the DCGAN model with the contemporary StyleGAN2 model. We also investigate the latent vector space produced by the DCGAN model and confirm that in accordance with previous research, namely that the DCGAN model is able to generate a latent space with data specific semantic structures. For the DCGAN model trained on the data set of basil plants, the latent space is able to distinguish between images of early stage basil plants from late stage plants in the growth phase. Furthermore, utilizing the sequential semantics of the basil plant data set, an attempt at generating an artificial growth sequence is made using linear interpolation. Finally we present an unsuccessful attempt at visualising the latent space produced by the DCGAN model using a rudimentary approach at inverting the generator network function. / Utvecklingen inom datorseende har syftat till att utforma djupa neurala nätverk som tränas på en stor mängd bilder och kan generera konstgjorda bilder av hög kvalitet med samma semantiska egenskaper som de ursprungliga bilderna. Ett avgörande skifte skedde när Goodfellow et al. introducerade det generativa adversariella nätverket (GAN). Med utgångspunkt i Goodfellows arbete har flera mer avancerade modeller som använder samma idé uppvisat stora förbättringar när det gäller både bildkvalitet och datamångfald. GAN-modeller genererar bilder genom att mata in vektorer från ett vektorrum till ett generativt neuralt nätverk. Strukturen hos dessa så kallade latenta vektorer visar sig motsvara semantiska likheter mellan motsvarande genererade bilder. I detta examensarbete har DCGAN-modellen tränats på en ny datamängd som består av bildsekvenser av basilikaplantors tillväxtprocess från groning till skörd. Vi utvärderar den tränade modellen genom att jämföra DCGAN-modellen mot referensdataset som MNIST och CIFAR10 och drar slutsatsen att DCGAN tränad på datasetet för basilikaväxter uppnår liknande resultat jämfört med MNIST-dataset och bättre resultat jämfört med CIFAR10-datasetet. För att påvisa potentialen av att använda mer avancerade GAN-modeller jämförs resultaten från DCGAN-modellen med den mer avancerade StyleGAN2-modellen. Vi undersöker också det latenta vektorrum som produceras av DCGAN-modellen och bekräftar att DCGAN-modellen i enlighet med tidigare forskning kan generera ett latent rum med dataspecifika semantiska strukturer. För DCGAN-modellen som tränats på datamängden med basilikaplantor lyckas det latenta rummet skilja mellan bilder av basilikaplantor i tidiga stadier och sena stadier av plantor i tillväxtprocessen. Med hjälp av den sekventiella semantiken i datamängden för basilikaväxter gjörs dessutom ett försök att generera en artificiell tillväxtsekvens med hjälp av linjär interpolation. Slutligen presenterar vi ett misslyckat försök att visualisera det latenta rummet som produceras av DCGAN-modellen med hjälp av ett rudimentärt tillvägagångssätt för att invertera den generativa nätverksfunktionen. GAN mathematical statistics deep neural networks generative models latent space exploration sequential data GAN matematisk statistik djupa neurala nätverk generativa modeller utforskning av latenta rum sekventiell data Other Mathematics Annan matematik

Search results