Global ETD Search

291	Identification du profil des utilisateurs d’un hypermédia encyclopédique à l’aide de classifieurs basés sur des dissimilarités : création d’un composant d’un système expert pour Hypergéo / Identification of hypermedia encyclopedic user's profile using classifiers based on dissimilarities : creating a component of an expert system for Hypergeo Abou Latif, Firas 08 July 2011 (has links) L’objectif de cette thèse est d’identifier le profil d’utilisateur d’un hypermédia afin de l’adapter. Ceprofil est déterminé en utilisant des algorithmes d’apprentissage supervisé comme le SVM.Le modèle d’utilisateur est l’un des composants essentiels des hypermédias adaptatifs. Une des façons de caractériser ce modèle est d’associer l’utilisateur à un profil. Le Web Usage Mining (WUM)identifie ce profil à l’aide des traces de navigation. Toutefois, ces techniques ne fonctionnent généralement que sur de gros volumes de données. Dans le cadre de volumes de données réduits, nous proposons d’utiliser la structure et le contenu de l’hypermédia. Pour cela, nous avons utilisé des algorithmes d’apprentissage à noyau pour lesquels nous avons défini l’élément clé qu’est la mesure de similarité entre traces basée sur une « distance » entre documents du site. Notre approche a été validée à l’aide de données synthétiques puis à l’aide de données issues des traces des utilisateurs du site Hypergéo (site webencyclopédique spécialisé dans la géographie). Nos résultats ont été comparés à ceux obtenus à l’aide d’une des techniques du WUM (l’algorithme des motifs caractéristiques). Finalement, nos propositions pour identifier les profils a posteriori ont permis de mettre en évidence cinq profils. En appliquant une« distance sémantique » entre documents, les utilisateurs d’Hypergéo ont été classés correctement selon leurs centres d’intérêt. / This thesis is devoted to identify the profile of hypermedia user, then to adapt it according to user’s profile. This profile is found by using supervised learning algorithm like SVM. The user model is one of the essential components of adaptive hypermedia. One way to characterize this model is to associate a user to a profile. Web Usage Mining (WUM) identifies this profile from traces. However, these techniques usually operate on large mass of data. In the case when not enough data are available, we propose to use the structure and the content of the hypermedia. Hence, we used supervised kernel learning algorithms for which we have defined the measure of similarity between traces based on a “distance” between documents of the site. Our approach was validated using synthetic data and then using real data from the traces of Hypergéo users, Hypergéo is an encyclopedic website specialized in geography. Our results were compared with those obtained using a techniques of WUM(the algorithm of characteristic patterns). Finally, our proposals to identify the profiles a posteriori led usto highlight five profiles. Hypergéo users are classified according to their interests when the “semantic distance” between documents is applied. Fouille de données d’usage du Web Algorithmes de projection Distance et dissimilarité Hypermédia adaptatif Web usage mining Supervised and unsupervised learning Visualization Dimensionality reduction Distance and dissimilarity Adaptive hypermedia.
292	Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering / Contributions à l'apprentissage non supervisé à partir de flux de données massives en grande dimension : structuration, hashing et clustering Morvan, Anne 12 November 2018 (has links) Cette thèse étudie deux tâches fondamentales d'apprentissage non supervisé: la recherche des plus proches voisins et le clustering de données massives en grande dimension pour respecter d'importantes contraintes de temps et d'espace.Tout d'abord, un nouveau cadre théorique permet de réduire le coût spatial et d'augmenter le débit de traitement du Cross-polytope LSH pour la recherche du plus proche voisin presque sans aucune perte de précision.Ensuite, une méthode est conçue pour apprendre en une seule passe sur des données en grande dimension des codes compacts binaires. En plus de garanties théoriques, la qualité des sketches obtenus est mesurée dans le cadre de la recherche approximative des plus proches voisins. Puis, un algorithme de clustering sans paramètre et efficace en terme de coût de stockage est développé en s'appuyant sur l'extraction d'un arbre couvrant minimum approché du graphe de dissimilarité compressé auquel des coupes bien choisies sont effectuées. / This thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed. Apprentissage non supervisé Recherche des plus proches voisins Flux Clustering Approximation Réduction de dimension Hachage Résumés minimalistes Unsupervised learning Nearest neighbors search Streaming Clustering Approximation Dimensionality reduction Hashing Sketching 005.7
293	Vysoce výkonné prohledávání a dotazování ve vybraných mnohadimenzionálních prostorech v přírodních vědách / High-performance exploration and querying of selected multi-dimensional spaces in life sciences Kratochvíl, Miroslav January 2020 (has links) This thesis studies, implements and experiments with specific application-oriented approaches for exploring and querying multi-dimensional datasets. The first part of the thesis scrutinizes indexing of the complex space of chemical compounds, and details a design of high-performance retrieval system for small molecules. The resulting system is then utilized within a wider context of federated search in heterogeneous data and metadata related to the chemical datasets. In the second part, the thesis focuses on fast visualization and exploration of many-dimensional data that originate from single- cell cytometry. Self-organizing maps are used to derive fast methods for analysis of the datasets, and used as a base for a novel data visualization algorithm. Finally, a similar approach is utilized for highly interactive exploration of multimedia datasets. The main contributions of the thesis comprise the advancement in optimization and methods for querying the chemical data implemented in the Sachem database cartridge, the federated, SPARQL-based interface to Sachem that provides the heterogeneous search support, dimensionality reduction algorithm EmbedSOM, design and implementation of the specific EmbedSOM-backed analysis tool for flow and mass cytometry, and design and implementation of the multimedia...
294	Conjurer la malédiction de la dimension dans le calcul du noyau de viabilité à l'aide de parallélisation sur carte graphique et de la théorie de la fiabilité : application à des dynamiques environnementales / Dispel the dimensionality curse in viability kernel computation with the help of GPGPU and reliability theory : application to environmental dynamics Brias, Antoine 15 December 2016 (has links) La théorie de la viabilité propose des outils permettant de contrôler un système dynamique afin de le maintenir dans un domaine de contraintes. Le concept central de cette théorie est le noyau de viabilité, qui est l’ensemble des états initiaux à partir desquels il existe au moins une trajectoire contrôlée restant dans le domaine de contraintes. Cependant, le temps et l’espace nécessaires au calcul du noyau de viabilité augmentent exponentiellement avec le nombre de dimensions du problème considéré. C’est la malédiction de la dimension. Elle est d’autant plus présente dans le cas de systèmes incorporant des incertitudes. Dans ce cas-là, le noyau de viabilité devient l’ensemble des états pour lesquels il existe une stratégie de contrôle permettant de rester dans le domaine de contraintes avec au moins une certaine probabilité jusqu’à l’horizon de temps donné. L’objectif de cette thèse est d’étudier et de développer des approches afin de combattre cette malédiction de la dimension. Pour ce faire, nous avons proposé deux axes de recherche : la parallélisation des calculs et l’utilisation de la théorie de la fiabilité. Les résultats sont illustrés par plusieurs applications. Le premier axe explore l’utilisation de calcul parallèle sur carte graphique. La version du programme utilisant la carte graphique est jusqu’à 20 fois plus rapide que la version séquentielle, traitant des problèmes jusqu’en dimension 7. Outre ces gains en temps de calcul, nos travaux montrent que la majeure partie des ressources est utilisée pour le calcul des probabilités de transition du système. Cette observation fait le lien avec le deuxième axe de recherche qui propose un algorithme calculant une approximation de noyaux de viabilité stochastiques utilisant des méthodes fiabilistes calculant les probabilités de transition. L’espace-mémoire requis par cet algorithme est une fonction linéaire du nombre d’états de la grille utilisée, contrairement à l’espace-mémoire requis par l’algorithme de programmation dynamique classique qui dépend quadratiquement du nombre d’états. Ces approches permettent d’envisager l’application de la théorie de la viabilité à des systèmes de plus grande dimension. Ainsi nous l’avons appliquée à un modèle de dynamique du phosphore dans le cadre de la gestion de l’eutrophisation des lacs, préalablement calibré sur les données du lac du Bourget. De plus, les liens entre fiabilité et viabilité sont mis en valeur avec une application du calcul de noyau de viabilité stochastique, autrement appelé noyau de fiabilité, en conception fiable dans le cas d’une poutre corrodée. / Viability theory provides tools to maintain a dynamical system in a constraint domain. The main concept of this theory is the viability kernel, which is the set of initial states from which there is at least one controlled trajectory remaining in the constraint domain. However, the time and space needed to calculate the viability kernel increases exponentially with the number of dimensions of the problem. This issue is known as “the curse of dimensionality”. This curse is even more present when applying the viability theory to uncertain systems. In this case, the viability kernel is the set of states for which there is at least a control strategy to stay in the constraint domain with some probability until the time horizon. The objective of this thesis is to study and develop approaches to beat back the curse of dimensionality. We propose two lines of research: the parallel computing and the use of reliability theory tools. The results are illustrated by several applications. The first line explores the use of parallel computing on graphics card. The version of the program using the graphics card is up to 20 times faster than the sequential version, dealing with problems until dimension 7. In addition to the gains in calculation time, our work shows that the majority of the resources is used to the calculation of transition probabilities. This observation makes the link with the second line of research which proposes an algorithm calculating a stochastic approximation of viability kernels by using reliability methods in order to compute the transition probabilities. The memory space required by this algorithm is a linear function of the number of states of the grid, unlike the memory space required by conventional dynamic programming algorithm which quadratically depends on the number of states. These approaches may enable the use of the viability theory in the case of high-dimension systems. So we applied it to a phosphorus dynamics for the management of Lake Bourget eutrophication, previously calibrated from experimental data. In addition the relationship between reliability and viability is highlighted with an application of stochastic viability kernel computation, otherwise known as reliability kernel, in reliable design in the case of a corroded beam. Théorie de la viabilité Malédiction de la dimension Parallélisation Théorie de la fiabilité Programmation dynamique Systèmes environnementaux Viability theory Curse of dimensionality Parallel computing Reliability theory Dynamic programming Environmental systems
295	Genomförbarhetsstudie av att känna igen två tankemönster i följd med EEG / Feasibility study of recognizing two subsequent thought patterns with EEG Wilhelmsson, Oskar, Wikén, Victor January 2015 (has links) Studien implementerade ett hjärna-dator-gränssnitt med hjälp av EEG-instrumentet MindWave Mobile Headset. Vi undersökte om det var möjligt att utföra fyra operationer genom att använda tankemönster. Fyra försökspersoner deltog i studien. Deras uppgift var att tänka i två tankemönster i följd som resulterade i en operation. EEG-signalen förbehandlas så att en mönsterigenkänningsmetod (k-NN) lättare kunde urskilja två tankemönster ur signalen. Denna undersökning har till vår vetskap inte tidigare utförts och är därmed kunskapsluckan vi ämnar fylla. Att fylla denna kunskapslucka är av intresse för bland annat användargrupperna: rörelsehindrade, spelintresserade och Virtual Reality-användare. Vi tog fram en modell som modellerade det bästa möjliga utfallet av metodiken i föreliggande studie. Undersökningens resultat kunde inte användas för att göra slutsatser angående frågeställningen då detta skulle vara att post hoc-teoretisera. I modellen visades dock tre av fyra operationer vara genomförbara, med en indikation om att även den fjärde var möjlig att utföra. Resultatet indikerar att det finns anledning att utföra en fortsatt studie. Den föreslagna fortsatta studien bör innefatta nya mätningar som testas av modellen för att fullt ut besvara problemformuleringen. / This study implements a Brain-Computer-Interface using the EEG-instrument MindWave Mobile Headset. We studied the feasibility of performing four operations using thought patterns. Four test subjects participated in the study. Their task was to think in two subsequent thought patterns that resulted in an operation. The EEG-signal was pre-processed in such a way that a pattern recognition algorithm (k-NN) more easily could recognize two thought patterns in the signal. This study has to our knowledge not been done before and thus aims to fill this lack of knowledge in the scientific community. User groups that have an interest in filling this gap are, amongst others; disabled people, gamers, and Virtual Reality users. We created a model that modeled the best possible outcome of the method used in this study. Conclusions drawn from the result can not be used to fully answer the problem statement, since it would be to post hoc-theorize. However, three out of four operations were possible to perform in the model, with an indication that the fourth also was possible to perform. These results indicate that there are grounds to continue this study. The proposed continued study should include new measurements that are tested by the model to determine if it is feasible to distinguish all four operations. EEG MindWave BCI feature vector pre-processing k-NN dimensionality reduction classification algorithm EEG MindWave BCI egenskapsvektor förbehandling k-NN dimensionalitetsreducering klassificeringsalgoritm Media and Communication Technology Medieteknik
296	Classify part of day and snow on the load of timber stacks : A comparative study between partitional clustering and competitive learning Nordqvist, My January 2021 (has links) In today's society, companies are trying to find ways to utilize all the data they have, which considers valuable information and insights to make better decisions. This includes data used to keeping track of timber that flows between forest and industry. The growth of Artificial Intelligence (AI) and Machine Learning (ML) has enabled the development of ML modes to automate the measurements of timber on timber trucks, based on images. However, to improve the results there is a need to be able to get information from unlabeled images in order to decide weather and lighting conditions. The objective of this study is to perform an extensive for classifying unlabeled images in the categories, daylight, darkness, and snow on the load. A comparative study between partitional clustering and competitive learning is conducted to investigate which method gives the best results in terms of different clustering performance metrics. It also examines how dimensionality reduction affects the outcome. The algorithms K-means and Kohonen Self-Organizing Map (SOM) are selected for the clustering. Each model is investigated according to the number of clusters, size of dataset, clustering time, clustering performance, and manual samples from each cluster. The results indicate a noticeable clustering performance discrepancy between the algorithms concerning the number of clusters, dataset size, and manual samples. The use of dimensionality reduction led to shorter clustering time but slightly worse clustering performance. The evaluation results further show that the clustering time of Kohonen SOM is significantly higher than that of K-means. Machine Learning (ML) Unsupervised Learning Cluster Analysis Partitional Clustering Competitive Learning Dimensionality Reduction Principal Component Analysis (PCA) K-means Kohonen Self-Organizing Map (SOM) Timber Computer Engineering Datorteknik
297	Multidimensionality of the models and the data in the side-channel domain / Multidimensionnalité des modèles et des données dans le domaine des canaux auxiliaires Marion, Damien 05 December 2018 (has links) Depuis la publication en 1999 du papier fondateur de Paul C. Kocher, Joshua Jaffe et Benjamin Jun, intitulé "Differential Power Analysis", les attaques par canaux auxiliaires se sont révélées être un moyen d’attaque performant contre les algorithmes cryptographiques. En effet, il s’est avéré que l’utilisation d’information extraite de canaux auxiliaires comme le temps d’exécution, la consommation de courant ou les émanations électromagnétiques, pouvait être utilisée pour retrouver des clés secrètes. C’est dans ce contexte que cette thèse propose, dans un premier temps, de traiter le problème de la réduction de dimension. En effet, en vingt ans, la complexité ainsi que la taille des données extraites des canaux auxiliaires n’a cessé de croître. C’est pourquoi la réduction de dimension de ces données permet de réduire le temps et d’augmenter l’efficacité des attaques. Les méthodes de réduction de dimension proposées le sont pour des modèles de fuites complexe et de dimension quelconques. Dans un second temps, une méthode d’évaluation d’algorithmes logiciels est proposée. Celle-ci repose sur l’analyse de l’ensemble des données manipulées lors de l’exécution du logiciel évalué. La méthode proposée est composée de plusieurs fonctionnalités permettant d’accélérer et d’augmenter l’efficacité de l’analyse, notamment dans le contexte d’évaluation d’implémentation de cryptographie en boîte blanche. / Since the publication in 1999 of the seminal paper of Paul C. Kocher, Joshua Jaffe and Benjamin Jun, entitled "Differential Power Analysis", the side-channel attacks have been proved to be efficient ways to attack cryptographic algorithms. Indeed, it has been revealed that the usage of information extracted from the side-channels such as the execution time, the power consumption or the electromagnetic emanations could be used to recover secret keys. In this context, we propose first, to treat the problem of dimensionality reduction. Indeed, since twenty years, the complexity and the size of the data extracted from the side-channels do not stop to grow. That is why the reduction of these data decreases the time and increases the efficiency of these attacks. The dimension reduction is proposed for complex leakage models and any dimension. Second, a software leakage assessment methodology is proposed ; it is based on the analysis of all the manipulated data during the execution of the software. The proposed methodology provides features that speed-up and increase the efficiency of the analysis, especially in the case of white box cryptography. Attaques par canaux auxiliaires Réduction de dimension Modèles de fuites Cryptographie en boîte blanche AES Side-channel attacks Dimensionality reduction Leakage models White box cryptography AES, Advanced Encryption Standard
298	Efficient learning on high-dimensional operational data Zhang, Hongyi January 2019 (has links) In a networked system, operational data collected by sensors or extracted from system logs can be used for target performance prediction, anomaly detection, etc. However, the number of metrics collected from a networked system is very large and usually can reach about 106 for a medium-sized system. This project aims to analyze and compare different unsupervised machine learning methods such as Unsupervised Feature Selection, Principle Component Analysis, Autoencoder, which can lead to efficient learning from high-dimensional data. The objective is to reduce the dimensionality of the input space while maintaining the prediction performance when compared with the learning on the full feature space. The data used in this project is collected from a KTH testbed which runs a Video-on-Demand service and a Key-Value store under different types of traffic load. The findings confirm the manifold hypothesis, which states that real-world high-dimensional data lie on lowdimensional manifolds embedded within the high-dimensional space. In addition, this project investigates data visualization of infrastructure measurements through two-dimensional plots. The results show that we can achieve data separation by using different mapping methods. / I ett nätverkssystem kan driftsdata som samlats in av sensorer eller extraherats från systemloggar användas för att förutsäga målprestanda, anomalidetektering etc. Antalet mätvärden som samlats in från ett nätverkssystem är dock mycket stort och kan vanligtvis uppgå till cirka 106 för ett medelstort system. Projektet syftar till att analysera och jämföra olika oövervakade metoder för maskininlärning, till exempel Oövervakad funktionsval, analys av huvudkomponent, autokodare, vilket kan leda till effektivt lärande av högdimensionell data. Målet är att minska ingångsutrymmet och samtidigt bibehålla prediktionsprestanda jämfört med inlärningen på hela funktionen. Uppgifterna som används i detta projekt samlas in från en KTH-testbädd som driver en Video-on-Demand-tjänst och en Key-Value-butik under olika typer av trafikbelastning. Resultaten bekräftar mångfaldshypotesen, som säger att verkliga högdimensionella data ligger på lågdimensionella grenrören inbäddade i det högdimensionella rymden. Dessutom undersöker detta projekt datavisualisering av infrastrukturmätningar genom tvådimensionella tomter. Resultaten visar att vi kan uppnå dataseparering genom att använda olika kartläggningsmetoder. Data-driven engineering Machine learning Dimensionality reduction Data-driven teknik Maskininlärning Dimensionalitet minskning Computer and Information Sciences Data- och informationsvetenskap Elektroteknik och elektronik
299	Boson Mode, Dimensional Crossover, Medium Range Structure and Intermediate Phase in Lithium- and Sodium-Borate Glasses Vignarooban, Kandasamy January 2012 (has links) No description available. Electrical Engineering Intermediate Phase Rigidity Theory Alkali-borate Glasses Boson Mode and Network Dimensionality Elastic Phases in Alkali-borates Raman Scattering in Borate Glasses
300	IP Algorithm Applied to Proteomics Data Green, Christopher Lee 30 November 2004 (has links) (PDF) Mass spectrometry has been used extensively in recent years as a valuable tool in the study of proteomics. However, the data thus produced exhibits hyper-dimensionality. Reducing the dimensionality of the data often requires the imposition of many assumptions which can be harmful to subsequent analysis. The IP algorithm is a dimension reduction algorithm, similar in purpose to latent variable analysis. It is based on the principle of maximum entropy and therefore imposes a minimum number of assumptions on the data. Partial Least Squares (PLS) is an algorithm commonly used with proteomics data from mass spectrometry in order to reduce the dimension of the data. The IP algorithm and a PLS algorithm were applied to proteomics data from mass spectrometry to reduce the dimension of the data. The data came from three groups of patients, those with no tumors, malignant or benign tumors. Reduced data sets were produced from the IP algorithm and the PLS algorithm. Logistic regression models were constructed using predictor variables extracted from these data sets. The response was threefold and indicated which tumor classifications each patient belonged. Misclassification rates were determined for the IP algorithm and the PLS algorithm. The rates correct classification associated with the IP algorithm were equal or better than those rates associated with the PLS algorithm. Proteomics hyper-dimensionality Partial Least Squares mass spectrometry IP algorithm Information Partition Function classification discrimination ovarian cancer grade of membership Statistics and Probability

Search results