Global ETD Search

181	Identification du profil des utilisateurs d’un hypermédia encyclopédique à l’aide de classifieurs basés sur des dissimilarités : création d’un composant d’un système expert pour Hypergéo / Identification of hypermedia encyclopedic user's profile using classifiers based on dissimilarities : creating a component of an expert system for Hypergeo Abou Latif, Firas 08 July 2011 (has links) L’objectif de cette thèse est d’identifier le profil d’utilisateur d’un hypermédia afin de l’adapter. Ceprofil est déterminé en utilisant des algorithmes d’apprentissage supervisé comme le SVM.Le modèle d’utilisateur est l’un des composants essentiels des hypermédias adaptatifs. Une des façons de caractériser ce modèle est d’associer l’utilisateur à un profil. Le Web Usage Mining (WUM)identifie ce profil à l’aide des traces de navigation. Toutefois, ces techniques ne fonctionnent généralement que sur de gros volumes de données. Dans le cadre de volumes de données réduits, nous proposons d’utiliser la structure et le contenu de l’hypermédia. Pour cela, nous avons utilisé des algorithmes d’apprentissage à noyau pour lesquels nous avons défini l’élément clé qu’est la mesure de similarité entre traces basée sur une « distance » entre documents du site. Notre approche a été validée à l’aide de données synthétiques puis à l’aide de données issues des traces des utilisateurs du site Hypergéo (site webencyclopédique spécialisé dans la géographie). Nos résultats ont été comparés à ceux obtenus à l’aide d’une des techniques du WUM (l’algorithme des motifs caractéristiques). Finalement, nos propositions pour identifier les profils a posteriori ont permis de mettre en évidence cinq profils. En appliquant une« distance sémantique » entre documents, les utilisateurs d’Hypergéo ont été classés correctement selon leurs centres d’intérêt. / This thesis is devoted to identify the profile of hypermedia user, then to adapt it according to user’s profile. This profile is found by using supervised learning algorithm like SVM. The user model is one of the essential components of adaptive hypermedia. One way to characterize this model is to associate a user to a profile. Web Usage Mining (WUM) identifies this profile from traces. However, these techniques usually operate on large mass of data. In the case when not enough data are available, we propose to use the structure and the content of the hypermedia. Hence, we used supervised kernel learning algorithms for which we have defined the measure of similarity between traces based on a “distance” between documents of the site. Our approach was validated using synthetic data and then using real data from the traces of Hypergéo users, Hypergéo is an encyclopedic website specialized in geography. Our results were compared with those obtained using a techniques of WUM(the algorithm of characteristic patterns). Finally, our proposals to identify the profiles a posteriori led usto highlight five profiles. Hypergéo users are classified according to their interests when the “semantic distance” between documents is applied. Fouille de données d’usage du Web Algorithmes de projection Distance et dissimilarité Hypermédia adaptatif Web usage mining Supervised and unsupervised learning Visualization Dimensionality reduction Distance and dissimilarity Adaptive hypermedia.
182	Contributions to unsupervised learning from massive high-dimensional data streams : structuring, hashing and clustering / Contributions à l'apprentissage non supervisé à partir de flux de données massives en grande dimension : structuration, hashing et clustering Morvan, Anne 12 November 2018 (has links) Cette thèse étudie deux tâches fondamentales d'apprentissage non supervisé: la recherche des plus proches voisins et le clustering de données massives en grande dimension pour respecter d'importantes contraintes de temps et d'espace.Tout d'abord, un nouveau cadre théorique permet de réduire le coût spatial et d'augmenter le débit de traitement du Cross-polytope LSH pour la recherche du plus proche voisin presque sans aucune perte de précision.Ensuite, une méthode est conçue pour apprendre en une seule passe sur des données en grande dimension des codes compacts binaires. En plus de garanties théoriques, la qualité des sketches obtenus est mesurée dans le cadre de la recherche approximative des plus proches voisins. Puis, un algorithme de clustering sans paramètre et efficace en terme de coût de stockage est développé en s'appuyant sur l'extraction d'un arbre couvrant minimum approché du graphe de dissimilarité compressé auquel des coupes bien choisies sont effectuées. / This thesis focuses on how to perform efficiently unsupervised machine learning such as the fundamentally linked nearest neighbor search and clustering task, under time and space constraints for high-dimensional datasets. First, a new theoretical framework reduces the space cost and increases the rate of flow of data-independent Cross-polytope LSH for the approximative nearest neighbor search with almost no loss of accuracy.Second, a novel streaming data-dependent method is designed to learn compact binary codes from high-dimensional data points in only one pass. Besides some theoretical guarantees, the quality of the obtained embeddings are accessed on the approximate nearest neighbors search task.Finally, a space-efficient parameter-free clustering algorithm is conceived, based on the recovery of an approximate Minimum Spanning Tree of the sketched data dissimilarity graph on which suitable cuts are performed. Apprentissage non supervisé Recherche des plus proches voisins Flux Clustering Approximation Réduction de dimension Hachage Résumés minimalistes Unsupervised learning Nearest neighbors search Streaming Clustering Approximation Dimensionality reduction Hashing Sketching 005.7
183	Une approche basée sur les motifs fermés pour résoudre le problème de clustering par consensus / A closed patterns-based approach to the consensus clustering problem Al-Najdi, Atheer 30 November 2016 (has links) Le clustering est le processus de partitionnement d’un ensemble de données en groupes, de sorte que les instances du même groupe sont plus semblables les unes aux autres qu’avec celles de tout autre groupe. De nombreux algorithmes de clustering ont été proposés, mais aucun d’entre eux ne s’avère fournir une partitiondes données pertinente dans toutes les situations. Le clustering par consensus vise à améliorer le processus de regroupement en combinant différentes partitions obtenues à partir de divers algorithmes afin d’obtenir une solution de consensus de meilleure qualité. Dans ce travail, une nouvelle méthode de clustering par consensus, appelée MultiCons, est proposée. Cette méthode utilise la technique d’extraction des itemsets fréquents fermés dans le but de découvrir les similitudes entre les différentes solutions de clustering dits de base. Les similitudes identifiées sont représentées sous une forme de motifs de clustering, chacun définissant un accord entre un ensemble de clusters de bases sur le regroupement d’un ensemble d’instances. En traitant ces motifs par groupes, en fonction du nombre de clusters de base qui définissent le motif, la méthode MultiCons génère une solution de consensus pour chaque groupe, générant par conséquence plusieurs consensus candidats. Ces différentes solutions sont ensuite représentées dans une structure arborescente appelée arbre de consensus, ouConsTree. Cette représentation graphique facilite la compréhension du processus de construction des multiples consensus, ainsi que les relations entre les instances et les structures d’instances dans l’espace de données / Clustering is the process of partitioning a dataset into groups, so that the instances in the same group are more similar to each other than to instances in any other group. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, a new consensus clustering method, called MultiCons, is proposed. It uses the frequent closed itemset mining technique in order to discover the similarities between the different base clustering solutions. The identified similarities are presented in a form of clustering patterns, that each defines the agreement between a set of base clusters in grouping a set of instances. By dividing these patterns into groups based on the number of base clusters that define the pattern, MultiCons generates a consensussolution from each group, resulting in having multiple consensus candidates. These different solutions are presented in a tree-like structure, called ConsTree, that facilitates understanding the process of building the multiple consensuses, and also the relationships between the data instances and their structuring in the data space. Five consensus functions are proposed in this work in order to build a consensus solution from the clustering patterns. Approach 1 is to just merge any intersecting clustering patterns. Approach 2 can either merge or split intersecting patterns based on a proposed measure, called intersection ratio Partitionnement de données Classification non-supervisée Ensembles de partitionnement de données Itemsets fréquents fermés Clustering Unsupervised learning Consensus clustering Clusterings ensemble Frequent closed itemsets
184	Användandet av algoritmer inom investeringar kopplat till OMX30 : Tillämpning av maskininlärning inom portföljhantering: En K-Betydelsemetod Larsson Olsson, Simon January 2020 (has links) Many investors use different types of data methods before making a decision, regardless of whether it is long or short term. The choice of which analysis method is generally determined by risk, removal of bias and the cost. One method that has been investigated is the use of machine lerning in data analysis. The advantage of machine lernig is that the method successfully handles comples, non-linear and non-stationary problems. In this essay, it will be investigated whether unattended machine learning, which uses the K-meaning method, which is a method that has not been investigated to any great extent either in practice or in theory to create a beneficial portfolio. The data used for the k-meaning method was historical data from the Swedish stock market between 1 January 2018 and 2 November 2020. The k-meaning analysis consists of the return of all shares included within OMX30 and the average deviation, which created a cluster of 11 shares that could generate a relatively high return compared to the remaining shares. To analyze whether the generated cluster were acceptable, an analysis of the sharpe-ratio and downward risk was preformed, which showed that the portfolio had a good risk-adjusted returnbut a worse result on downward risk. Machine learning k-means unsupervised learning stock market OMX30 portfolio diversification Maskininlärning K-betydelse oövervakadinlärning aktiemarknad OMX30 portfölj diversifiering Business Administration Företagsekonomi
185	Designing an Interactive tool for Cluster Analysis of Clickstream Data Collin, Sara, Möllerberg, Ingrid January 2020 (has links) The purpose of this study was to develop an interactive tool that enables identification of different types of users of an application based on clickstream data. A complex hierarchical clustering algorithm tool called Recursive Hierarchical Clustering (RHC) was used. RHC provides a visualisation of user types as clusters, where each cluster has its own distinguishing action pattern, i.e., one or several consecutive actions made by the user in the application. A case study was conducted on the mobile application Plick, which is an application for selling and buying second hand clothes. During the course of the project, the analysis and its result was discovered to be difficult to understand by the operators of the tool. The interactive tool had to be extended to visualise the complex analysis and its result in an intuitive way. A literature study of how humans interpret information, and how to present it to operators, was conducted and led to a redesign of the tool. More information was added to each cluster to enable further understanding of the clustering results. A clustering reconfiguration option was also created where operators of the tool got the possibility to interact with the analysis. In the reconfiguration, the operator could change the input file of the cluster analysis and thus the end result. Usability tests showed that the extra added information about the clusters served as an amplification and a verification of the original results presented by RHC. In some cases the original result presented by RHC was used as a verification to user group identification made by the operator solely based on the extra added information. The usability tests showed that the complex analysis with its results could be understood and configured without considerable comprehension of the algorithm. Instead it seemed like it could be successfully used in order to identify user types with help of visual clues in the interface and default settings in the reconfiguration. The visualisation tool is shown to be successful in identifying and visualising user groups in an intuitive way. Hierarchical clustering Unsupervised learning User segmentation Cluster visualization Interactive tool Cluster analysis Clickstream Interface design Hierarkisk klustring Användarsegmentering Klustervisualisering Interaktivt verktyg Klusteranalys Klickström Gränssnittsdesign Engineering and Technology Teknik och teknologier
186	Automated sleep scoring using unsupervised learning of meta-features / Automatiserad sömnmätning med användning av oövervakad inlärning av meta-särdrag Olsson, Sebastian January 2016 (has links) Sleep is an important part of life as it affects the performance of one's activities during all awake hours. The study of sleep and wakefulness is therefore of great interest, particularly to the clinical and medical fields where sleep disorders are diagnosed. When studying sleep, it is common to talk about different types, or stages, of sleep. A common task in sleep research is to determine the sleep stage of the sleeping subject as a function of time. This process is known as sleep stage scoring. In this study, I seek to determine whether there is any benefit to using unsupervised feature learning in the context of electroencephalogram-based (EEG) sleep scoring. More specifically, the effect of generating and making use of new feature representations for hand-crafted features of sleep data – meta-features – is studied. For this purpose, two scoring algorithms have been implemented and compared. Both scoring algorithms involve segmentation of the EEG signal, feature extraction, feature selection and classification using a support vector machine (SVM). Unsupervised feature learning was implemented in the form of a dimensionality-reducing deep-belief network (DBN) which the feature space was processed through. Both scorers were shown to have a classification accuracy of about 76 %. The application of unsupervised feature learning did not affect the accuracy significantly. It is speculated that with a better choice of parameters for the DBN in a possible future work, the accuracy may improve significantly. / Sömnen är en viktig del av livet eftersom den påverkar ens prestation under alla vakna timmar. Forskning om sömn and vakenhet är därför av stort intresse, i synnerhet för de kliniska och medicinska områdena där sömnbesvär diagnostiseras. I forskning om sömn är det är vanligt att tala om olika typer av sömn, eller sömnstadium. En vanlig uppgift i sömnforskning är att avgöra sömnstadiet av den sovande exemplaret som en funktion av tiden. Den här processen kallas sömnmätning. I den här studien försöker jag avgöra om det finns någon fördel med att använda oövervakad inlärning av särdrag för att utföra elektroencephalogram-baserad (EEG) sömnmätning. Mer specifikt undersöker jag effekten av att generera och använda nya särdragsrepresentationer som härstammar från handgjorda särdrag av sömndata – meta-särdrag. Två sömnmätningsalgoritmer har implementerats och jämförts för det här syftet. Sömnmätningsalgoritmerna involverar segmentering av EEG-signalen, extraktion av särdragen, urval av särdrag och klassificering genom användning av en stödvektormaskin (SVM). Oövervakad inlärning av särdrag implementerades i form av ett dimensionskrympande djuptrosnätverk (DBN) som användes för att bearbetasärdragsrymden. Båda sömnmätarna visades ha en klassificeringsprecision av omkring 76 %. Användningen av oövervakad inlärning av särdrag hade ingen signifikant inverkan på precisionen. Det spekuleras att precisionen skulle kunna höjas med ett mer lämpligt val av parametrar för djuptrosnätverket. EEG sleep scoring support vector machines deep belief networks AASM unsupervised learning feature extraction genetic algorithms meta-features Computer Sciences Datavetenskap (datalogi)
187	Semantic-Driven Unsupervised Image-to-Image Translation for Distinct Image Domains Ackerman, Wesley 15 September 2020 (has links) We expand the scope of image-to-image translation to include more distinct image domains, where the image sets have analogous structures, but may not share object types between them. Semantic-Driven Unsupervised Image-to-Image Translation for Distinct Image Domains (SUNIT) is built to more successfully translate images in this setting, where content from one domain is not found in the other. Our method trains an image translation model by learning encodings for semantic segmentations of images. These segmentations are translated between image domains to learn meaningful mappings between the structures in the two domains. The translated segmentations are then used as the basis for image generation. Beginning image generation with encoded segmentation information helps maintain the original structure of the image. We qualitatively and quantitatively show that SUNIT improves image translation outcomes, especially for image translation tasks where the image domains are very distinct. computer science machine learning image-to-image translation generative adversarial network deep learning unsupervised learning convolutional neural network Physical Sciences and Mathematics
188	Classify part of day and snow on the load of timber stacks : A comparative study between partitional clustering and competitive learning Nordqvist, My January 2021 (has links) In today's society, companies are trying to find ways to utilize all the data they have, which considers valuable information and insights to make better decisions. This includes data used to keeping track of timber that flows between forest and industry. The growth of Artificial Intelligence (AI) and Machine Learning (ML) has enabled the development of ML modes to automate the measurements of timber on timber trucks, based on images. However, to improve the results there is a need to be able to get information from unlabeled images in order to decide weather and lighting conditions. The objective of this study is to perform an extensive for classifying unlabeled images in the categories, daylight, darkness, and snow on the load. A comparative study between partitional clustering and competitive learning is conducted to investigate which method gives the best results in terms of different clustering performance metrics. It also examines how dimensionality reduction affects the outcome. The algorithms K-means and Kohonen Self-Organizing Map (SOM) are selected for the clustering. Each model is investigated according to the number of clusters, size of dataset, clustering time, clustering performance, and manual samples from each cluster. The results indicate a noticeable clustering performance discrepancy between the algorithms concerning the number of clusters, dataset size, and manual samples. The use of dimensionality reduction led to shorter clustering time but slightly worse clustering performance. The evaluation results further show that the clustering time of Kohonen SOM is significantly higher than that of K-means. Machine Learning (ML) Unsupervised Learning Cluster Analysis Partitional Clustering Competitive Learning Dimensionality Reduction Principal Component Analysis (PCA) K-means Kohonen Self-Organizing Map (SOM) Timber Computer Engineering Datorteknik
189	Involving behavior in the formation of sensory representations Weiller, Daniel 07 July 2009 (has links) Neurons are sensitive to specific aspects of natural stimuli, which are according to different statistical criteria an optimal representation of the natural sensory input. Since these representations are purely sensory, it is still an open question whether they are suited to generate meaningful behavior. Here we introduce an optimization scheme that applies a statistical criterion to an agent s sensory input while taking its motor behavior into account. We first introduce a general cognitive model, and second develop an optimization scheme that increases the predictability of the sensory outcome of the agent s motor actions and apply this to a navigational paradigm.In the cognitive model, place cells divide the environment into discrete states, similar to hippocampal place cells. The agents learned the sensory outcome of its action by the state-to-state transition probabilities and the extent to which these motor actions are caused by sensory-driven reflexive behavior (obstacle avoidance). Navigational decision making integrates both learned components to derive the actions that are most likely to lead to a navigational goal. Next we introduced an optimization process that modified the state distributions to increase the predictability of the sensory outcome of the agent s actions.The cognitive model successfully performs the navigational task, and the differentiation between transitions and reflexive processing increases both behavioral accuracy, as well as behavioral adaptation to changes in the environment. Further, the optimized sensory states are similar to place fields found in behaving animals. The spatial distribution of states depends on the agent s motor capabilities as well as on the environment. We proofed the generality of predictability as a coding principle by comparing it to the existing ones. Our results suggest that the agent s motor apparatus can play a profound role in the formation of place fields and thus in higher sensory representations. sensory coding sensorimotor space place cell predictability unsupervised learning sensory representation navigation reflex four-arm-maze task ddc:610
190	Unsupervised word discovery for computational language documentation / Découverte non-supervisée de mots pour outiller la linguistique de terrain Godard, Pierre 16 April 2019 (has links) La diversité linguistique est actuellement menacée : la moitié des langues connues dans le monde pourraient disparaître d'ici la fin du siècle. Cette prise de conscience a inspiré de nombreuses initiatives dans le domaine de la linguistique documentaire au cours des deux dernières décennies, et 2019 a été proclamée Année internationale des langues autochtones par les Nations Unies, pour sensibiliser le public à cette question et encourager les initiatives de documentation et de préservation. Néanmoins, ce travail est coûteux en temps, et le nombre de linguistes de terrain, limité. Par conséquent, le domaine émergent de la documentation linguistique computationnelle (CLD) vise à favoriser le travail des linguistes à l'aide d'outils de traitement automatique. Le projet Breaking the Unwritten Language Barrier (BULB), par exemple, constitue l'un des efforts qui définissent ce nouveau domaine, et réunit des linguistes et des informaticiens. Cette thèse examine le problème particulier de la découverte de mots dans un flot non segmenté de caractères, ou de phonèmes, transcrits à partir du signal de parole dans un contexte de langues très peu dotées. Il s'agit principalement d'une procédure de segmentation, qui peut également être couplée à une procédure d'alignement lorsqu'une traduction est disponible. En utilisant deux corpus en langues bantoues correspondant à un scénario réaliste pour la linguistique documentaire, l'un en Mboshi (République du Congo) et l'autre en Myene (Gabon), nous comparons diverses méthodes monolingues et bilingues de découverte de mots sans supervision. Nous montrons ensuite que l'utilisation de connaissances linguistiques expertes au sein du formalisme des Adaptor Grammars peut grandement améliorer les résultats de la segmentation, et nous indiquons également des façons d'utiliser ce formalisme comme outil de décision pour le linguiste. Nous proposons aussi une variante tonale pour un algorithme de segmentation bayésien non-paramétrique, qui utilise un schéma de repli modifié pour capturer la structure tonale. Pour tirer parti de la supervision faible d'une traduction, nous proposons et étendons, enfin, une méthode de segmentation neuronale basée sur l'attention, et améliorons significativement la performance d'une méthode bilingue existante. / Language diversity is under considerable pressure: half of the world’s languages could disappear by the end of this century. This realization has sparked many initiatives in documentary linguistics in the past two decades, and 2019 has been proclaimed the International Year of Indigenous Languages by the United Nations, to raise public awareness of the issue and foster initiatives for language documentation and preservation. Yet documentation and preservation are time-consuming processes, and the supply of field linguists is limited. Consequently, the emerging field of computational language documentation (CLD) seeks to assist linguists in providing them with automatic processing tools. The Breaking the Unwritten Language Barrier (BULB) project, for instance, constitutes one of the efforts defining this new field, bringing together linguists and computer scientists. This thesis examines the particular problem of discovering words in an unsegmented stream of characters, or phonemes, transcribed from speech in a very-low-resource setting. This primarily involves a segmentation procedure, which can also be paired with an alignment procedure when a translation is available. Using two realistic Bantu corpora for language documentation, one in Mboshi (Republic of the Congo) and the other in Myene (Gabon), we benchmark various monolingual and bilingual unsupervised word discovery methods. We then show that using expert knowledge in the Adaptor Grammar framework can vastly improve segmentation results, and we indicate ways to use this framework as a decision tool for the linguist. We also propose a tonal variant for a strong nonparametric Bayesian segmentation algorithm, making use of a modified backoff scheme designed to capture tonal structure. To leverage the weak supervision given by a translation, we finally propose and extend an attention-based neural segmentation method, improving significantly the segmentation performance of an existing bilingual method. Apprentissage non-supervisé Segmentation automatique en mots Alignement bilingue Modèles bayésiens Langues peu dotées Unsupervised learning Automatic word segmentation Bilingual alignment Bayesian models Low-resource languages

Search results