• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 11
  • 5
  • 5
  • 1
  • 1
  • 1
  • Tagged with
  • 56
  • 56
  • 56
  • 20
  • 18
  • 17
  • 16
  • 14
  • 13
  • 10
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Characterisation of a developer’s experience fields using topic modelling

Déhaye, Vincent January 2020 (has links)
Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology.
22

[en] BINARY MATRIX FACTORIZATION POST-PROCESSING AND APPLICATIONS / [pt] PÓS-PROCESSAMENTO DE FATORAÇÃO BINÁRIA DE MATRIZES E APLICAÇÕES

GEORGES MIRANDA SPYRIDES 06 February 2024 (has links)
[pt] Novos métodos de fatoração de matrizes introduzem restrições às matrizes decompostas, permitindo tipos únicos de análise. Uma modificação significativa é a fatoração de matrizes binárias para matrizes binárias. Esta técnica pode revelar subconjuntos comuns e mistura de subconjuntos, tornando-a útil em uma variedade de aplicações, como análise de cesta de mercado, modelagem de tópicos e sistemas de recomendação. Apesar das vantagens, as abordagens atuais enfrentam um trade-off entre precisão, escalabilidade e explicabilidade. Enquanto os métodos baseados em gradiente descendente são escaláveis, eles geram altos erros de reconstrução quando limitados para matrizes binárias. Por outro lado, os métodos heurísticos não são escaláveis. Para superar isso, essa tese propõe um procedimento de pós-processamento para discretizar matrizes obtidas por gradiente descendente. Esta nova abordagem recupera o erro de reconstrução após a limitação e processa com sucesso matrizes maiores dentro de um prazo razoável. Testamos esta técnica a muitas aplicações, incluindo um novo pipeline para descobrir e visualizar padrões em processos petroquímicos em batelada. / [en] Novel methods for matrix factorization introduce constraints to the decomposed matrices, allowing for unique kinds of analysis. One significant modification is the binary matrix factorization for binary matrices. This technique can reveal common subsets and mixing of subsets, making it useful in a variety of applications, such as market basket analysis, topic modeling, and recommendation systems. Despite the advantages, current approaches face a trade-off between accuracy, scalability, and explainability. While gradient descent-based methods are scalable, they yield high reconstruction errors when thresholded for binary matrices. Conversely, heuristic methods are not scalable. To overcome this, this thesis propose a post-processing procedure for discretizing matrices obtained by gradient descent. This novel approach recovers the reconstruction error post-thresholding and successfully processes larger matrices within a reasonable timeframe. We apply this technique to many applications including a novel pipeline for discovering and visualizing patterns in petrochemical batch processes.
23

Triple Non-negative Matrix Factorization Technique for Sentiment Analysis and Topic Modeling

Waggoner, Alexander A 01 January 2017 (has links)
Topic modeling refers to the process of algorithmically sorting documents into categories based on some common relationship between the documents. This common relationship between the documents is considered the “topic” of the documents. Sentiment analysis refers to the process of algorithmically sorting a document into a positive or negative category depending whether this document expresses a positive or negative opinion on its respective topic. In this paper, I consider the open problem of document classification into a topic category, as well as a sentiment category. This has a direct application to the retail industry where companies may want to scour the web in order to find documents (blogs, Amazon reviews, etc.) which both speak about their product, and give an opinion on their product (positive, negative or neutral). My solution to this problem uses a Non-negative Matrix Factorization (NMF) technique in order to determine the topic classifications of a document set, and further factors the matrix in order to discover the sentiment behind this category of product.
24

Decomposition methods of NMR signal of complex mixtures : models ans applications

Toumi, Ichrak 28 October 2013 (has links)
L'objectif de ce travail était de tester des méthodes de SAS pour la séparation des spectres complexes RMN de mélanges dans les plus simples des composés purs. Dans une première partie, les méthodes à savoir JADE et NNSC ont été appliqué es dans le cadre de la DOSY , une application aux données CPMG était démontrée. Dans une deuxième partie, on s'est concentré sur le développement d'un algorithme efficace "beta-SNMF" . Ceci s'est montré plus performant que NNSC pour beta inférieure ou égale à 2. Etant donné que dans la littérature, le choix de beta a été adapté aux hypothèses statistiques sur le bruit additif, une étude statistique du bruit RMN de la DOSY a été faite pour obtenir une image plus complète de nos données RMN étudiées. / The objective of the work was to test BSS methods for the separation of the complex NMR spectra of mixtures into the simpler ones of the pure compounds. In a first part, known methods namely JADE and NNSC were applied in conjunction for DOSY , performing applications for CPMG were demonstrated. In a second part, we focused on developing an effective algorithm "beta- SNMF ". This was demonstrated to outperform NNSC for beta less or equal to 2. Since in the literature, the choice of beta has been adapted to the statistical assumptions on the additive noise, a statistical study of NMR DOSY noise was done to get a more complete picture about our studied NMR data.
25

Biagrupamento heurístico e coagrupamento baseado em fatoração de matrizes: um estudo em dados textuais / Heuristic biclustering and coclustering based on matrix factorization: a study on textual data

Ramos Diaz, Alexandra Katiuska 16 October 2018 (has links)
Biagrupamento e coagrupamento são tarefas de mineração de dados que permitem a extração de informação relevante sobre dados e têm sido aplicadas com sucesso em uma ampla variedade de domínios, incluindo aqueles que envolvem dados textuais -- foco de interesse desta pesquisa. Nas tarefas de biagrupamento e coagrupamento, os critérios de similaridade são aplicados simultaneamente às linhas e às colunas das matrizes de dados, agrupando simultaneamente os objetos e os atributos e possibilitando a criação de bigrupos/cogrupos. Contudo suas definições variam segundo suas naturezas e objetivos, sendo que a tarefa de coagrupamento pode ser vista como uma generalização da tarefa de biagrupamento. Estas tarefas, quando aplicadas nos dados textuais, demandam uma representação em um modelo de espaço vetorial que, comumente, leva à geração de espaços caracterizados pela alta dimensionalidade e esparsidade, afetando o desempenho de muitos dos algoritmos. Este trabalho apresenta uma análise do comportamento do algoritmo para biagrupamento Cheng e Church e do algoritmo para coagrupamento de decomposição de valores em blocos não negativos (\\textit{Non-Negative Block Value Decomposition} - NBVD), aplicado ao contexto de dados textuais. Resultados experimentais quantitativos e qualitativos são apresentados a partir das experimentações destes algoritmos em conjuntos de dados sintéticos criados com diferentes níveis de esparsidade e em um conjunto de dados real. Os resultados são avaliados em termos de medidas próprias de biagrupamento, medidas internas de agrupamento a partir das projeções nas linhas dos bigrupos/cogrupos e em termos de geração de informação. As análises dos resultados esclarecem questões referentes às dificuldades encontradas por estes algoritmos nos ambiente de experimentação, assim como se são capazes de fornecer informações diferenciadas e úteis na área de mineração de texto. De forma geral, as análises realizadas mostraram que o algoritmo NBVD é mais adequado para trabalhar com conjuntos de dados em altas dimensões e com alta esparsidade. O algoritmo de Cheng e Church, embora tenha obtidos resultados bons de acordo com os objetivos do algoritmo, no contexto de dados textuais, propiciou resultados com baixa relevância / Biclustering e coclustering are data mining tasks that allow the extraction of relevant information about data and have been applied successfully in a wide variety of domains, including those involving textual data - the focus of interest of this research. In biclustering and coclustering tasks, similarity criteria are applied simultaneously to the rows and columns of the data matrices, simultaneously grouping the objects and attributes and enabling the discovery of biclusters/coclusters. However their definitions vary according to their natures and objectives, being that the task of coclustering can be seen as a generalization of the task of biclustering. These tasks applied in the textual data demand a representation in a model of vector space, which commonly leads to the generation of spaces characterized by high dimensionality and sparsity and influences the performance of many algorithms. This work provides an analysis of the behavior of the algorithm for biclustering Cheng and Church and the algorithm for coclustering non-negative block decomposition (NBVD) applied to the context of textual data. Quantitative and qualitative experimental results are shown, from experiments on synthetic datasets created with different sparsity levels and on a real data set. The results are evaluated in terms of their biclustering oriented measures, internal clustering measures applied to the projections in the lines of the biclusters/coclusters and in terms of generation of information. The analysis of the results clarifies questions related to the difficulties faced by these algorithms in the experimental environment, as well as if they are able to provide differentiated information useful to the field of text mining. In general, the analyses carried out showed that the NBVD algorithm is better suited to work with datasets in high dimensions and with high sparsity. The algorithm of Cheng and Church, although it obtained good results according to its own objectives, provided results with low relevance in the context of textual data
26

Evaluation de l'adhérence au contact roue-rail par analyse d'images spectrales / Wheel-track adhesion evaluation using spectral imaging

Nicodeme, Claire 04 July 2018 (has links)
L’avantage du train depuis sa création est sa faible résistance à l’avancement du fait du contact fer-fer de la roue sur le rail conduisant à une adhérence réduite. Cependant cette adhérence faible est aussi un inconvénient majeur : étant dépendante des conditions environnementales, elle est facilement altérée lors d’une pollution du rail (végétaux, corps gras, eau, etc.). Aujourd’hui, les mesures prises face à des situations d'adhérence dégradée impactent directement les performances du système et conduisent notamment à une perte de capacité de transport. L’objectif du projet est d’utiliser les nouvelles technologies d’imagerie spectrale pour identifier sur les rails les zones à adhérence réduite et leur cause afin d’alerter et d’adapter rapidement les comportements. La stratégie d’étude a pris en compte les trois points suivants : • Le système de détection, installé à bord de trains commerciaux, doit être indépendant du train. • La détection et l’identification ne doivent pas interagir avec la pollution pour ne pas rendre la mesure obsolète. Pour ce faire le principe d’un Contrôle Non Destructif est retenu. • La technologie d’imagerie spectrale permet de travailler à la fois dans le domaine spatial (mesure de distance, détection d’objet) et dans le domaine fréquentiel (détection et reconnaissance de matériaux par analyse de signatures spectrales). Dans le temps imparti des trois ans de thèse, nous nous sommes focalisés sur la validation du concept par des études et analyses en laboratoire, réalisables dans les locaux de SNCF Ingénierie & Projets. Les étapes clés ont été la réalisation d’un banc d’évaluation et le choix du système de vision, la création d'une bibliothèque de signatures spectrales de référence et le développement d'algorithmes classification supervisées et non supervisées des pixels. Ces travaux ont été valorisés par le dépôt d'un brevet et la publication d'articles dans des conférences IEEE. / The advantage of the train since its creation is in its low resistance to the motion, due to the contact iron-iron of the wheel on the rail leading to low adherence. However this low adherence is also a major drawback : being dependent on the environmental conditions, it is easily deteriorated when the rail is polluted (vegetation, grease, water, etc). Nowadays, strategies to face a deteriorated adherence impact the performance of the system and lead to a loss of transport capacity. The objective of the project is to use a new spectral imaging technology to identify on the rails areas with reduced adherence and their cause in order to quickly alert and adapt the train's behaviour. The study’s strategy took into account the three following points : -The detection system, installed on board of commercial trains, must be independent of the train. - The detection and identification process should not interact with pollution in order to keep the measurements unbiased. To do so, we chose a Non Destructive Control method. - Spectral imaging technology makes it possible to work with both spatial information (distance’s measurement, target detection) and spectral information (material detection and recognition by analysis of spectral signatures). In the assigned time, we focused on the validation of the concept by studies and analyses in laboratory, workable in the office at SNCF Ingénierie & Projets. The key steps were the creation of the concept's evaluation bench and the choice of a Vision system, the creation of a library containing reference spectral signatures and the development of supervised and unsupervised pixels classification. A patent describing the method and process has been filed and published.
27

Non-negative matrix factorization for integrative clustering / Алгоритми интегративног кластеровања података применом ненегативне факторизације матрице / Algoritmi integrativnog klasterovanja podataka primenom nenegativne faktorizacije matrice

Brdar Sanja 15 December 2016 (has links)
<p>Integrative approaches are motivated by the desired improvement of<br />robustness, stability and accuracy. Clustering, the prevailing technique for<br />preliminary and exploratory analysis of experimental data, may benefit from<br />integration across multiple partitions. In this thesis we have proposed<br />integration methods based on non-negative matrix factorization that can fuse<br />clusterings stemming from different data sets, different data preprocessing<br />steps or different sub-samples of objects or features. Proposed methods are<br />evaluated from several points of view on typical machine learning data sets,<br />synthetics data, and above all, on data coming form bioinformatics realm,<br />which rise is fuelled by technological revolutions in molecular biology. For a<br />vast amounts of &#39;omics&#39; data that are nowadays available sophisticated<br />computational methods are necessary. We evaluated methods on problem<br />from cancer genomics, functional genomics and metagenomics.</p> / <p>Предмет истраживања докторске дисертације су алгоритми кластеровања,<br />односно груписања података, и могућности њиховог унапређења<br />интегративним приступом у циљу повећања поузданости, робустности на<br />присуство шума и екстремних вредности у подацима, омогућавања фузије<br />података. У дисертацији су предложене методе засноване на ненегативној<br />факторизацији матрице. Методе су успешно имплементиране и детаљно<br />анализиране на разноврсним подацима са UCI репозиторијума и<br />синтетичким подацима које се типично користе за евалуацију нових<br />алгоритама и поређење са већ постојећим методама. Већи део<br />дисертације посвећен је примени у домену биоинформатике која обилује<br />хетерогеним подацима и бројним изазовним задацима. Евалуација је<br />извршена на подацима из домена функционалне геномике, геномике рака и<br />метагеномике.</p> / <p>Predmet istraživanja doktorske disertacije su algoritmi klasterovanja,<br />odnosno grupisanja podataka, i mogućnosti njihovog unapređenja<br />integrativnim pristupom u cilju povećanja pouzdanosti, robustnosti na<br />prisustvo šuma i ekstremnih vrednosti u podacima, omogućavanja fuzije<br />podataka. U disertaciji su predložene metode zasnovane na nenegativnoj<br />faktorizaciji matrice. Metode su uspešno implementirane i detaljno<br />analizirane na raznovrsnim podacima sa UCI repozitorijuma i<br />sintetičkim podacima koje se tipično koriste za evaluaciju novih<br />algoritama i poređenje sa već postojećim metodama. Veći deo<br />disertacije posvećen je primeni u domenu bioinformatike koja obiluje<br />heterogenim podacima i brojnim izazovnim zadacima. Evaluacija je<br />izvršena na podacima iz domena funkcionalne genomike, genomike raka i<br />metagenomike.</p>
28

Biagrupamento heurístico e coagrupamento baseado em fatoração de matrizes: um estudo em dados textuais / Heuristic biclustering and coclustering based on matrix factorization: a study on textual data

Alexandra Katiuska Ramos Diaz 16 October 2018 (has links)
Biagrupamento e coagrupamento são tarefas de mineração de dados que permitem a extração de informação relevante sobre dados e têm sido aplicadas com sucesso em uma ampla variedade de domínios, incluindo aqueles que envolvem dados textuais -- foco de interesse desta pesquisa. Nas tarefas de biagrupamento e coagrupamento, os critérios de similaridade são aplicados simultaneamente às linhas e às colunas das matrizes de dados, agrupando simultaneamente os objetos e os atributos e possibilitando a criação de bigrupos/cogrupos. Contudo suas definições variam segundo suas naturezas e objetivos, sendo que a tarefa de coagrupamento pode ser vista como uma generalização da tarefa de biagrupamento. Estas tarefas, quando aplicadas nos dados textuais, demandam uma representação em um modelo de espaço vetorial que, comumente, leva à geração de espaços caracterizados pela alta dimensionalidade e esparsidade, afetando o desempenho de muitos dos algoritmos. Este trabalho apresenta uma análise do comportamento do algoritmo para biagrupamento Cheng e Church e do algoritmo para coagrupamento de decomposição de valores em blocos não negativos (\\textit{Non-Negative Block Value Decomposition} - NBVD), aplicado ao contexto de dados textuais. Resultados experimentais quantitativos e qualitativos são apresentados a partir das experimentações destes algoritmos em conjuntos de dados sintéticos criados com diferentes níveis de esparsidade e em um conjunto de dados real. Os resultados são avaliados em termos de medidas próprias de biagrupamento, medidas internas de agrupamento a partir das projeções nas linhas dos bigrupos/cogrupos e em termos de geração de informação. As análises dos resultados esclarecem questões referentes às dificuldades encontradas por estes algoritmos nos ambiente de experimentação, assim como se são capazes de fornecer informações diferenciadas e úteis na área de mineração de texto. De forma geral, as análises realizadas mostraram que o algoritmo NBVD é mais adequado para trabalhar com conjuntos de dados em altas dimensões e com alta esparsidade. O algoritmo de Cheng e Church, embora tenha obtidos resultados bons de acordo com os objetivos do algoritmo, no contexto de dados textuais, propiciou resultados com baixa relevância / Biclustering e coclustering are data mining tasks that allow the extraction of relevant information about data and have been applied successfully in a wide variety of domains, including those involving textual data - the focus of interest of this research. In biclustering and coclustering tasks, similarity criteria are applied simultaneously to the rows and columns of the data matrices, simultaneously grouping the objects and attributes and enabling the discovery of biclusters/coclusters. However their definitions vary according to their natures and objectives, being that the task of coclustering can be seen as a generalization of the task of biclustering. These tasks applied in the textual data demand a representation in a model of vector space, which commonly leads to the generation of spaces characterized by high dimensionality and sparsity and influences the performance of many algorithms. This work provides an analysis of the behavior of the algorithm for biclustering Cheng and Church and the algorithm for coclustering non-negative block decomposition (NBVD) applied to the context of textual data. Quantitative and qualitative experimental results are shown, from experiments on synthetic datasets created with different sparsity levels and on a real data set. The results are evaluated in terms of their biclustering oriented measures, internal clustering measures applied to the projections in the lines of the biclusters/coclusters and in terms of generation of information. The analysis of the results clarifies questions related to the difficulties faced by these algorithms in the experimental environment, as well as if they are able to provide differentiated information useful to the field of text mining. In general, the analyses carried out showed that the NBVD algorithm is better suited to work with datasets in high dimensions and with high sparsity. The algorithm of Cheng and Church, although it obtained good results according to its own objectives, provided results with low relevance in the context of textual data
29

Méthodes avancées de séparation de sources applicables aux mélanges linéaires-quadratiques / Advanced methods of source separation applicable to linear-quadratic mixtures

Jarboui, Lina 18 November 2017 (has links)
Dans cette thèse, nous nous sommes intéressés à proposer de nouvelles méthodes de Séparation Aveugle de Sources (SAS) adaptées aux modèles de mélange non-linéaires. La SAS consiste à estimer les signaux sources inconnus à partir de leurs mélanges observés lorsqu'il existe très peu d'informations disponibles sur le modèle de mélange. La contribution méthodologique de cette thèse consiste à prendre en considération les interactions non-linéaires qui peuvent se produire entre les sources en utilisant le modèle linéaire-quadratique (LQ). A cet effet, nous avons développé trois nouvelles méthodes de SAS. La première méthode vise à résoudre le problème du démélange hyperspectral en utilisant un modèle linéaire-quadratique. Celle-ci se repose sur la méthode d'Analyse en Composantes Parcimonieuses (ACPa) et nécessite l'existence des pixels purs dans la scène observée. Dans le même but, nous proposons une deuxième méthode du démélange hyperspectral adaptée au modèle linéaire-quadratique. Elle correspond à une méthode de Factorisation en Matrices Non-négatives (FMN) se basant sur l'estimateur du Maximum A Posteriori (MAP) qui permet de prendre en compte les informations a priori sur les distributions des inconnus du problème afin de mieux les estimer. Enfin, nous proposons une troisième méthode de SAS basée sur l'analyse en composantes indépendantes (ACI) en exploitant les Statistiques de Second Ordre (SSO) pour traiter un cas particulier du mélange linéaire-quadratique qui correspond au mélange bilinéaire. / In this thesis, we were interested to propose new Blind Source Separation (BSS) methods adapted to the nonlinear mixing models. BSS consists in estimating the unknown source signals from their observed mixtures when there is little information available on the mixing model. The methodological contribution of this thesis consists in considering the non-linear interactions that can occur between sources by using the linear-quadratic (LQ) model. To this end, we developed three new BSS methods. The first method aims at solving the hyperspectral unmixing problem by using a linear-quadratic model. It is based on the Sparse Component Analysis (SCA) method and requires the existence of pure pixels in the observed scene. For the same purpose, we propose a second hyperspectral unmixing method adapted to the linear-quadratic model. It corresponds to a Non-negative Matrix Factorization (NMF) method based on the Maximum A Posteriori (MAP) estimate allowing to take into account the available prior information about the unknown parameters for a better estimation of them. Finally, we propose a third BSS method based on the Independent Component Analysis (ICA) method by using the Second Order Statistics (SOS) to process a particular case of the linear-quadratic mixture that corresponds to the bilinear one.
30

Candidate - job recommendation system : Building a prototype of a machine learning – based recommendation system for an online recruitment company

Hafizovic, Nedzad January 2019 (has links)
Recommendation systems are gaining more popularity because of the complexity of problems that they provide a solution to. There are many applications of recommendation systems everywhere around us. Implementation of these systems differs and there are two approaches that are most distinguished. First approach is a system without Machine Learning, while the other one includes Machine Learning. The second approach, used in this project, is based on Machine Learning collaborative filtering techniques. These techniques include numerous algorithms and data processing methods. This document describes a process that focuses on building a job recommendation system for a recruitment industry, starting from data acquisition to the final result. Data used in the project is collected from the Pitchler AB company, which provides an online recruitment platform. Result of this project is a machine learning based recommendation system used as an engine for the Pitchler AB IT recruitment platform.

Page generated in 0.1368 seconds