Global ETD Search

241	Investigating the performance of matrix factorization techniques applied on purchase data for recommendation purposes Holländer, John January 2015 (has links) Automated systems for producing product recommendations to users is a relatively new area within the field of machine learning. Matrix factorization techniques have been studied to a large extent on data consisting of explicit feedback such as ratings, but to a lesser extent on implicit feedback data consisting of for example purchases.The aim of this study is to investigate how well matrix factorization techniques perform compared to other techniques when used for producing recommendations based on purchase data. We conducted experiments on data from an online bookstore as well as an online fashion store, by running algorithms processing the data and using evaluation metrics to compare the results. We present results proving that for many types of implicit feedback data, matrix factorization techniques are inferior to various neighborhood- and association rules techniques for producing product recommendations. We also present a variant of a user-based neighborhood recommender system algorithm \textit{(UserNN)}, which in all tests we ran outperformed both the matrix factorization algorithms and the k-nearest neighbors algorithm regarding both accuracy and speed. Depending on what dataset was used, the UserNN achieved a precision approximately 2-22 percentage points higher than those of the matrix factorization algorithms, and 2 percentage points higher than the k-nearest neighbors algorithm. The UserNN also outperformed the other algorithms regarding speed, with time consumptions 3.5-5 less than those of the k-nearest neighbors algorithm, and several orders of magnitude less than those of the matrix factorization algorithms. Recommender systems Matrix factorization Machine learning Implicit feedback Recommender algorithms Nearest neighbor UserNN Recommendation systems UserKnn Engineering and Technology Teknik och teknologier
242	Topic Analysis of Tweets on the European Refugee Crisis Using Non-negative Matrix Factorization Shen, Chong 01 January 2016 (has links) The ongoing European Refugee Crisis has been one of the most popular trending topics on Twitter for the past 8 months. This paper applies topic modeling on bulks of tweets to discover the hidden patterns within these social media discussions. In particular, we perform topic analysis through solving Non-negative Matrix Factorization (NMF) as an Inexact Alternating Least Squares problem. We accelerate the computation using techniques including tweet sampling and augmented NMF, compare NMF results with different ranks and visualize the outputs through topic representation and frequency plots. We observe that supportive sentiments maintained a strong presence while negative sentiments such as safety concerns have emerged over time. Text Mining Topic Modeling Refugee Crisis Nonnegative Matrix Factorization Alternating Least Squares Linear Algebra Numerical Analysis and Computation Other Applied Mathematics Politics and Social Change Social Media Social Statistics
243	Triple Non-negative Matrix Factorization Technique for Sentiment Analysis and Topic Modeling Waggoner, Alexander A 01 January 2017 (has links) Topic modeling refers to the process of algorithmically sorting documents into categories based on some common relationship between the documents. This common relationship between the documents is considered the “topic” of the documents. Sentiment analysis refers to the process of algorithmically sorting a document into a positive or negative category depending whether this document expresses a positive or negative opinion on its respective topic. In this paper, I consider the open problem of document classification into a topic category, as well as a sentiment category. This has a direct application to the retail industry where companies may want to scour the web in order to find documents (blogs, Amazon reviews, etc.) which both speak about their product, and give an opinion on their product (positive, negative or neutral). My solution to this problem uses a Non-negative Matrix Factorization (NMF) technique in order to determine the topic classifications of a document set, and further factors the matrix in order to discover the sentiment behind this category of product. Machine Learning Sentiment Analysis Topic Modeling Non-negative Matrix Factorization Applied Mathematics Data Science Other Applied Mathematics Other Mathematics Theory and Algorithms
244	Methods and algorithms for solving linear systems of equations on massively parallel computers / Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles Donfack, Simplice 07 March 2012 (has links) Les processeurs multi-cœurs sont considérés de nos jours comme l'avenir des calculateurs et auront un impact important dans le calcul scientifique. Cette thèse présente une nouvelle approche de résolution des grands systèmes linéaires creux et denses, qui soit adaptée à l'exécution sur les futurs machines pétaflopiques et en particulier celles ayant un nombre important de cœurs. Compte tenu du coût croissant des communications comparé au temps dont les processeurs mettent pour effectuer les opérations arithmétiques, notre approche adopte le principe de minimisation des communications au prix de quelques calculs redondants et utilise plusieurs adaptations pour atteindre de meilleures performances sur les machines multi-cœurs. Nous décomposons le problème à résoudre en plusieurs phases qui sont ensuite mises en œuvre séparément. Dans la première partie, nous présentons un algorithme basé sur le partitionnement d'hypergraphe qui réduit considérablement le remplissage ("fill-in") induit lors de la factorisation LU des matrices creuses non symétriques. Dans la deuxième partie, nous présentons deux algorithmes de réduction de communication pour les factorisations LU et QR qui sont adaptés aux environnements multi-cœurs. La principale contribution de cette partie est de réorganiser les opérations de la factorisation de manière à réduire la sollicitation du bus tout en utilisant de façon optimale les ressources. Nous étendons ensuite ce travail aux clusters de processeurs multi-cœurs. Dans la troisième partie, nous présentons une nouvelle approche d'ordonnancement et d'optimisation. La localité des données et l'équilibrage des charges représentent un sérieux compromis pour le choix des méthodes d'ordonnancement. Sur les machines NUMA par exemple où la localité des données n'est pas une option, nous avons observé qu'en présence de perturbations systèmes (" OS noise"), les performances pouvaient rapidement se dégrader et devenir difficiles à prédire. Pour cela, nous présentons une approche combinant un ordonnancement statique et dynamique pour ordonnancer les tâches de nos algorithmes. Nos résultats obtenues sur plusieurs architectures montrent que tous nos algorithmes sont efficaces et conduisent à des gains de performances significatifs. Nous pouvons atteindre des améliorations de l'ordre de 30 à 110% par rapport aux correspondants de nos algorithmes dans les bibliothèques numériques bien connues de la littérature. / Multicore processors are considered to be nowadays the future of computing, and they will have an important impact in scientific computing. In this thesis, we study methods and algorithms for solving efficiently sparse and dense large linear systems on future petascale machines and in particular these having a significant number of cores. Due to the increasing communication cost compared to the time the processors take to perform arithmetic operations, our approach embrace the communication avoiding algorithm principle by doing some redundant computations and uses several adaptations to achieve better performance on multicore machines.We decompose the problem to solve into several phases that would be then designed or optimized separately. In the first part, we present an algorithm based on hypergraph partitioning and which considerably reduces the fill-in incurred in the LU factorization of sparse unsymmetric matrices. In the second part, we present two communication avoiding algorithms that are adapted to multicore environments. The main contribution of this part is to reorganize the computations such as to reduce bus contention and using efficiently resources. Then, we extend this work for clusters of multi-core processors. In the third part, we present a new scheduling and optimization approach. Data locality and load balancing are a serious trade-off in the choice of the scheduling strategy. On NUMA machines for example, where the data locality is not an option, we have observed that in the presence of noise, performance could quickly deteriorate and become difficult to predict. To overcome this bottleneck, we present an approach that combines a static and a dynamic scheduling approach to schedule the tasks of our algorithms.Our results obtained on several architectures show that all our algorithms are efficient and lead to significant performance gains. We can achieve from 30 up to 110% improvement over the corresponding routines of our algorithms in well known libraries. Factorisation LU QR Réduction des communications Méthode de renumérotations Techniques d'ordonnancement Optimisations Multi-coeurs LU factorization QR Communication avoiding Ordering Scheduling technic Optimization Multicore
245	詞彙向量的理論與評估基於矩陣分解與神經網絡 / Theory and evaluation of word embedding based on matrix factorization and neural network 張文嘉, Jhang, Wun Jia Unknown Date (has links) 隨著機器學習在越來越多任務中有突破性的發展，特別是在自然語言處理問題上，得到越來越多的關注，近年來，詞向量是自然語言處理研究中最令人興奮的部分之一。在這篇論文中，我們討論了兩種主要的詞向量學習方法。一種是傳統的矩陣分解，如奇異值分解，另一種是基於神經網絡模型（具有負採樣的Skip-gram模型（Mikolov等人提出，2013），它是一種迭代演算法。我們提出一種方法來挑選初始值，透過使用奇異值分解得到的詞向量當作是Skip-gram模型的初始直，結果發現替換較佳的初始值，在某些自然語言處理的任務中得到明顯的提升。 / Recently, word embedding is one of the most exciting part of research in natural language processing. In this thesis, we discuss the two major learning approaches for word embedding. One is traditional matrix factorization like singular value decomposition, the other is based on neural network model (e.g. the Skip-gram model with negative sampling (Mikolov et al., 2013b)) which is an iterative algorithm. It is known that an iterative process is sensitive to initial starting values. We present an approach for implementing the Skip-gram model with negative sampling from a given initial value that is using singular value decomposition. Furthermore, we show that refined initial starting points improve the analogy task and succeed in capturing fine-gained semantic and syntactic regularities using vector arithmetic. 矩陣分解初始值自然語言處理神經網絡 Matrix factorization Initalization Natural language processing Neural network
246	Decomposition methods of NMR signal of complex mixtures : models ans applications Toumi, Ichrak 28 October 2013 (has links) L'objectif de ce travail était de tester des méthodes de SAS pour la séparation des spectres complexes RMN de mélanges dans les plus simples des composés purs. Dans une première partie, les méthodes à savoir JADE et NNSC ont été appliqué es dans le cadre de la DOSY , une application aux données CPMG était démontrée. Dans une deuxième partie, on s'est concentré sur le développement d'un algorithme efficace "beta-SNMF" . Ceci s'est montré plus performant que NNSC pour beta inférieure ou égale à 2. Etant donné que dans la littérature, le choix de beta a été adapté aux hypothèses statistiques sur le bruit additif, une étude statistique du bruit RMN de la DOSY a été faite pour obtenir une image plus complète de nos données RMN étudiées. / The objective of the work was to test BSS methods for the separation of the complex NMR spectra of mixtures into the simpler ones of the pure compounds. In a first part, known methods namely JADE and NNSC were applied in conjunction for DOSY , performing applications for CPMG were demonstrated. In a second part, we focused on developing an effective algorithm "beta- SNMF ". This was demonstrated to outperform NNSC for beta less or equal to 2. Since in the literature, the choice of beta has been adapted to the statistical assumptions on the additive noise, a statistical study of NMR DOSY noise was done to get a more complete picture about our studied NMR data. Spectroscopie RMN DOSY Séparation aveugle des sources (SAS) Parcimonie NMR spectroscopy DOSY Blind source separation (BSS) Non negative matrix factorization (NMF) Sparsity
247	Incorporação de metadados semânticos para recomendação no cenário de partida fria / Incorporation of semantic metadata for recommendation in the cold start scenario Fressato, Eduardo Pereira 06 May 2019 (has links) Com o propósito de auxiliar os usuários no processo de tomada de decisão, diversos tipos de sistemas Web passaram a incorporar sistemas de recomendação. As abordagens mais utilizadas são a filtragem baseada em conteúdo, que recomenda itens com base nos seus atributos, a filtragem colaborativa, que recomenda itens de acordo com o comportamento de usuários similares, e os sistemas híbridos, que combinam duas ou mais técnicas. A abordagem baseada em conteúdo apresenta o problema de análise limitada de conteúdo, o qual pode ser reduzido com a utilização de informações semânticas. A filtragem colaborativa, por sua vez, apresenta o problema da partida fria, esparsidade e alta dimensionalidade dos dados. Dentre as técnicas de filtragem colaborativa, as baseadas em fatoração de matrizes são geralmente mais eficazes porque permitem descobrir as características subjacentes às interações entre usuários e itens. Embora sistemas de recomendação usufruam de diversas técnicas de recomendação, a maioria das técnicas apresenta falta de informações semânticas para representarem os itens do acervo. Estudos na área de sistemas de recomendação têm analisado a utilização de dados abertos conectados provenientes da Web dos Dados como fonte de informações semânticas. Dessa maneira, este trabalho tem como objetivo investigar como relações semânticas computadas a partir das bases de conhecimentos disponíveis na Web dos Dados podem beneficiar sistemas de recomendação. Este trabalho explora duas questões neste contexto: como a similaridade de itens pode ser calculada com base em informações semânticas e; como semelhanças entre os itens podem ser combinadas em uma técnica de fatoração de matrizes, de modo que o problema da partida fria de itens possa ser efetivamente amenizado. Como resultado, originou-se uma métrica de similaridade semântica que aproveita a hierarquia das bases de conhecimento e obteve um desempenho superior às outras métricas na maioria das bases de dados. E também o algoritmo Item-MSMF que utiliza informações semânticas para amenizar o problema de partida fria e obteve desempenho superior em todas as bases de dados avaliadas no cenário de partida fria. / In order to assist users in the decision-making process, several types of web systems started to incorporate recommender systems. The most commonly used approaches are content-based filtering, which recommends items based on their attributes; collaborative filtering, which recommends items according to the behavior of similar users; and hybrid systems that combine both techniques. The content-based approach presents the problem of limited content analysis, which can be reduced by using semantic information. The collaborative filtering, presents the problem of cold start, sparsity and high dimensionality of the data. Among the techniques of collaborative filtering, those based on matrix factorization are generally more effective because they allow us to discover the underlying characteristics of interactions between users and items. Although recommender systems have several techniques, most of them lack semantic information to represent the items in the collection. Studies in this area have analyzed linked open data from the Web of data as source of semantic information. In this way, this work aims to investigate how semantic relationships computed from the knowledge bases available in the Data Web can benefit recommendation systems. This work explores two questions in this context: how the similarity of items can be calculated based on semantic information and; as similarities between items can be combined in a matrix factorization technique, so that the cold start problem of items can be effectively softened. As a result, a semantic similarity metric was developed that leverages the knowledge base hierarchy and outperformed other metrics in most databases. Also the Item-MSMF algorithm that uses semantic information to soften the cold start problem and obtained superior performance in all databases evaluated in the cold start scenario. Cold start Collaborative filtering Dados abertos conectados Fatoração de matrizes Filtragem colaborativa Linked open data Matrix factorization Partida fria Recommender systems Sistemas de recomendação
248	Regularização social em sistemas de recomendação com filtragem colaborativa / Social Regularization in Recommender Systems with Collaborative Filtering Zabanova, Tatyana 14 May 2019 (has links) Modelos baseados em fatoração de matrizes estão entre as implementações mais bem sucedidas de Sistemas de Recomendação. Neste projeto, estudamos as possibilidades de incorporação de informações provindas de redes sociais, para melhorar a qualidade das predições do modelo tanto em modelos tradicionais de Filtragem Colaborativa, quanto em Filtragem Colaborativa Neural. / Models based on matrix factorization are among the most successful implementations of Recommender Systems. In this project, we study the possibilities of incorporating the information from social networks to improve the quality of predictions of the model both in traditional Collaborative Filtering and in Neural Collaborative Filtering. Collaborative filtering Fatoração de matrizes Filtragem colaborativa Filtragem colaborativa neural Matrix factorization Neural collaborative filtering Recommender system Regularização social Sistema de recomendação Social regularization
249	Tests phénoménologiques de la chromodynamique quantique perturbative à haute énergie au LHC / Phenomenological tests of perturbative quantum chromodynamics at high energy at the LHC Ducloué, Bertrand 08 July 2014 (has links) Dans la limite des hautes énergies, la petite valeur de la constante de couplage de l'interaction forte peut être compensée par l'apparition de grands logarithmes de l'énergie dans le centre de masse. Toutes ces contributions peuvent être du même ordre de grandeur et sont resommées par l'équation de Balitsky-Fadin-Kuraev-Lipatov (BFKL). De nombreux processus ont été proposés pour étudier cette dynamique. L'un des plus prometteurs, proposé par Mueller et Navelet, est l'étude de la production de deux jets vers l'avant séparés par un grand intervalle en rapidité dans les collisions de hadrons. Un calcul BFKL ne prenant en compte que les termes dominants (approximation des logarithmes dominants ou LL) prédit une augmentation rapide de la section efficace avec l'augmentation de l'intervalle en rapidité entre les jets ainsi qu'une faible corrélation angulaire. Cependant, des calculs basés sur cette approximation ne purent pas décrire correctement les mesures expérimentales de ces observables au Tevatron. Dans cette thèse, nous étudions ce processus à l'ordre des logarithmes sous-dominants, ou NLL, en prenant en compte les corrections NLL aux facteurs d'impact, qui décrivent la transition d'un hadron initial vers un jet, et à la fonction de Green, qui décrit le couplage entre les facteurs d'impact. Nous étudions l'importance de ces corrections NLL et trouvons qu'elles sont très importantes, ce qui conduit à des résultats très différents de ceux obtenus à l'ordre des logarithmes dominants. De plus, ces résultats dépendent fortement du choix des échelles présentes dans ce processus. Nous comparons nos résultats avec des données récentes de la collaboration CMS sur les corrélations angulaires des jets Mueller-Navelet au LHC et ne trouvons pas un bon accord. Nous montrons que cela peut être corrigé en utilisant la procédure de Brodsky-Lepage-Mackenzie pour fixer le choix de l'échelle de renormalisation. Cela conduit à des résultats plus stables et une très bonne description des données de CMS. Finalement, nous montrons que, à l'ordre des logarithmes sous-dominants, l'absence de conservation stricte de l'énergie-impulsion (qui est un effet négligé dans un calcul BFKL) devrait être un problème beaucoup moins important qu'à l'ordre des logarithmes dominants. / In the high energy limit of QCD, the smallness of the strong coupling due to the presence of a hard scale can be compensated by large logarithms of the center of mass energy. All these logarithmically-enhanced contributions can be resummed by the Balitsky-Fadin-Kuraev-Lipatov (BFKL) equation. Many processes have been proposed to study these dynamics. Among the most promising ones is the production of two forward jets separated by a large interval of rapidity at hadron colliders, proposed by Mueller and Navelet. A BFKL calculation taking into account only dominant contributions (leading logarithmic, or LL, accuracy) predicts a strong rise of the cross section with increasing rapidity separation between the jets and a large decorrelation of their azimuthal angles. However, such LL calculations could not successfully describe measurements of these observables performed at the Tevatron. In this thesis, we study this process at next-to-leading logarithmic (NLL) accuracy, taking into account NLL corrections both to the impact factors, which describe the transition from an incoming hadron to a jet, and to the Green's function, which describes the coupling between the impact factors. We investigate the magnitude of these NLL corrections and find that they are very large, leading to very different results compared with a LL calculation. In addition, we find that these results are very dependent on the choice of the scales involved in the process. We compare our results with recent data from the CMS collaboration on the azimuthal correlations of Mueller-Navelet jets at the LHC and find a rather poor agreement. We show that this can be cured by using the Brodsky-Lepage-Mackenzie procedure to fix the renormalization scale. This leads to more stable results and a very good description of CMS data. Finally, we show that at NLL accuracy the absence of strict energy-momentum conservation (which is a subleading effect in a BFKL calculation) should be a much less severe issue than at LL accuracy. Physique des particules Chromodynamique quantique perturbative LHC . kT-factorisation BFKL Jets vers l'avant Particle physics Erturbative quantum chromodynamics LHC . kT-factorization BFKL Forward jets
250	Paralelização de inferência em redes credais utilizando computação distribuída para fatoração de matrizes esparsas / Parallelization of credal network inference using distributed computing for sparse matrix factorization. Pereira, Ramon Fortes 25 April 2017 (has links) Este estudo tem como objetivo melhorar o desempenho computacional dos algoritmos de inferência em redes credais, aplicando técnicas de computação paralela e sistemas distribuídos em algoritmos de fatoração de matrizes esparsas. Grosso modo, técnicas de computação paralela são técnicas para transformar um sistema em um sistema com algoritmos que possam ser executados concorrentemente. E a fatoração de matrizes são técnicas da matemática para decompor uma matriz em um produto de duas ou mais matrizes. As matrizes esparsas são matrizes que possuem a maioria de seus valores iguais a zero. E as redes credais são semelhantes as redes bayesianas, que são grafos acíclicos que representam uma probabilidade conjunta através de probabilidades condicionais e suas relações de independência. As redes credais podem ser consideradas como uma extensão das redes bayesianas para lidar com incertezas ou a má qualidade dos dados. Para aplicar a técnica de paralelização de fatoração de matrizes esparsas na inferência de redes credais, a inferência utiliza-se da técnica de eliminação de variáveis onde o grafo acíclico da rede credal é associado a uma matriz esparsa e cada variável eliminada é análoga a eliminação de uma coluna. / This study\'s objective is the computational performance improvement of credal network inference algorithms by applying computational parallel and distributed system techniques of sparse matrix factorization algorithms. Roughly, computational parallel techniques are used to transform systems in systems with algorithms that can be executed concurrently. And the matrix factorization is a group of mathematical techniques to decompose a matrix in a product of two or more matrixes. The sparse matrixes are matrixes which have most of their values equal to zero. And credal networks are similar to Bayesian networks, which are acyclic graphs representing a joint probability through conditional probabilities and their independence relations. Credal networks can be considered as a Bayesian network extension because of their manner of leading to uncertainty and the poor data quality. To apply parallel techniques of sparse matrix factorization in credal network inference the variable elimination method was used, where the credal network acyclic graph is associated to a sparse matrix and every eliminated variable is analogous to an eliminated column. Credal network Credal network inference Eliminação de variáveis Fatoração de matrizes esparsas Inferência em redes credais Rede credal Sparse matrix factorization Variables elimination

Search results