Global ETD Search

231	High Performance Parallel Algorithms for Tensor Decompositions / Algorithmes Parallèles pour les Décompositions des Tenseurs Kaya, Oguz 15 September 2017 (has links) La factorisation des tenseurs est au coeur des méthodes d'analyse des données massives multidimensionnelles dans de nombreux domaines, dont les systèmes de recommandation, les graphes, les données médicales, le traitement du signal, la chimiométrie, et bien d'autres.Pour toutes ces applications, l'obtention rapide de la décomposition des tenseurs est cruciale pour pouvoir traiter manipuler efficacement les énormes volumes de données en jeu.L'objectif principal de cette thèse est la conception d'algorithmes pour la décomposition de tenseurs multidimensionnels creux, possédant de plusieurs centaines de millions à quelques milliards de coefficients non-nuls. De tels tenseurs sont omniprésents dans les applications citées plus haut.Nous poursuivons cet objectif via trois approches.En premier lieu, nous proposons des algorithmes parallèles à mémoire distribuée, comprenant des schémas de communication point-à-point optimisés, afin de réduire les coûts de communication. Ces algorithmes sont indépendants du partitionnement des éléments du tenseur et des matrices de faible rang. Cette propriété nous permet de proposer des stratégies de partitionnement visant à minimiser le coût de communication tout en préservant l'équilibrage de charge entre les ressources. Nous utilisons des techniques d'hypergraphes pour analyser les paramètres de calcul et de communication de ces algorithmes, ainsi que des outils de partitionnement d'hypergraphe pour déterminer des partitions à même d'offrir un meilleur passage à l'échelle. Deuxièmement, nous étudions la parallélisation sur plate-forme à mémoire partagée de ces algorithmes. Dans ce contexte, nous déterminons soigneusement les tâches de calcul et leur dépendances, et nous les exprimons en termes d'une structure de données idoine, et dont la manipulation permet de révéler le parallélisme intrinsèque du problème. Troisièmement, nous présentons un schéma de calcul en forme d'arbre binaire pour représenter les noyaux de calcul les plus coûteux des algorithmes, comme la multiplication du tenseur par un ensemble de vecteurs ou de matrices donnés. L'arbre binaire permet de factoriser certains résultats intermédiaires, et de les ré-utiliser au fil du calcul. Grâce à ce schéma, nous montrons comment réduire significativement le nombre et le coût des multiplications tenseur-vecteur et tenseur-matrice, rendant ainsi la décomposition du tenseur plus rapide à la fois pour la version séquentielle et la version parallèle des algorithmes.Enfin, le reste de la thèse décrit deux extensions sur des thèmes similaires. La première extension consiste à appliquer le schéma d'arbre binaire à la décomposition des tenseurs denses, avec une analyse précise de la complexité du problème et des méthodes pour trouver la structure arborescente qui minimise le coût total. La seconde extension consiste à adapter les techniques de partitionnement utilisées pour la décomposition des tenseurs creux à la factorisation des matrices non-négatives, problème largement étudié et pour lequel nous obtenons des algorithmes parallèles plus efficaces que les meilleurs actuellement connus.Tous les résultats théoriques de cette thèse sont accompagnés d'implémentations parallèles,aussi bien en mémoire partagée que distribuée. Tous les algorithmes proposés, avec leur réalisation sur plate-forme HPC, contribuent ainsi à faire de la décomposition de tenseurs un outil prometteur pour le traitement des masses de données actuelles et à venir. / Tensor factorization has been increasingly used to analyze high-dimensional low-rank data ofmassive scale in numerous application domains, including recommender systems, graphanalytics, health-care data analysis, signal processing, chemometrics, and many others.In these applications, efficient computation of tensor decompositions is crucial to be able tohandle such datasets of high volume. The main focus of this thesis is on efficient decompositionof high dimensional sparse tensors, with hundreds of millions to billions of nonzero entries,which arise in many emerging big data applications. We achieve this through three majorapproaches.In the first approach, we provide distributed memory parallel algorithms with efficientpoint-to-point communication scheme for reducing the communication cost. These algorithmsare agnostic to the partitioning of tensor elements and low rank decomposition matrices, whichallow us to investigate effective partitioning strategies for minimizing communication cost whileestablishing computational load balance. We use hypergraph-based techniques to analyze computational and communication requirements in these algorithms, and employ hypergraphpartitioning tools to find suitable partitions that provide much better scalability.Second, we investigate effective shared memory parallelizations of these algorithms. Here, we carefully determine unit computational tasks and their dependencies, and express them using aproper data structure that exposes the parallelism underneath.Third, we introduce a tree-based computational scheme that carries out expensive operations(involving the multiplication of the tensor with a set of vectors or matrices, found at the core ofthese algorithms) faster by factoring out and storing common partial results and effectivelyre-using them. With this computational scheme, we asymptotically reduce the number oftensor-vector and -matrix multiplications for high dimensional tensors, and thereby rendercomputing tensor decompositions significantly cheaper both for sequential and parallelalgorithms.Finally, we diversify this main course of research with two extensions on similar themes.The first extension involves applying the tree-based computational framework to computingdense tensor decompositions, with an in-depth analysis of computational complexity andmethods to find optimal tree structures minimizing the computational cost. The second workfocuses on adapting effective communication and partitioning schemes of our parallel sparsetensor decomposition algorithms to the widely used non-negative matrix factorization problem,through which we obtain significantly better parallel scalability over the state of the artimplementations.We point out that all theoretical results in the thesis are nicely corroborated by parallelexperiments on both shared-memory and distributed-memory platforms. With these fastalgorithms as well as their tuned implementations for modern HPC architectures, we rendertensor and matrix decomposition algorithms amenable to use for analyzing massive scaledatasets. Décompositions des tenseurs Algorithmes parallèles Partitionnement des hypergraphes Factorisation des matrices Arbres de dimension Tensor decompositions Parallel algorithms Hypergraph partitioning Matrix factorization Dimension trees
232	Fatoração de inteiros e grupos sobre conicas / Interger fatorization and groups on conics Souza, Vera Lúcia Graciani de 13 August 2018 (has links) Orientador: Martinho da Costa Araujo / Dissertação (mestrado profissional) - Universidade Estadual de Campinas. Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-13T09:34:59Z (GMT). No. of bitstreams: 1 Souza_VeraLuciaGracianide_M.pdf: 1138543 bytes, checksum: 893a12834a41de0bedf2e0e1c71a3fc1 (MD5) Previous issue date: 2009 / Resumo: Este trabalho tem por objetivo fatorar número inteiro utilizando pontos racionais sobre o círculo unitário. Igualmente pretende determinar alguns grupos sobre cônicas. A pesquisa inicia com os conceitos básicos de Álgebra e Teoria dos Números, que fundamentam que o conjunto de pontos racionais sobre o círculo unitário tem uma estrutura de grupo. Desse conjunto é possível estender a idéia de grupo de pontos racionais sobre o círculo para pontos racionais sobre cônicas. Para encontrar os pontos racionais sobre o círculo foi usada uma parametrização do círculo por funções trigonométricas. Para cada ponto sobre o círculo unitário está associado um ângulo com o eixo positivo das abscissas, portanto adicionar pontos sobre o círculo equivale adicionar seus ângulos correspondentes. Com a operação "adição" de pontos sobre o círculo é possível definir uma estrutura de grupo que é utilizada para fatorar números inteiros. Para a cônica, a operação "adição" é determinada algebricamente ao calcular o coeficiente angular da reta que passa por dois pontos dados e o elemento neutro dessa cônica, também justificada geometricamente. No trabalho foram determinados os grupos de pontos racionais sobre cônicas e demonstrado alguns resultados sobre esses grupos usando os resíduos quadráticos e finalizando com a dedução de alguns resultados sobre a soma das coordenadas dos pontos sobre uma cônica. / Abstract: The objective of this paper is to factorize integer number using rational points on the unitary circle. Also, it intends to determinate some groups on the conics. The research begins with the basic concepts of Algebra and Number Theory ensuring that the rational points set on the unitary circle has a structure of group. From this set is possible to extend the idea of rational points on the circle toward rational points on conics. In order to find the rational points on the circle a parametrization by trigonometric function on it was used. For each point on the unitary circle it is associated an angle with abscissa positive axis, therefore adding points on the circle equals to add its corresponding angles. With the operation of "addition" points on the circle it is possible to define a group structure that is used to factorize integer numbers. For the conic, the "addition" operation is algebraically determinated when the angle coeficient of the line is calculated that joins two given points and the neutral element of that conic, which is geometrically justified. In the research the rational points groups on the conics were determined, and some result on these groups using quadratic residues were demonstrated, and it was finalized with the deduction of some results concerning the coordinates sum of points on a conics. / Mestrado / Mestre em Matemática Campos algébricos Teoria dos números Fatoração (Matemática) Algoritmos Teoremas de reciprocidade Algebraic fields Reciprocity theorms Number theory Factorization (Mathematics) Algorithms
233	Blind inverse imaging with positivity constraints / Inversion aveugle d'images avec contraintes de positivité Lecharlier, Loïc 09 September 2014 (has links) Dans les problèmes inverses en imagerie, on suppose généralement connu l’opérateur ou matrice décrivant le système de formation de l’image. De façon équivalente pour un système linéaire, on suppose connue sa réponse impulsionnelle. Toutefois, ceci n’est pas une hypothèse réaliste pour de nombreuses applications pratiques pour lesquelles cet opérateur n’est en fait pas connu (ou n’est connu qu’approximativement). On a alors affaire à un problème d’inversion dite “aveugle”. Dans le cas de systèmes invariants par translation, on parle de “déconvolution aveugle” car à la fois l’image ou objet de départ et la réponse impulsionnelle doivent être estimées à partir de la seule image observée qui résulte d’une convolution et est affectée d’erreurs de mesure. Ce problème est notoirement difficile et pour pallier les ambiguïtés et les instabilités numériques inhérentes à ce type d’inversions, il faut recourir à des informations ou contraintes supplémentaires, telles que la positivité qui s’est avérée un levier de stabilisation puissant dans les problèmes d’imagerie non aveugle. La thèse propose de nouveaux algorithmes d’inversion aveugle dans un cadre discret ou discrétisé, en supposant que l’image inconnue, la matrice à inverser et les données sont positives. Le problème est formulé comme un problème d’optimisation (non convexe) où le terme d’attache aux données à minimiser, modélisant soit le cas de données de type Poisson (divergence de Kullback-Leibler) ou affectées de bruit gaussien (moindres carrés), est augmenté par des termes de pénalité sur les inconnues du problème. La stratégie d’optimisation consiste en des ajustements alternés de l’image à reconstruire et de la matrice à inverser qui sont de type multiplicatif et résultent de la minimisation de fonctions coût “surrogées” valables dans le cas positif. Le cadre assez général permet d’utiliser plusieurs types de pénalités, y compris sur la variation totale (lissée) de l’image. Une normalisation éventuelle de la réponse impulsionnelle ou de la matrice est également prévue à chaque itération. Des résultats de convergence pour ces algorithmes sont établis dans la thèse, tant en ce qui concerne la décroissance des fonctions coût que la convergence de la suite des itérés vers un point stationnaire. La méthodologie proposée est validée avec succès par des simulations numériques relatives à différentes applications telle que la déconvolution aveugle d'images en astronomie, la factorisation en matrices positives pour l’imagerie hyperspectrale et la déconvolution de densités en statistique. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Mathématiques Sciences exactes et naturelles Matrix inversion Matrices -- Inversion Nonnegative Matrix Factorization Inverse Problems/Problèmes inverses
234	Nízko-dimenzionální faktorizace pro "End-To-End" řečové systémy / Low-Dimensional Matrix Factorization in End-To-End Speech Recognition Systems Gajdár, Matúš January 2020 (has links) The project covers automatic speech recognition with neural network training using low-dimensional matrix factorization. We are describing time delay neural networks with factorization (TDNN-F) and without it (TDNN) in Pytorch language. We are comparing the implementation between Pytorch and Kaldi toolkit, where we achieve similar results during experiments with various network architectures. The last chapter describes the impact of a low-dimensional matrix factorization on End-to-End speech recognition systems and also a modification of the system with TDNN(-F) networks. Using specific network settings, we were able to achieve better results with systems using factorization. Additionally, we reduced the complexity of training by decreasing network parameters with the use of TDNN(-F) networks.
235	Apprentissage de représentations en imagerie fonctionnelle / Learning representations from functional MRI data Mensch, Arthur 28 September 2018 (has links) Grâce aux avancées technologiques dans le domaine de l'imagerie fonctionnelle cérébrale, les neurosciences cognitives accumulent une grande quantité de cartes spatiales décrivant de manière quantitative l'activité neuronale suscitée dans le cerveau humain en réponse à des tâches ou des stimuli spécifiques, ou de manière spontanée. Dans cette thèse, nous nous intéressons particulièrement aux données issues de l'imagerie par résonance magnétique fonctionnelle (IRMf), que nous étudions dans un cadre d'apprentissage statistique. Notre objectif est d'apprendre des modèles d'activité cérébrale à partir des données. Nous proposons différentes nouvelles manières de profiter de la grande quantité de données IRMf disponible. Tout d'abord, nous considérons les données d'IRMf de repos, que nous traitons grâce à des méthodes de factorisation de matrices. Nous présentons de nouvelles méthodes pour calculer en un temps raisonnable une factorisation parcimonieuse de matrices constituées de centaines d'enregistrements d'IRMf. Cela nous permet d'extraire des réseaux fonctionnels à partir de données d'une envergure inédite. Notre méthode principale introduit une réduction aléatoire de la dimension des données dans une boucle d'apprentissage en ligne. L'algorithme proposé converge plus de 10 fois plus vite que les meilleures méthodes existantes, pour différentes configurations et sur plusieurs jeux de données. Nous effectuons une vaste validation expérimentale de notre approche de sous-échantillonnage aléatoire. Nous proposons une étude théorique des propriétés de convergence de notre algorithme. Dans un second temps, nous nous intéressons aux données d'IRMf d'activation. Nous démontrons comment agréger différents études acquises suivant des protocoles distincts afin d'apprendre des modèles joints de décodage plus justes et interprétables. Notre modèle multi-études apprend à réduire la dimension des images cérébrales en entrée en même temps qu'il apprend à les classifier, pour chacune des études, à partir de leurs représentations réduites. Cela suscite un transfert d'information entre les études. En conséquence, notre modèle multi-étude est plus performant que les modèles de décodage appris sur chaque étude séparément. Notre approche identifie une représentation universellement pertinente de l'activité cérébrale, supportée par un petit nombre de réseaux optimisés pour l'identification de tâches. / Thanks to the advent of functional brain-imaging technologies, cognitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider data from functional Magnetic Resonance Imaging (fMRI), that we study in a machine learning setting: we learn a model of brain activity that should generalize on unseen data. After reviewing the standard fMRI data analysis techniques, we propose new methods and models to benefit from the recently released large fMRI data repositories. Our goal is to learn richer representations of brain activity. We first focus on unsupervised analysis of terabyte-scale fMRI data acquired on subjects at rest (resting-state fMRI). We perform this analysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary learning on hundreds of fMRI records in reasonable time. Our leading approach relies on introducing randomness in stochastic optimization loops and provides speed-up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic subsampling approach, for datasets from fMRI, hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix factorization problem. We then turn to work with fMRI data acquired on subject undergoing behavioral protocols (task fMRI). We investigate how to aggregate data from many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks from brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to learning to decode these images from their reduced representation. This fosters transfer learning in between studies, as we learn the undocumented cognitive common aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting. Finally, on a related topic, we show how to use dynamic programming within end-to-end trained deep networks, with applications in natural language processing. Apprentissage Imagerie fonctionnel Factorisation de matrice Optimisation Dictionnaire Machine learning Functional imaging Matrix factorization Dictionary Optimization Deep learning
236	Characterisation of a developer’s experience fields using topic modelling Déhaye, Vincent January 2020 (has links) Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology. Computer Systems Datorsystem
237	A high order method for simulation of fluid flow in complex geometries Stålberg, Erik January 2005 (has links) A numerical high order difference method is developed for solution of the incompressible Navier-Stokes equations. The solution is determined on a staggered curvilinear grid in two dimensions and by a Fourier expansion in the third dimension. The description in curvilinear body-fitted coordinates is obtained by an orthogonal mapping of the equations to a rectangular grid where space derivatives are determined by compact fourth order approximations. The time derivative is discretized with a second order backward difference method in a semi-implicit scheme, where the nonlinear terms are linearly extrapolated with second order accuracy. An approximate block factorization technique is used in an iterative scheme to solve the large linear system resulting from the discretization in each time step. The solver algorithm consists of a combination of outer and inner iterations. An outer iteration step involves the solution of two sub-systems, one for prediction of the velocities and one for solution of the pressure. No boundary conditions for the intermediate variables in the splitting are needed and second order time accurate pressure solutions can be obtained. The method has experimentally been validated in earlier studies. Here it is validated for flow past a circular cylinder as an example of a physical test case and the fourth order method is shown to be efficient in terms of grid resolution. The method is applied to external flow past a parabolic body and internal flow in an asymmetric diffuser in order to investigate the performance in two different curvilinear geometries and to give directions for future development of the method. It is concluded that the novel formulation of boundary conditions need further investigation. A new iterative solution method for prediction of velocities allows for larger time steps due to less restrictive convergence constraints. / QC 20101221 Technology Navier-Stokes equations compact high order difference methods approximate factorization curivilinear staggered grids TEKNIKVETENSKAP Engineering and Technology Teknik och teknologier
238	Machine Learning Approaches to Reveal Discrete Signals in Gene Expression Changlin Wan (12450321) 24 April 2022 (has links) <p>Gene expression is an intricate process that determines different cell types and functions in metazoans, where most of its regulation is communicated through discrete signals, like whether the DNA helix is open, whether an enzyme binds with its target, etc. Understanding the regulation signals of the selective expression process is essential to the full comprehension of biological mechanism and complicated biological systems. In this research, we seek to reveal the discrete signals in gene expression by utilizing novel machine learning approaches. Specifically, we focus on two types of data chromatin conformation capture (3C) and single cell RNA sequencing (scRNA-seq). To identify potential regulators, we utilize a new hypergraph neural network to predict genome interactions, where we find the gene co-regulation may result from the shared enhancer element. To reveal the discrete expression state from scRNA-seq data, we propose a novel model called LTMG that considered the biological noise and showed better goodness of fitting compared with existing models. Next, we applied Boolean matrix factorization to find the co-regulation modules from the identified expression states, where we revealed the general property in cancer cells across different patients. Lastly, to find more reliable modules, we analyze the bias in the data and proposed BIND, the first algorithm to quantify the column- and row-wise bias in binary matrix.</p> Computational Biology Machine Learning Matrix Factorization Discrete data analysis Statistical Modeling
239	Automatic tag suggestions using a deep learning recommender system / Automatiska taggförslag med hjälp av ett rekommendationssystem baserat på djupinlärning Malmström, David January 2019 (has links) This study was conducted to investigate how well deep learning can be applied to the field of tag recommender systems. In the context of an image item, tag recommendations can be given based on tags already existing on the item, or on item content information. In the current literature, there are no works which jointly models the tags and the item content information using deep learning. Two tag recommender systems were developed. The first one was a highly optimized hybrid baseline model based on matrix factorization and Bayesian classification. The second one was based on deep learning. The two models were trained and evaluated on a dataset of user-tagged images and videos from Flickr. A percentage of the tags were withheld, and the evaluation consisted of predicting them. The deep learning model attained the same prediction recall as the baseline model in the main evaluation scenario, when half of the tags were withheld. However, the baseline model generalized better to the sparser scenarios, when a larger number of tags were withheld. Furthermore, the computations of the deep learning model were much more time-consuming than the computations of the baseline model. These results led to the conclusion that the baseline model was more practical, but that there is much potential in using deep learning for the purpose of tag recommendation. / Den här studien genomfördes i syfte att undersöka hur effektivt djupinlärning kan användas för att konstruera rekommendationssystem för taggar. När det gäller bildobjekt så kan taggar rekommenderas baserat på taggar som redan förekommer på objektet, samt på information om objektet. I dagens forskning finns det inte några publikationer som presenterar ett rekommendationssystem baserat på djupinlärning som bygger på att gemensamt använda taggarna och objektsinformationen. I studien har två rekommendationssystem utvecklats. Det första var en referensmodell, ett väloptimerat hybridsystem baserat på matrisfaktorisering och bayesiansk klassificering. Det andra systemet baserades på djupinlärning. De två modellerna tränades och utvärderades på en datamängd med bilder och videor taggade av användare från Flickr. En procentandel av taggarna var undanhållna, och utvärderingen gick ut på att förutsäga dem. Djupinlärningsmodellen gav förutsägelser av samma kvalitet som referensmodellen i det primära utvärderingsscenariot, där hälften av taggarna var undanhållna. Referensmodellen gav dock bättre resultat i de scenarion där alla eller nästan alla taggar var undanhållna. Dessutom så var beräkningarna mycket mer tidskrävande för djupinlärningsmodellen jämfört med referensmodellen. Dessa resultat ledde till slutsatsen att referensmodellen var mer praktisk, men att det finns mycket potential i att använda djupinlärningssystem för att rekommendera taggar. Computer and Information Sciences Data- och informationsvetenskap
240	Improving Food Recipe Suggestions with Hierarchical Classification of Food Recipes / Förbättrande rekommendationer av matrecept genom hierarkisk klassificering av matrecept Fathollahzadeh, Pedram January 2018 (has links) Making personalized recommendations has become a central part in many platforms, and is continuing to grow with more access to massive amounts of data online. Giving recommendations based on the interests of the individual, rather than recommending items that are popular, increases the user experience and can potentially attract more customers when done right. In order to make personalized recommendations, many platforms resort to machine learning algorithms. In the context of food recipes, these machine learning algorithms tend to consist of hybrid methods between collaborative filtering, content-based methods and matrix factorization. Most content-based approaches are ingredient based and can be very fruitful. However, fetching every single ingredient for recipes and processing them can be computationally expensive. Therefore, this paper investigates if clustering recipes according to what cuisine they belong to and what the main protein is can also improve rating predictions compared to when only collaborative filtering and matrix factorization methods are employed. This suggested content-based approach has a structure of a hierarchical classification, where recipes are first clustered into what cuisine group they belong to, then the specific cuisine and finally what the main protein is. The results suggest that the content-based approach can improve the predictions slightly but not significantly, and can help reduce the sparsity of the rating matrix to some extent. However, it suffers from heavily sparse data with respect to how many rating predictions it can give. / Att ge personliga rekommendationer har blivit en central del av många plattformar och fortsätter att bli det då tillgången till stora mängder data har ökat. Genom att ge personliga rekommendationer baserat på användares intressen, istället för att rekommendera det som är populärt, förbättrar användarupplevelsen och kan attrahera fler kunder. För att kunna producera personliga rekommendationer så vänder sig många plattformar till maskininlärningsalgoritmer. När det kommer till matrecept, så brukar dessa maskininlärningsalgoritmer bestå av hybrida metoder som sammanfogar collaborative filtering, innehållsbaserande metoder och matrisfaktorisering. De flesta innehållsbaserande metoderna baseras på ingredienser och har visats vara effektiva. Däremot, så kan det vara kostsamt för datorer att ta hänsyn till varenda ingrediens i varje matrecept. Därför undersöker denna artikel om att klassificera recept hierarkiskt efter matkultur och huvudprotein också kan förbättra rekommendationer när bara collaborative filtering och matrisfaktorisering används. Denna innehållsbaserande metod har en struktur av hierarkisk klassificering, där recept först indelas efter matkultur, specifik matkultur och till slut vad huvudproteinet är. Resultaten visar att innehållsbaserande metoden kan förbättra receptförslagen, men inte på en statistisk signifikant nivå, och kan reducera gleshet i en matris med tillsatta betyg från olika användare med olika recept något. Däremot så påverkas den ansenligt när det är glest med tillgänglighet av data. / Eatit collaborative filtering content-based method matrix factorization recommender systems hierarchical classification food recipes Computer Sciences Datavetenskap (datalogi)

Search results