Global ETD Search

51	Aceleração de uma variação do problema k-nearest neighbors / Acceleration of a variation of the K-nearest neighbors problem Morais Neto, Jorge Peixoto de 29 January 2014 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2014-11-25T13:07:50Z No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2014-11-25T14:42:09Z (GMT) No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-11-25T14:42:09Z (GMT). No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-01-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Let M be a metric space and let P be a subset of M. The well known k-nearest neighbors problem (KNN) consists in finding, given q 2 M, the k elements of P with are closest to q according to the metric of M. We discuss a variation of KNN for a particular class of pseudo-metric spaces, described as follows. Let m 2 N be a natural number and let d be the Euclidean distance in Rm. Given p 2 Rm: p := (p1; : : : ; pm) let C (p) be the set of the m rotations of p’s coordinates: C (p) := f(p1; : : : ; pm); (p2; : : : ; pm; p1); : : : ; (pm; p1; : : : ; pm􀀀1)g we define the special distance de as: de(p;q) := min p02C (p) d(p0;q): de is a pseudo-metric, and (Rm;de) is a pseudo-metric space. The class of pseudo-metric spaces under discussion is f(Rm;de) j m 2 N:g The brute force approach is too costly for instances of practical size. We present a more efficient solution employing parallelism, the FFT (fast Fourier transform) and the fast elimination of unfavorable training vectors.We describe a program—named CyclicKNN —which implements this solution.We report the speedup of this program over serial brute force search, processing reference datasets. / Seja M um espaço métrico e P um subconjunto de M. O conhecido problema k vizinhos mais próximos (k-neareast neighbors, KNN) consiste em encontrar, dado q 2 M, os k elementos de P mais próximos de q conforme a métrica de M. Abordamos uma variação do problema KNN para uma classe particular de espaços pseudo-métricos, descrita a seguir. Seja m 2 N um natural e seja d a distância euclidiana em Rm. Dado um vetor p 2 Rm: p := (p1; : : : ; pm) seja C (p) o conjunto das m rotações das coordenadas de p: C (p) := f(p1; : : : ; pm); (p2; : : : ; pm; p1); : : : ; (pm; p1; : : : ; pm􀀀1)g definimos a distância especial de como: de(p;q) := min p02C (p) d(p0;q): de é uma pseudo-métrica, e (Rm;de) é um espaço pseudo-métrico. A classe de espaços pseudo-métricos abordada é (Rm;de) j m 2 N: A solução por força bruta é cara demais para instâncias de tamanho prático. Nós apresentamos uma solução mais eficiente empregando paralelismo, a FFT (transformada rápida de Fourier) e a eliminação rápida de vetores de treinamento desfavoráveis. Desenvolvemos um programa—chamado CyclicKNN—que implementa essa solução. Reportamos o speedup desse programa em comparação com a força bruta sequencial, processando bases de dados de referência. Aceleração Análise de dados multidimensionais K-nearest neighbors K vizinhos mais próximos Matriz circulante Processamento de imagem Programação paralela Transformada rápida de Fourier Acceleration K-nearest neighbors Circulant matrix Image processing Fast Fourier transform Multidimensional data analysis Parallel programming
52	以文件分類技術預測股價趨勢 / Predicting Trends of Stock Prices with Text Classification Techniques 陳俊達, Chen, Jiun-da Unknown Date (has links) 股價的漲跌變化是由於證券市場中眾多不同投資人及其投資決策後所產生的結果。然而，影響股價變動的因素眾多且複雜，新聞也屬於其中一種，新聞事件不但是投資人用來得知該股票上市公司的相關營運資訊的主要媒介，同時也是影響投資人決定或變更其股票投資策略的主要因素之一。本研究提出以新聞文件做為股價漲跌預測系統的基礎架構，透過文字探勘技術及分類技術來建置出能預測當日個股收盤股價漲跌趨勢之系統。本研究共提出三種分類模型，分別是簡易貝氏模型、k最近鄰居模型以及混合模型，並設計了三組實驗，分別是分類器效能的比較、新聞樣本資料深度的比較、以及新聞樣本資料廣度的比較來檢驗系統的預測效能。實驗結果顯示，本研究所提出的分類模型可以有效改善相關研究中整體正確率高但各個類別的預測效能卻差異甚大的情況。而對於影響投資人獲利與否的關鍵類別"漲"及類別"跌"的平均預測效能上，本研究所提出的這三種分類模型亦同時具有良好的成效，可以做為投資人進行投資決策時的有效參考依據。 / Stocks' closing price levels can provide hints about investors' aggregate demands and aggregate supplies in the stock trading markets. If the level of a stock's closing price is higher than its previous closing price, it indicates that the aggregate demand is stronger than the aggregate supply in this trading day. Otherwise, the aggregate demand is weaker than the aggregate supply. It would be profitable if we can predict the individual stock's closing price level. For example, in case that one stock's current price is lower than its previous closing price. We can do the proper strategies(buy or sell) to gain profit if we can predict the stock's closing price level correctly in advance. In this thesis, we propose and evaluate three models for predicting individual stock's closing price in the Taiwan stock market. These models include a naïve Bayes model, a k-nearest neighbors model, and a hybrid model. Experimental results show the proposed methods perform better than the NewsCATS system for the "UP" and "DOWN" categories. 股價預測文字探勘簡易貝氏模型 k最近鄰居模型混合模型 Stock Price Prediction text mining naïve Bayesian models k-nearest neighbors models hybrid models
53	Data-Driven Predictions of Heating Energy Savings in Residential Buildings Lindblom, Ellen, Almquist, Isabelle January 2019 (has links) Along with the increasing use of intermittent electricity sources, such as wind and sun, comes a growing demand for user flexibility. This has paved the way for a new market of services that provide electricity customers with energy saving solutions. These include a variety of techniques ranging from sophisticated control of the customers’ home equipment to information on how to adjust their consumption behavior in order to save energy. This master thesis work contributes further to this field by investigating an additional incentive; predictions of future energy savings related to indoor temperature. Five different machine learning models have been tuned and used to predict monthly heating energy consumption for a given set of homes. The model tuning process and performance evaluation were performed using 10-fold cross validation. The best performing model was then used to predict how much heating energy each individual household could save by decreasing their indoor temperature by 1°C during the heating season. The highest prediction accuracy (of about 78%) is achieved with support vector regression (SVR), closely followed by neural networks (NN). The simpler regression models that have been implemented are, however, not far behind. According to the SVR model, the average household is expected to lower their heating energy consumption by approximately 3% if the indoor temperature is decreased by 1°C. Building Energy Machine Learning Energy Savings Heating Energy Indoor Temperature Neural Networks Support Vector Regression Random Forest Ridge Regression K-Nearest Neighbors Energy Systems Energisystem Computer and Information Sciences Data- och informationsvetenskap
54	SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION Jingzhao Dai (6642491) 11 June 2019 (has links) <p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p> Sparse discrete wavelet decomposition Bandpass filter banks Support vector machine Support vector classification Random forest K nearest neighbors Convolutional neural networks
55	Data, learning and privacy in recommendation systems / Données, apprentissage et respect de la vie privée dans les systèmes de recommandation Mittal, Nupur 25 November 2016 (has links) Les systèmes de recommandation sont devenus une partie indispensable des services et des applications d’internet, en particulier dû à la surcharge de données provenant de nombreuses sources. Quel que soit le type, chaque système de recommandation a des défis fondamentaux à traiter. Dans ce travail, nous identifions trois défis communs, rencontrés par tous les types de systèmes de recommandation: les données, les modèles d'apprentissage et la protection de la vie privée. Nous élaborons différents problèmes qui peuvent être créés par des données inappropriées en mettant l'accent sur sa qualité et sa quantité. De plus, nous mettons en évidence l'importance des réseaux sociaux dans la mise à disposition publique de systèmes de recommandation contenant des données sur ses utilisateurs, afin d'améliorer la qualité des recommandations. Nous fournissons également les capacités d'inférence de données publiques liées à des données relatives aux utilisateurs. Dans notre travail, nous exploitons cette capacité à améliorer la qualité des recommandations, mais nous soutenons également qu'il en résulte des menaces d'atteinte à la vie privée des utilisateurs sur la base de leurs informations. Pour notre second défi, nous proposons une nouvelle version de la méthode des k plus proches voisins (knn, de l'anglais k-nearest neighbors), qui est une des méthodes d'apprentissage parmi les plus populaires pour les systèmes de recommandation. Notre solution, conçue pour exploiter la nature bipartie des ensembles de données utilisateur-élément, est évolutive, rapide et efficace pour la construction d'un graphe knn et tire sa motivation de la grande quantité de ressources utilisées par des calculs de similarité dans les calculs de knn. Notre algorithme KIFF utilise des expériences sur des jeux de données réelles provenant de divers domaines, pour démontrer sa rapidité et son efficacité lorsqu'il est comparé à des approches issues de l'état de l'art. Pour notre dernière contribution, nous fournissons un mécanisme permettant aux utilisateurs de dissimuler leur opinion sur des réseaux sociaux sans pour autant dissimuler leur identité. / Recommendation systems have gained tremendous popularity, both in academia and industry. They have evolved into many different varieties depending mostly on the techniques and ideas used in their implementation. This categorization also marks the boundary of their application domain. Regardless of the types of recommendation systems, they are complex and multi-disciplinary in nature, involving subjects like information retrieval, data cleansing and preprocessing, data mining etc. In our work, we identify three different challenges (among many possible) involved in the process of making recommendations and provide their solutions. We elaborate the challenges involved in obtaining user-demographic data, and processing it, to render it useful for making recommendations. The focus here is to make use of Online Social Networks to access publicly available user data, to help the recommendation systems. Using user-demographic data for the purpose of improving the personalized recommendations, has many other advantages, like dealing with the famous cold-start problem. It is also one of the founding pillars of hybrid recommendation systems. With the help of this work, we underline the importance of user’s publicly available information like tweets, posts, votes etc. to infer more private details about her. As the second challenge, we aim at improving the learning process of recommendation systems. Our goal is to provide a k-nearest neighbor method that deals with very large amount of datasets, surpassing billions of users. We propose a generic, fast and scalable k-NN graph construction algorithm that improves significantly the performance as compared to the state-of-the art approaches. Our idea is based on leveraging the bipartite nature of the underlying dataset, and use a preprocessing phase to reduce the number of similarity computations in later iterations. As a result, we gain a speed-up of 14 compared to other significant approaches from literature. Finally, we also consider the issue of privacy. Instead of directly viewing it under trivial recommendation systems, we analyze it on Online Social Networks. First, we reason how OSNs can be seen as a form of recommendation systems and how information dissemination is similar to broadcasting opinion/reviews in trivial recommendation systems. Following this parallelism, we identify privacy threat in information diffusion in OSNs and provide a privacy preserving algorithm for the same. Our algorithm Riposte quantifies the privacy in terms of differential privacy and with the help of experimental datasets, we demonstrate how Riposte maintains the desirable information diffusion properties of a network. Systèmes de recommandation K plus proches voisins Diffusion de l'information Respect de la vie privée Qualité et quantité de données Apprentissage Recommendation systems K nearest neighbors Information diffusion Privacy in social networks Data quality and quantity Learning in recommendation systems
56	An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification Alsouda, Yasser January 2019 (has links) Noise is defined as any undesired sound. Urban noise and its effect on citizens area significant environmental problem, and the increasing level of noise has become a critical problem in some cities. Fortunately, noise pollution can be mitigated by better planning of urban areas or controlled by administrative regulations. However, the execution of such actions requires well-established systems for noise monitoring. In this thesis, we present a solution for noise measurement and classification using a low-power and inexpensive IoT unit. To measure the noise level, we implement an algorithm for calculating the sound pressure level in dB. We achieve a measurement error of less than 1 dB. Our machine learning-based method for noise classification uses Mel-frequency cepstral coefficients for audio feature extraction and four supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregating, and random forest). We evaluate our approach experimentally with a dataset of about 3000 sound samples grouped in eight sound classes (such as car horn, jackhammer, or street music). We explore the parameter space of the four algorithms to estimate the optimal parameter values for the classification of sound samples in the dataset under study. We achieve noise classification accuracy in the range of 88% – 94%. urban noise sound pressure level (SPL) internet of things (IoT) machine learning support vector machine (SVM) k-nearest neighbors (KNN) bootstrap aggregating (Bagging) random forest Elektroteknik och elektronik
57	Um estudo sobre a extraÃÃo de caracterÃsticas e a classificaÃÃo de imagens invariantes Ã rotaÃÃo extraÃdas de um sensor industrial 3D / A study on the extraction of characteristics and the classification of invariant images through the rotation of an 3D industrial sensor Rodrigo Dalvit Carvalho da Silva 08 May 2014 (has links) CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior / Neste trabalho, Ã discutido o problema de reconhecimento de objetos utilizando imagens extraÃdas de um sensor industrial 3D. NÃs nos concentramos em 9 extratores de caracterÃsticas, dos quais 7 sÃo baseados nos momentos invariantes (Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, Bessel-Fourier e Gaussian-Hermite), um outro Ã baseado na Transformada de Hough e o Ãltimo na anÃlise de componentes independentes, e, 4 classificadores, Naive Bayes, k-Vizinhos mais PrÃximos, MÃquina de Vetor de Suporte e Rede Neural Artificial-Perceptron Multi-Camadas. Para a escolha do melhor extrator de caracterÃsticas, foram comparados os seus desempenhos de classificaÃÃo em termos de taxa de acerto e de tempo de extraÃÃo, atravÃs do classificador k-Vizinhos mais PrÃximos utilizando distÃncia euclidiana. O extrator de caracterÃsticas baseado nos momentos de Zernike obteve as melhores taxas de acerto, 98.00%, e tempo relativamente baixo de extraÃÃo de caracterÃsticas, 0.3910 segundos. Os dados gerados a partir deste, foram apresentados a diferentes heurÃsticas de classificaÃÃo. Dentre os classificadores testados, o classificador k-Vizinhos mais PrÃximos, obteve a melhor taxa mÃdia de acerto, 98.00% e, tempo mÃdio de classificaÃÃo relativamente baixo, 0.0040 segundos, tornando-se o classificador mais adequado para a aplicaÃÃo deste estudo. / In this work, the problem of recognition of objects using images extracted from a 3D industrial sensor is discussed. We focus in 9 feature extractors (where seven are based on invariant moments -Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, BesselâFourier and Gaussian-Hermite-, another is based on the Hough transform and the last one on independent component analysis), and 4 classifiers (Naive Bayes, k-Nearest Neighbor, Support Vector machines and Artificial Neural Network-Multi-Layer Perceptron). To choose the best feature extractor, their performance was compared in terms of classification accuracy rate and extraction time by the k-nearest neighbors classifier using euclidean distance. The feature extractor based on Zernike moments, got the best hit rates, 98.00 %, and relatively low time feature extraction, 0.3910 seconds. The data generated from this, were presented to different heuristic classification. Among the tested classifiers, the k-nearest neighbors classifier achieved the highest average hit rate, 98.00%, and average time of relatively low rank, 0.0040 seconds, thus making it the most suitable classifier for the implementation of this study. Transformada de Hough Naive Bayes MÃquina de Vetor de Suporte Invariant Moments Hough Transform Independent Component Analysis Naive Bayes k-Nearest Neighbors Support Vector Machine ENGENHARIA ELETRICA
58	Kombination von terrestrischen Aufnahmen und Fernerkundungsdaten mit Hilfe der kNN-Methode zur Klassifizierung und Kartierung von Wäldern / Combination of field data and remote sensing data with the knn-method (k-nearest neighbors method) for classification and mapping of forests Stümer, Wolfgang 30 August 2004 (has links) (PDF) Bezüglich des Waldes hat sich in den letzten Jahren seitens der Politik und Wirtschaft ein steigender Informationsbedarf entwickelt. Zur Bereitstellung dieses Bedarfes stellt die Fernerkundung ein wichtiges Hilfsmittel dar, mit dem sich flächendeckende Datengrundlagen erstellen lassen. Die k-nächsten-Nachbarn-Methode (kNN-Methode), die terrestrische Aufnahmen mit Fernerkundungsdaten kombiniert, stellt eine Möglichkeit dar, diese Datengrundlage mit Hilfe der Fernerkundung zu verwirklichen. Deshalb beschäftigt sich die vorliegende Dissertation eingehend mit der kNN-Methode. An Hand der zwei Merkmale Grundfläche (metrische Daten) und Totholz (kategoriale Daten) wurden umfangreiche Berechnungen durchgeführt, wobei verschiedenste Variationen der kNN-Methode berücksichtigt wurden. Diese Variationen umfassen verschiedenste Einstellungen der Distanzfunktion, der Wichtungsfunktion und der Anzahl k-nächsten Nachbarn. Als Fernerkundungsdatenquellen kamen Landsat- und Hyperspektraldaten zum Einsatz, die sich sowohl von ihrer spektralen wie auch ihrer räumlichen Auflösung unterscheiden. Mit Hilfe von Landsat-Szenen eines Gebietes von verschiedenen Zeitpunkten wurde außerdem der multitemporale Ansatz berücksichtigt. Die terrestrische Datengrundlage setzt sich aus Feldaufnahmen mit verschiedenen Aufnahmedesigns zusammen, wobei ein wichtiges Kriterium die gleichmäßige Verteilung von Merkmalswerten (z.B. Grundflächenwerten) über den Merkmalsraum darstellt. Für die Durchführung der Berechnungen wurde ein Programm mit Visual Basic programmiert, welches mit der Integrierung aller Funktionen auf der Programmoberfläche eine benutzerfreundliche Bedienung ermöglicht. Die pixelweise Ausgabe der Ergebnisse mündete in detaillierte Karten und die Verifizierung der Ergebnisse wurde mit Hilfe des prozentualen Root Mean Square Error und der Bootstrap-Methode durchgeführt. Die erzielten Genauigkeiten für das Merkmal Grundfläche liegen zwischen 35 % und 67 % (Landsat) bzw. zwischen 65 % und 67 % (HyMapTM). Für das Merkmal Totholz liegen die Übereinstimmungen zwischen den kNN-Schätzern und den Referenzwerten zwischen 60,0 % und 73,3 % (Landsat) und zwischen 60,0 % und 63,3 % (HyMapTM). Mit den erreichten Genauigkeiten bietet sich die kNN-Methode für die Klassifizierung von Beständen bzw. für die Integrierung in Klassifizierungsverfahren an. / Mapping forest variables and associated characteristics is fundamental for forest planning and management. The following work describes the k-nearest neighbors (kNN) method for improving estimations and to produce maps for the attributes basal area (metric data) and deadwood (categorical data). Several variations within the kNN-method were tested, including: distance metric, weighting function and number of neighbors. As sources of remote sensing Landsat TM satellite images and hyper spectral data were used, which differ both from their spectral as well as their spatial resolutions. Two Landsat scenes from the same area acquired September 1999 and 2000 regard multiple approaches. The field data for the kNN- method comprise tree field measurements which were collected from the test site Tharandter Wald (Germany). The three field data collections are characterized by three different designs. For the kNN calculation a program with integration all kNN functions were developed. The relative root mean square errors (RMSE) and the Bootstrap method were evaluated in order to find optimal parameters. The estimation accuracy for the attribute basal area is between 35 % and 67 % (Landsat) and 65 % and 67 % (HyMapTM). For the attribute deadwood is the accuracy between 60 % and 73 % (Landsat) and 60 % and 63 % (HyMapTM). Recommendations for applying the kNN method for mapping and regional estimation are provided. Fernerkundung Forstwissenschaften Grundfläche Klassifizierung Totholz k-nächste Nachbarn Methode kNN-Methode basal area classification deadwood forest k-nearest neighbors method knn-method remote sensing ddc:630 rvk:ZI 9560 Fernerkundung Forstwirtschaft Stichprobe Totholz Waldbestand
59	Generalized N-body problems: a framework for scalable computation Riegel, Ryan Nelson 13 January 2014 (has links) In the wake of the Big Data phenomenon, the computing world has seen a number of computational paradigms developed in response to the sudden need to process ever-increasing volumes of data. Most notably, MapReduce has proven quite successful in scaling out an extensible class of simple algorithms to even hundreds of thousands of nodes. However, there are some tasks---even embarrassingly parallelizable ones---that neither MapReduce nor any existing automated parallelization framework is well-equipped to perform. For instance, any computation that (naively) requires consideration of all pairs of inputs becomes prohibitively expensive even when parallelized over a large number of worker nodes. Many of the most desirable methods in machine learning and statistics exhibit these kinds of all-pairs or, more generally, all-tuples computations; accordingly, their application in the Big Data setting may seem beyond hope. However, a new algorithmic strategy inspired by breakthroughs in computational physics has shown great promise for a wide class of computations dubbed generalized N-body problems (GNBPs). This strategy, which involves the simultaneous traversal of multiple space-partitioning trees, has been applied to a succession of well-known learning methods, accelerating each asymptotically and by orders of magnitude. Examples of these include all-k-nearest-neighbors search, k-nearest-neighbors classification, k-means clustering, EM for mixtures of Gaussians, kernel density estimation, kernel discriminant analysis, kernel machines, particle filters, the n-point correlation, and many others. For each of these problems, no overall faster algorithms are known. Further, these dual- and multi-tree algorithms compute either exact results or approximations to within specified error bounds, a rarity amongst fast methods. This dissertation aims to unify a family of GNBPs under a common framework in order to ease implementation and future study. We start by formalizing the problem class and then describe a general algorithm, the generalized fast multipole method (GFMM), capable of solving all problems that fit the class, though with varying degrees of speedup. We then show O(N) and O(log N) theoretical run-time bounds that may be obtained under certain conditions. As a corollary, we derive the tightest known general-dimensional run-time bounds for exact all-nearest-neighbors and several approximated kernel summations. Next, we implement a number of these algorithms in a commercial database, empirically demonstrating dramatic asymptotic speedup over their conventional SQL implementations. Lastly, we implement a fast, parallelized algorithm for kernel discriminant analysis and apply it to a large dataset (40 million points in 4D) from the Sloan Digital Sky Survey, identifying approximately one million quasars with high accuracy. This exceeds the previous largest catalog of quasars in size by a factor of ten and has since been used in a follow-up study to confirm the existence of dark energy. Fast algorithms Generalized algorithms Tree codes Complexity analysis Database-resident computation Machine learning Nearest neighbors Kernel sums Affinity propagation Kernel discriminant analysis Quasar identification Big data Many-body problem Algorithms
60	Video Recommendation Based on Object Detection Nyberg, Selma January 2018 (has links) In this thesis, various machine learning domains have been combined in order to build a video recommender system that is based on object detection. The work combines two extensively studied research fields, recommender systems and computer vision, that also are rapidly growing and popular techniques on commercial markets. To investigate the performance of the approach, three different content-based recommender systems have been implemented at Spotify, which are based on the following video features: object detections, titles and descriptions, and user preferences. These systems have then been evaluated and compared against each other together with their hybridized result. Two algorithms have been implemented, the prediction and the top-N algorithm, where the former is the more reliable source for evaluating the system's performance. The evaluation of the system shows that the overall performance scores for predicting values of the users' liked and disliked videos are in the range from about 40 % to 70 % for the prediction algorithm and from about 15 % to 70 % for the top-N algorithm. The approach based on object detection performs worse in comparison to the other approaches. Hence, there seems to be is a low correlation between the user preferences and the video contents in terms of object detection data. Therefore, this data is not very suitable for describing the content of videos and using it in the recommender system. However, the results of this study cannot be generalized to apply for other systems before the approach has been evaluated in other environments and for various data sets. Moreover, there are plenty of room for refinements and improvements to the system, as well as there are many interesting research areas for future work. Spotify Machine Learning Artificial Intelligence Recommender Systems Content-Based Filtering Collaborative Filtering Hybrid Filtering Deep Learning Image Recognition Object Detection Natural Language Processing Paragraph Vectors Doc2Vec TensorFlow Classification K-Nearest Neighbors Cross-Validation Computer and Information Sciences Data- och informationsvetenskap

Search results