Global ETD Search

11	First-Order Algorithms for Communication Efficient Distributed Learning Khirirat, Sarit January 2019 (has links) Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p> Communication efficient learning Optimization algorithms Quantization Error compensation First-order algorithms Stochastic gradient descent Control Engineering Reglerteknik
12	Deep learning for portfolio optimization MBITI, JOHN N. January 2021 (has links) In this thesis, an optimal investment problem is studied for an investor who can only invest in a financial market modelled by an Itô-Lévy process; with one risk free (bond) and one risky (stock) investment possibility. We present the dynamic programming method and the associated Hamilton-Jacobi-Bellman (HJB) equation to explicitly solve this problem. It is shown that with purification and simplification to the standard jump diffusion process, closed form solutions for the optimal investment strategy and for the value function are attainable. It is also shown that, an explicit solution can be obtained via a finite training of a neural network using Stochastic gradient descent (SGD) for a specific case. Portfolio optimization optimal portfolio jump diffusion Itô-Lévy process stochastic control dynamic programming HJB equation utility optimization stochastic gradient descent Deep learning neural network. Mathematics Matematik
13	Decentralized Learning over Wireless Networks with Imperfect and Constrained Communication : To broadcast, or not to broadcast, that is the question! Dahl, Martin January 2023 (has links) The ever-expanding volume of data generated by network devices such as smartphones, personal computers, and sensors has significantly contributed to the remarkable advancements in artificial intelligence (AI) and machine learning (ML) algorithms. However, effectively processing and learning from this extensive data usually requires substantial computational capabilities centralized in a server. Moreover, concerns regarding data privacy arise when collecting training data from distributed network devices. To address these challenges, collaborative ML with decentralized data has emerged as a promising solution for large-scale machine learning across distributed devices, driven by the parallel computing and learning trends. Collaborative and distributed ML can be broadly classified into two types: server-based and fully decentralized, based on whether the model aggregation is coordinated by a parameter server or performed in a decentralized manner through peer-to-peer communication. In cases where communication between devices occurs over wireless links, which are inherently imperfect, unreliable, and resource-constrained, how can we design communication protocols to achieve the best learning performance? This thesis investigates decentralized learning using decentralized stochastic gradient descent, an established algorithm for decentralized ML, in a novel setting with imperfect and constrained communication. "Imperfect" implies that communication can fail and "constrained" implies that communication resources are limited. The communication across a link between two devices is modeled as a binary event with either success or failure, depending on if multiple neighbouring devices are transmitting information. To compensate for communication failures, every communication round can have multiple communication slots, which are limited and must be carefully allocated over the learning process. The quality of communication is quantified by introducing normalized throughput, describing the ratio of successful links in a communication round. To decide when devices should broadcast, both random and deterministic medium access policies have been developed with the goal of maximizing throughput, which has shown very efficient learning performance. Finally, two schemes for allocating communication slots over communication rounds have been defined and simulated: Delayed-Allocation and the Periodic-Allocation schemes, showing that it is better to allocate slots late rather than early, and neither too frequently nor infrequently which can depend on several factors and requires further study Decentralized Learning Medium Access Control Wireless Communications Machine Learning Imperfect Communication Resource-Constrained Resource Allocation Scheduling Communication Systems Kommunikationssystem
14	Large scale support vector machines algorithms for visual classification / Algorithmes de SVM pour la classification d'images à grande échelle Doan, Thanh-Nghi 07 November 2013 (has links) Nous présentons deux contributions majeures : 1) une combinaison de plusieurs descripteurs d’images pour la classification à grande échelle, 2) des algorithmes parallèles de SVM pour la classification d’images à grande échelle. Nous proposons aussi un algorithme incrémental et parallèle de classification lorsque les données ne peuvent plus tenir en mémoire vive. / We have proposed a novel method of combination multiple of different features for image classification. For large scale learning classifiers, we have developed the parallel versions of both state-of-the-art linear and nonlinear SVMs. We have also proposed a novel algorithm to extend stochastic gradient descent SVM for large scale learning. A class of large scale incremental SVM classifiers has been developed in order to perform classification tasks on large datasets with very large number of classes and training data can not fit into memory. Séparateurs à Vaste Marge Apprentissage incrémental et parallèle Descente de gradient stochastique Algorithme de bagging équilibré Support vector machines Incremental learning method Stochastic gradient descent Balanced bagging Large scale classification
15	推薦系統資料插補改良法-電影推薦系統應用 / Improving recommendations through data imputation-with application for movie recommendation 楊智博, Yang, Chih Po Unknown Date (has links) 現今許多網路商店或電子商務將產品銷售給消費者的過程中，皆使用推薦系統的幫助來提高銷售量。如亞馬遜公司(Amazon)、Netflix，深入了解顧客的使用習慣，建構專屬的推薦系統並進行個性化的推薦商品給每一位顧客。推薦系統應用的技術分為協同過濾和內容過濾兩大類，本研究旨在探討協同過濾推薦系統中潛在因子模型方法，利用矩陣分解法找出評分矩陣。在Koren等人(2009)中，將矩陣分解法的演算法大致分為兩種，隨機梯度下降法(Stochastic gradient descent)與交替最小平方法(Alternating least squares)。本研究主要研究目的有三項，一為比較交替最小平方法與隨機梯度下降法的預測能力，二為兩種矩陣分解演算法在加入偏誤項後的表現，三為先完成交替最小平方法與隨機梯度下降法，以其預測值對原始資料之遺失值進行資料插補，再利用奇異值分解法對完整資料做矩陣分解，觀察其前後方法的差異。研究結果顯示，隨機梯度下降法所需的運算時間比交替最小平方法所需的運算時間少。另外，完成兩種矩陣分解演算法後，將預測值插補遺失值，進行奇異值分解的結果也顯示預測能力有提升。 / Recommender system has been largely used by Internet companies such Amazon and Netflix to make recommendations for Internet users. Techniques for recommender systems can be divided into content filtering approach and collaborative filtering approach. Matrix factorization is a popular method for collaborative filtering approach. It minimizes the object function through stochastic gradient descent and alternating least squares. This thesis has three goals. First, we compare the alternating least squares method and stochastic gradient descent method. Secondly, we compare the performance of matrix factorization method with and without the bias term. Thirdly, we combine singular value decomposition and matrix factorization. As expected, we found the stochastic gradient descent takes less time than the alternating least squares method, and the the matrix factorization method with bias term gives more accurate prediction. We also found that combining singular value decomposition with matrix factorization can improve the predictive accuracy. 推薦系統矩陣分解隨機梯度下降奇異值分解 Recommender systems Matrix Factorization Stochastic Gradient Descent Alternating Least Squares Singular Value Decomposition
16	Evaluation of computational methods for data prediction Erickson, Joshua N. 03 September 2014 (has links) Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included. / Graduate / 0984 / erickson@uvic.ca regression classification evaluation data analysis prediction machine learning supervised learning linear regression support vector machine nearest neighbor logistic regression gaussian naive bayes stochastic gradient descent scikit-learn decision tree
17	Aplicação do Word2vec e do Gradiente descendente dstocástico em tradução automática Aguiar, Eliane Martins de 30 May 2016 (has links) Submitted by Eliane Martins de Aguiar (elianemart@gmail.com) on 2016-08-01T21:03:09Z No. of bitstreams: 1 dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2016-08-03T20:29:34Z (GMT) No. of bitstreams: 1 dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Approved for entry into archive by Maria Almeida (maria.socorro@fgv.br) on 2016-08-23T20:12:35Z (GMT) No. of bitstreams: 1 dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) / Made available in DSpace on 2016-08-23T20:12:54Z (GMT). No. of bitstreams: 1 dissertacao-ElianeMartins.pdf: 6062037 bytes, checksum: 14567c2feca25a81d6942be3b8bc8a65 (MD5) Previous issue date: 2016-05-30 / O word2vec é um sistema baseado em redes neurais que processa textos e representa pa- lavras como vetores, utilizando uma representação distribuída. Uma propriedade notável são as relações semânticas encontradas nos modelos gerados. Este trabalho tem como objetivo treinar dois modelos utilizando o word2vec, um para o Português e outro para o Inglês, e utilizar o gradiente descendente estocástico para encontrar uma matriz de tradução entre esses dois espaços. Natural language processing Neural networks Word2vec Continuos bag-of-words Stochastic gradient descent Machine translation Processamento de linguagem natural Redes neurais Gradiente descendente estocástico Tradução automática Matemática Redes neurais (Computação)
18	Contributions à l'apprentissage grande échelle pour la classification d'images / Contributions to large-scale learning for image classification Akata, Zeynep 06 January 2014 (has links) La construction d'algorithmes classifiant des images à grande échelle est devenue une t^ache essentielle du fait de la difficulté d'effectuer des recherches dans les immenses collections de données visuelles non-etiquetées présentes sur Internet. L'objetif est de classifier des images en fonction de leur contenu pour simplifier la gestion de telles bases de données. La classification d'images à grande échelle est un problème complexe, de par l'importance de la taille des ensembles de données, tant en nombre d'images qu'en nombre de classes. Certaines de ces classes sont dites "fine-grained" (sémantiquement proches les unes des autres) et peuvent même ne contenir aucun représentant étiqueté. Dans cette thèse, nous utilisons des représentations à l'état de l'art d'images et nous concentrons sur des méthodes d'apprentissage efficaces. Nos contributions sont (1) un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle et (2) un nouvel algorithme basé sur l'incorporation d'étiquettes pour apprendre sur des données peu abondantes. En premier lieu, nous introduisons un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle, dans un cadre entièrement supervisé. Il compare plusieurs fonctions objectifs pour apprendre des classifieurs linéaires, tels que "un contre tous", "multiclasse", "classement", "classement avec pondération" par descente de gradient stochastique. Ce banc d'essai se conclut en un ensemble de recommandations pour la classification à grande échelle. Avec une simple repondération des données, la stratégie "un contre tous" donne des performances meilleures que toutes les autres. Par ailleurs, en apprentissage en ligne, un pas d'apprentissage assez petit s'avère suffisant pour obtenir des résultats au niveau de l'état de l'art. Enfin, l'arrêt prématuré de la descente de gradient stochastique introduit une régularisation qui améliore la vitesse d'entraînement ainsi que la capacité de régularisation. Deuxièmement, face à des milliers de classes, il est parfois difficile de rassembler suffisamment de données d'entraînement pour chacune des classes. En particulier, certaines classes peuvent être entièrement dénuées d'exemples. En conséquence, nous proposons un nouvel algorithme adapté à ce scénario d'apprentissage dit "zero-shot". Notre algorithme utilise des données parallèles, comme les attributs, pour incorporer les classes dans un espace euclidien. Nous introduisons par ailleurs une fonction pour mesurer la compatibilité entre image et étiquette. Les paramètres de cette fonction sont appris en utilisant un objectif de type "ranking". Notre algorithme dépasse l'état de l'art pour l'apprentissage "zero-shot", et fait preuve d'une grande flexibilité en permettant d'incorporer d'autres sources d'information parallèle, comme des hiérarchies. Il permet en outre une transition sans heurt du cas "zero-shot" au cas où peu d'exemples sont disponibles. / Building algorithms that classify images on a large scale is an essential task due to the difficulty in searching massive amount of unlabeled visual data available on the Internet. We aim at classifying images based on their content to simplify the manageability of such large-scale collections. Large-scale image classification is a difficult problem as datasets are large with respect to both the number of images and the number of classes. Some of these classes are fine grained and they may not contain any labeled representatives. In this thesis, we use state-of-the-art image representations and focus on efficient learning methods. Our contributions are (1) a benchmark of learning algorithms for large scale image classification, and (2) a novel learning algorithm based on label embedding for learning with scarce training data. Firstly, we propose a benchmark of learning algorithms for large scale image classification in the fully supervised setting. It compares several objective functions for learning linear classifiers such as one-vs-rest, multiclass, ranking and weighted average ranking using the stochastic gradient descent optimization. The output of this benchmark is a set of recommendations for large-scale learning. We experimentally show that, online learning is well suited for large-scale image classification. With simple data rebalancing, One-vs-Rest performs better than all other methods. Moreover, in online learning, using a small enough step size with respect to the learning rate is sufficient for state-of-the-art performance. Finally, regularization through early stopping results in fast training and a good generalization performance. Secondly, when dealing with thousands of classes, it is difficult to collect sufficient labeled training data for each class. For some classes we might not even have a single training example. We propose a novel algorithm for this zero-shot learning scenario. Our algorithm uses side information, such as attributes to embed classes in a Euclidean space. We also introduce a function to measure the compatibility between an image and a label. The parameters of this function are learned using a ranking objective. Our algorithm outperforms the state-of-the-art for zero-shot learning. It is flexible and can accommodate other sources of side information such as hierarchies. It also allows for a smooth transition from zero-shot to few-shots learning. Descente de gradient stochastique Incorporation d'étiquettes Apprentissage Large Scale Image Classification Linear SVMs Stochastic Gradient Descent Zero-Shot Learning Few-Shots Learning 004 510
19	STATISTICAL PHYSICS OF CELL ADHESION COMPLEXES AND MACHINE LEARNING Adhikari, Shishir Raj 26 August 2019 (has links) No description available. Biophysics Physics
20	A Study of the Loss Landscape and Metastability in Graph Convolutional Neural Networks / En studie av lösningslandskapet och metastabilitet i grafiska faltningsnätverk Larsson, Sofia January 2020 (has links) Many novel graph neural network models have reported an impressive performance on benchmark dataset, but the theory behind these networks is still being developed. In this thesis, we study the trajectory of Gradient descent (GD) and Stochastic gradient descent (SGD) in the loss landscape of Graph neural networks by replicating Xing et al. [1] study for feed-forward networks. Furthermore, we empirically examine if the training process could be accelerated by an optimization algorithm inspired from Stochastic gradient Langevin dynamics and what effect the topology of the graph has on the convergence of GD by perturbing its structure. We find that the loss landscape is relatively flat and that SGD does not encounter any significant obstacles during its propagation. The noise-induced gradient appears to aid SGD in finding a stationary point with desirable generalisation capabilities when the learning rate is poorly optimized. Additionally, we observe that the topological structure of the graph plays a part in the convergence of GD but further research is required to understand how. / Många nya grafneurala nätverk har visat imponerande resultat på existerande dataset, dock är teorin bakom dessa nätverk fortfarande under utveckling. I denna uppsats studerar vi banor av gradientmetoden (GD) och den stokastiska gradientmetoden (SGD) i lösningslandskapet till grafiska faltningsnätverk genom att replikera studien av feed-forward nätverk av Xing et al. [1]. Dessutom undersöker vi empiriskt om träningsprocessen kan accelereras genom en optimeringsalgoritm inspirerad av Stokastisk gradient Langevin dynamik, samt om grafens topologi har en inverkan på konvergensen av GD genom att ändra strukturen. Vi ser att lösningslandskapet är relativt plant och att bruset inducerat i gradienten verkar hjälpa SGD att finna stabila stationära punkter med önskvärda generaliseringsegenskaper när inlärningsparametern har blivit olämpligt optimerad. Dessutom observerar vi att den topologiska grafstrukturen påverkar konvergensen av GD, men det behövs mer forskning för att förstå hur. Graph neural networks Graph convolutional neural networks Loss landscape Gradient descent Stochastic gradient descent Stochastic gradient Langevin dynamics Grafneurala nätverk grafiska faltningsnätverk lösningslandskap gradientmetoder stokastiska gradientmetoder stokastisk gradient Langevin dynamik Probability Theory and Statistics Sannolikhetsteori och statistik

Search results