Global ETD Search

11	Implementação do algoritmo da fft-2d para rede de transputers / Implementation of 2D-FFT algorithm for a transputer network Regina Fumie Eto 16 February 1993 (has links) O presente trabalho descreve a implementação do algoritmo discreto da FFT-2D, numa rede de transputers. Primeiramente a implementação seqüencial do algoritmo é analisado, em seguida são apresentados algumas técnicas de paralelização, bem como sua aplicação no algoritmo da FFT-2D. Finalmente são apresentados os resultados do desempenho obtido por redes compostas de um, dois e quatro transputers / The present work describes the implementation of the discrete FFT-2D algorithm in a distributed transputer network. First a seqüencial implementation of the algorithm is presented. Then some parallelization techniques are analyzed and applied to the FFT-2D algorithm. Finally the obtained performance is presented for networks containing one, two and four transputers Desempenho FFT Processamento paralelo Transputer FFT Parallel processing Speed-up Transputer
12	A Flexible, Natural Deduction, Automated Reasoner for Quick Deployment of Non-Classical Logic Mukhopadhyay, Trisha 20 March 2019 (has links) Automated Theorem Provers (ATP) are software programs which carry out inferences over logico-mathematical systems, often with the goal of finding proofs to some given theorem. ATP systems are enormously powerful computer programs, capable of solving immensely difficult problems. Currently, many automated theorem provers exist like E, vampire, SPASS, ACL2, Coq etc. However, all the available theorem provers have some common problems: (1) Current ATP systems tend not to try to find proofs entirely on their own. They need help from human experts to supply lemmas, guide the proof, etc. (2) There is not a single proof system available which provides fully automated platforms for both First Order Logic (FOL) and other Higher Order Logic (HOL). (3) Finally, current proof systems do not have an easy way to quickly deploy and reason over new logical systems, which a logic researcher may want to test. In response to these problems, I introduce the MATR framework. MATR is a platform-independent, codelet-based (independently operating processes) proof system with an easy-to-use Graphical User Interface (GUI), where multiple codelets can be selected based on the formal system desired. MATR provides a platform for different proof strategies like deduction and backward reasoning, along with different formal systems such as non-classical logics. It enables users to design their own proof system by selecting from the list of codelets without needing to write an ATP from scratch. Automated Theorem Prover Codelets First-Order Logic Higher-Order Logic Speed Up Theorem Workspace Computer Engineering
13	First-order distributed optimization methods for machine learning with linear speed-up Spiridonoff, Artin 27 September 2021 (has links) This thesis considers the problem of average consensus, distributed centralized and decentralized Stochastic Gradient Descent (SGD) and their communication requirements. Namely, (i) an algorithm for achieving consensus among a collection of agents is studied and its convergence to the average is shown, in the presence of link failures and delays. The new results improve upon the prior works by relaxing some of the restrictive assumptions on communication, such as bounded link failures and intercommunication intervals, as well as allowing for message delays. Next, (ii) a Robust Asynchronous Stochastic Gradient Push (RASGP) algorithm is proposed to minimize the separable objective F(z) = 𝛴_{i=1}^n f_i(z) in a harsh network setting characterized by asynchronous updates, message losses and delays, and directed communication. RASGP is shown to asymptotically perform as well as the best bounds on a centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all local functions f_i(z). Next, (iii) a new communication strategy for Local SGD is proposed, a centralized optimization algorithm where workers make local updates and then calculate their average values only once in a while. It is shown that linear speed-up in the number of workers N is possible, using only O(N) communication (averaging) rounds, independent of the total number of iterations T. Empirical evidence suggests this bound is close to being tight as it is further shown that √N or N^{3/4} communications fail to achieve linear speed-up. Finally, (iv) under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging, which only uses a single round of communication, is shown to have optimal convergence rate asymptotically. Operations research Communication Local-SGD Optimization Speed-up Stochastic gradient descent
14	Graph compression using graph grammars Peternek, Fabian Hans Adolf January 2018 (has links) This thesis presents work done on compressed graph representations via hyperedge replacement grammars. It comprises two main parts. Firstly the RePair compression scheme, known for strings and trees, is generalized to graphs using graph grammars. Given an object, the scheme produces a small context-free grammar generating the object (called a “straight-line grammar”). The theoretical foundations of this generalization are presented, followed by a description of a prototype implementation. This implementation is then evaluated on real-world and synthetic graphs. The experiments show that several graphs can be compressed stronger by the new method, than by current state-of-the-art approaches. The second part considers algorithmic questions of straight-line graph grammars. Two algorithms are presented to traverse the graph represented by such a grammar. Both algorithms have advantages and disadvantages: the first one works with any grammar but its runtime per traversal step is dependent on the input grammar. The second algorithm only needs constant time per traversal step, but works for a restricted class of grammars and requires quadratic preprocessing time and space. Finally speed-up algorithms are considered. These are algorithms that can decide specific problems in time depending only on the size of the compressed representation, and might thus be faster than a traditional algorithm would on the decompressed structure. The idea of such algorithms is to reuse computation already done for the rules of the grammar. The possible speed-ups achieved this way is proportional to the compression ratio of the grammar. The main results here are a method to answer “regular path queries”, and to decide whether two grammars generate isomorphic trees.
15	From Word Embeddings to Large Vocabulary Neural Machine Translation Jean, Sébastien 04 1900 (has links) Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14). / In this thesis, we examine some properties of word embeddings and propose a technique to handle large vocabularies in neural machine translation. We first look at a well-known analogy task and examine the effect of position-dependent weights, the choice of combination function and the impact of supervised learning. We then show that simple embeddings learnt with translational contexts can match or surpass the state of the art on the TOEFL synonym detection task and on the recently introduced SimLex-999 word similarity gold standard. Finally, motivated by impressive results obtained by small-vocabulary (30,000 words) neural machine translation embeddings on some word similarity tasks, we present a GPU-friendly approach to increase the vocabulary size by more than an order of magnitude. Despite originally being developed for obtaining the embeddings only, we show that this technique actually works quite well on actual translation tasks, especially for English to French (WMT'14). Analogies Word2vec TOEFL SimLex-999 Accélération Apprentissage profond GPU WMT Speed-up Deep learning GPU-friendly
16	Reconnaissance de langages en temps réel par des automates cellulaires avec contraintes Borello, Alex 12 December 2011 (has links) Dans cette thèse, on s'intéresse aux automates cellulaires en tant que modèle de calcul permettant de reconnaître des langages. Dans un tel domaine, il est toujours difficile d'établir des résultats négatifs, typiquement de prouver qu'un langage donné n'est pas reconnu en une certaine fonction de temps par une certaine classe d'automates. On se focalisera en particulier sur les classes de faible complexité comme le temps réel, au sujet desquelles de nombreuses questions restent ouvertes.Dans une première partie, on propose plusieurs manières d'affaiblir encore les classes de langages étudiées, permettant ainsi d'obtenir des exemples de résultats négatifs. Dans une seconde partie, on montre un théorème d'accélération par automate cellulaire d'un modèle séquentiel, les automates finis oublieux. Ce modèle est une version a priori affaiblie, mais non triviale, des automates finis à plusieurs têtes de lecture. / This document deals with cellular automata as a model of computation used to recognise languages. In such a domain, it is always difficult to provide negative results, that is, typically, to prove that a given language is not recognised in some function of time by some class of automata. The document focuses in particular on the low-complexity classes such as real time, about which a lot of questions remain open since several decades.In a first part, several techniques to weaken further still these classes of languages are investigated, thereby bringing examples of negative results. A second part is dedicated to the comparison of cellular automata with another model language recognition, namely multi-head finite automata. This leads to speed-up theorem when finite automata are oblivious, which makes them a priori weaker than in the general case but leaves them a nontrivial power. Automate cellulaire Reconnaissance de langages Classes de faible complexité Automate fini multitête oublieux Théorème d'accélération Cellular automaton Language recognition Low-complexity classes Oblivious multi-head finite automaton Speed-up theorem
17	Analys av förutsättningar för småskalig vertikalaxlad vindkraft i byggd miljö : En förstudie åt AirSon Engineering AB Mesropyan, Diana, Espling, Joel January 2021 (has links) The aim of this study is to act as a pre-study for AirSon Engineering AB regarding a small scale wind turbine they want to install. This by means of collecting data about the windspeeds present at said location, taking into consideration local regulations and doing calculations on the turbulence in the wind, which is affected by nearby obstacles and by the house which the wind turbine is planned to be installed next to. The study puts specific focus on three main questions, namely: What kind of production is to be expected? What is the economy like for the installation of the wind turbine? What are the possibilities/limitations from a construction perspective? An analysis of the location of the installation and a comparison of the selected wind turbines and their respective dimensions, potential for production and economics is presented in this study. The emphasis of the analysis is on examining the respective wind turbines and determining which of them that best fits AirSon with regard to all three aspects. Different graphs have been used to compile wind data and the program used for this study is Matlab. In addition to that the program Excel has been used to compile and present the results for the various wind turbines. A total of nine small scale vertical axis wind turbines with rated output powers between 1 kW and 10 kW have been examined and are presented as potential suggestions for installation. The manufacturers whose wind turbines are presented are Aeolos, Toyoda and Ropatec. By the end of this study a recommendation from the authors, to AirSon is presented for which windturbine the authors think might fit best. The plan for this study is furthermore to act as a guidance so that AirSon can, following up on the study, directly work toward acquiring and installing said wind turbine. / Syftet med detta arbete är att genomföra en förstudie åt AirSon Engineering AB rörande ett småskaligt vindkraftverk som de vill installera. Arbetet innefattar insamling av data om vindhastigheter från den befintliga platsen samt hänsynstagande till de lokala omständigheterna, till exempel vad gäller turbulensen i vinden, som påverkas av närliggande hinder och av huset vilket vindkraftverket planeras att installeras intill. Examensarbetet har sitt fokus specifikt på tre huvudfrågor, nämligen: Vad för produktion förväntas från platsen? Hur ser ekonomin ut för installationen av vindkraftverket? Vad finns det för möjligheter/hinder ur ett kontruktionsperspektiv? I arbetet presenteras en analys av platsen som vindkraftverket ska installeras på samt en analys av utvalda vindkraftverk med hänsyn till storlek, produktion och ekonomi. Analysens tyngdpunkt ligger i att undersöka det vindkraftverk som passar in bäst för AirSon med hänsyn till alla tre aspekterna. Till analysen har olika grafer använts för att sammanställa vinddata och programmet som användes till detta är Matlab. För att sammanställa och presentera de olika vindkraftverken har Excel använts. Totalt sett har nio småskaliga vertikalaxlade vindkraftverk med märkeffekter mellan 1 kW och 10 kW undersökts och tagits fram som potentiella förslag för installation. De tillverkare vars vindkraftverk presenteras är Aeolos, Toyoda och Ropatec. I slutet av detta arbete presenteras en rekommendation för vilket vindkraftverk som författarna anser vara lämpligast för AirSon. Avsikten med arbetet är att vägleda AirSon tillräckligt mycket för att de ska kunna använda analysen för att installera verket. Energy technology vertical axis wind power small scale wind power turbulence ridge speed-up effect. Energiteknik vertikalaxlad vindkraft småskalig vindkraft turbulens backeffekt. Energy Engineering Energiteknik
18	Accelerated Deep Learning using Intel Xeon Phi Viebke, André January 2015 (has links) Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, deep learning algorithms instead learn from experience when performing a task. However, for the algorithm to learn require training, which is a high computational challenge. High Performance Computing can help ease the burden through parallelization, thereby reducing the training time; this is essential to fully utilize the algorithms in practice. Numerous work targeting GPUs have investigated ways to speed up the training, less attention have been paid to the Intel Xeon Phi coprocessor. In this thesis we present a parallelized implementation of a Convolutional Neural Network (CNN), a deep learning architecture, and our proposed parallelization scheme, CHAOS. Additionally a theoretical analysis and a performance model discuss the algorithm in detail and allow for predictions if even more threads are available in the future. The algorithm is evaluated on an Intel Xeon Phi 7120p, Xeon E5-2695v2 2.4 GHz and Core i5 661 3.33 GHz using various architectures and thread counts on the MNIST dataset. Findings show a 103.5x, 99.9x, 100.4x speed up for the large, medium, and small architecture respectively for 244 threads compared to 1 thread on the coprocessor. Moreover, a 10.9x - 14.1x (large to small) speed up compared to the sequential version running on Xeon E5. We managed to decrease training time from 7 days on the Core i5 and 31 hours on the Xeon E5, to 3 hours on the Intel Xeon Phi when training our large network for 15 epochs Machine Learning Deep Learning Supervised Deep Learning Intel Xeon Phi Convolutional Neural Network CNN High Performance Computing CHAOS parallel computing coprocessor MIC speed up performance model evaluation Computer Sciences Datavetenskap (datalogi)
19	Ladění výkonnosti databází / Database Performance Tuning Paulíček, Martin January 2011 (has links) The objective of this thesis was to study problems of an insufficient database processing performance and possibilities how to improve the performance with database configuration file optimizations, more powerful hardware and parallel processing. The master thesis contains a description of relational databases, storage media and different forms of parallelism with its use in database systems. There is a description of the developed software for testing database performance. The program was used for testing several database configuration files, various hardware, different database systems (PostgreSQL, Oracle) and advantages of parallel method "partitioning". Test reports and evaluation results are described at the end of the thesis.

Search results