Global ETD Search

51	EMPIRICALLY EXAMINING THE ROADBLOCKS TO THE AUTOMATIC PARALLELIZATION AND ANALYSIS OF OPEN SOURCE SOFTWARE SYSTEMS Alnaeli, Saleh M. 20 April 2015 (has links) No description available. Computer Science parallelization inhibitors data dependency function calls function pointers virtual functions empirical study static analysis inter-procedural analysis automatic parallelization
52	Parallelization of the HIROMB ocean model Wilhelmsson, Tomas January 2002 (has links) No description available. parallelization ocean model operational forecast Baltic Sea multi-frontal solver domain decomposition ice dynamics
53	Arimaa challenge - statistická ohodnovací funce / Arimaa challenge - static evaluation function Hřebejk, Tomáš January 2014 (has links) Arimaa is a strategic board game for two players. It was designed with the aim that it will be hard to create a computer program that could defeat the best human players. In this thesis, we focus on the design of the static evaluation function for Arimaa. The purpose of a static evaluation function is to determine which player is leading in a given position and how significant the lead is. We have divided the problem into a few parts, which were solved separately. We paid most attention to the efficient recognition of important patterns on the board, such as goal threats. The basic element of the proposed evaluation function is mobility. For each piece, the number of steps that the piece would need to get to other places on the board is estimated. We also examined machine learning. We developed a new algorithm for learning a static evaluation function from expert games. An implementation of an Arimaa playing program, which demonstrates the proposed methods, is part of the thesis. Powered by TCPDF (www.tcpdf.org)
54	Energy efficient design of an adaptive switching algorithm for the iterative-MIMO receiver Mohd Tadza, Noor Zahrinah Binti January 2015 (has links) An efficient design dedicated for iterative-multiple-input multiple-output (MIMO) receiver systems is now imperative in our world since data demands are increasing tremendously in wireless networks. This puts a massive burden on the signal processing power especially in small receiver systems where power sources are often shared or limited. This thesis proposes an attractive solution to both the wireless signal processing and the architectural implementation design sides of the problem. A novel algorithm, dubbed the Adaptive Switching Algorithm, is proven to not only save more than a third of the energy consumption in the algorithmic design, but is also able to achieve an energy reduction of more than 50% in terms of processing power when the design is mapped onto state-of-the-art programmable hardware. Simulations are based in MatlabTM using the Monte Carlo approach, where multiple additive white Gaussian noise (AWGN) and Rayleigh fading channels for both fast and slow fading environments were investigated. The software selects the appropriate detection algorithm depending on the current channel conditions. The design for the hardware is based on the latest field programmable gate arrays (FPGA) hardware from Xilinx R , specifically the Virtex-5 and Virtex-7 chipsets. They were chosen during the experimental phase to verify the results in order to examine trends for energy consumption in the proposed algorithm design. Savings come from dynamic allocation of the hardware resources by implementing power minimization techniques depending on the processing requirements of the system. Having demonstrated the feasibility of the algorithm in controlled environments, realistic channel conditions were simulated using spatially correlated MIMO channels to test the algorithm’s readiness for real-world deployment. The proposed algorithm is placed in both the MIMO detector and the iterative-decoder blocks of the receiver. When the final full receiver design setup is implemented, it shows that the key to energy saving lies in the fact that both software and hardware components of the Adaptive Switching Algorithm adopt adaptivity in the respective designs. The detector saves energy by selecting suitable detection schemes while the decoder provides adaptivity by limiting the number of decoding iterations, both of which are updated in real-time. The overall receiver can achieve more than 70% energy savings in comparison to state-of-the-art iterative-MIMO receivers and thus it can be concluded that this level of ‘intelligence’ is an important direction towards a more efficient iterative-MIMO receiver designs in the future. 621.384
55	Parallelization of backward deleted distance calculation in graph based features using Hadoop Pillamari, Jayachandran January 1900 (has links) Master of Science / Department of Computing & Information Sciences / Daniel Andresen / The current project presents an approach to parallelize the calculation of Backward Deleted Distance (BDD) in Graph Based Features (GBF) computation using Hadoop. In this project the issues concerned with the calculation of BDD are identified and parallel computing technologies like Hadoop are applied to solve them. The project introduces a new algorithm to parallelize the APSP problem in BDD calculation using Hadoop Map Reduce feature. The project is implemented in Java and Hadoop technologies. The aim of this project is to parallelize the calculation of BDD thereby reducing GBF computation time. The process of BDD calculation is examined to identify the key places where it could be parallelized. Since the BDD calculation involves calculating the shortest paths between all pairs of given users, it can viewed as All Pairs Shortest Path (APSP) problem. The internal structure and implementation of Hadoop Map-Reduce framework is studied and applied to the process of APSP problem. The GBF features are one of the features set used in the Ontology classifiers. In the current project, GBF features are used to predict the friendship relationship between the users whose direct link is deleted. The computation involves calculating BDD between all pairs of users. The BDD for a user pair represents the shortest path between them when their direct link is deleted. In real terms, it is the shortest distance between them other than the direct path. The project uses train and test data sets consisting of positive instances and negative instances. The positive instances consist of user pairs having a friendship link between them whereas the negative instances do not have any direct link between them. Apache Hadoop is a latest emerging technology in the market introduced for scalable, distributed computing across clusters of computers. It has a Map Reduce framework used for developing applications which process large amounts of data in parallel on large clusters. The project is developed and implemented successfully and has the best time complexity. The project is tested for its reliability and performance. Different data sets are used in this testing by considering various factors and typical graph representations. The test results were analyzed to predict the behavior of the system. The test results show that the system has best speedup and considerably decreased the processing time from 10 hours to 20 minutes which is rewarding. Parallelization of APSP Hadoop Map Reduce Computer Science (0984) Information Technology (0489)
56	Fuzzy Gradual Pattern Mining Based on Multi-Core Architectures / Fouille de motifs graduels flous basée sur architectures multi-coeur Quintero Flores, Perfecto Malaquias 19 March 2013 (has links) Les motifs graduels visent à décrire des co-variations au sein des données et sont de la forme plus l'âge est important, plus le salaire est élevé. Ces motifs ont fait l'objet de nombreux travaux en fouille de données ces dernières années, du point de vue des définitions que peuvent avoir de tels motifs et d'un point de vue algorithmique pour les extraire efficacement. Ces définitions et algorithmes considèrent qu'il est possible d'ordonner de manière stricte les valeurs (par exemple l'âge, le salaire). Or, dans de nombreux champs applicatifs, il est difficile voire impossible d'ordonner de cette manière. Par exemple, quand l'on considère l'expression de gènes, dire que l'expression d'un gène est plus importante que l'expression d'un autre gène quand leurs expressions ne diffèrent qu'à la dixième décimale n'a pas de sens d'un point de vue biologique. Ainsi, nous proposons dans cette thèse une approche fondée sur les ordres flous. Les algorithmes étant très consommateurs tant en mémoire qu'en temps de calcul, nous proposons des optimisations d'une part du stockage des degrés flous et d'autre part de calcul parallélisé. Les expérimentations que nous avons menées sur des bases de données synthétiques et réelles montrent l'intérêt de notre approche. / Gradual patterns aim at describing co-variations of data such as the older, the higher the salary. They have been more and more studied from the data mining point of view in recent years, leading to several ways of defining their meaning and and several algorithms to automatically extract them.They consider that data can be ordered regarding the values taken on the attributes (e.g. the age and the salary).However, in many application domains, it is hardly possible to consider that data values are crisply ordered. For instance, when considering gene expression, it is not true, from the biological point of view, to say that Gene 1 is more expressed than Gene 2 if the levels of expression only differ from the tenth decimal. This thesis thus considers fuzzy orderings and propose both formal definitions and algorithms to extract gradual patterns considering fuzzy orderings. As these algorithms are both time and memory consuming, we propose some optimizations based on an efficient storage of the fuzzy ordering informationcoupled with parallel algorithms. Experimental results run on synthetic and real database show the interest or our proposal. Fouille de donnees Logique floue Parallelisation Bases de donnees Data mining Fuzzy logic Parallelization Databases
57	Avaliação de métodos de paralelização automática. / Evaluation of automatic parallelization methods. Ferlin, Edson Pedro 24 March 1997 (has links) Este trabalho aborda alguns conceitos e definições de processamento paralelo, que são aplicados a paralelização automática, e também às análises e condições para as dependências dos dados, de modo a aplicarmos os métodos de paralelização: Hiperplano, Transformação Unimodular, Alocação de Dados Sem Comunicação e Particionamento & Rotulação. Desta forma, transformamos um programa seqüencial em seu equivalente paralelo. Utilizando-os em um sistema de memória distribuída com comunicação através da passagem de mensagem MPI (Message-Passing Interface), e obtemos algumas métricas para efetuarmos as avaliações/comparações entre os métodos. / This work invoke some concepts and definitions about parallel processing, applicable in the automatic parallelization, and also the analysis and conditions for the data dependence, in order to apply the methods for parallelization: Hyperplane, Unimodular Transformation, Communication-Free Data Allocation and Partitioning & Labeling. On this way, transform a sequential program into an equivalent parallel one. Applying these programs on the distributed-memory system with communication through message-passing MPI (Message-Passing Interface), and we obtain some measurements for the evaluations/comparison between those methods. Métodos de paralelização Parallel processing Parallel programming Parallelization methods Processamento paralelo Programação paralela
58	Parallel-Node Low-Density Parity-Check Convolutional Code Encoder and Decoder Architectures Brandon, Tyler 06 1900 (has links) We present novel architectures for parallel-node low-density parity-check convolutional code (PN-LDPC-CC) encoders and decoders. Based on a recently introduced implementation-aware class of LDPC-CCs, these encoders and decoders take advantage of increased node-parallelization to simultaneously decrease the energy-per-bit and increase the decoded information throughput. A series of progressively improved encoder and decoder designs are presented and characterized using synthesis results with respect to power, area and throughput. The best of the encoder and decoder designs significantly advance the state-of-the-art in terms of both the energy-per-bit and throughput/area metrics. One of the presented decoders, for an Eb /N0 of 2.5 dB has a bit-error-rate of 106, takes 4.5 mm2 in a CMOS 90-nm process, and achieves an energy-per-decoded-information-bit of 65 pJ and a decoded information throughput of 4.8 Gbits/s. We implement an earlier non-parallel node LDPC-CC encoder, decoder and a channel emulator in silicon. We provide readers, via two sets of tables, the ability to look up our decoder hardware metrics, across four different process technologies, for over 1000 variations of our PN-LDPC-CC decoders. By imposing practical decoder implementation constraints on power or area, which in turn drives trade-offs in code size versus the number of decoder processors, we compare the code BER performance. An extensive comparison to known LDPC-BC/CC decoder implementations is provided. LDPC Convolutional Encoder Decoder Achitecture VLSI throughput energy-per-bit parallelization
59	Hybrid Time-Domain Methods and Wire Models for Computational Electromagnetics Ledfelt, Gunnar January 2001 (has links) No description available. FD-TD Finite element Finite volume Hybrid methods Thin wire subcell models Visualization Parallelization
60	Automatic Parallelization of Equation-Based Simulation Programs Aronsson, Peter January 2006 (has links) Modern equation-based object-oriented modeling languages which have emerged during the past decades make it easier to build models of large and complex systems. The increasing size and complexity of modeled systems requires high performance execution of the simulation code derived from such models. More efficient compilation and code optimization techniques can help to some extent. However, a number of heavy-duty simulation applications require the use of high performance parallel computers in order to obtain acceptable execution times. Unfortunately, the possible additional performance offered by parallel computer architectures requires the simulation program to be expressed in a way that makes the potential parallelism accessible to the parallel computer. Manual parallelization of computer programs is generally a tedious and error prone process. Therefore, it would be very attractive to achieve automatic parallelization of simulation programs. This thesis presents solutions to the research problem of finding practically usable methods for automatic parallelization of simulation codes produced from models in typical equationbased object-oriented languages. The methods have been implemented in a tool to automatically translate models in the Modelica modeling language to parallel codes which can be efficiently executed on parallel computers. The tool has been evaluated on several application models. The research problem includes the problem of how to extract a sufficient amount of parallelism from equations represented in the form of a data dependency graph (task graph), requiring analysis of the code at a level as detailed as individual expressions. Moreover, efficient clustering algorithms for building clusters of tasks from the task graph are also required. One of the major contributions of this thesis work is a new approach for merging fine-grained tasks by using a graph rewrite system. Results from using this method show that it is efficient in merging task graphs, thereby decreasing their size, while still retaining a reasonable amount of parallelism. Moreover, the new task-merging approach is generally applicable to programs which can be represented as static (or almost static) task graphs, not only to code from equation-based models. An early prototype called DSBPart was developed to perform parallelization of codes produced by the Dymola tool. The final research prototype is the ModPar tool which is part of the OpenModelica framework. Results from using the DSBpart and ModPar tools show that the amount of parallelism of complex models varies substantially between different application models, and in some cases can produce reasonable speedups. Also, different optimization techniques used on the system of equations from a model affect the amount of parallelism of the model and thus influence how much is gained by parallelization. Heavy-duty simulation Parallel computers Automatic parallelization Clustering Computer science Datavetenskap

Search results