Global ETD Search

131	Paralelní trénování neuronových sítí pro rozpoznávání řeči / Parallel Training of Neural Networks for Speech Recognition Veselý, Karel January 2010 (has links) This thesis deals with different parallelizations of training procedure for artificial neural networks. The networks are trained as phoneme-state acoustic descriptors for speech recognition. Two effective parallelization strategies were implemented and compared. The first strategy is data parallelization, where the training is split into several POSIX threads. The second strategy is node parallelization, which uses CUDA framework for general purpose computing on modern graphic cards. The first strategy showed a 4x speed-up, while using the second strategy we observed nearly 10x speed-up. The Stochastic Gradient Descent algorithm with error backpropagation was used for the training. After a short introduction, the second chapter of this thesis shows the motivation and introduces the neural networks into the context of speech recognition. The third chapter is theoretical, the anatomy of a neural network and the used training method are discussed. The following chapters are focused on the design and implementation of the project, while the phases of the iterative development are described. The last extensive chapter describes the setup of the testing system and reports the experimental results. Finally, the obtained results are concluded and the possible extensions of the project are proposed.
132	Applied Adaptive Optimal Design and Novel Optimization Algorithms for Practical Use Strömberg, Eric January 2016 (has links) The costs of developing new pharmaceuticals have increased dramatically during the past decades. Contributing to these increased expenses are the increasingly extensive and more complex clinical trials required to generate sufficient evidence regarding the safety and efficacy of the drugs. It is therefore of great importance to improve the effectiveness of the clinical phases by increasing the information gained throughout the process so the correct decision may be made as early as possible. Optimal Design (OD) methodology using the Fisher Information Matrix (FIM) based on Nonlinear Mixed Effect Models (NLMEM) has been proven to serve as a useful tool for making more informed decisions throughout the clinical investigation. The calculation of the FIM for NLMEM does however lack an analytic solution and is commonly approximated by linearization of the NLMEM. Furthermore, two structural assumptions of the FIM is available; a full FIM and a block-diagonal FIM which assumes that the fixed effects are independent of the random effects in the NLMEM. Once the FIM has been derived, it can be transformed into a scalar optimality criterion for comparing designs. The optimality criterion may be considered local, if the criterion is based on singe point values of the parameters or global (robust), where the criterion is formed for a prior distribution of the parameters. Regardless of design criterion, FIM approximation or structural assumption, the design will be based on the prior information regarding the model and parameters, and is thus sensitive to misspecification in the design stage. Model based adaptive optimal design (MBAOD) has however been shown to be less sensitive to misspecification in the design stage. The aim of this thesis is to further the understanding and practicality when performing standard and MBAOD. This is to be achieved by: (i) investigating how two common FIM approximations and the structural assumptions may affect the optimized design, (ii) reducing runtimes complex design optimization by implementing a low level parallelization of the FIM calculation, (iii) further develop and demonstrate a framework for performing MBAOD, (vi) and investigate the potential advantages of using a global optimality criterion in the already robust MBAOD. Nonlinear Mixed Effects Models Pharmacometrics Fisher Information Matrix Approximation Optimality Criterion Parallelization Model Based Adaptive Optimal Design
133	HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOL Johnson, Buxton L., Sr. 01 January 2017 (has links) Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization. computational electromagnetics method of moments electric field integral equation hybrid parallelization high performance computing Electromagnetics and Photonics
134	Analýza paralelizovatelnosti programů na základě jejich bytecode / Analýza paralelizovatelnosti programů na základě jejich bytecode Brabec, Michal January 2013 (has links) Analysis of automatic program parallelization based on bytecode There are many algorithms for automatic parallelization and this work explores the possible application of these algorithms to programs based on their bytecode or similar intermediate code. All these algorithms require the identification of independent code segments, because if two parts of code do not interfere with one another then they can be run in parallel without any danger of data corruption. Dependence testing is an extremely complicated problem and in general application, it is not algorithmically solvable. However, independences can be discovered in special cases and then they can be used as a basis for application of automatic parallelization, like the use of vector instructions. The first step is function inlining that allows the compiler to analyze the code more precisely, without unnecessary dependences caused by unknown functions. Next, it is necessary to identify all control flow constructs, like loops, and after that the compiler can attempt to locate dependences between the statements or instructions. Parallelization can be achieved only if the analysis discovered some independent parts in the code. This work is accompanied by an implementation of function inlining and code analysis for the .NET framework.
135	Automatic Parallelization using Pipelining for Equation-Based Simulation Languages Lundvall, Håkan January 2008 (has links) During the most recent decades modern equation-based object-oriented modeling and simulation languages, such as Modelica, have become available. This has made it easier to build complex and more detailed models for use in simulation. To be able to simulate such large and complex systems it is sometimes not enough to rely on the ability of a compiler to optimize the simulation code and reduce the size of the underlying set of equations to speed up the simulation on a single processor. Instead we must look for ways to utilize the increasing number of processing units available in modern computers. However to gain any increased performance from a parallel computer the simulation program must be expressed in a way that exposes the potential parallelism to the computer. Doing this manually is not a simple task and most modelers are not experts in parallel computing. Therefore it is very appealing to let the compiler parallelize the simulation code automatically. This thesis investigates techniques of using automatic translation of models in typical equation based languages, such as Modelica, into parallel simulation code that enable high utilization of available processors in a parallel computer. The two main ideas investigated here are the following: first, to apply parallelization simultaneously to both the system equations and the numerical solver, and secondly. to use software pipelining to further reduce the time processors are kept waiting for the results of other processors. Prototype implementations of the investigated techniques have been developed as a part of the OpenModelica open source compiler for Modelica. The prototype has been used to evaluate the parallelization techniques by measuring the execution time of test models on a few parallel archtectures and to compare the results to sequential code as well as to the results achieved in earlier work. A measured speedup of 6.1 on eight processors on a shared memory machine has been reached. It still remains to evaluate the methods for a wider range of test models and parallel architectures. Equation-Based languages automatic parallelization Modelica simulation Ekvationsbaserade språk automatisk parallellisering Modelica simulering Computer Sciences Datavetenskap (datalogi)
136	Analyzing OpenMP Parallelization Capabilities and Finding Thread Handling Optimums Olofsson, Simon, Olsson, Emrik January 2018 (has links) Utmaningar i modern processortillverkning begränsar klockfrekvensen för enkeltrådiga applikationer, vilket har resulterat i utvecklingen av flerkärniga processorer. Dessa processorer tillåter flertrådig exekvering och ökar därmed prestandan. För att undersöka möjligheterna med parallell exekvering används en Fast Fourier Transform algoritm där trådprestanda mäts för olika skapade tester med varierande problemstorlekar. Dessa tester körs på tre testsystem och använder olika sökalgoritmer för att dynamiskt justera antalet trådar vid exekvering. Denna prestanda jämförs sedan med den högsta möjliga prestanda som kan fås genom Brute-Forcing. Testerna använder OpenMP-instruktioner för att specificera antalet trådar som finns tillgängliga för programexekvering. För mindre problemstorlekar resulterar färre antal trådar i högre prestanda. Motsatsen gäller för större problemstorlekar, där många trådar föredras istället. Denna rapport visar att användning av alla tillgängliga trådar för ett system inte är optimalt i alla lägen då det finns en tydlig koppling mellan problemstorlek och det optimala antalet trådar för maximal prestanda. Detta gäller för alla tre testsystem som omfattas av rapporten. Metodiken som har använts för att skapa testerna har gjort det möjligt att dynamiskt kunna justera antalet trådar vid exekvering. Rapporten visar också att dynamisk justering av antalet trådar inte passar för alla typer av applikationer. / As physical limitations limit the clock frequencies available for a single thread, processor vendors increasingly build multi-core systems with support for dividing processes across multiple threads for increased overall processing power. To examine parallelization capabilities, a fast fourier transform algorithm is used to benchmark parallel execution and compare brute-forced optimum with results from various search algorithms and scenarios across three different testbed systems. These algorithms use OpenMP instructions to directly specify number of threads available for program execution. For smaller problem sizes the tests heavily favour fewer threads, whereas the larger problems favour the native 'maximum' thread count. Several algorithms were used to compare ways of searching for the optimum thread values at runtime. We showed that running at maximum threads is not always the most optimum choice as there is a clear relationship between the problem size and the optimal thread-count in the experimental setup across all three machines. The methods used also made it possible to identify a way to dynamically adjust the thread-count during runtime of the benchmark, however it is not certain all applications would be suitable for this type of dynamic thread assignment OpenMP Parallelization Capabilities Dynamic Thread Handling Performance OpenMP Parallell exekvering Dynamisk trådhantering Prestanda Optimum Computer Systems Datorsystem
137	Análise dos caminhos de execução de programas para a paralelização automática de códigos binários para a plataforma Intel x86 / Analysis of the execution paths of programs to perform automatic parallelization of binary codes on the platform Intel x86 Eberle, André Mantini 06 October 2015 (has links) Aplicações têm tradicionalmente utilizado o paradigma de programação sequencial. Com a recente expansão da computação paralela, em particular os processadores multinúcleo e ambientes distribuídos, esse paradigma tornou-se um obstáculo para a utilização dos recursos disponíveis nesses sistemas, uma vez que a maior parte das aplicações tornam-se restrita à execução sobre um único núcleo de processamento. Nesse sentido, este trabalho de mestrado introduz uma abordagem para paralelizar programas sequenciais de forma automática e transparente, diretamente sobre o código-binário, de forma a melhor utilizar os recursos disponíveis em computadores multinúcleo. A abordagem consiste na desmontagem (disassembly) de aplicações Intel x86 e sua posterior tradução para uma linguagem intermediária. Em seguida, são produzidos grafos de fluxo e dependências, os quais são utilizados como base para o particionamento das aplicações em unidades paralelas. Por fim, a aplicação é remontada (assembly) e traduzida novamente para a arquitetura original. Essa abordagem permite a paralelização de aplicações sem a necessidade de esforço suplementar por parte de desenvolvedores e usuários. / Traditionally, computer programs have been developed using the sequential programming paradigm. With the advent of parallel computing systems, such as multi-core processors and distributed environments, the sequential paradigm became a barrier to the utilization of the available resources, since the program is restricted to a single processing unit. To address this issue, we introduce a transparent automatic parallelization methodology using a binary rewriter. The steps involved in our approach are: the disassembly of an Intel x86 application, transforming it into an intermediary language; analysis of this intermediary code to obtain flow and dependency graphs; partitioning of the application into parallel units, using the obtained graphs and posterior reassembly of the application, writing it back to the original Intel x86 architecture. By transforming the compiled application software, we aim at obtaining a program which can explore the parallel resources, with no extra effort required either from users or developers. Análise de dependências Automatic parallelization Binary code Binary rewriter Código-binário Dependency analysis Intel x86 platform Paralelização automática Plataforma Intel x86
138	Redução de perdas em sistemas de distribuição por reconfiguração de redes utilizando aceleradores de hardware / Reduction of losses in distribution systems by network reconfiguration using hardware accelerators Gois, Marcilyanne Moreira 23 March 2017 (has links) A reconfiguração de redes é uma técnica utilizada para alterar topologias de redes por meio da mudança dos estados das chaves normalmente aberta e normalmente fechada. Essa técnica é muito utilizada para tratar problemas relacionados ao excesso de perdas ôhmicas em uma rede elétrica. Tais perdas representam um custo considerável no faturamento das empresas distribuidoras. O problema de redução de perdas via reconfiguração de redes pode ser modelado como um problema de otimização combinatória, em que se deve determinar a combinação de estados de chaves que correspondem a configuração radial da rede com menor nível de perdas. De modo a lidar com esse problema por reconfiguração da redes, diversas técnicas computacionais têm sido propostas. Dentre essas técnicas, estruturas de dados eficientes, como a Representação nó-profundidade (RNP), viabilizam a modelagem radial dos sistemas de distribuição (SDs) e o uso combinado com métodos de otimização possibilitam uma redução do espaço de busca de soluções consequentemente pode-se obter melhores soluções. Para otimizar a capacidade de processamento, este trabalho propõe tratar o problema de redução de perdas em SDs via reconfiguração de redes em aceleradores de hardware utilizando da arquitetura de hardware paralelizada em FPGA baseada na RNP (HP-RNP) proposta em (GOIS, 2011). Assim, um problema combinatório é tratado em aceleradoras de hardware reduzindo significativamente o custo computacional devido ao alto grau de paralelismo no processo de busca por soluções. Nesse sentido, foi proposto neste trabalho a extensão da HP-RNP, a partir de modificações no barramento de comunicação da arquitetura original para o envio e recebimentos dos dados que representam os SDs de forma mais eficiente. Além disso, o problema de redução de perdas por reconfiguração de redes foi mapeado em um problema de floresta geradora mínima com restrição de grau (dc-MSFP), a partir de uma aproximação que faz uso de uma heurística de pesos, em que informações relacionadas com grandezas elétricas e características topológicas da rede são transformadas em pesos. A partir da extensão da HP-RNP e do mapeamento do problema em um dc-MSFP, foi possível obter soluções de qualidade (próximas da ótima) em tempo significativamente reduzido quando comparado às outras abordagens. / Network reconfiguration is a technique used to change network topologies by changing the normally open and normally closed switches states. This technique is widely used to problems related to the excess of ohmic losses in distribution companies. Such losses represent a considerable cost in the distribution companies. The problem of network reconfiguration can be modeled as a combinatorial optimization problem, in which the combination of switches states that represent the configuration of the network with the lowest level of losses must be determined. To deal with these problems by network reconfiguration, several computational techniques have been proposed. Among these techniques, efficient data structures, such as the Node-Depth Encoding (NDE), enable the radial modeling of the distribution systems and the combined use of the NDE with optimization methods allow the reduction of the search space of the solutions. In order to optimize the processing capacity, this work proposes to deal with the loss minimization problem in Distribution Systems (DSs) by network reconfiguration using the Hardware Parallelized NDE (HP-NDE) proposed in (GOIS, 2011) to accelerate the network reconfiguration. Thus, a combinatorial problem is addressed in hardware accelerators, reducing significantly the computational cost due to the high degree of the parallelism in the process of search of the solution search. In this context, it was proposed the extension of the HP-NDE, from modifications in the communication bus of the original HP-NDE to send and receive more efficiently the data that represent the DSs. Moreover, the problem of loss reduction was mapped in a minimum spanning forest problem with degree constraint (dc-MSFP), by using an approximation that use a weights heuristic based on the information of the electrical magnitudes and topological characteristics of the network. From the extension of the HP-RNP and the mapping of the problem in a dc-MSFP, it was possible to obtain solutions of the good quality (close to optimal) in a time significantly reduced when compared to the other approaches. Distribution systems FPGA FPGA Heurística de pesos Loss reduction Network reconfiguration Paralelização Parallelization Reconfiguração de redes Redução de perdas Sistemas de distribuição Weight heuristic
139	Redução de perdas em sistemas de distribuição por reconfiguração de redes utilizando aceleradores de hardware / Reduction of losses in distribution systems by network reconfiguration using hardware accelerators Marcilyanne Moreira Gois 23 March 2017 (has links) A reconfiguração de redes é uma técnica utilizada para alterar topologias de redes por meio da mudança dos estados das chaves normalmente aberta e normalmente fechada. Essa técnica é muito utilizada para tratar problemas relacionados ao excesso de perdas ôhmicas em uma rede elétrica. Tais perdas representam um custo considerável no faturamento das empresas distribuidoras. O problema de redução de perdas via reconfiguração de redes pode ser modelado como um problema de otimização combinatória, em que se deve determinar a combinação de estados de chaves que correspondem a configuração radial da rede com menor nível de perdas. De modo a lidar com esse problema por reconfiguração da redes, diversas técnicas computacionais têm sido propostas. Dentre essas técnicas, estruturas de dados eficientes, como a Representação nó-profundidade (RNP), viabilizam a modelagem radial dos sistemas de distribuição (SDs) e o uso combinado com métodos de otimização possibilitam uma redução do espaço de busca de soluções consequentemente pode-se obter melhores soluções. Para otimizar a capacidade de processamento, este trabalho propõe tratar o problema de redução de perdas em SDs via reconfiguração de redes em aceleradores de hardware utilizando da arquitetura de hardware paralelizada em FPGA baseada na RNP (HP-RNP) proposta em (GOIS, 2011). Assim, um problema combinatório é tratado em aceleradoras de hardware reduzindo significativamente o custo computacional devido ao alto grau de paralelismo no processo de busca por soluções. Nesse sentido, foi proposto neste trabalho a extensão da HP-RNP, a partir de modificações no barramento de comunicação da arquitetura original para o envio e recebimentos dos dados que representam os SDs de forma mais eficiente. Além disso, o problema de redução de perdas por reconfiguração de redes foi mapeado em um problema de floresta geradora mínima com restrição de grau (dc-MSFP), a partir de uma aproximação que faz uso de uma heurística de pesos, em que informações relacionadas com grandezas elétricas e características topológicas da rede são transformadas em pesos. A partir da extensão da HP-RNP e do mapeamento do problema em um dc-MSFP, foi possível obter soluções de qualidade (próximas da ótima) em tempo significativamente reduzido quando comparado às outras abordagens. / Network reconfiguration is a technique used to change network topologies by changing the normally open and normally closed switches states. This technique is widely used to problems related to the excess of ohmic losses in distribution companies. Such losses represent a considerable cost in the distribution companies. The problem of network reconfiguration can be modeled as a combinatorial optimization problem, in which the combination of switches states that represent the configuration of the network with the lowest level of losses must be determined. To deal with these problems by network reconfiguration, several computational techniques have been proposed. Among these techniques, efficient data structures, such as the Node-Depth Encoding (NDE), enable the radial modeling of the distribution systems and the combined use of the NDE with optimization methods allow the reduction of the search space of the solutions. In order to optimize the processing capacity, this work proposes to deal with the loss minimization problem in Distribution Systems (DSs) by network reconfiguration using the Hardware Parallelized NDE (HP-NDE) proposed in (GOIS, 2011) to accelerate the network reconfiguration. Thus, a combinatorial problem is addressed in hardware accelerators, reducing significantly the computational cost due to the high degree of the parallelism in the process of search of the solution search. In this context, it was proposed the extension of the HP-NDE, from modifications in the communication bus of the original HP-NDE to send and receive more efficiently the data that represent the DSs. Moreover, the problem of loss reduction was mapped in a minimum spanning forest problem with degree constraint (dc-MSFP), by using an approximation that use a weights heuristic based on the information of the electrical magnitudes and topological characteristics of the network. From the extension of the HP-RNP and the mapping of the problem in a dc-MSFP, it was possible to obtain solutions of the good quality (close to optimal) in a time significantly reduced when compared to the other approaches. FPGA Heurística de pesos Paralelização Reconfiguração de redes Redução de perdas Sistemas de distribuição Distribution systems FPGA Loss reduction Network reconfiguration Parallelization Weight heuristic
140	Análise dos caminhos de execução de programas para a paralelização automática de códigos binários para a plataforma Intel x86 / Analysis of the execution paths of programs to perform automatic parallelization of binary codes on the platform Intel x86 André Mantini Eberle 06 October 2015 (has links) Aplicações têm tradicionalmente utilizado o paradigma de programação sequencial. Com a recente expansão da computação paralela, em particular os processadores multinúcleo e ambientes distribuídos, esse paradigma tornou-se um obstáculo para a utilização dos recursos disponíveis nesses sistemas, uma vez que a maior parte das aplicações tornam-se restrita à execução sobre um único núcleo de processamento. Nesse sentido, este trabalho de mestrado introduz uma abordagem para paralelizar programas sequenciais de forma automática e transparente, diretamente sobre o código-binário, de forma a melhor utilizar os recursos disponíveis em computadores multinúcleo. A abordagem consiste na desmontagem (disassembly) de aplicações Intel x86 e sua posterior tradução para uma linguagem intermediária. Em seguida, são produzidos grafos de fluxo e dependências, os quais são utilizados como base para o particionamento das aplicações em unidades paralelas. Por fim, a aplicação é remontada (assembly) e traduzida novamente para a arquitetura original. Essa abordagem permite a paralelização de aplicações sem a necessidade de esforço suplementar por parte de desenvolvedores e usuários. / Traditionally, computer programs have been developed using the sequential programming paradigm. With the advent of parallel computing systems, such as multi-core processors and distributed environments, the sequential paradigm became a barrier to the utilization of the available resources, since the program is restricted to a single processing unit. To address this issue, we introduce a transparent automatic parallelization methodology using a binary rewriter. The steps involved in our approach are: the disassembly of an Intel x86 application, transforming it into an intermediary language; analysis of this intermediary code to obtain flow and dependency graphs; partitioning of the application into parallel units, using the obtained graphs and posterior reassembly of the application, writing it back to the original Intel x86 architecture. By transforming the compiled application software, we aim at obtaining a program which can explore the parallel resources, with no extra effort required either from users or developers. Análise de dependências Código-binário Paralelização automática Plataforma Intel x86 Automatic parallelization Binary code Binary rewriter Dependency analysis Intel x86 platform

Search results