Global ETD Search

51	Paralelização de programas sisal para sistemas MPI / Parallelization of sisal programs for MPI systems Nakashima, Raul Junji 15 March 1996 (has links) Este trabalho teve como finalidade a implementação de um método para a paralelização parcial de programas, escritos na linguagem funcional, SISAL utilizando as bibliotecas do padrão MPI (Message Passing Interface). Para tal, propusemos a transformação dos programas SISAL através do particionamento do loop paralelo forall, através do método de particionamento slice e a utilização do modelo de implementação do paralelismo SPMD (Single Program Multiple Data) no estilo de programas mestre/escravo. A validação de nossa proposta foi obtida através da realização de testes onde foram comparados os resultados obtidos com os programas originais e os programas com as alterações propostas / This work describes a method for the partial parallelization of SISAL programs into programs with calls to MPI routines. We focused on the parallelization of the forall loop (through slicing of the index range). The generated code is a master/slave SPMD program. The work was validated through the compilation of some simple SISAL programs and comparison of the results with an unmodified version Compiladores Compilers Functional languages Linguagens funcionais MPI MPI Paralelismo Parallelism
52	Avaliação dos tempos de teste de produtos eletrônicos Ferraz, Leandro Alves 08 February 2010 (has links) Made available in DSpace on 2015-04-22T22:10:46Z (GMT). No. of bitstreams: 1 leand.pdf: 1294573 bytes, checksum: d10eb21cbecb9685945b11331d660860 (MD5) Previous issue date: 2010-02-08 / The production line concept introduction by Henry Ford, brought a great enhancement in the resource usage efficiency and transformation during the industrial process. Afterwards Ohno (1988) presented that during the industrial process all the waste should be removed from the process. Ohno also explains that waste is any activity that do not aggregate value in the final product. As a production test do not aggregate value, it simply ensure the product reliability, this paper has the purpose to develop a methodology to evaluate, identify and reduce the longest tests time of an electronic test sequence without decrease the product quality. Based on the instrument driver s utilization is possible to identify opportunities points to be executed concurrently (parallelism). The execution of two or more tests in parallel make possible the time reduction of the whole product test. One baseline of times and results were created to act as a benchmark and based on statistical process control indexes such as result mean and standard deviation and also the test acceptance limits cpk index, it will be possible to confirm the methodology efficacy and efficiency. Besides the methodology, the paper also refers the analysis and automation tools that were created during the time optimization project. / A introdução do conceito de linha de montagem por Henry Ford, trouxe um grande avanço à eficiência do uso dos recursos e transformações durante o processo fabril. Mais tarde com Ohno (1988), mostrou-se que durante o processo fabril, todo o desperdício deve ser removido do processo. Ohno ainda explica que desperdício é toda atividade que não agrega valor ao produto final. Como o teste de produção não agrega valor ao produto final, apenas garante a confiabilidade do mesmo, o presente trabalho tem como objetivo desenvolver uma metodologia para identificar e reduzir os tempos de testes mais demorados de uma sequência de testes de produtos eletrônicos sem que seja diminuída a qualidade dos produtos. Baseado na análise da utilização dos instrumentos dos testes é possível a identificação dos pontos candidatos a serem executados concorrentemente (paralelismo). A execução de dois ou mais testes em paralelo nas sequências torna possível a redução do tempo total de testes do produto. Um histórico dos tempos e resultados foi catalogado para através da utilização de índices estatísticos como a média e o desvio padrão dos resultados, juntamente com o índice Cpk dos limites dos testes, poder comprovar não só a eficiência, mas também a eficácia da metodologia. Além da metodologia, o trabalho também cita as ferramentas de análise e automação de redução de tempos de testes que foram desenvolvidas ao longo do projeto. A metodologia se mostrou eficaz e não apenas os tempos de testes foram reduzidos, mas, também, os resultados se mostraram igualmente confiáveis. Paralelismo Parallelism Instrument drivers utilization ENGENHARIAS: ENGENHARIA DE PRODUÇÃO
53	Parallelism with limited nondeterminism Finkelstein, Jeffrey 05 March 2017 (has links) Computational complexity theory studies which computational problems can be solved with limited access to resources. The past fifty years have seen a focus on the relationship between intractable problems and efficient algorithms. However, the relationship between inherently sequential problems and highly parallel algorithms has not been as well studied. Are there efficient but inherently sequential problems that admit some relaxed form of highly parallel algorithm? In this dissertation, we develop the theory of structural complexity around this relationship for three common types of computational problems. Specifically, we show tradeoffs between time, nondeterminism, and parallelizability. By clearly defining the notions and complexity classes that capture our intuition for parallelizable and sequential problems, we create a comprehensive framework for rigorously proving parallelizability and non-parallelizability of computational problems. This framework provides the means to prove whether otherwise tractable problems can be effectively parallelized, a need highlighted by the current growth of multiprocessor systems. The views adopted by this dissertation—alternate approaches to solving sequential problems using approximation, limited nondeterminism, and parameterization—can be applied practically throughout computer science. Computer science Computational complexity Limited nondeterminism Nondeterminism Parallelism Parameterized complexity
54	Design Issues in Parallel Architecture for Artificial Intelligence Hewitt, Carl, Lieberman, Henry 01 November 1983 (has links) Development of highly intelligent computers requires a conceptual foundation that will overcome the limitations of the von Neumann architecture. Architectures for such a foundation should meet the following design goals: * Address the fundamental organizational issues of large-scale parallelism and sharing in a fully integrated way. This means attention to organizational principles, as well as hardware and software. * Serve as an experimental apparatus for testing large-scale artificial intelligence systems. * Explore the feasibility of an architecture based on abstractions, which serve as natural computational primitives for parallel processing. Such abstractions should be logically independent of their software and hardware host implementations. In this paper we lay out some of the fundamental design issues in parallel architectures for Artificial Intelligence, delineate limitations of previous parallel architectures, and outline a new approach that we are pursuing. architecture parallelism actors Act2 artificial intelligence sApiary message passing reasoning
55	A high performance pseudo-multi-core elliptic curve cryptographic processor over GF(2^163) Zhang, Yu 22 June 2010 Elliptic curve cryptosystem is one type of public-key system, and it can guarantee the same security level with Rivest, Shamir and Adleman (RSA) with a smaller key size. Therefore, the key of elliptic curve cryptography (ECC) can be more compact, and it brings many advantages such as circuit area, memory requirement, power consumption, performance and bandwidth. However, compared to private key system, like Advanced Encryption Standard (AES), ECC is still much more complicated and computationally intensive. In some real applications, people usually combine private-key system with public-key system to achieve high performance. The ultimate goal of this research is to architect a high performance ECC processor for high performance applications such as network server and cellular sites.<p> In this thesis, a high performance processor for ECC over Galois field (GF)(2^163) by using polynomial presentation is proposed for high-performance applications. It has three finite field (FF) reduced instruction set computer (RISC) cores and a main controller to achieve instruction-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication (PM) can be well suited on this platform. Instructions for combined FF operation are proposed to decrease clock cycles in the instruction set. The interconnection among three FF cores and the main controller is obtained by analyzing the data dependency in the parallelized algorithm. Five-stage pipeline is employed in this architecture. Finally, the u-code executed on these three FF cores is manually optimized to save clock cycles. The proposed design can reach 185 MHz with 20; 807 slices when implemented on Xilinx XC4VLX80 FPGA device and 263 MHz with 217,904 gates when synthesized with TSMC .18um CMOS technology. The implementation of the proposed architecture can complete one ECC PM in 1428 cycles, and is 1.3 times faster than the current fastest implementation over GF(2^163) reported in literature while consumes only 14:6% less area on the same FPGA device. polynomial basis elliptic curve cryptograhpy instruction-level parallelism
56	A high performance pseudo-multi-core elliptic curve cryptographic processor over GF(2^163) Zhang, Yu 22 June 2010 (has links) Elliptic curve cryptosystem is one type of public-key system, and it can guarantee the same security level with Rivest, Shamir and Adleman (RSA) with a smaller key size. Therefore, the key of elliptic curve cryptography (ECC) can be more compact, and it brings many advantages such as circuit area, memory requirement, power consumption, performance and bandwidth. However, compared to private key system, like Advanced Encryption Standard (AES), ECC is still much more complicated and computationally intensive. In some real applications, people usually combine private-key system with public-key system to achieve high performance. The ultimate goal of this research is to architect a high performance ECC processor for high performance applications such as network server and cellular sites.<p> In this thesis, a high performance processor for ECC over Galois field (GF)(2^163) by using polynomial presentation is proposed for high-performance applications. It has three finite field (FF) reduced instruction set computer (RISC) cores and a main controller to achieve instruction-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication (PM) can be well suited on this platform. Instructions for combined FF operation are proposed to decrease clock cycles in the instruction set. The interconnection among three FF cores and the main controller is obtained by analyzing the data dependency in the parallelized algorithm. Five-stage pipeline is employed in this architecture. Finally, the u-code executed on these three FF cores is manually optimized to save clock cycles. The proposed design can reach 185 MHz with 20; 807 slices when implemented on Xilinx XC4VLX80 FPGA device and 263 MHz with 217,904 gates when synthesized with TSMC .18um CMOS technology. The implementation of the proposed architecture can complete one ECC PM in 1428 cycles, and is 1.3 times faster than the current fastest implementation over GF(2^163) reported in literature while consumes only 14:6% less area on the same FPGA device. polynomial basis elliptic curve cryptograhpy instruction-level parallelism
57	Parallelism and Epistasis in the de novo Evolution of Cooperation between Two Species Douglas, Sarah Michael 06 June 2014 (has links) Resolving the genetic and mechanistic bases of complex biological behaviors remains a central challenge in the post-genomic era. Among these is the emergence of interspecies cooperation, a feature common across levels of biological organization. Of the numerous examples afforded by nature, microbes arguably provide the greatest ability to connect underlying genotypes to cooperative phenotypes. Biology Molecular biology Bacterial Cooperation Epistasis Methionine Parallelism Evolution Development
58	Software-centric and interaction-oriented system-on-chip verification. Xu, Xiao Xi January 2009 (has links) As the complexity of very-large-scale-integrated-circuits (VLSI) soars, the complexity of verifying them increases even faster. Design verification becomes the biggest bottleneck in VLSI design, consuming around 70% of the effort and time in a typical design cycle. The problem is even more severe as the system-on-chip (SoC) design paradigm is gaining popularity. Unfortunately, the development in verification techniques has not kept up with the growth of the design capability, and is being left further behind in the SoC era. In recent years, a new generation of hardware-modelling-languages alongside the best practices to use them have emerged and evolved in an attempt to productively build an intelligent stimulationobservation environment referred to as the test-bench. Ironically, as test-benches are becoming more powerful and sophisticated under these best practices known as verification methodologies, the overall verification approaches today are still officially described as ad hoc and experimental and are in great need of a methodological breakthrough. Our research was carried out to seek the desirable methodological breakthrough, and this thesis presents the research outcome: a novel and holistic methodology that brings an opportunity to address the SoC verification problems. Furthermore, our methodology is a solution completely independent of the underlying simulation technologies; therefore, it could extend its applicability into future VLSI designs. Our methodology presents two ideas. (a) We propose that system-level verification should resort to the SoC-native languages rather than the test-bench construction languages; the software native to the SoC should take more critical responsibilities than the test-benches. (b) We challenge the fundamental assumption that “objects-under-test” and “tests” are distinct entities; instead, they should be understood as one type of entities – the interactions; interactions, together with the interference between interactions, i.e., the parallelism and resource-competitions, should be treated as the focus in system-level verification. The above two ideas, namely, software-centric verification and interaction-oriented verification have yielded practical techniques. This thesis elaborates on these techniques, including the transfer-resource-graph based test-generation method targeting the parallelism, the coverage measures of the concurrency completeness using Petri-nets, the automation of the test-programs which can execute smartly in an event-driven manner, and a software observation mechanism that gives insights into the system-level behaviours. / http://proxy.library.adelaide.edu.au/login?url= http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1363926 / Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 2009
59	Exploração de paralelismo no roteamento global de circuitos VLSI / Parallel computing exploitation applied for VLSI global routing Tumelero, Diego January 2015 (has links) Com o crescente aumento das funcionalidades dos circuitos integrados, existe um aumento consequente da complexidade do projeto dos mesmos. O fluxo de projeto de circuitos integrados inclui em um de seus passos o roteamento, que consiste em criar fios que interconectam as células do circuito. Devido à complexidade, o roteamento é dividido em global e detalhado. O roteamento global de circuitos VLSI é uma das tarefas mais complexas do fluxo de síntese física, sendo classificado como um problema NP-completo. Neste trabalho, além de realizar um levantamento de trabalhos que utilizam as principais técnicas de paralelismo com o objetivo de acelerar o processamento do roteamento global, foram realizadas análises nos arquivos de benchmark do ISPD 2007/08. Com base nestas análises foi proposto um método que agrupa as redes para então verificar a existência de dependência de dados em cada grupo. Esta verificação de dependência de dados, que chamamos neste trabalho de colisor, tem por objetivo, criar fluxos de redes independentes umas das outras para o processamento em paralelo, ou seja, ajudar a implementação do roteamento independente de redes. Os resultados demonstram que esta separação em grupos, aliada com a comparação concorrente dos grupos, podem reduzir em 67x o tempo de execução do colisor de redes se comparada com a versão sequencial e sem a utilização de grupos. Também foi obtido um ganho de 10x ao comparar a versão com agrupamentos sequencial com a versão paralela. / With the increasing of the functionality of integrated circuits, there is a consequent increase in the complexity of the design. The IC design flow includes the routing in one of its steps, which is to create wires that interconnect the circuit cells. Because of the complexity, routing is divided into global and detailed. The global routing of VLSI circuits is one of the most complex tasks in the flow of physical synthesis and it's classified as an NP-complete problem. In this work, a parallel computing techniques survey was applied to the VLSI global routing in order to accelerate the global routing processing analyzes. This analyzes was performed on the ISPD 2007/08 benchmark files. We proposed a method that groups the networks and then check for data dependence in each group based on these analyzes. This data dependency checking, we call this checking of collider, aims to create flow nets independent of each other for processing in parallel, or help implement the independent routing networks. The results demonstrate that this separation into groups, together with the competitor comparison of groups, can reduce 67x in the collider networks runtime compared with the sequential release and without the use of groups. It was also obtained a gain of 10x when comparing the version with sequential clusters with the parallel version. Microeletrônica Paralelismo Vlsi Global routing Parallelism EDA VLSI Microelectronics
60	Exploração de paralelismo no roteamento global de circuitos VLSI / Parallel computing exploitation applied for VLSI global routing Tumelero, Diego January 2015 (has links) Com o crescente aumento das funcionalidades dos circuitos integrados, existe um aumento consequente da complexidade do projeto dos mesmos. O fluxo de projeto de circuitos integrados inclui em um de seus passos o roteamento, que consiste em criar fios que interconectam as células do circuito. Devido à complexidade, o roteamento é dividido em global e detalhado. O roteamento global de circuitos VLSI é uma das tarefas mais complexas do fluxo de síntese física, sendo classificado como um problema NP-completo. Neste trabalho, além de realizar um levantamento de trabalhos que utilizam as principais técnicas de paralelismo com o objetivo de acelerar o processamento do roteamento global, foram realizadas análises nos arquivos de benchmark do ISPD 2007/08. Com base nestas análises foi proposto um método que agrupa as redes para então verificar a existência de dependência de dados em cada grupo. Esta verificação de dependência de dados, que chamamos neste trabalho de colisor, tem por objetivo, criar fluxos de redes independentes umas das outras para o processamento em paralelo, ou seja, ajudar a implementação do roteamento independente de redes. Os resultados demonstram que esta separação em grupos, aliada com a comparação concorrente dos grupos, podem reduzir em 67x o tempo de execução do colisor de redes se comparada com a versão sequencial e sem a utilização de grupos. Também foi obtido um ganho de 10x ao comparar a versão com agrupamentos sequencial com a versão paralela. / With the increasing of the functionality of integrated circuits, there is a consequent increase in the complexity of the design. The IC design flow includes the routing in one of its steps, which is to create wires that interconnect the circuit cells. Because of the complexity, routing is divided into global and detailed. The global routing of VLSI circuits is one of the most complex tasks in the flow of physical synthesis and it's classified as an NP-complete problem. In this work, a parallel computing techniques survey was applied to the VLSI global routing in order to accelerate the global routing processing analyzes. This analyzes was performed on the ISPD 2007/08 benchmark files. We proposed a method that groups the networks and then check for data dependence in each group based on these analyzes. This data dependency checking, we call this checking of collider, aims to create flow nets independent of each other for processing in parallel, or help implement the independent routing networks. The results demonstrate that this separation into groups, together with the competitor comparison of groups, can reduce 67x in the collider networks runtime compared with the sequential release and without the use of groups. It was also obtained a gain of 10x when comparing the version with sequential clusters with the parallel version. Microeletrônica Paralelismo Vlsi Global routing Parallelism EDA VLSI Microelectronics

Search results