Global ETD Search

31	Designing a Prototype of Communication Protocol for FlexRAM Architecture Liu, Hsien-Ming 26 July 2001 (has links) In recent years, many researchers investigated a new class of computer architecture, called intelligent memory (IRAM), to reduce the performance gap between the CPU and memory. In order to increase the flexibility of designing parallel applications, we develop the communication mechanisms for FlexRAM, one of the IRAM architectures. The communication protocol is called CPFR (Communication Protocol for FlexRAM). For the reason of lacking whole communication mechanism in the original FlexRAM architecture, we construct CPFR by using the features of shared-memory and the notification primitive centrally controlled by the main memory processor (P.Mem). In addition, we provide a uniform programming interface in CPFR for the programmers. An example is also used to demonstrate the usage of our communication protocol in detail. programming interface FlexRAM CPFR Communication Protocol MPI
32	developing a VIA-RPI for LAM Engler, Ralph, Wenzel, Tobias 01 March 2004 (has links) (PDF) Development of an RPIs (Request Progression Interface = communication device) that uses VIA (virtual Interface Architecture) instead of TCP on ethernet networks. / Entwicklung eines RPIs (Request Progression Interface = Kommunikations Modul) das auf ethernet Netzwerken VIA (virtual Interface Architecture) an Stelle von TCP benutzt. LAM RPI VIA ddc:004 MPI
33	Daugiakriterinių uždavinių lygiagretaus sprendimo strategijų tyrimas / Interactive multicritarial optimization tasks decision by parallel computers Sosunova, Olga 11 June 2004 (has links) The purpose of this work was to analize parallel algorithmization principles and to present interactive multicritarial optimization tasks decision by parallel computers, MS Visual ++6.0 programming language and data transferring package MPICH. With the help of received knowledge new strategy to solve multicritarial tasks was created using computer network, applied to several users. Informatics Lygiagretieji algoritmai Lygiagretieji skaičiavimai MPI
34	FLENS - A Flexible Library for Efficient Numerical Solutions Lehn, Michael Christian, January 2008 (has links) Ulm, Univ., Diss., 2008.
35	A HIGH PERFORMANCE GIBBS-SAMPLING ALGORITHM FOR ITEM RESPONSE THEORY MODELS Patsias, Kyriakos 01 January 2009 (has links) Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have modified the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine. With this parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was reduced considerably. Gibbs sampling IRT MPI parallel computing
36	Um cluster híbrido com módulos de co – processamento em hardware (FPGAS) para processamento de alto desempenho BARROS JÚNIOR, Severino José de 10 September 2014 (has links) Submitted by Luiz Felipe Barbosa (luiz.fbabreu2@ufpe.br) on 2015-03-10T19:00:58Z No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-10T19:42:57Z (GMT) No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-10T19:42:57Z (GMT). No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014-09-10 / FINEP/Petrobrás(CENPES) / Organizações que lidam com sistemas computacionais buscam cada vez mais melhorar o desempenho de suas aplicações. Essas aplicações possuem como principal característica o processamento massivo de dados. A solução utilizada para execução desses problemas é baseada, em geral, em arquiteturas de processadores de uso geral, cuja principal característica é sua estrutura de hardware baseada no Paradigma de Von Neumann. Esse paradigma possui uma deficiência conhecida como “Gargalo de Von Neumann”, onde instruções que poderiam ser executadas de forma simultânea, devido à sua independência de dados, acabam sendo processadas sequencialmente, prejudicando o potencial desempenho dessa classe de aplicações. Para aumentar o processamento paralelo dos sistemas, as Organizações costumam adotar uma estrutura baseada na associação de vários PCs, conectados a uma rede de alta velocidade e trabalham em conjunto para resolver um grande problema. A essa associação é atribuída o nome de cluster, a qual cada integrante PC, chamado de nó, realiza uma parte da computação de um grande problema de forma simultânea, proporcionando a ideia de um paralelismo explícito da aplicação como um todo. Mesmo com um aumento significativo de elementos de processamento independentes, este crescimento é insuficiente para atender à enorme quantidade de demanda de computação de dados em aplicações complexas. Ela exige uma divisão de grupos de instruções independentes, distribuídos entre os nós. Esta estratégia dá a idéia de paralelismo e assim um melhor desempenho. No entanto, o desempenho em cada nó permanece degradado, devido ao estrangulamento seqüencial presente nós processadores. A fim de aumentar o paralelismo das operações em cada nó, soluções híbridas, compostas por CPUs convencionais e coprocessadores foram adotadas. Um desses coprocessadores é o FPGA (Field Programmable Gate Array), que geralmente é conectado ao PC através do barramento PCIe. O projeto descrito na dissertação propõe uma metodologia de desenvolvimento para este aglomerado híbrido, de modo a aumentar o desempenho de aplicações científicas que requerem uma grande quantidade de processamento de dados. A metodologia é apresentada e dois exemplos são discutidos em detalhes. HPC Cluster Híbrido FPGA OpenMP MPI
37	Instalace a konfigurace Octave výpočetního clusteru / Installation and configuration of Octave computation cluster Vitner, Petr January 2014 (has links) This paper explores the possibilities and tools for creating High-Performace Computing cluster. It contains a project for his creation and a detailed description of the setup and configuration in a virtual environment.
38	Designing High-Performance Remote Memory Access for MPI and PGAS Models with Modern Networking Technologies on Heterogeneous Clusters Li, Mingzhe January 2017 (has links) No description available. Computer Science MPI RMA HPC OpenSHMEM InfiniBand
39	Improving the Parallel Performance of Boltzman-Transport Equation for Heat Transfer Maddipati, Sai Ratna Kiran 28 September 2016 (has links) No description available. Computer Science Parallel Computing, OpenMP, MPI
40	Design and Implementation of a Multi-Block Parallel Algorithm for Solving Navier-Stokes Equations on Structured Grids Mittadar, Nirmal Tatavalli 03 August 2002 (has links) A coarse-grain parallel multi-block algorithm was designed for CHEQNS - a multi-block solver for solving chemically reacting flows in local chemical equilibrium and has been implemented using the Message Passing Interface (MPI). The parallel implementation confirms to the Single Program Multiple Data (SPMD) model. The parallel implementation uses synchronous update of fluxes across the block-block boundaries. The solution algorithm consists of block-decoupled Gauss-Seidel iterations. The coupling between the sub-domains on different processors occurs at the Newton iteration level. The parallel implementation is general and can accept an arbitrary arrangement of blocks in multi-block configuration with multiple blocks per processor. The parallel implementation has been verified against the results from the sequential multi-block solver for different types of flows. The parallel performance has been studied in terms of speed-up and efficiency. The influence of parallelization on the convergence was also studied. Synchronous MPI Coarse Grain Block-Block Communication

Search results