Spelling suggestions: "subject:"multicomputers"" "subject:"ulticomputers""
1 
Avaliação do algoritmo de "ray tracing" em multicomputadores. / Evaluation of the ray tracing algorithm in multicomputers.Eduardo Toledo Santos 29 June 1994 (has links)
A Computação Gráfica, área em franco desenvolvimento, têm caminhado em busca da geração, cada vez mais rápida, de imagens mais realísticas. Os algoritmos que permitem a síntese de imagens realísticas demandam alto poder computacional, fazendo com que a geração deste tipo de imagem, de forma rápida, requeira o uso de computadores paralelos. Hoje, a técnica que permite gerar as imagens mais realísticas é o "ray tracing" . Os multicomputadores, por sua vez, são a arquitetura de computadores paralelos mais promissora na busca do desempenho computacional necessário às aplicações modernas. Esta dissertação aborda o problema da implementação do algoritmo de "ray tracing" em multicomputadores. A paralelização desta técnica para uso em computadores paralelos de memória distribuída pode ser feita de muitas formas diferentes, sempre envolvendo um compromisso entre a velocidade de processamento e a memória utilizada. Neste trabalho conceituase este problema e introduzse ferramentas para a avaliação de soluções que levam em consideração a eficiência de processamento e a redundância no uso de memória. Também é apresentada uma nova taxonomia que, além de permitir a classificação de propostas para implementações de "ray tracing" paralelo, orienta a procura de novas soluções para este problema. O desempenho das soluções em cada classe desta taxonomia é avaliado qualitativamente. Por fim, são sugeridas novas alternativas de paralelização do algoritmo de "ray tracing" em multicomputadores. / Computer Graphics is headed today towards the synthesis of more realistic images, in less time. The algorithms used for realistic image synthesis demand high computer power, so that the synthesis of this kind of image, in short periods of time, requires the use of parallel computers. Nowadays, the technique that yields the most realistic images is ray tracing. On its turn, multicomputers are the most promising parallel architecture for reaching the performance needed in modern applications. This dissertation is on the problem of implementing the ray tracing algorithm on multicomputers. The parallelization of this technique on distributed memory parallel computers can take several forms, always involving a compromise between speed and memory. In this work, this problem is conceptualized and tools for evaluation of solutions that account for efficiency and redundancy, are introduced. It is also presented a new taxonomy that can be used for both the classification of parallel ray tracing proposals and for driving the search of new solutions to this problem. The performances of entries in each class of the taxonomy are qualitatively assessed. New alternatives for parallelizing the ray tracing algorithm on multicomputers, are suggested.

2 
Performance modelling and evaluation of virtual channels in multicomputer networks with bursty trafficMin, Geyong, OuldKhaoua, M. January 2004 (has links)
No

3 
A performance model for wormholeswitched interconnection networks under selfsimilar traffic.Min, Geyong, OuldKhaoua, M. January 2004 (has links)
No / Many recent studies have convincingly demonstrated that network traffic exhibits a noticeable selfsimilar nature which has a considerable impact on queuing performance. However, the networks used in current multicomputers have been primarily designed and analyzed under the assumption of the traditional Poisson arrival process, which is inherently unable to capture traffic selfsimilarity. Consequently, it is crucial to reexamine the performance properties of multicomputer networks in the context of more realistic traffic models before practical implementations show their potential faults. In an effort toward this end, this paper proposes the first analytical model for wormholeswitched kary ncubes in the presence of selfsimilar traffic. Simulation experiments demonstrate that the proposed model exhibits a good degree of accuracy for various system sizes and under different operating conditions. The analytical model is then used to investigate the implications of traffic selfsimilarity on network performance. This study reveals that the network suffers considerable performance degradation when subjected to selfsimilar traffic, stressing the great need for improving network performance to ensure efficient support for this type of traffic.

4 
Automatic data distribution for massively parallel processorsGarcía Almiñana, Jordi 16 April 1997 (has links)
Massively Parallel Processor systems provide the required computational power to solve most large scale High Performance Computing applications. Machines with physically distributed memory allow a costeffective way to achieve this performance, however, these systems are very diffcult to program and tune. In a distributedmemory organization each processor has direct access to its local memory, and indirect access to the remote memories of other processors. But the cost of accessing a local memory location can be more than one order of magnitude faster than accessing a remote memory location. In these systems, the choice of a good data distribution strategy can dramatically improve performance, although different parts of the data distribution problem have been proved to be NPcomplete.The selection of an optimal data placement depends on the program structure, the program's data sizes, the compiler capabilities, and some characteristics of the target machine. In addition, there is often a tradeoff between minimizing interprocessor data movement and load balancing on processors. Automatic data distribution tools can assist the programmer in the selection of a good data layout strategy. These use to be sourcetosource tools which annotate the original program with data distribution directives.Crucial aspects such as data movement, parallelism, and load balance have to be taken into consideration in a unified way to efficiently solve the data distribution problem.In this thesis a framework for automatic data distribution is presented, in the context of a parallelizing environment for massive parallel processor (MPP) systems. The applications considered for parallelization are usually regular problems, in which data structures are dense arrays. The data mapping strategy generated is optimal for a given problem size and target MPP architecture, according to our current cost and compilation model.A single data structure, named CommunicationParallelism Graph (CPG), that holds symbolic information related to data movement and parallelism inherent in the whole program, is the core of our approach. This data structure allows the estimation of the data movement and parallelism effects of any data distribution strategy supported by our model. Assuming that some program characteristics have been obtained by profiling and that some specific target machine features have been provided, the symbolic information included in the CPG can be replaced by constant values expressed in seconds representing data movement time overhead and saving time due to parallelization. The CPG is then used to model a minimal path problem which is solved by a general purpose linear 01 integer programming solver. Linear programming techniques guarantees that the solution provided is optimal, and it is highly effcient to solve this kind of problems.The data mapping capabilities provided by the tool includes alignment of the arrays, one or twodimensional distribution with BLOCK or CYCLIC fashion, a set of remapping actions to be performed between phases if profitable, plus the parallelization strategy associated. The effects of control flow statements between phases are taken into account in order to improve the accuracy of the model. The novelty of the approach resides in handling all stages of the data distribution problem, that traditionally have been treated in several independent phases, in a single step, and providing an optimal solution according to our model.

5 
Uma adaptação do MEF para análise em multicomputadores: aplicações em alguns modelos estruturais / Multicomputer finite element method analysis of usual structures modelsValério da Silva Almeida 24 March 1999 (has links)
Neste trabalho, apresentase uma adaptação dos procedimentos utilizados nos códigos computacionais seqüenciais advindos do MEF, para utilizálos em multicomputadores. Desenvolvese uma rotina para a montagem do sistema linear particionado entre os diversos processadores. Resolvese o sistema de equações lineares geradas mediante a rotina do PIM (Parallel Iterative Method). São feitas adaptações deste pacote para se aproveitar as características comuns do sistema linear gerado pelo MEF: esparsidade e simetria. A técnica de resolução do sistema em paralelo é otimizada com o uso de dois tipos de précondicionadores: a decomposição incompleta de Cholesky (IC) generalizado e o POLY(0) ou Jacobi. É feita uma aplicação para a solução de pavimento com o algoritmobase totalmente paralelizado. Também é avaliada a solução do sistema de equações de uma treliça. Mostramse resultados de speedup, de eficiência e de tempo para estes dois modelos estruturais. Além disso, é feito um estudo em processamento seqüencial da performance dos précondicionadores genéricos (IC) e do advindo de uma série truncada de Neumann, também generalizada, utilizandose modelos estruturais de placa e chapa. / This work presents an adaptation of conventional finite element method (FEM) computing procedures to multicomputers. It is presented the procedure which the linear system of equations is split among the processor and its solution. It was improved a public software called PIM (Parallel Iterative Method) is used to solve this system of equations. These improvements explore efficiently the usual features of the FEM systems of equations: sparseness and symmetry. To improve the solution of the system, two different preconditioners are used: a generic Incomplete Cholesky (IC) and the Polynomial preconditioning (POLY(0) or Jacobi). It is carried out a full adaptation of the method to parallel computing with a program developed to analyse floor structures. The improved PIM is also used to solve the system of equations of a tridimensional truss. It is presented the speedup, the efficiency and the time used in the resolution of the systems of equations for the floor and for the truss. It is also presented a study of performance in sequential processing of the generic (IC) and the generic Neumann series preconditioners in the analysis of plates in bending and in plane actions.

6 
Uma adaptação do MEF para análise em multicomputadores: aplicações em alguns modelos estruturais / Multicomputer finite element method analysis of usual structures modelsAlmeida, Valério da Silva 24 March 1999 (has links)
Neste trabalho, apresentase uma adaptação dos procedimentos utilizados nos códigos computacionais seqüenciais advindos do MEF, para utilizálos em multicomputadores. Desenvolvese uma rotina para a montagem do sistema linear particionado entre os diversos processadores. Resolvese o sistema de equações lineares geradas mediante a rotina do PIM (Parallel Iterative Method). São feitas adaptações deste pacote para se aproveitar as características comuns do sistema linear gerado pelo MEF: esparsidade e simetria. A técnica de resolução do sistema em paralelo é otimizada com o uso de dois tipos de précondicionadores: a decomposição incompleta de Cholesky (IC) generalizado e o POLY(0) ou Jacobi. É feita uma aplicação para a solução de pavimento com o algoritmobase totalmente paralelizado. Também é avaliada a solução do sistema de equações de uma treliça. Mostramse resultados de speedup, de eficiência e de tempo para estes dois modelos estruturais. Além disso, é feito um estudo em processamento seqüencial da performance dos précondicionadores genéricos (IC) e do advindo de uma série truncada de Neumann, também generalizada, utilizandose modelos estruturais de placa e chapa. / This work presents an adaptation of conventional finite element method (FEM) computing procedures to multicomputers. It is presented the procedure which the linear system of equations is split among the processor and its solution. It was improved a public software called PIM (Parallel Iterative Method) is used to solve this system of equations. These improvements explore efficiently the usual features of the FEM systems of equations: sparseness and symmetry. To improve the solution of the system, two different preconditioners are used: a generic Incomplete Cholesky (IC) and the Polynomial preconditioning (POLY(0) or Jacobi). It is carried out a full adaptation of the method to parallel computing with a program developed to analyse floor structures. The improved PIM is also used to solve the system of equations of a tridimensional truss. It is presented the speedup, the efficiency and the time used in the resolution of the systems of equations for the floor and for the truss. It is also presented a study of performance in sequential processing of the generic (IC) and the generic Neumann series preconditioners in the analysis of plates in bending and in plane actions.

7 
Hyperplane Partitioning : An Approach To Global Data Partitioning For Distributed Memory MachinesPrakash, S R 07 1900 (has links)
Automatic Global Data Partitioning for Distributed Memory Machines (DMMs)
is a difficult problem. Distributed memory machines are scalable,
but since the memory is distributed across processors, the scheme
of placement of
data (arrays) onto local memories of different processors become
crucial since any communication between processors for nonlocal
data access is an order of magnitude costlier than access to local
memory. Researchers have given varied solutions
to this problem, most of which work for uniform dependences in loops
and they suggest HPFlike distributions only. For nonuniform
dependences the loop was made to run sequentially.
In this work, we present a partitioning strategy
called Hyperplane Partitioning which works well with
loops with nonuniform dependences also. In this method of partitioning,
the iteration space
is partitioned into as many number of partitions as there are
number of logical processors, in such a way that the overall
interprocessor communication will be minimum. The idea is to
localize as many as dependences as possible so that overall
communication both beacuse of nonlocal data as well as
interprocessor synchronizations are reduced.
These partitions are
then induced into data spaces of the arrays referenced in the loop.
Each processor then runs its part of iteration space keeping the data
partition that it owns locally. Any nonlocal data access is
implemented by interprocessor communication at runtime.The Hyperplane Partitioning is also extended to
a sequence of loops. This is done by first finding
Best Local Distribution (BLD) for every loop first and
then finding the best way of grouping different adjacent loops
(just for finding the data partition)
which gives best global data partition. This sequence of
distributions/redistributions is found by constructing a
data structure called Data Distribution Tree (DDT) and finding
the least cost path from the source to any of the leaf nodes
in the DDT. The costs for the edges come from the communication
cost incurred while running a loop with a particular distribution
and redistribution to suit the requirement at the next loop.
For this a communication cost estimator is developed which
works well for fewer dimensions. To handle complete programs
we use some heuristic to find the best global distribution
for the entire program.Some optimizations like message optimization to reduce the number
of messages sent across processors, time optimization
which is done by uniform scheduling across processors, and
space optimization to keep only the part of array space
that any processor owns onto its local memory, are studied.
Hyperplane Partitioning is also implemented using an algorithm for
synchronization to handle nonlocal memory access as well
as obeying data dependence constraints. The algorithm is also
proved to be correct. The target machine is IBMSP2 using
PVM for the message passing library. The performance of the tool
on some standard benchmarks (ADI and RHS) and also on some
programs designed by us to show the specific merits of the tool.
The results show that the loops which have nonuniform dependences
also can be run on DMM with good speedups.

Page generated in 0.1029 seconds