Global ETD Search

11	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. Martins, João Eduardo Machado Perea 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Optical interconnection networks Parallel computer architecture Photonic Redes ópticas de interconexão
12	Application of Parallel Computers to Enhance the Flow Modelling Capability in Aircraft Design Sillén, Mattias January 2006 (has links) <p>The development process for new aircraft configurations needs to be more efficient in terms of performance, cost and time to market. The potential to influence these factors is highest in early design phases. Thus, high confidence must be established in the product earlier than today. To accomplish this, the concept of virtual product development needs to be established. This implies having a mathematical representation of the product and its associated properties and functions, often obtained through numerical simulations. Building confidence in the product early in the development process through simulations postpones expensive testing and verification to later development stages when the design is more mature.</p><p>To use this in aerodynamic design will mean introducing more advanced physical modelling of the flow as well as significantly reducing the turn around time for flow solutions.</p><p>This work describes the benefit of using parallel computers for flow simulations in the aircraft design process. Reduced turn around time for flow simulations is a prerequisite for non-linear flow modelling in early design stages and a condition for introducing high-end turbulence models and unsteady simulations in later stages of the aircraft design process. The outcome also demonstrates the importance of bridging the gap between the research community and industrial applications.</p><p>The computer platforms are very important to reduce the turn around time for flow simulations. With the recent popularity of Linux–clusters it is now possible to design cost efficient systems for a specific application. Two flow solvers are investigated for parallel</p><p>performance on various clusters. Hardware and software factors influencing the efficiency are analyzed and recommendations are made for cost efficiency and peak performance.</p> / Report code: LiU-TEK-LIC-2006:27. flow modelling parallel computer parallel efficiency turbulence modelling aircraft design CFD Fluid mechanics Strömningsmekanik
13	Application of Parallel Computers to Enhance the Flow Modelling Capability in Aircraft Design Sillén, Mattias January 2006 (has links) The development process for new aircraft configurations needs to be more efficient in terms of performance, cost and time to market. The potential to influence these factors is highest in early design phases. Thus, high confidence must be established in the product earlier than today. To accomplish this, the concept of virtual product development needs to be established. This implies having a mathematical representation of the product and its associated properties and functions, often obtained through numerical simulations. Building confidence in the product early in the development process through simulations postpones expensive testing and verification to later development stages when the design is more mature. To use this in aerodynamic design will mean introducing more advanced physical modelling of the flow as well as significantly reducing the turn around time for flow solutions. This work describes the benefit of using parallel computers for flow simulations in the aircraft design process. Reduced turn around time for flow simulations is a prerequisite for non-linear flow modelling in early design stages and a condition for introducing high-end turbulence models and unsteady simulations in later stages of the aircraft design process. The outcome also demonstrates the importance of bridging the gap between the research community and industrial applications. The computer platforms are very important to reduce the turn around time for flow simulations. With the recent popularity of Linux–clusters it is now possible to design cost efficient systems for a specific application. Two flow solvers are investigated for parallel performance on various clusters. Hardware and software factors influencing the efficiency are analyzed and recommendations are made for cost efficiency and peak performance. / Report code: LiU-TEK-LIC-2006:27. flow modelling parallel computer parallel efficiency turbulence modelling aircraft design CFD Fluid mechanics Strömningsmekanik
14	Effcient Simulation of Message-Passing in Distributed-Memory Architectures Demaine, Erik January 1996 (has links) In this thesis we propose a distributed-memory parallel-computer simulation system called PUPPET (Performance Under a Pseudo-Parallel EnvironmenT). It allows the evaluation of parallel programs run in a pseudo-parallel system, where a single processor is used to multitask the program's processes, as if they were run on the simulated system. This allows development of applications and teaching of parallel programming without the use of valuable supercomputing resources. We use a standard message-passing language, MPI, so that when desired (e. g. , development is complete) the program can be run on a truly parallel system without any changes. There are several features in PUPPET that do not exist in any other simulation system. Support for all deterministic MPI features is available, including collective and non-blocking communication. Multitasking (more processes than processors) can be simulated, allowing the evaluation of load-balancing schemes. PUPPET is very loosely coupled with the program, so that a program can be run once and then evaluated on many simulated systems with multiple process-to-processor mappings. Finally, we propose a new model of direct networks that ignores network traffic, greatly improving simulation speed and often not signficantly affecting accuracy. Computer Science distributed-memory parallel-computer simulation system Performance Pseudo-Parallel Environment
15	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. João Eduardo Machado Perea Martins 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Redes ópticas de interconexão Optical interconnection networks Parallel computer architecture Photonic
16	Design and Analysis of Four Architectures for FPGA-Based Cellular Computing Morgan, Kenneth J. 09 November 2004 (has links) The computational abilities of today's parallel supercomputers are often quite impressive, but these machines can be impractical for some researchers due to prohibitive costs and limited availability. These researchers might be better served by a more personal solution such as a "hardware acceleration" peripheral for a PC. FPGAs are the ideal device for the task: their configurability allows a problem to be translated directly into hardware, and their reconfigurability allows the same chip to be reprogrammed for a different problem. Efficient FPGA computation of parallel problems calls for cellular computing, which uses an array of independent, locally connected processing elements, or cells, that compute a problem in parallel. The architecture of the computing cells determines the performance of the FPGA-based computer in terms of the cell density possible and the speedup over conventional single-processor computation. This thesis presents the design and performance results of four computing-cell architectures. MULTIPLE performs all operations in one cycle, which takes the least amount of time but requires the most chip area. BIT performs all operations bit-serially, which takes a long time but allows a large cell density. The two other architectures, SINGLE and BOOTH, lie within these two extremes of the area/time spectrum. The performance results show that MULTIPLE provides the greatest speedup over common calculation software, but its usefulness is limited by its small cell density. Thus, the best architecture for a particular problem depends on the number of computing cells required. The results also show that with further research, next-generation FPGAs can be expected to accelerate single-processor computations as much as 22,000 times. / Master of Science Booth Algorithm Field programmable gate arrays Single-Chip Computer Cellular Computing Bit-Serial Parallel Computer
17	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB
18	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB
19	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB
20	Operating System Support For Optimistic Distributed Simulation Raja, V 06 1900 (has links) (PDF) No description available. Computer Science Operating Systems - Computers Optimistic Distributed Simulation Parallel Computer Computer Science

Search results