Global ETD Search

1	Systolic arrays for the matrix iterative methods Haider, Shahid Abbas January 1993 (has links) The systolic array research was pioneered by H. T. Kung and C. E. Leiserson. Systolic arrays are special purpose synchronous architectures consisting of simple, regular and modular processors which are regularly interconnected to form an array. Systolic arrays are well suited for computational bound problems in Linear Algebra. In this thesis, the numerical problems, especially iterative algorithms are chosen and implemented on the linear systolic array. same. 005
2	An investigation into efficient interfacing strategies for VLSI arithmetic processors based on residue number systems utilising diminished and augmented radix-2 moduli Pourbigharaz, Fariborz January 1995 (has links) No description available. 621.39
3	Performance Modeling of Single Processor and Multi-Processor Computer Architectures Commissariat, Hormazd P. 11 March 2000 (has links) Determining the optimum computer architecture configuration for a specific application or a generic algorithm is a difficult task. The complexity involved in today's computer architectures and systems makes it more difficult and expensive to easily and economically implement and test full functional prototypes of computer architectures. High level VHDL performance modeling of architectures is an efficient way to rapidly prototype and evaluate computer architectures. Determining the architecture configuration is fixed, one would like to know the tolerance and expected performance of individual/critical components and also what would be the best way to map the software tasks onto the processor(s). Trade-offs and engineering compromises can be analyzed and the effects of certain component failures and communication bottle-necks can be studied. A part of the research work done for the RASSP (Rapid Prototyping of Application Specific Signal Processors) project funded by Department of Defense contracts is documented in this thesis. The architectures modeled include a single-processor, single-global-bus system; a four processor, single-global-bus system; a four processor, multiple-local-bus, single-global-bus system; and finally, a four processor multiple-local-bus system interconnected by a crossbar interconnection switch. The hardware models used are mostly legacy/inherited models from an earlier project and they were upgraded, modified and customized to suit the current research needs and requirements. The software tasks that are run on the processors are pieces of the signal and image processing algorithm run on the Synthetic Aperture Radar (SAR). The communication between components/devices is achieved in the form of tokens which are record structures. The output is a trace file which tracks the passage of the tokens through various components of the architecture. The output trace file is post-processed to obtain activity plots and latency plots for individual components of the architecture. / Master of Science Performance Modeling Computer Architectures VHDL Multi-Processor
4	Evaluation of information systems development in the NHS using NIMSAD framework Kheong Lye, Sue January 1996 (has links) The principal focus of the research effort was the management of information systems development to support the increased information needs arising from the radical health reforms of 1989. This was undertaken in collaboration with a purchaser and a provider within the health service. An action research approach was adopted wherein the researcher was actively involved in the development and successful implementation of an information system. Initial findings revealed a variety of factors hindering the purchaser and the provider from successfully developing the intended information systems to support the contracting process required in the reforms. A disparity in relative strengths between the purchaser and provider was considered a major constraint hindering the purchaser from developing the intended information system and performing their designated role in the new internal market system of the NHS. Through the rapid development of a computer-based information system the immediate needs of the purchaser and the provider were satisfied, and development of the individuals and the organisation took place. Subsequent to the development, a reflective post-intervention evaluation was carried out using a conceptual problem solving framework. Three important findings emerged from the systems development effort: [1] The employment of prototyping in the evolutionary development of the intended information system is considered to be particularly pertinent and responsive to the uncertain requirements of organisations undergoing change. [2] The embracing of a flexible blend of expert intervention and facilitation is an important element in the information systems development process. {3) The development of the individuals and the organisation is an intrinsic part of developing information systems. Using the NIMSAD framework for post-intervention evaluation of the development effort, various additional findings were abstracted from the critical evaluation and reflection on the adopted approach. The systems development process was evaluated against three identified elements - the problem situation, the problem solving process and the problem solver. Results of the evaluation and reflection revealed deficiencies in the research, which indicate that: [1] The appreciation of the context and content of the problem situation increases the level of understanding of the 'problems' leading to the adoption of appropriate methodologies for conducting the problem solving process. [2] The effectiveness of the adopted problem solving process can be enhanced by the validation of the client's definition of the problem, the facilitation of involvement from participants, the innovative use of prototyping and the need for evaluation of the process. [3] The personal characteristics of the problem solver significantly influence the possible solutions to the identified problems. Contributions from the evaluation of the research effort can be seen in: [1] The suggested reflexive model for action research, with emphasis on evaluation of the actions of the researcher as a problem solver. [2] The need to maintain close links with the client and communicate disparate perceptions of the problem and problem situation. [3] The employment of a flexible blend of expert intervention and facilitation (a hybrid approach enables the resolution of the problem from a multidisciplinary perspective). [4] Suggestion for further research into the personal characteristics of an effective problem solver. 658
5	Architecture, Performance and Applications of a Hierarchial Network of Hypercubes Kumar, Mohan J 02 1900 (has links) This thesis, presents a multiprocessor topology, the hierarchical network of hyper-cubes, which has a low diameter, low degree of connectivity and yet exhibits hypercube like versatile characteristics. The hierarchical network of hyper-cubes consists of k-cubes interconnected in two or more hierarchical levels. The network has a hierarchical, expansive, recursive structure with a constant pre-defined building block. The basic building block of the hierarchical network of hyper-cubes comprises of a k-cube of processor elements and a network controller. The hierarchical network of hyper-cubes retains the positive features of the k-cube at different levels of hierarchy and has been found to perform better than the binary hypercube in executing a variety of application problems. The ASCEND/DESCEND class of algorithms can be executed in O(log2 N) parallel steps (N is the number of data elements) on a hierarchical network of hypercubes with N processor elements. A description of the topology of the hierarchical network of hypercubes is presented and its architectural potential in terms of fault-tolerant message routing, executing a class of highly parallel algorithms, and in simulating artificial neural networks is analyzed. Further, the proposed topology is found to be very efficient in executing multinode broadcast and total exchange algorithms. We subsequently, propose an improvisation of the network to counter faults, and explore implementation of artificial neural networks to demonstrate efficient implementation of application problems on the network. The fault-tolerant capabilities of the hierarchical network of hypercubes with two network controllers per k-cube of processor elements are comparable to those of the hypercube and the folded hypercube. We also discuss various issues related to the suitability of multiprocessor architectures for simulating neural networks. Performance analysis of ring, hypercube, mesh and hierarchical network of hypercubes for simulating artificial neural networks is presented. Our studies reveal that the performance of the hierarchical network of hypercubes is better than those of ring, mesh, hypernet and hypercube topologies in implementing artificial neural networks. Design and implementation aspects of hierarchical network of hypercubes based on two schemes, viz., dual-ported RAM communication, and transputers are also presented. Results of simulation studies for robotic applications using neural network paradigms on the transputer-based hierarchical network of hypercubes reveal that the proposed network can produce fast response times of the order of hundred microseconds. Computer and Information Science Computer network architectures Computer architectures Multiprocessors Parallel processing Hierarchical Network of Hypercubes
6	Desenvolvimento de uma arquitetura reconfigurável para o processamento de modelos no ambiente ABACUS / Lima, Verônica Aparecida Lopes. January 2007 (has links) Orientador: Norian Marranghello / Banca: Nobuo Oki / Banca: Wang Jiang Chau / Resumo: O objetivo deste trabalho é o desenvolvimento de uma arquitetura reconfigurável estaticamente, de um elemento de processamento (MPH) para o ambiente de simulação de circuitos ABACUS. Este elemento de processamento consiste de um conjunto de unidades funcionais que podem ser relacionadas por meio de algumas palavras de controle armazenadas na ROM, e cuja interconexão pode ser alterada para que o hardware de processamento se adapte ao modelo do elemento de circuito a ser simulado. O projeto foi descrito em linguagem VHDL e simulado com o auxílio do software QUARTUS II. / Abstract: The aim of this work is the development of a statically reconfigurable architecture, of a processing element (MPH) for the ABACUS circuit simulation environment. This processing element consists of a set of functional units that can be related by means of some control words stored in the ROM, and whose interconnection can be modified so that the processing hardware be adapted to the model of the circuit element to be simulated. The project was described in VHDL, and simulated with the aid of Quartus II software. / Mestre Reconfigurable digital systems. eng Circuit simulation. eng
7	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB
8	Desenvolvimento de uma arquitetura reconfigurável para o processamento de modelos no ambiente ABACUS Lima, Verônica Aparecida Lopes [UNESP] 31 August 2007 (has links) (PDF) Made available in DSpace on 2014-06-11T19:22:35Z (GMT). No. of bitstreams: 0 Previous issue date: 2007-08-31Bitstream added on 2014-06-13T20:29:09Z : No. of bitstreams: 1 lima_val_me_ilha.pdf: 399126 bytes, checksum: 5597e5f619ca9aa5e433432ef064a3bf (MD5) / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / O objetivo deste trabalho é o desenvolvimento de uma arquitetura reconfigurável estaticamente, de um elemento de processamento (MPH) para o ambiente de simulação de circuitos ABACUS. Este elemento de processamento consiste de um conjunto de unidades funcionais que podem ser relacionadas por meio de algumas palavras de controle armazenadas na ROM, e cuja interconexão pode ser alterada para que o hardware de processamento se adapte ao modelo do elemento de circuito a ser simulado. O projeto foi descrito em linguagem VHDL e simulado com o auxílio do software QUARTUS II. / The aim of this work is the development of a statically reconfigurable architecture, of a processing element (MPH) for the ABACUS circuit simulation environment. This processing element consists of a set of functional units that can be related by means of some control words stored in the ROM, and whose interconnection can be modified so that the processing hardware be adapted to the model of the circuit element to be simulated. The project was described in VHDL, and simulated with the aid of Quartus II software. Sistemas digitais reconfiguráveis Reconfigurable digital systems Circuit simulation Multiprocessor computer architectures
9	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB
10	Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threads Cruz, Eduardo Henrique Molina da January 2012 (has links) As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively. Processamento paralelo Desempenho : Computadores Processamento distribuido Thread mapping Parallel computer architectures Shared memory Communication Cache memory Cache coherence protocols TLB

Search results