Spelling suggestions: "subject:"multiprocessor"" "subject:"multprocessor""
1 |
A robust MFSK transmission system for aeromobile HF radio channelsClark, Paul Derrick John January 1999 (has links)
No description available.
|
2 |
MULTI-STREAM DATA-DRIVEN TELEMETRY SYSTEMCan, Ouyan, Chang-jie, Shi 11 1900 (has links)
International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The Multi-Stream Data-Driven Telemetry System (MSDDTS) is a new generation system in China developed by Beijing Research Institute of Telemetry (BRIT) for high bit rate, multi-stream data acquisition, processing and display. Features of the MSDDTS include:
.Up to 4 data streams; .Data driven architecture; .Multi-processor for parallel processing; .Modular, Configurable, expandable and programmable; .Stand-along capability; .And, external control by host computer.
This paper addresses three very important aspects of the MSDDTS. First, the system architecture is discussed. Second, three basic models of the system configuration are described. The third shows the future development of the system.
|
3 |
ENHANCING FAIRNESS AND PERFORMANCE ON CHIP MULTI-PROCESSOR PLATFORMS WITH CONTENTION-AWARE SCHEDULING POLICIESMarinakis, Theodoros 01 December 2019 (has links) (PDF)
Chip Multi-Processor (CMP) platforms, well-established in the server, desktop and embedded domain, succeeded in overcoming the power consumption and heat dissipation bottlenecks by integrating multiple cores, less complex and powerful than their single-core ancestors, in a single die. A major issue induced by the design of the CMPs is contention for the shared resources of the platform, Last Level Cache (LLC) and main memory bandwidth. Applications, running concurrently on the cores, compete with each other for the shared resources, and are subject to performance degradation. The way applications are assigned to the CMP, is crucial for the overall performance of the system. A scheduling policy that accounts for contention will bring high performance speed-ups, whereas an agnostic one will generate unpredictable contention conditions. For this reason the significance of the scheduler has been elevated, as it is the component that determines which applications utilize the resources each time period.In this thesis, we address cross-core interference on CMP platforms, by designing scheduling policies that improve performance and fairness. We deal with contention in three ways. In our first approach, we incorporate the notion of progress in order to balance unfairness among the applications of the workload. Performance degradation is not evenly distributed and progress greatly varies among them. In order to provide a fair execution environment, we monitor, at run-time, applications assigned to the CPU and prioritize them based on the extent at which they are affected by contention.In our second approach, we target performance by mitigating contention on shared resources. It is necessary to decide, out of all the possible application schedules, the one that generates the least amount of resource interference. To achieve that, the first indispensable step is to extract an interference profile for the applications executed on the CMP. We accomplish that by applying pressure to all levels of memory hierarchy and identifying the point at which performance is compromised. From our analysis, we understand that shared resources can tolerate pressure of certain amount; applications can be grouped together if the overall generated pressure does not reach the saturation point of the shared resources. Having extracted this information, we proceed to the placement of the application in such a way that overall resource requirements are as balanced as possible across the execution.Finally, we design a policy in order to improve performance and fairness at the same time. Applications that heavily rely on the LLC are separated from those with high main memory bandwidth, in order to avoid the destructive effects caused by the LLC thrashing behavior of the latter. The group executed on the CPU is determined based on the key observation that the overall requirements of the group should not exceed the saturation limits of the CMP. Additionally, during execution, the progress for each application is estimated and those with the least accumulated progress are prioritized.Our proposed policies are evaluated in an Intel Xeon E5-2620 v3 processor. A variety of benchmark suites were utilized to generate mixes of diverse characteristics. Our methodologies are implemented in user-space and can be deployed on Linux-based systems. Experimental results show the benefits of tackling contention in shared resources. We achieve throughput gains of up to 16% and unfairness is reduced by 2.37x on average compared to Linux scheduler.
|
4 |
Performance Modeling of Single Processor and Multi-Processor Computer ArchitecturesCommissariat, Hormazd P. 11 March 2000 (has links)
Determining the optimum computer architecture configuration for a specific application or a generic algorithm is a difficult task. The complexity involved in today's computer architectures and systems makes it more difficult and expensive to easily and economically implement and test full functional prototypes of computer architectures. High level VHDL performance modeling of architectures is an efficient way to rapidly prototype and evaluate computer architectures.
Determining the architecture configuration is fixed, one would like to know the tolerance and expected performance of individual/critical components and also what would be the best way to map the software tasks onto the processor(s). Trade-offs and engineering compromises can be analyzed and the effects of certain component failures and communication bottle-necks can be studied.
A part of the research work done for the RASSP (Rapid Prototyping of Application Specific Signal Processors) project funded by Department of Defense contracts is documented in this thesis. The architectures modeled include a single-processor, single-global-bus system; a four processor, single-global-bus system; a four processor, multiple-local-bus, single-global-bus system; and finally, a four processor multiple-local-bus system interconnected by a crossbar interconnection switch. The hardware models used are mostly legacy/inherited models from an earlier project and they were upgraded, modified and customized to suit the current research needs and requirements. The software tasks that are run on the processors are pieces of the signal and image processing algorithm run on the Synthetic Aperture Radar (SAR).
The communication between components/devices is achieved in the form of tokens which are record structures. The output is a trace file which tracks the passage of the tokens through various components of the architecture. The output trace file is post-processed to obtain activity plots and latency plots for individual components of the architecture. / Master of Science
|
5 |
COMPACT AIRBORNE REAL TIME DATA MONITOR SYSTEM - PRODUCTION MONITORTolleth, Grant H. 10 1900 (has links)
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California / This paper describes the Production Monitor (PM), a result of integrating very
diverse hardware architectures into a compact, portable, real time airborne data monitor,
and data analysis station. Flight testing of aircraft is typically conducted with personnel
aboard during flight. These personnel monitor real time data, play back recorded data, and
adjust test suites to certify or analyze systems as quickly as possible. In the past, Boeing
has used a variety of dissimilar equipment and software to meet our testing needs. During
the process of standardizing and streamlining testing processes, the PM was developed.
PM combines Data Flow, VME, Ethernet, and PC architectures into a single integrated
system. This approach allows PM to run applications, provide indistinguishable operator
interfaces, and use data bases and peripherals common to our other systems.
|
6 |
FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTUREHegde, Sridhar 01 January 2004 (has links)
Reconfigurable and dynamic computer architecture is an exciting area of research that is rapidly expanding to meet the requirements of compute intense real and non-real time applications in key areas such as cryptography, signal/radar processing and other areas. To meet the demands of such applications, a parallel single-chip heterogeneous Hybrid Data/Command Architecture (HDCA) has been proposed. This single-chip multiprocessor architecture system is reconfigurable at three levels: application, node and processor level. It is currently being developed and experimentally verified via a three phase prototyping process. A first phase prototype with very limited functionality has been developed. This initial prototype was used as a base to make further enhancements to improve functionality and performance resulting in a second phase virtual prototype, which is the subject of this thesis. In the work reported here, major contributions are in further enhancing the functionality of the system by adding additional processors, by making the system reconfigurable at the node level, by enhancing the ability of the system to fork to more than two processes and by designing some more complex real/non-real time applications which make use of and can be used to test and evaluate enhanced and new functionality added to the architecture. A working proof of concept of the architecture is achieved by Hardware Description Language (HDL) based development and use of a Virtual Prototype of the architecture. The Virtual Prototype was used to evaluate the architecture functionality and performance in executing several newly developed example applications. Recommendations are made to further improve the system functionality.
|
7 |
Performance and Energy Efficient Building Blocks for Network-on-Chip ArchitecturesVangal, Sriram R. January 2006 (has links)
The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures. This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply. It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control. / Report code: LiU-TEK-LIC-2006:36.
|
8 |
Performance and Energy Efficient Building Blocks for Network-on-Chip ArchitecturesVangal, Sriram R. January 2006 (has links)
<p>The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures.</p><p>This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply.</p><p>It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control.</p> / Report code: LiU-TEK-LIC-2006:36.
|
9 |
Modelo de migração de tarefas para MPSoCs baseados em redes-em-chip / Task migration model for NoC-based MPSoCsBarcelos, Daniel January 2008 (has links)
Em relação a sistemas multiprocessados integrados em uma única pastilha (MPSoC), tanto a alocação dinâmica quanto a migração de tarefas são áreas de pesquisa recentes e abertas. Este artigo propõe uma organização de memória híbrida para sistemas com comunicação baseados em redes-em-chip, como maneira de minimizar a energia gasta durante a transferência de código decorrente de uma alocação ou migração de tarefa. É também introduzido um novo mecanismo de migração de tarefas, que, por sua vez, pode utilizar check-pointing ou outra técnica mais transparente. O aumento do uso de sistemas multiprocessados na computação embarcada torna importante a avaliação de diferentes organizações de memória. Enquanto memórias distribuídas proporcionam acessos mais rápidos, memórias compartilhadas tornam possível o compartilhamento de dados sem a interferência dos processadores. Nos experimentos realizados, foi focada a redução da energia gasta na comunicação em um contexto onde uma migração de tarefas ou uma alocação dinâmica fosse necessária. Os resultados indicam que, considerando a migração do código, a solução proposta apresenta melhor eficiência do que soluções unicamente distribuídas ou compartilhadas. Foi também verificado que, em alguns casos, a estratégia híbrida reduz os tempos de migração. Na solução apresentada, o código pode ser transferido do nó onde a tarefa era originalmente executada ou de uma memória posicionada no centro da rede. A escolha entre as duas opções é feita em tempo de execução de uma maneira intuitiva, sendo a escolha baseada na distância entre os nós envolvidos na transferência. Os resultados indicam que a organização proposta reduz a energia de transferência de código em 24% e 10% em média, se comparada, respectivamente, a soluções utilizando somente memória global ou distribuída. O modelo de migração de tarefas proposto é baseado na linguagem Java e na comunicação por troca de mensagens. Todo seu desenvolvimento se deu em software, não requerendo nenhuma modificação no sistema. O custo energético da migração foi então avaliado. Entende-se por custo energético a energia gasta nos processadores para envio e recebimento das mensagens e na estrutura de comunicação, uma rede-em-chip. Trabalhos já existentes não consideram o custo de migração, comparando apenas o arranjo inicial e final das tarefas no sistema. Este trabalho, entretanto, avalia todo o processo de migração. Através de experimentos, é estimado o tempo mínimo de execução da plataforma, como função do tamanho da tarefa e da distância entre os nós da rede, necessário para amortizar a energia gasta no processo de migração, considerando que os processadores utilizam a técnica de DVS para reduzir o consumo de acordo com suas cargas de processamento. / Regarding embedded Multi-processor Systems-on-Chip (MPSoCs), dynamic task allocation and task migration are still open research areas. This work proposes a hybrid memory organization for NoC-based systems as the way to minimize the energy spent during the code transfer when task migration or dynamic task allocation needs to be performed. It is also introduced a new flexible task migration mechanism, which can use check-pointing or a more transparent technique. The increasing use of multi-processor architectures in embedded computing makes it important to evaluate different options for memory organization. While distributed memory allows faster accesses, a global memory makes possible the sharing of data without processor interference. In the experiments, it is targeted the communication energy reduction in a context where task migration or dynamic task allocation is required. Results indicate that the proposed hybrid memory organization presents better efficiency than distributed- or global-only organizations regarding code migration. It is also noticed that, in some cases, the hybrid strategy reduces the task migration times. In the hybrid approach, the code can be transferred from the node where the task was originally running or from a memory positioned at the center of the system. The choice between the two options is done at runtime in a very intuitive way, based on the distance between the nodes involved on the transfer. Results are very encouraging and indicate that the proposed hybrid organization reduces the code transfer energy by 24% and 10% on average, as compared to global- and distributed-only memory organizations, respectively. The proposed migration model is based on the Java language and on message passing communication method. It is mainly software-based, and does not require any system modification. The energy cost of the migration process is then evaluated, i.e., the energy spent on the sending and receiving cores and on the communication structure, a wormhole-based Network-on-Chip (NoC). Previous works have compared system figures before and after task migration, while this study evaluates the whole migration process. Finally, it is derived the minimum execution time of the embedded system, as a function of the task size and of the distance between the cores on the NoC, that is required to amortize the energy spent on the migration process, considering that processors use Dynamic Voltage Scaling to reduce power consumption according to their current workloads.
|
10 |
Modelo de migração de tarefas para MPSoCs baseados em redes-em-chip / Task migration model for NoC-based MPSoCsBarcelos, Daniel January 2008 (has links)
Em relação a sistemas multiprocessados integrados em uma única pastilha (MPSoC), tanto a alocação dinâmica quanto a migração de tarefas são áreas de pesquisa recentes e abertas. Este artigo propõe uma organização de memória híbrida para sistemas com comunicação baseados em redes-em-chip, como maneira de minimizar a energia gasta durante a transferência de código decorrente de uma alocação ou migração de tarefa. É também introduzido um novo mecanismo de migração de tarefas, que, por sua vez, pode utilizar check-pointing ou outra técnica mais transparente. O aumento do uso de sistemas multiprocessados na computação embarcada torna importante a avaliação de diferentes organizações de memória. Enquanto memórias distribuídas proporcionam acessos mais rápidos, memórias compartilhadas tornam possível o compartilhamento de dados sem a interferência dos processadores. Nos experimentos realizados, foi focada a redução da energia gasta na comunicação em um contexto onde uma migração de tarefas ou uma alocação dinâmica fosse necessária. Os resultados indicam que, considerando a migração do código, a solução proposta apresenta melhor eficiência do que soluções unicamente distribuídas ou compartilhadas. Foi também verificado que, em alguns casos, a estratégia híbrida reduz os tempos de migração. Na solução apresentada, o código pode ser transferido do nó onde a tarefa era originalmente executada ou de uma memória posicionada no centro da rede. A escolha entre as duas opções é feita em tempo de execução de uma maneira intuitiva, sendo a escolha baseada na distância entre os nós envolvidos na transferência. Os resultados indicam que a organização proposta reduz a energia de transferência de código em 24% e 10% em média, se comparada, respectivamente, a soluções utilizando somente memória global ou distribuída. O modelo de migração de tarefas proposto é baseado na linguagem Java e na comunicação por troca de mensagens. Todo seu desenvolvimento se deu em software, não requerendo nenhuma modificação no sistema. O custo energético da migração foi então avaliado. Entende-se por custo energético a energia gasta nos processadores para envio e recebimento das mensagens e na estrutura de comunicação, uma rede-em-chip. Trabalhos já existentes não consideram o custo de migração, comparando apenas o arranjo inicial e final das tarefas no sistema. Este trabalho, entretanto, avalia todo o processo de migração. Através de experimentos, é estimado o tempo mínimo de execução da plataforma, como função do tamanho da tarefa e da distância entre os nós da rede, necessário para amortizar a energia gasta no processo de migração, considerando que os processadores utilizam a técnica de DVS para reduzir o consumo de acordo com suas cargas de processamento. / Regarding embedded Multi-processor Systems-on-Chip (MPSoCs), dynamic task allocation and task migration are still open research areas. This work proposes a hybrid memory organization for NoC-based systems as the way to minimize the energy spent during the code transfer when task migration or dynamic task allocation needs to be performed. It is also introduced a new flexible task migration mechanism, which can use check-pointing or a more transparent technique. The increasing use of multi-processor architectures in embedded computing makes it important to evaluate different options for memory organization. While distributed memory allows faster accesses, a global memory makes possible the sharing of data without processor interference. In the experiments, it is targeted the communication energy reduction in a context where task migration or dynamic task allocation is required. Results indicate that the proposed hybrid memory organization presents better efficiency than distributed- or global-only organizations regarding code migration. It is also noticed that, in some cases, the hybrid strategy reduces the task migration times. In the hybrid approach, the code can be transferred from the node where the task was originally running or from a memory positioned at the center of the system. The choice between the two options is done at runtime in a very intuitive way, based on the distance between the nodes involved on the transfer. Results are very encouraging and indicate that the proposed hybrid organization reduces the code transfer energy by 24% and 10% on average, as compared to global- and distributed-only memory organizations, respectively. The proposed migration model is based on the Java language and on message passing communication method. It is mainly software-based, and does not require any system modification. The energy cost of the migration process is then evaluated, i.e., the energy spent on the sending and receiving cores and on the communication structure, a wormhole-based Network-on-Chip (NoC). Previous works have compared system figures before and after task migration, while this study evaluates the whole migration process. Finally, it is derived the minimum execution time of the embedded system, as a function of the task size and of the distance between the cores on the NoC, that is required to amortize the energy spent on the migration process, considering that processors use Dynamic Voltage Scaling to reduce power consumption according to their current workloads.
|
Page generated in 0.1885 seconds