Spelling suggestions: "subject:"multiprocessor"" "subject:"multiprocessors""
1 |
A Configurable Router for Embedded Network-on-Chip Support in Field-Programmable Gate ArraysPau, Ronny 27 September 2008 (has links)
The scaling of VLSI technology has allowed extensive integration of processing resources
on a single chip. Consequently, programmable chips is able to have a high logic and memory
capacity for implementation of complex systems. Field-programmable gate arrays (FPGAs) with their embedded memory and other specialized functionality have become viable
alternatives in many cases to costly application-specific integrated circuits as a system-on-chip (SoC) substrate. However, on-chip bus-based interconnects are no longer suitable for complex SoC design because of its limited scalability. The network-on-chip (NoC)paradigm has therefore emerged as a scalable approach for addressing this challenge.
FPGAs can also adopt the NoC paradigm in order to support more complex SoC implementations. The elements for NoC support can be implemented in conventional programmable logic within an FPGA, however, a dedicated approach for these NoC elements
can lead to better performance and more efficient utilization of on-chip FPGA resources. A fixed network topology can be a disadvantage in NoC platforms due to misalignment with application requirements. It is therefore desirable to incorporate a certain level of configurability even for embedded NoC support within an FPGA.
This thesis presents the design and implementation of a configurable router intended as a dedicated embedded module for NoC support in an FPGA. The goal is to provide a general NoC infrastructure for the FPGA platform that balances trade-offs with regard to logic complexity, resource utilization, and flexibility. The configurable router provides flexibility in implementing a variety of network topologies with the convenience of a 3-bit input to the router for configuration. All of the necessary routing functionality for each topology is implemented in logic for performance and area efficiency. The overall
router design provides general NoC support with reduced complexity, thereby achieving
area efficiency and an adequate clock frequency for typical operation in conjunction with embedded soft processors.
Synthesis results are presented at the router level in order to characterize the hardware overhead for implementations in programmable logic as well as standard-cell technology, and at the system-level in order to evaluate overall system resource utilization. Operational results are shown at router level to demonstrate correctness and at system level to demonstrate
functionality of the multiprocessor systems that utilizes the configurable router. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2008-09-24 23:24:01.907
|
2 |
A multiprocessng system-on-chip framework targeting stream-oriented applicationsCook, Darcy Philip 19 January 2011 (has links)
Over the past decade, the processing speed requirement of embedded systems has steadily increased. Since faster clocking of a single processor can no longer be considered to increase the processing speed of the system (due to overheating and other constraints), the development of multiprocessors on a single chip has stepped up to meet the demand. One approach has been to design and develop a multiprocessing platform to handle a large set of homogeneous applications. However, this development has been slow due to the intractable design space, which results when both the hardware and software are required to be adjustable to meet the needs of the dissimilar applications. A different approach has been to limit the number of targeted applications to be similar in some sense. By limiting the number of targeted applications to a cohesive set, the design space can become manageable. This thesis proposes a framework for a multiprocessing system-on-chip (MPSoC), consisting of a cohesive hardware and software architecture intended specifically for problems that are stream-oriented (e.g., video streaming). The framework allows the hardware and software to be customized to fit a specific application within the cohesive set, while narrowing the design space to a manageable set of design parameters. In addition, this thesis designs and develops an analytic model, using a discrete-time Markov chain, to measure the performance of an MPSoC framework implementation when the number of concurrent processing elements is varied. Finally, a chaotic simulated annealing algorithm was developed to determine an optimal mapping and scheduling of tasks to processing elements within the MPSoC.
|
3 |
A multiprocessng system-on-chip framework targeting stream-oriented applicationsCook, Darcy Philip 19 January 2011 (has links)
Over the past decade, the processing speed requirement of embedded systems has steadily increased. Since faster clocking of a single processor can no longer be considered to increase the processing speed of the system (due to overheating and other constraints), the development of multiprocessors on a single chip has stepped up to meet the demand. One approach has been to design and develop a multiprocessing platform to handle a large set of homogeneous applications. However, this development has been slow due to the intractable design space, which results when both the hardware and software are required to be adjustable to meet the needs of the dissimilar applications. A different approach has been to limit the number of targeted applications to be similar in some sense. By limiting the number of targeted applications to a cohesive set, the design space can become manageable. This thesis proposes a framework for a multiprocessing system-on-chip (MPSoC), consisting of a cohesive hardware and software architecture intended specifically for problems that are stream-oriented (e.g., video streaming). The framework allows the hardware and software to be customized to fit a specific application within the cohesive set, while narrowing the design space to a manageable set of design parameters. In addition, this thesis designs and develops an analytic model, using a discrete-time Markov chain, to measure the performance of an MPSoC framework implementation when the number of concurrent processing elements is varied. Finally, a chaotic simulated annealing algorithm was developed to determine an optimal mapping and scheduling of tasks to processing elements within the MPSoC.
|
4 |
Quasi-static scheduling for fine-grained embedded multiprocessingBoutellier, J. (Jani) 27 October 2009 (has links)
Abstract
Designing energy-efficient multiprocessing hardware for applications such as video decoding or MIMO-OFDM baseband processing is challenging because these applications require high throughput, as well as flexibility for efficient use of the processing resources. Application specific hardwired accelerator circuits are the most energy-efficient processing resources, but are inflexible by nature. Furthermore, designing an application specific circuit is expensive and time-consuming. A solution that maintains the energy-efficiency of accelerator circuits, but makes them flexible as well, is to make the accelerator circuits fine-grained.
Fine-grained application specific processing elements can be designed to implement general purpose functions that can be used in several applications and their small size makes the design and verification times reasonable. This thesis proposes an efficient method for orchestrating the use of heterogeneous fine-grained processing elements in dynamic applications without introducing tremendous orchestration overheads. Furthermore, the thesis presents a processing element management unit which performs scheduling and independent dispatching, and works with such low overheads that the use of low latency processing elements becomes worthwhile and efficient.
Dynamic orchestration of processing elements requires run-time scheduling that has to be done very fast and with as few resources as possible, for which this work proposes dividing the application into short static parts, whose schedules can be determined at system design time. This approach, often called quasi-static scheduling, captures the dynamic nature of the application, as well as minimizes the computations of run-time scheduling.
Enabling low overhead quasi-static scheduling required studying simultaneously the computational complexity and performance of simple but efficient scheduling algorithms. The requirements lead to the use of flow-shop scheduling. This thesis is the first work that adapts the flow-shop scheduling algorithms to different multiprocessor memory architectures. An extension to the flow-shop model is also presented, which enables modeling a wider scope of applications than traditional flow-shop. The feasibility of the proposed approach is demonstrated with a real multiprocessor solution that is instantiated on a field-programmable gate array.
|
5 |
Symmetric MultiProcessing for the Pintos Instructional Operating SystemChao, Lance Rolin 30 June 2017 (has links)
For the last decade, practical limitations have prevented processor speeds from increasing significantly. To increase throughput, the computing industry has turned to multiprocessing; that is, executing computations in parallel on separate processing units. Making use of these additional units requires support from the operating system (OS). Indeed, most modern operating systems do have the capability of recognizing and utilizing multiprocessor hardware.
Pintos is an instructional operating system used by many institutions to teach important operating systems concepts. Pintos aims to increase student engagement by providing challenging programming projects in which students personally implement many core functionalities of an operating system. However, prior to this work, Pintos was a uniprocessor OS. This makes it difficult for Pintos to expose students to the same synchronization challenges that most modern kernel developers face. In addition, the first structured project, aimed at teaching scheduling policies, requires students to implement an uniprocessor variant of MLFQS scheduler which is no longer used in modern systems.
We implemented Symmetric MultiProcessing (SMP) support in Pintos. We also created a new scheduling assignment to expose students to a multiprocessor proportional-share scheduling policy called Completely Fair Scheduler and to introduce them to the concept of load balancing. Finally, we evaluate the effectiveness of our new Pintos framework in augmenting students’ knowledge of OS scheduling and enhancing their ability to code and debug in a low-level environment. / Master of Science / Operating system education remains a cornerstone of any undergraduate computer science curriculum. Instructional operating systems provide the necessary infrastructure to increase student engagement by allowing students to learn through challenging, hands-on projects. We present PintOS/SMP, an instructional operating system we built.
PintOS/SMP is based on the existing PintOS operating system that has been in use at Virginia Tech and other institutions for several years. PintOS, however, was not a multiprocessor operating system, which meant that it was unable to support additional execution units provided by the underlying hardware. Thus, it lacks realism in an era in which even smartphones are delivered with multiple execution units. We added multiprocessor support to PintOS in order to introduce students to the challenges that multiprocessor systems brought to OS developers. We also developed a new programming assignment in order to expose students to the techniques used to distribute work efficiently in multiprocessor systems.
We deployed PintOS/SMP in a capstone class with 23 students and evaluated it using a survey instrument. Students reported that the projects were useful and interesting and significantly enhanced their level of understanding of operating system concepts, which we confirmed through the use of test questions. Our results indicate that PintOS/SMP provides a challenging but enjoyable learning experience and is successful in reaching its educational goals.
|
6 |
An Interprocessor Communication Link for Data MinicomputersBrett, Michael Edward 05 1900 (has links)
<p>The ACTR (Asynchronous Communications Transmitter Receiver) is a serial data transfer link for the Data General ECLIPSE and NOVA minicomputer lines. The ACTR allows the interconnection of computers in the NOVA and ECLIPSE lines into a multiprocessor system by permitting blocks ot data to be transferred through the computers' program I/O tacitities. Such a small computer multiprocessor system is a powerful, high flexible alternative to a single large computer in many applications. The major application of the ACTR is in systems where the linked processors are either far remote from one another or where the system is so configured that a master/slave environment is practical.</p>
<p>This report will deal with the theory of Operation of the hardware as well as the software control of the ACTR. A method of handling the ACTR in a multi-tasking environment under the Data General operating systems, RDOS/RTOS, will also be developed.</p> / Thesis / Master of Engineering (ME)
|
7 |
VME Based Ground Stations at Mcdonnell Douglas Aerospace Flight TestTaylor, Bruce A. 11 1900 (has links)
International Telemetering Conference Proceedings / October 30-November 02, 1995 / Riviera Hotel, Las Vegas, Nevada / The ability to dynamically configure our ground stations to support a wide array of fighter/attack aircraft programs has lead McDonnell Douglas Aerospace (MDA) to seek alternatives to commercially available ground stations. Cost effectiveness and fast response time to these widely varying needs is paramount to staying competitive in today's current defense environment. VME (Versa Modular European) architecture has provided a platform that fulfills these requirements while requiring a minimum of in house designs which can be expensive and time consuming to implement. MDA is now in its third generation of VME based ground systems. These systems are highly extensible due to their reliance on software and programmable hardware systems and are inexpensive due to their use of commercial grade VME cards. This paper describes the current generation TM/Quicklook Ground Station and the Data Editor (Preprocessor) Station and it also provides a perspective of how the designers solved some common problems associated with VME architecture. These stations are now in use at MDA test sights in St. Louis, Patuxent River NAWC, Edwards AFB, and Eglin AFB.
|
8 |
Comparison and Prediction of Temporal Hotspot MapsArnesson, Andreas, Lewenhagen, Kenneth January 2018 (has links)
Context. To aid law enforcement agencies when coordinating and planningtheir efforts to prevent crime, there is a need to investigate methods usedin such areas. With the help of crime analysis methods, law enforcementare more efficient and pro-active in their work. One analysis method istemporal hotspot maps. The temporal hotspot map is often represented asa matrix with a certain resolution such as hours and days, if the aim is toshow occurrences of hour in correlation to weekday. This thesis includes asoftware prototype that allows for the comparison, visualization and predic-tion of temporal data. Objectives. This thesis explores if multiprocessing can be utilized to im-prove execution time for the following two temporal analysis methods, Aoris-tic and Getis-Ord*. Furthermore, to what extent two temporal hotspotmaps can be compared and visualized is researched. Additionally it wasinvestigated if a naive method could be used to predict temporal hotspotmaps accurately. Lastly this thesis explores how different software packag-ing methods compare to certain aspects defined in this thesis. Methods. An experiment was performed, to answer if multiprocessingcould improve execution time of Getis-Ord* or Aoristic. To explore howhotspot maps can be compared, a case study was carried out. Another ex-periment was used to answer if a naive forecasting method can be used topredict temporal hotspot maps. Lastly a theoretical analysis was executedto extract how different packaging methods work in relation to defined as-pects. Results. For both Getis-Ord* and Aoristic, the sequential implementationsachieved the shortest execution time. The Jaccard measure calculated thesimilarity most accurately. The naive forecasting method created, provednot adequate and a more advanced method is preferred. Forecasting Swedishburglaries with three previous months produced a mean of only 12.1% over-lap between hotspots. The Python package method accumulated the highestscore of the investigated packaging methods. Conclusions. The results showed that multiprocessing, in the languagePython, is not beneficial to use for Aoristic and Getis-Ord* due to thehigh level of overhead. Further, the naive forecasting method did not provepractically useful in predicting temporal hotspot maps.
|
9 |
Asymmetric Multiprocessing Real Time Operating System on Multicore PlatformsJanuary 2014 (has links)
abstract: The need for multi-core architectural trends was realized in the desktop computing domain fairly long back. This trend is also beginning to be seen in the deeply embedded systems such as automotive and avionics industry owing to ever increasing demands in terms of sheer computational bandwidth, responsiveness, reliability and power consumption constraints. The adoption of such multi-core architectures in safety critical systems is often met with resistance owing to the overhead in migration of the existing stable code base to the new system setup, typically requiring extensive re-design. This also brings about the need for exhaustive testing and validation that goes hand in hand with such a migration, especially in safety critical real-time systems.
This project highlights the steps to develop an asymmetric multiprocessing variant of Micrium µC/OS-II real-time operating system suited for a multi-core system. This RTOS variant also supports multi-core synchronization, shared memory management and multi-core messaging queues.
Since such specialized embedded systems are usually developed by system designers focused more so on the functionality than on the coding standards, the adoption of automatic production code generation tools, such as SIMULINK's Embedded Coder, is increasingly becoming the industry norm. Such tools are capable of producing robust, industry compliant code with very little roll out time. This project documents the process of extending SIMULINK's automatic code generation tool for the AMP variant of µC/OS-II on Freescale's MPC5675K, dual-core Microcontroller Unit. This includes code generation from task based models and multi-rate models. Apart from this, it also de-scribes the development of additional software tools to allow semantically consistent communication between task on the same kernel and those across the kernels. / Dissertation/Thesis / Masters Thesis Computer Science 2014
|
10 |
Data Race Detection for Parallel Programs Using a Virtual PlatformHaverås, Daniel January 2018 (has links)
Data races are highly destructive bugs found in concurrent programs. Because of unordered thread interleavings, data races can randomly appear and disappear during the debugging process which makes them difficult to find and reproduce. A data race exists when multiple threads or processes concurrently access a shared memory address, with at least one of the accesses being a write. Such a scenario can cause data corruption, memory leaks, crashes, or incorrect execution. It is therefore important that data races are absent from production software. This thesis explores dynamic data race detection in programs running on Ericsson’s System Virtualization Platform (SVP), a SystemC/TLM-2.0-based virtual platform used for running software on simulated hardware. SVP is a bit-accurate simulator of Ericsson Many-Core Architecture (EMCA) hardware, enabling software and hardware to be developed in parallel, as well as providing unique insight into software execution. This latter property of SVP has been utilized to implement SVPracer, a proof-of-concept dynamic data race detector. SVPracer is based on a happens-before algorithm similar to Google’s ThreadSanitizer v2, but is significantly different in implementation as it relies entirely on instrumenting binary code during runtime without requiring code modification during build time. A set of test programs exhibiting various data races were written and compiled for EOS, the operating system (OS) running on EMCA Digital Signal Processors (DSPs). Similar programs were created for Linux using POSIX APIs, to compare SVPracer against ThreadSanitizer v2. Both SVPracer and ThreadSanitizer v2 correctly detect the data races present in the respective test programs. Further work must be done in SVPracer to eliminate some false positive results, caused by missing support for some OS functionality such as semaphores. Still, the present state of SVPracer is sufficient proof that dynamic data race detection is possible using a virtual platform. Future work could involve exploring other data race detection algorithms as well as implementing deadlock/livelock detection in virtual platforms. / Datakapplöpning är en mycket destruktiv typ av bugg i samtidig programvara. På grund av icke-ordnad sammanvävning av trådar kan datakapplöpning slumpmässigt dyka upp och försvinna under avlusning (debugging), vilket gör dem svåra att hitta och återskapa. Datakapplöpning existerar när flera trådar eller processer samtidigt accessar en delad minnesaddress och minst en av accesserna är en skrivning. Ett sådant scenario kan orsaka datakorruption, minnesläckor, krascher eller felaktig exekvering. Det är därför viktigt att datakapplöpning inte finns med i programvara för slutlig release. Det här examensarbetet utforskar dynamisk detektion av datakapplöpning i program som körs på Ericssons System Virtualization Platform (SVP), en SystemC/TLM-2.0baserad virtuell platform som används för att köra program på simulerad hårdvara. SVP är en bit-exakt simulator för hårdvara av typen Ericsson Many-Core Architecture (EMCA), vilket möjliggör parallell utveckling av hårdvara och programvara samt unik inblick i programvaruexekvering. Den senare egenskapen hos SVP har använts för att implementera SVPracer, en konceptvalidering av dynamisk detektion av datakapplöpning. SVPracer baseras på en algoritm av typen happens-before, som liknar den i Googles ThreadSanitizer v2. Stora skillnader finns dock i SVPracers implementation eftersom den instrumenterar binärkod under körning, utan att behöva modifiera koden under kompilering. Ett antal testprogram med olika typer av datakapplöpning skapades för (EOS), ett operativsystem som körs på EMCAs signalprocessorer (DSP). Motsvarande program skrevs för Linux med POSIX-APIer, för att kunna jämföra SVPracer med ThreadSanitizer v2. Både SVPracer och ThreadSanitizer v2 upptäckte datakapplöpningarna i samtliga testprogram. SVPracer kräver vidare arbete för att eliminera några falska positiva resultat orsakade av saknat stöd för vissa OS-funktioner, exempelvis semaforer. Trots det bedöms SVPracers nuvarande prestanda som tillräckligt bevis för att virtuella plattformar kan användas för detektion av datakapplöpning. Framtida arbete skulle kunna involvera utforskning av andra detektionsalgoritmer samt detektion av baklås.
|
Page generated in 0.1 seconds