Abstract
Designing energy-efficient multiprocessing hardware for applications such as video decoding or MIMO-OFDM baseband processing is challenging because these applications require high throughput, as well as flexibility for efficient use of the processing resources. Application specific hardwired accelerator circuits are the most energy-efficient processing resources, but are inflexible by nature. Furthermore, designing an application specific circuit is expensive and time-consuming. A solution that maintains the energy-efficiency of accelerator circuits, but makes them flexible as well, is to make the accelerator circuits fine-grained.
Fine-grained application specific processing elements can be designed to implement general purpose functions that can be used in several applications and their small size makes the design and verification times reasonable. This thesis proposes an efficient method for orchestrating the use of heterogeneous fine-grained processing elements in dynamic applications without introducing tremendous orchestration overheads. Furthermore, the thesis presents a processing element management unit which performs scheduling and independent dispatching, and works with such low overheads that the use of low latency processing elements becomes worthwhile and efficient.
Dynamic orchestration of processing elements requires run-time scheduling that has to be done very fast and with as few resources as possible, for which this work proposes dividing the application into short static parts, whose schedules can be determined at system design time. This approach, often called quasi-static scheduling, captures the dynamic nature of the application, as well as minimizes the computations of run-time scheduling.
Enabling low overhead quasi-static scheduling required studying simultaneously the computational complexity and performance of simple but efficient scheduling algorithms. The requirements lead to the use of flow-shop scheduling. This thesis is the first work that adapts the flow-shop scheduling algorithms to different multiprocessor memory architectures. An extension to the flow-shop model is also presented, which enables modeling a wider scope of applications than traditional flow-shop. The feasibility of the proposed approach is demonstrated with a real multiprocessor solution that is instantiated on a field-programmable gate array.
Identifer | oai:union.ndltd.org:oulo.fi/oai:oulu.fi:isbn978-951-42-9272-9 |
Date | 27 October 2009 |
Creators | Boutellier, J. (Jani) |
Publisher | University of Oulu |
Source Sets | University of Oulu |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/doctoralThesis, info:eu-repo/semantics/publishedVersion |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess, © University of Oulu, 2009 |
Relation | info:eu-repo/semantics/altIdentifier/pissn/0355-3213, info:eu-repo/semantics/altIdentifier/eissn/1796-2226 |
Page generated in 0.002 seconds