Return to search

Image Processing On Reconfigurable System-on-Chip

Real-time image processing requires not only sophisticated heuristic algorithms customized for a particular application, but also needs substantial computational power to handle a massive quantity of input image data. Reconfigurable System-on- Chip (rSoC), a powerful method to harness the power of FPGA technology, is well suited to real-time image processing. It balances the design cost and performance via a combination of hardware and software. However, hardware/software co-design requires specialized design skills, and designs are complex. This thesis investigates how best to use FPGA-based reconfigurable computing to provide efficient speed-up of real-time image processing algorithms. Existing rSoC systems, face detection and recognition algorithms, hardware/software co-design methods are first reviewed and analyzed. The advantages and disadvantages of existing research results are also presented. However, these existing approaches all have shortcomings. A new rSoC system without a separate host machine is presented for standalone embedded platforms. A new hardware/software co-design method including hardware/software communication and partitioning is also explained. This rSoC system is a highly modular system, it runs without a host machine and it supports the Linux operating systems. Hardware and software designs can be rapidly implemented on this new platform. A new method for hardware/software communication in rSoC design is presented, which is based on shared memory and semaphores, and makes hardware coprocessors appear like software processes. Individual processes in hardware-software systems can communicate without knowing whether other co-operating processes are hardware or software. This approach enables re-useable hardware components to be readily accessed by designers, without specialist hardware knowledge. Processes also can be easily swapped between hardware and software. The partitioning method handles the software/hardware partition iteratively during the implementation. The partition is based on experimental profiling, so it is easier to realize and may achieve a more optimal result than a fixed a priori partition. An example face recognition system has been implemented to test the new design method. It is a four-stage pipeline architecture which contains image capture, face detection, image enhancement, and face recognition. Firstly, a software-only solution using semaphores and shared memory method is implemented on a Linux PC. Results of 5.5 frames per second indicate that the speed may not be fast enough for real-time image processing. Secondly, that software-only solution is moved to the new rSoC platform. The performance of 0.1 frames per second is worse than PC platform since the PC’s CPU is much more powerful than the rSoC’s. Finally the new design method is used to move some bottleneck modules to hardware. The new hardware/software communication method is used, so software modules remain unchanged and unaware of the movement of other modules to hardware. Results show that moving only one module to hardware was not helpful. However when both the bottleneck modules were moved to hardware, the system speedup was approximately 200 with a final system speed of 19 frames per second.

Identiferoai:union.ndltd.org:ADTP/252893
CreatorsHan, Jie
Source SetsAustraliasian Digital Theses Program
Detected LanguageEnglish

Page generated in 0.0019 seconds