Spelling suggestions: "subject:"extended instruction set"" "subject:"extendedly instruction set""
1 |
Implementation of face detection algorithm with parallel extended-MMX instruction setTzeng, Hua-Yi 20 August 2008 (has links)
Face detection has many applications in technical area. We think about accuracy and regular arrangement of data of face detection. So, we select Recognition algorithms using neural network for implementation. The implementation method can be divided into three parts. One is Modified Census Transform. The other one is computing hypotheses. Other is square frame for mark face. Modified Census Transform is a regularly computing method and regular arrangement of data. Modified Census Transform is compatible using SIMD execution, but other parts is irregular arrangement of data and not easy to parallel execution. This paper uses SIMD processor architecture which develops in our laboratory to implementation of Modified Census Transform and multi-data streaming property. The picture is divided four parts to execute at the same time and changes different mode to execute according to different algorithm then fetch data is smooth and moving data can reduce frequency. Adding a new instruction that uses 16bits data format uses four MMX registers for 4¡Ñ4 transpose of the matrix. The other is loading data and extending signed bit or unsigned bit at the same time. They can accelerate parallel execution in multi-data streaming. We also support multi-data streaming that is not series. It uses striping mode to fetch multi-data which between the same distance then we can achieve to compute multi-data streaming. Besides, we use hypotheses to distinguish different person that we only want find one. We compare two hypotheses. If the difference in hypotheses between two different picture that there is small than 0.3%, they are the same person which in different picture. Finial, we verify the function is correct in UMVP-2500 platform. We compare efficiency with MMX and Xscale and analysis multi-data streaming SIMD architecture which has some benefits. We compare efficiency with MMX. We speed up 373%. We compare efficiency with Xscale. We speed up 345%. This result will show that multi-data streaming SIMD architecture compares speed up with others SIMD architecture. Multi-data streaming SIMD architecture adds a new instruction which is 4¡Ñ4 transpose of the matrix. Because the 4¡Ñ4 transpose of the matrix can change row and column, we have new abstraction. The common computation likes a line, but the new abstraction becomes a phase. MMX and Xscale are not this abstraction.
|
Page generated in 0.1166 seconds