Spelling suggestions: "subject:"multidata streaming"" "subject:"multipedata streaming""
1 |
The implementation of H.264 algorithm with parallel extended MMX instruction setShen, Cheng-Ying 20 August 2008 (has links)
The H.264 Protocol is an important method for the multimedia transmission and calculation, but it is difficult to work smoothly on the embedded systems because of the low clock in the working environment of the embedded system .Although many new multimedia instruction sets have been developed, the immediate multimedia calculation is still difficult to implement on the embedded system.
So this paper uses the ¡§Multimedia Operation Register¡¨, a SIMD architecture, to implement H.264 algorithm on the embedded system to improve the performance of handling multimedia calculation. Multimedia Operation Register, which performs the parallel execution of the multi-data-streaming, uses the bit slice concept to design operation pair combining bit storage cell and bit computation. According to the characteristic , which is the address having constant distance between more than two data being used saved in the Memory, this paper using the striping addressing mode , which can cooperate with the parallel execution of multi-data-streaming , to load the data having strode addresses from the Memory in one instructions. On the other hand, this paper designs a new instruction set based on the Intel MMX instruction set and the operation feature of multimedia calculation.
When a designer uses single-data-steaming to implement the H.264 Protocol by the multimedia instruction sets, he will use more interactions to do the same thing in every block. Now this paper can use fewer interactions to do the same thing because the Multimedia Operation Register can use the parallel execution of the multi-data-stream to calculate the data in many different blocks to implement H.264 Protocol at the same time. On the other hand, this paper can reallocate the number of the registers to the arithmetic unit which will be used smartly by changing the working mode. This paper also saves much execution time of some actions such as the transpose of the matrix, the data resorting and the SAD (Sum of Absolute Differences) calculation by using new instructions. In order to reduce the times of memory access, this paper uses the method which rotates the data between two registers to let the data been used as possible as it can. So the coding efficiency can be improved explosively by using all the methods which have been introduced.
The conclusion in this paper shows that the parallel execution of the multi-data-streaming will be a very important method to handle multimedia calculation. And this paper advances an innovative architecture to implement the parallel execution of the multi-data- streaming. According to the simulation in 5th chapter, the speedup of handling H.264 Protocol by Multimedia Operation Register is more than four times with MMX instruction set. In the SAD calculation, it even can have ten times advanced then MMX instruction set. At last the efficacy is even better than the latest multimedia instruction set -¡§SSE4¡¨.
|
2 |
Implementation of face detection algorithm with parallel extended-MMX instruction setTzeng, Hua-Yi 20 August 2008 (has links)
Face detection has many applications in technical area. We think about accuracy and regular arrangement of data of face detection. So, we select Recognition algorithms using neural network for implementation. The implementation method can be divided into three parts. One is Modified Census Transform. The other one is computing hypotheses. Other is square frame for mark face. Modified Census Transform is a regularly computing method and regular arrangement of data. Modified Census Transform is compatible using SIMD execution, but other parts is irregular arrangement of data and not easy to parallel execution. This paper uses SIMD processor architecture which develops in our laboratory to implementation of Modified Census Transform and multi-data streaming property. The picture is divided four parts to execute at the same time and changes different mode to execute according to different algorithm then fetch data is smooth and moving data can reduce frequency. Adding a new instruction that uses 16bits data format uses four MMX registers for 4¡Ñ4 transpose of the matrix. The other is loading data and extending signed bit or unsigned bit at the same time. They can accelerate parallel execution in multi-data streaming. We also support multi-data streaming that is not series. It uses striping mode to fetch multi-data which between the same distance then we can achieve to compute multi-data streaming. Besides, we use hypotheses to distinguish different person that we only want find one. We compare two hypotheses. If the difference in hypotheses between two different picture that there is small than 0.3%, they are the same person which in different picture. Finial, we verify the function is correct in UMVP-2500 platform. We compare efficiency with MMX and Xscale and analysis multi-data streaming SIMD architecture which has some benefits. We compare efficiency with MMX. We speed up 373%. We compare efficiency with Xscale. We speed up 345%. This result will show that multi-data streaming SIMD architecture compares speed up with others SIMD architecture. Multi-data streaming SIMD architecture adds a new instruction which is 4¡Ñ4 transpose of the matrix. Because the 4¡Ñ4 transpose of the matrix can change row and column, we have new abstraction. The common computation likes a line, but the new abstraction becomes a phase. MMX and Xscale are not this abstraction.
|
Page generated in 0.0575 seconds