Spelling suggestions: "subject:"last fourier transform - algorithms"" "subject:"last fourier transform - a.lgorithms""
1 |
Computational Study of Stokesian Suspensions using Particle Mesh Ewald SummationMenon, Udayshankar K January 2015 (has links) (PDF)
We consider fast computation methods for simulation of dynamics of a collection of particles dispersed in an unbounded Stokesian suspension. Stokesian suspensions are of great practical interest in the manufacturing and processing of various commercial products. The most popular dynamic simulation method for these kind of suspensions was developed by Brady and Bossis (Brady and Bossis [1988]). This method uses a truncated multipole expansion to represent the fluid traction on particle surfaces. The hydrodynamic interactions in Stoke-sian suspension are long ranged in nature, resulting in strong coupled motion of all particles. For an N particle system, this method imposes an O(N3) computational cost, thus posing limitations to the number of particles that may be simulated. More recent methods (Sierou and Brady [2001], Scintilla, Darve and Shaqfeh [2005]) have attempted to solve this problem using Particle Mesh Ewald summation techniques by distributing the moments on a grid and using Fast Fourier Transform algorithms, resulting in an O(N log N) computational cost. We review these methods and propose a version that we believe is some-what superior. In the course of this study, we have identified and corrected errors in previous studies that maybe of some importance in determining the bulk properties of suspensions. Finally, we show the utility of our method in determining certain properties of suspensions and compare them to existing analytical results for the same.
|
2 |
ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT ProcessorKala, S 12 1900 (has links) (PDF)
The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.
|
Page generated in 0.0674 seconds