For a change of pace, here is a "What I Did This Summer" style post. This summer I worked at the "MIT Haystack":http://www.haystack.mit.edu/ performing research on updating an existing Digital Signal Processing (DSP) system. The topic was "GPU Based Polyphase Filter Banks for VLBI", which is fairly terse to most. This means that I was responsible for designing a means of processing radio-frequency time data resulting in frequency channel strength with respect to time. This was all done on a Nvidia graphics card. This project was funded by the Research Experience for the Undergraduate (REU) program by the National Science Foundation (NSF). <!-- :truncate: -→
Compute Unified Device Architecture (CUDA)
CUDA provides a framework for developing code to run on Nvidia graphics cards. This framework among other things have made me a fairly big Nvidia fan, as my experience has shown that they write good quality software. When executing on the graphics cards, C++ routines are defined in units called `kernels', which in parallel execute a block of code, with differing environments for determining the allocation of work. This seems simple enough at first, which it may be, but the devil is in the details. While working on this project a large amount of work was spent looking into the memory models presented, in order to minimize overhead from memory accesses and to allow a maximum of instructions to get shoved down the pipe.
In the end the project was instruction limited using one of the Tesla graphics cards. The total throughput was just shy of 900 Mega-Samples/second. As this was the first time I had worked with CUDA, this was a success in my book.
Other REU projects implemented at the same time observed ~50x speed increases over nonthreaded CPU processing, with my project appearing to be in the same ballpark based upon figures cited for normal nonthreaded PFB processing.
What is a PFB?
A PFB is a FFT channelizing system that uses a multirate FIR filter to improve channel separation. As per the exact mathmatics, I am still not entirely sure how the extra filtering improves the channel separation over a basic FFT channelizer, but it does and looking at IEEE, there are some papers that deal with the specifics.
This has peaked my interest in FFTs and their implementation. In particular, I now hope to write a moderately efficient FFT implementation for simple powers of 2 cases. With these cases, it seems to be a fairly simple (computationally and implementation wise) problem, but in order to find out the best course of action will be to implement it. As to make use of available resources, reading through the fftw sources will likely be done.
For more information along these lines, you could read: * "A Comparison of FFT and Polyphase Channelizers" * "A Polyphase Filter For GPUs And Multi-Core Processors" * "Multirate digital filters, filter banks, polyphase networks, and applications: a tutorial - Proceedings of the IEEE"
h2. EDIT: Results * "Final Report":http://fundamental-code.com/log/mit-research/findings.pdf * "Final Presentation":http://fundamental-code.com/log/mit-research/pres.pdf * "Git Repository":http://fundamental-code.com/gitweb/?p=gpfb.git;a=summary