Friday, April 11, 2008

GPGPU - next gen performance

Graphics Processor Unit processing is the way of the future.  Basically the main reason AMD bought ATI was to get at this technology.  Intel is producing their next generation CPU architecture in combination with a GPU.  The speed improvements are dramatic - from 2x-30x differences in calculation performance.

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two biological sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. This paper by Svetlin Manavski and Giorgio Valle describes SmithWaterman-CUDA, an open-source project to perform fast sequence alignment on the GPU. Although the software performs the optimal Smith-Waterman alignment it is faster than heuristics approaches like FASTA and BLAST. The tests on protein data banks show up to 30x speed up related to reference CPU implementations. (Svetlin A. Manavski, Giorgio Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(Suppl 2):S10 (26 March 2008))

GPGPU

Finally, some food for thought: The GPU is becoming so powerful that companies like nVidia are pitching them as GPGPUs and selling HPC (high performance computing) products that provide massive amounts of power (128 processors, massively parallel) in a little box. So, imagine that we took this same concept a step further and implemented an entire library outside of WPF that allowed you to leverage those kinds of platforms for general programming. Just like DLINQ where the expression is translated to SQL and remove over to your DB server for processing, we could translate and remote over to one of these boxes and execute it in a nanosecond

http://blog.hackedbrain.com/archive/2008/02/19/6141.aspx

Where technology like this would really shine is when programs have been optimized for parallel computing. 

PC Perspective recently had an interview with John Carmack, one of the 2 Johns responsible for creating Wolfenstein & Doom.  He is the one person who would probably take this technology and demonstrate its full potential.

http://www.pcper.com/article.php?aid=532

That is my big take away message for a lot of people about the upcoming generation of general purpose computation on GPUs; a lot of people don’t seem to really appreciate how the vertex fragment rasterization approach to computer graphics has been unquestionably the most successful multi-processing solution ever.  If you look back over 40 years of research and what people have done on trying to use multiple processors to solve problems, the fact that we can do so much so easily with the vertex fragment model, it’s a real testament to its value.  A lot of people just think “oh of course I want more flexibility I’d love to have multiple CPUs doing all these different things” and there’s a lot of people that don’t really appreciate what the suffering is going to be like as we move through that; and that’s certainly going on right now as software tries to move things over, and it’s not “oh just thread your application”.  Anyone that says that is basically an idiot, not appreciating the problems.  There are depths of subtly to all of this where it’s been an ivory tower research project since the very beginning and it’s by no means solved.