Learning CUDA and the fundamentals of parallel programming

After teaching myself CUDA, luckily for me I started on the more recent SDK’s, I wanted to share my experience regarding some of the most useful resources available, to help anyone interested in accelerating their own algorithms, to be as productive as possible.

In my opinion the fastest way to learn GPU programming with CUDA, and/or find out if it is useful to you, is to start writing CUDA kernel’s asap. Unfortunately, the prerequisite of an nvidia CUDA capable GPU (at least since SDK v3.0) can put up the first barrier to success. Even if a compatible GPU is present; it may be old and not support all the latest features; it may be slow, CUDA kernel’s written on a low-end GPU are unlikely to execute any faster than on a low-end CPU; you need to download and install the CUDA toolkit (currently 1.2GB on windows); and all just to get started writing your first kernel.

Taking the above into consideration is the main reason why I would recommend signing up for the free Udacity course, Intro to Parallel Programming. The title is a little misleading, whilst the material presented concerns the fundamentals of parallel programming, I think a better title could have been Intro to Parallel Programming with CUDA C\C++, as the presented material and problem sets are all taught from the perspective of the CUDA programming model, with the course assignments all given in C\C++. Even if you are looking to use CUDA FORTRAN, the material presented, except the assignments, is language-independent.

Intro to Parallel Programming is well put together, gently introducing some of the fundamental parallel programming patterns, motivating each pattern with examples and a real world image processing assignment. That is why I really enjoyed the course, however the best part, and the main reason for the recommendation, is that all the assignments can be completed on the cloud using the Udacity web interface, which compiles, runs, and times the execution of your code on high end GPU’s. This means you can write and execute CUDA kernel’s from any machine with a web browser! You can also download the complete problem sets to run on your local machine as well, which I would definitely recommend, if the required hardware is available.

Although the Intro to Parallel Programming is mostly self-contained with very little need for external material, there are some excellent online resources, to supplement what is given. One of which is Modern GPU, which is exactly as described:

“…code and commentary intended to promote new and productive ways of thinking about GPU computing.
This project is a library, an algorithms book, a tutorial, and a best-practices guide.”

Essentially Modern GPU includes a lot of material covering parallel programming patterns and hence is very relevant to the Intro to Parallel Programming course. In addition to this there is of course, the official CUDA programming guide, but to get started with the CUDA and begin to think in parallel the material presented by Udacity and the Modern GPU website should be sufficient.

If you prefer offline resources, then there are many good books on CUDA, all of which cover the same material to a greater or lesser extent. Of the few I have personally read, the ones I find useful are:

Of these my current favorite is Professional CUDA C Programming. I really like the approach of this book, starting with a naïve implementation of the reduction algorithm and optimizing it in each chapter using the new feature introduced therein. Whilst this is well trodden ground, the same example can be seen here and in the Udacity course itself, the book presents a good narrative, introducing the CUDA profiler nvprof early to justify and examine the changes proposed, making it altogether a gentle introduction to something quite alien, if you are used to serial programming on a CPU. Additionally, this is one of the newer books on CUDA, with more up to date chapters on CUDA streams, and multi-GPU programming.

If I had to recommend a second book it would be The CUDA Handbook. This is slightly older, and some may find a little too much like a manual or the official programming guide, however it is perfect as a reference text, discussing topics such as the CUDA runtime and driver API, nvcc, ptx and microcode, at the same time as including a great chapter with diagrams on the implementation of parallel scan.

There is also a 3rd Edition of Programming Massively Parallel Processors which I have not yet read, but because of the solid content in the 2nd edition this may replace Professional CUDA C Programming as my favorite.

5 thoughts on “Learning CUDA and the fundamentals of parallel programming

  1. Hi James,
    Thanks for sharing your experience. I am also learning CUDA from the Udacity course and the Professional CUDA C Programming book.
    Please, the link for Modern GPU doesn’t seem to be working.

    1. Hi, I have corrected the link to Modern GPU. If you are following the Udacity course you will want to stick the with Modern GPU 1.0 notes. There are a lot of links from there that will send you to Modern GPU 2.0.

  2. Hi, I’ve been reading your blog with interest and really appreciate the valuable tips you have. Can I ask, what did you make of the 3rd Edition of Programming Massively Parallel Processors? Thank you.

    1. Hi, honestly it has been a while since I read that book. All I can say is that at the time I was really liked that it had sections on parallel patterns and application case studies.

Leave a Reply

Your email address will not be published.