CS194: Engineering Parallel Software
Kurt Keutzer and Tim Mattson
Fall 2013
From cell phones to cloud computing, parallel processors are the computing platform of the future. This course will enable students to design, implement, optimize, and verify programs to run on parallel processors. Our approach to this course reflects our view that a well designed software architecture is a key to designing parallel software, and a key to software architecture is design patterns and a pattern language. Our course will use this pattern language as the basis for describing how to design, implement, verify, and optimize parallel programs. Following this approach we will introduce each of the major patterns that are used in developing a high-level architecture of a program. These eight structural and thirteen computational patterns may be found at: http://parlab.eecs.berkeley.edu/wiki/patterns/patterns.
We also allow that writing efficient parallel programs requires insights into the hardware architecture of contemporary parallel processors as well as an understanding as to how to write efficient code in general. As a result a significant amount of time in the course will be spent on these topics as well.
Other lectures and laboratories of the course will focus on implementation using contemporary parallel programming languages, verification of parallel software using invariants and testing, and performance tuning and optimization.
Course Work and Grading
The course consists of twice-weekly lectures and once-weekly lab sessions. For the first two thirds of the course, there will be a series of programming assignments. There will be two take-home examinations during the first two thirds of the course.
Course Projects
The final third of the course will be an open-ended course project. Projects using quad-core cell phones will be among the acceptable platforms. Students will create their own projects in project teams of 4-6 students. Course Staff
Professor: Kurt Keutzer
Guest Lecturer: Tim Mattson, Intel
TAs: Patrick Li
Recommended Course Textbook
Patterns for Parallel Programming, T. Mattson, B. Sanders, B. Massingill, Addison Wesley, 2005. (This text is being revised with this course in mind.)
Week | Date | What | Topic |
Week 1 | Tues 8/27 |
No class | |
Wed 8/28 |
No class | ||
Thurs 8/29 |
Lecture 1 | First Lecture: Intro, Background, Course Objectives and Course Projects Video Games –Keutzer |
|
Fri 8/30 |
No class | ||
Week 2 | Tues 9/3 |
Lecture 2 | A programers introduction to parallel computing: Amdahl’s law, Concurrency vs. Parallelism, and the jargon of parallel computing. Getting started with OpenMP and Pthreads. –Mattson |
Wed 9/4 |
Discussion 1 | Intro to the Lab Environment. Assign Intro_1: Matrix multiplication with OpenMP and pthreads |
|
Thurs 9/5 |
Lecture 3 | Parallel programing on shared memory computers: complete the introduction to OpenMP and pthreads. Along the way address granularity, parallel overhead, load balancing, and Weak vs. strong scaling. –Mattson |
|
Fri 9/6 |
No class | ||
Week 3 | Tues 9/10 |
Lecture 4 | Shared Memory Concurrency Issues: Races, Livelock, Deadlock, Dining Philosophers –Mattson |
Wed 9/11 |
Discussion 2 | C++ for Java/C Programmers; Working with OpenMP and Pthreads. | |
Thurs 9/12 |
Lecture 5 | Software Architecture Overview: Overview of Computational and Structural Patterns –Keutzer |
|
Fri 9/13 |
Intro_1 due | ||
Week 4 | Tues 9/17 |
Lecture 6 | Sequential Processor Performance: Notions of performance: Insufficiency of Big-O, Matrix-Multiply Example; Pipelining, Superscalar, etc.; Compiler Optimizations; Processor “Speed of Light” –Keutzer/Allen |
Wed 9/18 |
Discussion 3 | pthreads examples. Assign Intro2 | |
Thurs 9/19 |
Lecture 7 | Memory System Performance: Caches, Cache Hierarchies, benchmarking; Optimizing Matrix Multiplication –Keutzer |
|
Fri 9/20 |
|||
Week 5 | Tues 9/24 |
Lecture 8 | Optimizing Matrix Multiply, Introduction to Parallel Processor Architectures: Multi-Core, Cache Coherence, Memory Consistency; SIMD / SIMT; Vectors; NUMA–Mattson |
Wed 9/25 |
Discussion 4 | Interactive session exploring performance issues in pthreads and openMP. They’ve been thinking about cache coherence … let’s use the discussion section to bring up a discussion of vectorization. | |
Thurs 9/26 |
Lecture 9 | Data Parallelism–Mattson | |
Fri 9/27 |
intro_2 due | ||
Week 6 | Tues 10/1 |
Lecture 10 | Introduction to CUDA –Li |
Wed 10/2 |
Discussion 5 | Assign MP1: Discuss the CUDA and OpenCL environments in the lab. Help students where appropriate install them on their own laptops.Project teams finalized. | |
Thurs 10/3 |
Lecture 11 | CUDA and OpenCL continued
|
|
Week 7 | Tues 10/8 |
Lecture 12 | The Roofline Model –Keutzer |
Wed 10/9 |
Discussion 6 | CUDA and data parallel programming. | |
Thurs 10/10 |
Lecture 13 | Distributed Memory Systems, Supercomputing, and MPI –Mattson |
|
Sun | MP1 due | ||
Week 8 | Tues 10/15 |
Midterm review | Announce final project details and review for midterm |
Wed 10/16 |
Discussion 7 | Midterm review/Project proposals due | |
Thurs 10/17 |
MIDTERM | ||
Week 9 | Tues 10/22 |
Lecture 14 | Design patterns, pattern languages, PLPP overview –not Kurt Keutzer |
Wed 10/23 |
Discussion 8 | Discuss some exemplar projects from the past. | |
Thurs 10/24 |
Lecture 15 | PLPP algorithm structure and supporting structures –Keutzer |
|
Week 10 | Tues 10/29 |
Lecture 16 | Structural patterns and software architecture –Keutzer |
Wed 10/30 |
Discussion 9 | Project meetings: show up with evidence of work! Q and A session on GPGPU programming | |
Thurs 10/31 |
Lecture 17 | Graph algorithms, dynamic programming, and speech recognition –Keutzer |
|
Fri 11/1 |
MP2 due | ||
Week 11 | Tues 11/5 |
Lecture 18 | Speech – part2 –Keutzer,Li |
Wed 11/6 |
Discussion 10 | Project meetings: show up with evidence of work! | |
Thurs 11/7 |
Lecture 19 | Sparse linear algebra and image contour detection –Li |
|
Week 12 | Tues 11/12 |
Lecture 20 | Principle component analysis and 3D reconstruction –Li |
Wed 11/13 |
Discussion 11 | Project meetings: show up with evidence of work! | |
Thurs 11/14 |
Lecture 21 | Object recognition –Li |
|
Week 13 | Tues 11/19 |
Lecture 22 | Optimization patterns –Li |
Wed 11/20 |
Discussion 12 | Project meetings: show up with evidence of work! | |
Thurs 11/21 |
Lecture 23 | Future of parallel computing –Keutzer |
|
Week 14 | Tues 11/26 |
Lecture 24 | Use class time to talk about projects |
Week 15 | Tues 12/3 |
Lecture 25 | Project presentations |
Wed 12/4 |
Discussion 13 | Final exam review | |
Thurs 12/5 |
Lecture 26 | Project presentations | |
Final Exam: |