Fall 2014 CS194 Engineering Parallel Software

Kurt Keutzer, EECS, University of California, Berkeley

Tim Mattson, Intel Research

Fall 2014

As the basic computing device ranging single cell phones to racks of hardware in cloud computing, parallel processors are emerging as the pervasive computing platform of our time. This course will enable advanced undergraduate students to design, implement, optimize, and verify programs to run on present generations of parallel processors.

There are four principal themes that are pursued in this course:

Software engineering
Performance Programming
Programming in Parallel Languages
Course project

Contents

1 Software Engineering and Software Architecture
2 Performance Programming
3 Programming in Parallel Languages
4 Course Projects
5 Prerequisites
6 Course Work and Grading
7 Course Staff
8 Recommended Course Textbook
9 Course Assignments Will Be Selected From Among this List

Software Engineering and Software Architecture

Our approach to this course reflects our view that a well-designed software architecture is a key to designing parallel software, and a key to software architecture is design patterns and a pattern language. Our course will use Our Pattern Language as the basis for describing how to design, implement, verify, and optimize parallel programs. Following this approach we will introduce each of the major patterns that are used in developing a high-level architecture of a program. Descriptions of these ten structural and thirteen computational patterns, together with other readings, may be found at: https://patterns.eecs.berkeley.edu/.

Performance Programming

Writing efficient parallel programs requires insights into the hardware architecture of contemporary parallel processors as well as an understanding as to how to write efficient code in general. As a result a significant amount of time in the course will be spent on looking “under the hood” of contemporary sequential and multiprocessors and identifying the key architectural details, such as non-uniform memory architecture (NUMA), that are necessary to write high performance code.

Programming in Parallel Languages

Other lectures and laboratories of the course will focus on implementation using contemporary parallel programming languages, verification of parallel software using invariants and testing, and performance tuning and optimization. Particular languages covered typically include OpenMP, MPI, and OpenCL.

Course Projects

The final third of the course will be an open-ended course project. These projects allow students to demonstrate their mastery of the course concepts mentioned above. Students will create their own projects in project teams of 4-6 students.

Prerequisites

Students should have taken, the following or equivalents:

Basic programming course using Java, C or C++
Undergraduate course on computer organization
Linear algebra

It is recommended that students have taken:

At least one upper division course that includes significant programming assignments (e.g. Compilers, Operating Systems, or Software Engineering)

Course Work and Grading

The course consists of twice-weekly lectures and once-weekly lab sessions. For the first two thirds of the course, there will be a series of programming assignments. There will be two examinations during the course.

Course Staff

Professor: Kurt Keutzer

Guest Lecturer: Tim Mattson, Intel

TA’s: Paden Tomasello; Peter Jin

Recommended Course Textbook

Patterns for Parallel Programming, T. Mattson, B. Sanders, B. Massingill, Addison Wesley, 2005.

Course Assignments Will Be Selected From Among this List

Computer Architecture – Measure L1/L2/L3 bandwidth and latency on our lab machines. Also, investigate measured ILP for a handful of different SGEMM implementations. Performance in MFlops/s increases, but ILP drops. Also serves as a warmup / refresher for the small subset of C++ we use for the lab assignments. Follows the material from lecture 3 (sequential processor performance)
Parallel Matrix Multiply (DGEMM) – Write naive parallel DGEMM using OMP for loops, OMP tasks, and pthreads. Serves as a simple warm-up for the basic threading libraries. Advanced question on how GCC converts code with OpenMP pragmas into parallel code. Follows the material from lecture 2/4 (parallel programming on shared memory computers)
Optimize Matrix Multiply (DGEMM) – Optimize the naive parallel matrix multiply for both locality and data parallelism (using SSE2). Students get familiar with SSE2 intrinsics if they want to use them for their final projects. Follows the material from lecture 6/8 (memory subsystem performance)
Introduction to OpenCL – Students write both VVADD and SGEMM in OpenCL. They will write the kernels. Follows lecture 9 / 10. (Data parallelism and CUDA).
OpenCL + OpenGL – Students perform a handful of simple graphics operations on an image. Follows lecture 9 / 10. (Data parallelism and CUDA).
Advanced OpenCL – Students write a reduction routine using the ideas presented in class. They also write array compaction using scan. Follows lecture 9 / 10. (Data parallelism and CUDA).

Syllabus, Fall 2014: Classes are at 2:00 – 3:30PM PDT/PST

Week	Date	What	Topic
Week 1	Tuesday 8/26	No class
Week 1	Thursday 8/28	Lecture 1	First Lecture: Intro, Background, Course Objectives and Course Projects –Keutzer

Week 2	Tuesday 9/2	Lecture 2	A programmer’s introduction to parallel computing: Amdahl’s law, Concurrency vs. Parallelism, and the jargon of parallel computing. Getting started with OpenMP and Pthreads. –Mattson
Week 2	Thursday 9/4	Lecture 3	Sequential Processor Performance: Notions of performance: Insufficiency of Big-O,Example; Pipelining, Superscalar, etc.; Compiler Optimizations; Processor “Speed of Light” –Keutzer

	Monday 9/8	Discussion 1	Intro to the Lab Environment. Assignment 1 goes out.
	Tuesday 9/9	Lecture 4	C++ for Java/C Programmers; Working with OpenMP and Pthreads. Assignment 1 due. Assignment 2 goes out.
	Thursday 9/11	Lecture 5	Sequential Processor Performance Part 2 –Keutzer

	Monday 9/15	Discussion 2	Assignment 1 due. Assignment 2 goes out.
	Tuesday 9/16	Lecture 6	Parallel Processor Architecture –Keutzer
	Thursday 9/18	Lecture 7	Patterns – Another Way to Think About Parallel Programming – Keutzer

	Monday 9/22	Discussion 3	Assignment 2 due. Study for midterm.
	Tuesday 9/23	Lecture 8	Memory optimization and Optimizing Matrix Multiply,–Mattson
	Thursday 9/25	Lecture 9	Synchronization and concurrency issues–Mattson

Week 6	Monday 9/29	Discussion 4	Assignment 3 goes out.
	Tuesday 9/30	Midterm	Midterm 1
	Thursday 10/2	Lecture 10	Data Parallelism –Keutzer

Week 7	Monday 10/6	Discussion 5	Assignment 3 due. Assignment 4 goes out.
	Tuesday 10/7	Lecture 11	CUDA and OpenCL–Jin
	Thurs 10/9	Lecture 12	Structured grid, and MPI –Mattson

Week 8	Monday 10/13	Discussion 6	Assignment 4 due. Assignment 5 goes out.
	Tuesday 10/14	Lecture 13	Mid-semester review/Project proposals due.
	Thursday 10/16	Lecture 14	Parallelizing structural patterns – part 1 – Keutzer

Week 9	Monday 10/20	Discussion 7	Assignment 5 due. Assignment 6 goes out.
	Tuesday 10/21	Lecture 15	Parallelizing structural patterns – part 2–Keutzer
	Thurs 10/23	Lecture 16	Parallelizing logic optimization – part 1–Keutzer

Week 10	Monday 10/27	Discussion 8	Assignment 6 due. Midterm 2 review and discussion.
	Tuesday 10/28	Lecture 17	Parallelizing logic optimization – part 2Keutzer
	Thursday 10/30	Midterm 2	Midterm 2

Week 11	Monday 11/3	Discussion 9	Project meetings: show up with evidence of work!
	Tuesday 11/4	Lecture 18	Dense linear algebra – pt1 – Jin
	Thursday 11/6	Lecture 19&20	Dense linear algebra – pt2 – JinSparse linear algebra – pt1

Week 12	Monday 11/10	Discussion 10	Project meetings: show up with evidence of work!
	Tuesday 11/11	Holiday
	Thursday 11/13	Lecture 21	Sparse linear algebra – pt2 – Jin

Week 13	Monday 11/17	Discussion 11	Project meetings: show up with evidence of work!
	Tuesday 11/18	Lecture 23	Speech recognition – pt1 –Keutzer
	Thurs 11/20	Lecture 24	Speech recognition – pt2 –Keutzer

Week 14	Tues 11/25	Lecture 25	Your career in software – Keutzer

Week 15	Monday 12/1	Discussion 12	discuss projects
	Tuesday 12/2	Presentations	Project presentations
	Thurs 12/4	Presentations	Project presentations