Parallel and
Distributed Programming
Kenjiro Taura
What’s new (in the newest-first
order)
- (Posted: Jan. 20, 2024) Plan for Jan. 20
- (Posted: Jan. 06, 2024) Plan for Jan. 06
- (Posted: Jan. 06, 2024) Details about how to get
credit are released
- (Posted: Dec. 23, 2024) Plan for Dec. 23
- (Posted: Dec. 16, 2024) Plan for Dec. 16
- (Posted: Dec. 9, 2024) Plan for Dec. 9
- (Posted: Dec. 2, 2024) Plan for Dec. 2
- (Posted: Nov. 25, 2024) Plan for Nov. 25
- SIMD
high level approach (recap)
- pd04_simd_high_level
- SIMD
low level approach
- pd05_simd_low_level
- pd30_mlp
- (Posted: Nov. 16, 2024)
- I released a new Jupyter notebook,
pd30_mlp
- It is going to be the next assignment you will submit
- See the notebook, as well as this github page for more details and
further updates
- As due is still undecided, there is no entry in UTOL yet, but I will
make it once due is fixed
- (Posted: Nov. 13, 2024) Plan for Nov. 13
- pd03_omp_gpu
- SIMD
high level approach
- pd04_simd_high_level
- SIMD
low level approach
- pd05_simd_low_level
- (Posted: Nov. 11, 2024) Plan for Nov. 11
- (Posted: Nov. 11, 2024)
- we’ll have another class this week, on Wednesday,
Nov. 13th
- no class in the next Monday, Nov. 18th; I’ll
deliver on-demand materials
- (Posted: Nov. 02, 2024)
- (Posted: Oct. 27, 2024) Plan for Oct. 28
- (Posted: Oct. 20, 2024) Plan for Oct. 21
- (Posted: Oct. 06, 2024) Plan for Oct. 07
- (Posted: Sep. 28, 2024) Site up
Slides
- Introduction
- OpenMP
- CUDA
- OpenMP for GPU
- SIMD
- How to
get nearly peak FLOPS (with CPU)
- What You
Must Know about Memory, Caches, and Shared Memory
- Analyzing
Data Access of Algorithms and How to Make Them Cache-Friendly
- Divide and Conquer
- Neural
Network Basics
- Understanding Task Scheduling Algorithms
Languages
- All written materials (slides, home pages, etc.) will be in
English
- Lectures will be in English
Hands-on programming
exercise
- You will have an access to latest CPU and GPU machines and hands-on
experiences on parallel programming
- This year, I emphasize a programming model targetting both CPUs and
GPUs (OpenMP + GPU offloading)
How to get the credit
Topics covered
- Parallel Programming in Practice
- It’s easy! — a quick and gentle introduction to parallel problem
solving
- Some examples of parallel problem solving and programming
- Taxonomy of parallel machines and programming models
- What today’s machines look like — parallel computer architecture
- Distributed memory machines
- Multi-core / multi-socket nodes
- SIMD instructions
- Parallel programming models
- Finding and expressing parallelism
- Mapping computation onto compute resources
- Coordination and communication
- Examples of parallel programming languages/models
- Understanding performance of parallel programs (and achieving high
performance)
- The maximum performance of your CPU/GPU and why you don’t get it for
your program?
- The maximum performance of memory and why you don’t get it for your
program?
- How to reason about memory traffic of your programs
- Provable bounds of greedy schedulers
- Provable bounds of work-stealing schedulers
- Cache miss bounds of work-stealing schedulers