Overview
Syllabus
0:00 Benchmarking Google’s TPUs vs Nvidia GPUs
0:33 Video Overview
1:12 H100 SXM, H200 SXM and v6e hardware specs
4:47 Benchmarking Design with vLLM and llmperf
7:42 Price assumptions per hour
8:47 Tensor Parallel vs Pipeline Parallel
13:45 Pros and Cons of Tensor vs Pipeline Parallel
14:42 Where to test TPUs and GPUs
15:45 Future videos: Blackwell B200 and Amazon Trainium
16:15 Running inference on Nvidia GPUs
19:17 Running inference on Google TPUs
25:51 Running benchmarking with llmperf
28:23 Benchmarking Results: TPU vs GPU
33:21 Conclusion, Resources and Workshop
Taught by
Trelis Research