Taming Multi-dimensional Parallelism: Communication Compression & Configuration for Scalable LLM Training

09 Jul 2025

Jayong Song - AISys, EE, Seoul National University

Training today’s large language models (LLMs) demands thousands of GPUs orchestrated with multi-dimensional parallelism such as 3D/5D parallelism. This talk brings together two complementary systems that remove the twin roadblocks such scale introduces:

1) Optimus-CC attacks the communication bottleneck. It aggressively compresses not only data-parallel traffic but also pipeline back-propagation signals and embedding synchronizations. Error-suppression techniques—grounded in formal analysis—preserve model quality while selectively compressing only critical-path transfers, yielding state-of-the-art speedups on multi-node clusters.

2) Pipette tackles the configuration bottleneck. It automatically finds memory-safe 3D-parallel splits and GPU mappings by modeling real-world, link-level bandwidth heterogeneity and per-GPU memory limits. The resulting configurations execute—and accelerate—LLM training where prior tuners stall or underperform.

Together, Optimus-CC and Pipette show that principled communication compression and topology-aware configuration are both necessary and synergistic for unlocking fast, memory-efficient, and scalable LLM training. I want to share the key algorithmic insights of the above papers and some details of recent parallelism with Mixture-of-Expert (MoE) models for future research.

PPT CV

Catering Courtesy of MICV Lab
Invited Talk

Yonsei MLSys Student Group

Taming Multi-dimensional Parallelism: Communication Compression & Configuration for Scalable LLM Training

Jayong Song - AISys, EE, Seoul National University

Related posts

Opening and Closing the Multi-Tenant NPU Chapter: From Time-Sharing to Virtualization 24 Dec 2025

Introducing Jason Lee's Current Research 16 Dec 2025

Image Generation for Brain MRI 10 Dec 2025