Yonsei

Yonsei MLSys Student Group

Opening and Closing the Multi-Tenant NPU Chapter: From Time-Sharing to Virtualization

Hyunkyu Kang - ASOLab, CS, Yonsei University

This presentation surveys the design of multi-tenant NPUs, exploring core mechanisms such as preemptible time-sharing, spatial partitioning, memory-centric control, and accelerator virtualization. It uses four representative architectures from 2020 to 2023 as case studies to illustrate how these mechanisms aim to improve the performance of NPU for concurrent neural network services. The limitations and trade-offs of these approaches are examined in the context of modern large-scale model serving workloads, clarifying where multi-tenancy brings clear benefits and where its impact is constrained by batching, scaling, and deployment patterns. The goal is to provide a concise, historically grounded view of how this research direction opened, evolved, and now appears to be stabilized.

PPT CV

Catering Courtesy of CoreLab