09 Jul 2025
Jayong Song - AISys, EE, Seoul National University
Training today’s large language models (LLMs) demands thousands of GPUs orchestrated with multi-dimensional parallelism such as 3D/5D parallelism.
This talk brings together two complementary systems that remove the twin roadblocks such scale introduces:
1) Optimus-CC attacks the communication bottleneck. It aggressively compresses not only data-parallel traffic but also pipeline back-propagation signals and embedding synchronizations. Error-suppression techniques—grounded in formal analysis—preserve model quality while selectively compressing only critical-path transfers, yielding state-of-the-art speedups on multi-node clusters.
2) Pipette tackles the configuration bottleneck. It automatically finds memory-safe 3D-parallel splits and GPU mappings by modeling real-world, link-level bandwidth heterogeneity and per-GPU memory limits. The resulting configurations execute—and accelerate—LLM training where prior tuners stall or underperform.
Together, Optimus-CC and Pipette show that principled communication compression and topology-aware configuration are both necessary and synergistic for unlocking fast, memory-efficient, and scalable LLM training.
I want to share the key algorithmic insights of the above papers and some details of recent parallelism with Mixture-of-Expert (MoE) models for future research.
CV
Catering Courtesy of MICV Lab
Invited Talk
02 Jul 2025
Jisoo Jang - SSLab, CS, Yonsei University
This talk explores the landscape of modern fuzzing—a well-proven, automated testing technique for assessing program robustness by providing invalid, unexpected, and random inputs. As a fundamental technique in system security, its application in contemporary research is crucial. To illustrate the journey from fundamental concepts to advanced application, the presentation will dive into the detailed analysis of ReUSB and Moneta, two of presenter’s recent works on fuzzing. These publications demonstrate how novel fuzzing techniques are integrated into system security research to uncover critical vulnerabilities.
PPT
CV
Catering Courtesy of ASO Lab
25 Jun 2025
Jinwoo Kim - CIP Lab, CS, Yonsei University
Traditional Image Signal Processing (ISP) pipelines use fixed algorithms, which makes them less adaptable to different imaging situations and user tastes. While deep learning has improved individual ISP modules, a comprehensive and flexible solution is still needed. Diffusion models show great promise in generating, enhancing, and restoring images. This research looks into whether a diffusion-based model could replace conventional ISP to create a unified and adaptive imaging system. Our proposed Diffusion Camera is a new framework that combines ISP processing and image editing into one diffusion-based model. This approach moves beyond traditional ISPs by unifying multiple tasks and allowing for personalization.
CV
Catering Courtesy of ASO Lab
18 Jun 2025
Jason Lee - CIPLab, AI, Yonsei University
This presentation introduces the past research and current interest of Jaeseong Lee (Jason Lee).
First, It begins with two papers related with MLLM, which are accepted at IPIU 2025 Oral, CVPR 2025 Workshop.
Next, he introduces two papers which is under-review at ICCV 2025 and going to be submitted at ICCV 2025 Workshop, which focuses on fair generation in diffusion models and robust personalization.
Finally, he shares his current research intereset.
PPT
CV
Catering Courtesy of AI-ISL
11 Jun 2025
Yoonjun Cho - AI-ISL, CS, Yonsei University
Quantization reduces the bitwidth of weights in large language models to enable efficient inference, but often comes at the cost of accuracy. In this talk, I begin with an overview of quantization techniques, outlining their evolution and recent research trends aimed at improving quality and scalability.
I will then present my recent work on quantization, which focuses on decomposing the weight matrix into a quantized matrix and low-rank components. The talk concludes with a brief discussion of future directions in quantization research.
PPT
CV
Catering Courtesy of AI-ISL
04 Jun 2025
Kunmo Jeong - CoreLab, EE, Yonsei University
The growing demand for domain-specific accelerators in fields such as machine learning, graph analytics, and scientific computing has highlighted the need for productive and efficient hardware design methodologies. High-level synthesis (HLS) offers an attractive solution by generating hardware circuits from high-level code, with loop pipelining serving as a cornerstone for maximizing throughput in regular computations. However, existing static and dynamic HLS approaches fail to achieve high pipeline utilization for irregular loop nests that are characterized by data-dependent bounds, unpredictable memory access patterns, and loop-carried dependencies.
To address the pipeline underutilization in irregular loop, this work proposes a new barrier-free pipeline architecture with a cross-level scheduling strategy and the corresponding HLS compiler Selene, that automatically synthesizes the proposed architecture.
Our approach introduces a fine-grained pipeline controller and an outer-loop iteration interleaving mechanism, enabling concurrent execution of multiple outer-loop iterations and efficient handling of data dependencies. Implemented within a commercial HLS flow, Vitis HLS, Selene delivers significant speedups of 5.21x and 5.68x over both standard static and dynamic HLS tools on a range of irregular workload benchmarks, demonstrating its effectiveness in overcoming longstanding barriers to efficient hardware generation for data-dependent applications.
PPT
CV
Catering Courtesy of CIP Lab
28 May 2025
Chaerim Lim - BDAI Lab, CS, Yonsei University
Data lakes were introduced to overcome the limitations of traditional data warehouses, especially in handling large and diverse datasets. They support schema-on-read and decoupled storage and compute, which are useful for modern data processing. This session introduces how data lakes have evolved in terms of architecture and system design. Their applications in AI workflows—such as data preparation, feature extraction, and model training—are also discussed. The session further addresses key challenges including metadata management, data quality, and query performance, and examines how AI techniques can be applied to improve the efficiency and automation of data lake operations.
PPT
CV
Catering Courtesy of CIP Lab
21 May 2025
Kyobin Choo - MICV Lab, CS, Yonsei University
Image registration is the task of aligning two images by establishing meaningful spatial correspondences between them. It plays a vital role in a wide range of applications, including computer vision, remote sensing, and especially medical imaging, where it is essential for tasks such as longitudinal studies, multi-modal image fusion, and image-guided interventions.
This seminar provides a comprehensive overview of the evolution of deformable image registration, covering classical optimization-based methods as well as more recent deep learning approaches. A particular focus is placed on the technical challenges of multi-modal medical image registration, where aligning images from different modalities (e.g., MRI and CT) poses significant difficulties due to modality-specific appearance discrepancies.
The seminar concludes by introducing a recent approach that reformulates the multi-modal registration problem as a mono-modal one, enabling more robust and efficient optimization. This section will present my own research contributions in this direction.
CV
Catering Courtesy of CoreLab
14 May 2025
We invite you to join us for an engaging yet concise recap of the standout sessions from Yonsei MLSys. We’ll cover compiler optimizations that speed up large-scale model training, dive into kernel-fusion strategies, and explore hardware-aware scheduling techniques illustrated with real-world case studies. You’ll also get a quick live demo for system optimization, plus practical tips for GPU and NPU profiling and tuning.
Each presenter will deliver a focused five-minute overview, followed by an open Q&A panel to tackle your questions. If you’re looking to move beyond paper reading, this meetup offers hands-on exercises and the chance to swap ideas with fellow students and researchers. Please join us for a collaborative and insightful session!
Catering Courtesy of CoreLab
07 May 2025
Chanyoung Kim - MICV Lab, AI, Yonsei University
Vision Transformers (ViTs) have become prominent models in computer vision, offering competitive performance across a variety of tasks. Beyond their effectiveness, ViTs exhibit a range of interesting and non-obvious characteristics. For example, different layers capture varying types of information, each contributing uniquely to the overall representation. Furthermore, attention patterns show that some image patches focus more on local features, while others prioritize global context—even within a single input. These observations point to an organized yet adaptable way in which ViTs handle visual information.
In this seminar, we examine these hidden properties of ViTs through empirical findings and theoretical perspectives. We also explore how a deeper understanding of such characteristics can inform the design of downstream applications. By investigating the internal workings of ViTs, we aim to provide insights that can lead to more effective and targeted use of transformer-based models in visual learning tasks.
PPT
CV
Catering Courtesy of ASO Lab
30 Apr 2025
Kiung Jung - ASO Lab, CS, Yonsei University
Recent deep learning models consist of tens of thousands of GPU kernels.
While early optimization efforts primarily focused on computation itself, advancements in both hardware and software have made non-computational overheads increasingly prominent.
In this talk, we discuss the impact of such overheads on real-world models and explores optimization strategies to mitigate them, with a particular focus on kernel launch overhead.
CV
Catering Courtesy of ASO Lab
23 Apr 2025
Sejong Yang - CIPLAB, CS, Yonsei University
In this presentation, Sejong Yang shares his journey toward character talking head generation. The first part introduces his life experiences and the motivations that led him to pursue this topic as his PhD goal. He also presents four related research projects, offering insights into recent trends and studies in face modeling and talking head generation. Although still a work in progress, this slide deck—also used in his recent job talks—illustrates his ongoing efforts to extend talking head video generation beyond humans to encompass a wide range of characters throughout his PhD journey.
PPT
CV
Catering Courtesy of MICV Lab
16 Apr 2025
Jason Lee - CIPLab, AI, Yonsei University
This presentation introduces the career journey and research interests of Jaeseong Lee (Jason Lee). It begins with an overview of the AI startup ecosystem, including how AI startups generate revenue and the roles of research scientists and engineers. Jason then shares insights from his experiences at three different AI startups, offering a more detailed perspective on their operations and culture. Finally, he briefly discusses his motivation for starting a PhD and outlines the focus of his upcoming thesis research.
PPT
CV
Catering Courtesy of MICV Lab
09 Apr 2025
Kunmo Jeong - CoreLab, EE, Yonsei University
High-Level Synthesis (HLS) has emerged as a key methodology for bridging the gap between software design and hardware implementation. Among various optimization strategies, Coarse-Grained Pipeline Architecture (CGPA) has shown promise in improving throughput by partitioning programs into pipeline stages. However, achieving effective parallelism remains challenging due to control and data dependencies. This work investigates the integration of Program Slicing-based Decoupled Software Pipelining (PS-DSWP) into CGPA to improve stage-level parallelism and performance. By applying program slicing techniques prior to software pipelining, the proposed approach enhances the decoupling of stages and reduces inter-stage communication overhead. Experimental results on representative benchmarks demonstrate notable improvements in resource utilization and latency reduction compared to baseline CGPA implementations. This study highlights the potential of PS-DSWP as a complementary technique for advancing pipeline-based HLS designs.
PPT
CV
Catering Courtesy of CoreLab
02 Apr 2025
Junyoung Jang - AIPR, School of Computing, KAIST
This talk covers post-training techniques aimed at enhancing the reasoning capabilities of Large Language Models (LLMs). Key topics include foundational reinforcement learning methodologies as well as advanced approaches such as PPO, DPO, GRPO, and DAPO. We analyze how these methods affect model reasoning performance and response consistency. Furthermore, we discuss the effectiveness and limitations of each technique through representative evaluation benchmarks and experimental results, offering insights into future research directions.
PPT
CV
Catering Courtesy of CoreLab
Invited Talk
29 Mar 2025
Enhyeok Jang - eSCaL, EE, Yonsei University
The recently proposed racetrack architecture based on trapped-ion qubits by Quantinuum, and the zoned architecture using Rydberg atoms by QuEra, have hardware characteristics that differ from conventional superconducting-based architectures with static topologies. These qubit systems, controlled by optical lasers, offer flexible connectivity for entangling operations, but they also introduce a new compilation challenge of shuttling overhead. In such systems, ions or neutral atoms should physically move to designated gate operation zones, adding a new layer of complexity for the qubit re-alignment. In this presentation, we explore compilation methodologies to design variational quantum programs for quantum machine learning applications to be effectively trained on these emerging architectures.
PPT
CV
Invited Talk
26 Mar 2025
Hyeoncheol Kim - ASO Lab, CS, Yonsei University
As the data demands of large language models (LLMs) continue to grow, challenges such as the memory wall have become increasingly prominent in modern AI accelerators. These limitations lead to constrained scalability, underutilized compute resources, and increased energy consumption. This talk presents research conducted by Hyeoncheol Kim, which explores system-level techniques to improve the efficiency of LLM inference across both GPU-based systems and real-world Processing-In-Memory (PIM) hardware. The presentation covers architectural considerations and compiler-level integration strategies designed to reduce data movement overhead and enhance execution throughput.
PPT
CV
Catering Courtesy of CIP Lab
19 Mar 2025
Yejin Lee - ToC Lab, CS, Yonsei University
Large Language Models (LLMs) are making coding easier and more efficient by helping with writing, debugging, and understanding code. Recent research has focused on improving their accuracy, reasoning, and ability to work with different programming languages. This presentation will cover key advancements, challenges, and what the future holds for LLMs in software development.
PPT
CV
Catering Courtesy of CIP Lab
12 Mar 2025
Kyobin Choo - MICV Lab, CS, Yonsei University
This presentation introduces the company projects and research endeavors led by Kyobin Choo, highlighting key challenges encountered while developing a positron emission tomography (PET) quantification solution for the early diagnosis of Alzheimer’s disease. These challenges naturally expanded into broader research in medical imaging, encompassing diverse computer vision tasks such as volumetric image segmentation, deformable image registration, denoising, and cross-modality image synthesis. In particular, this talk explores how diffusion models have been effectively leveraged to address medical image-specific challenges—distinct from those found in natural images—within generative tasks such as image-to-image translation.
PPT
CV
Catering Courtesy of MICV Lab
05 Mar 2025
Chaerim Lim - BDAI Lab, CS, Yonsei University
With the rapid advancement of machine learning (ML) and increasing industry adoption, organizations are exploring how to seamlessly integrate ML with their existing data and systems. As ML applications expand, a critical challenge arises: how can organizations leverage their well-established data infrastructure to support ML workflows effectively? This talk explores the intersection of ML and data management, focusing on how ML pipelines—from training to inference and deployment—can be optimized from a data-centric perspective. Additionally, we will discuss how data management systems can better support ML-driven applications, particularly in scenarios like Retrieval-Augmented Generation (RAG), and highlight ongoing research in this domain.
PPT
CV
Catering Courtesy of MICV Lab
26 Feb 2025
Sejong Yang - CIP Lab, CS, Yonsei University
This presentation explores the objectives pursued in deep learning research and the tools utilized to achieve them. It provides practical tips for machine learning researchers and offers system researchers insights into the workflow of machine learning research. The tools introduced include Slurm, Docker, Conda, Git, WandB, PyTorch, and PyTorch-Lightning.
PPT
CV
Catering Courtesy of ASO Lab
19 Feb 2025
Jinwoo Kim - CIP Lab, CS, Yonsei University
In this seminar, I will share my journey through computer vision research, from early explorations to recent developments. Starting with my first steps into video instance segmentation during my 2020 internship, I will walk through key milestones: my first ICCV paper in 2021, my experience with workshops, and the path to my first co-first author paper at CVPR 2022. I will also discuss my research on object-centric learning, which led to another CVPR paper in 2023, and my latest work on an object-centric dataset. Through these experiences, I will reflect on how each project began, the challenges faced, and the insights gained along the way.
PPT
CV
Catering Courtesy of ASO Lab
13 Feb 2025
Heelim Choi - Core Lab, EE, Yonsei University
Large Language Models (LLMs) have revolutionized AI applications, yet their high computational and memory demands pose significant challenges in inference and deployment.
This talk explores system software optimizations that enhance the efficiency of LLM execution.
We discuss key innovations such as PagedAttention (vLLM) for memory-efficient inference, GPU-accelerated serving with TensorRT-LLM, and distributed execution strategies in DeepSpeed and Megatron-LM.
These approaches enable higher throughput, lower latency, and reduced hardware costs, making LLM deployment more scalable.
Finally, we examine the real-world impact of these techniques and future directions in LLM system optimization.
PPT
CV
Catering Courtesy of ASO Lab
04 Feb 2025
Kiung Jung - ASO Lab, CS, Yonsei University
In the rapidly evolving landscape of deep learning and high-performance computing, Triton emerges as a powerful Python-based domain-specific language (DSL) for writing highly optimized GPU kernels with ease. By abstracting away much of the complexity of traditional CUDA programming, Triton enables researchers and developers to write efficient, high-performance code with minimal effort.
Many state-of-the-art AI models, including DeepSeek and FlashAttention, leverage Triton DSL to develop custom kernels, significantly enhancing computational efficiency. By utilizing Triton, researchers can optimize memory access patterns, effectively parallelize computations, and outperform standard implementations.
For those seeking to accelerate their machine learning workloads, Triton provides an intuitive yet highly expressive approach to kernel development, bridging the gap between productivity and performance. Whether you’re optimizing deep learning frameworks or developing custom operations, Triton DSL empowers you to push the boundaries of computational efficiency.
PPT
CV
Catering Courtesy of ASO Lab
22 Jan 2025
Chanyoung Kim - MICV Lab, AI, Yonsei University
This presentation provides a brief introduction to the research conducted by Chanyoung Kim. His work spans diverse areas, including audio-visual representation, image semantic segmentation, and medical imaging. Additionally, it addresses fundamental questions regarding the ViT attention mechanism and includes an in-depth analysis of these aspects.
PPT
CV
Catering Courtesy of ASO Lab
15 Jan 2025
Sejong Yang - CIP Lab, CS, Yonsei University
This presentation introduces the various research projects and tasks undertaken by Sejong Yang in the field of computer vision. It provides a concise overview of foundational theories related to models employed in diverse computer vision tasks. Special attention is given to the basic concepts underlying generative models, including their foundational theories and methods for leveraging generative model priors effectively.
PPT
CV
15 Jan 2025
Jinwoo Kim - CIP Lab, CS, Yonsei University
This presentation provides a brief introduction of a resrach domain, called “compositional understanding in computer vision.”
Compositional understanding in computer vision focuses on comprehending “what is an object” and “how they interact with each other” from given visual inputs.
PPT
CV
07 Jan 2025
Heelim Choi - Core Lab, EE, Yonsei University
This presentation provides a concise introduction to compiler optimization and the importance of Intermediate Representation (IR) in supporting various hardware and software platforms. It explores applications in areas such as graph processing, homomorphic encryption, and deep learning, demonstrating how optimization techniques can enhance performance. To address memory bottlenecks, the concept of Processing In Memory (PIM) technology is introduced, with a focus on DRAM-based PIM hardware. The presentation outlines the design of PIM-friendly IR and its integration into compilers as a cohesive strategy for improving efficiency in memory-intensive tasks.
PPT
CV
07 Jan 2025
Kiung Jung - ASO Lab, CS, Yonsei University
A compiler is a software tool that acts as an interface between the user and the hardware. DL models, like classical programs, cannot be directly executed on GPUs without the assistance of compilers. In this talk, we will explore various DL compiler optimizations, such as kernel fusion, and delve into the PyTorch software stack. Additionally, we will examine how the Triton language is transformed from a Python Domain Specific Language (DSL) into PTX.
PPT
CV