FACIL: Flexible DRAM Address Mapping for SoC-PIM Cooperative On-device LLM Inference

Seong Hoon Seo; Junghoon Kim; Donghyun Lee; Seonah Yoo; Seokwon Moon; Yeonhong Park; Jae W. Lee

FACIL: Flexible DRAM Address Mapping for SoC-PIM Cooperative On-device LLM Inference

Seong Hoon Seo, Junghoon Kim, Donghyun Lee, Seonah Yoo, Seokwon Moon, Yeonhong Park, Jae W. Lee

Published: 01 Jan 2025, Last Modified: 07 May 2025HPCA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rise of on-device inference of large language models (LLMs) is rapidly escalating the demand for memory-intensive operations on edge devices. While DRAMbased processing-in-memory (PIM) is a promising solution for overcoming the memory wall, edge devices require PIM to function both as a compute unit and a memory device due to their limited memory capacity. Such PIM-enabled memory complicates the partition and placement of a tensor into DRAM banks in a PIM-operable manner. Notably, we highlight that LLM weights need to be accessible by both PIM and system-on-chip (SoC) processors, as the same weights are used for both SoC-favorable GEMM and PIM-favorable GEMV operations. This necessitates different memory mappings for PIM and SoC processors, leading to potential re-layout costs when switching between the two. To address this challenge, we propose FACIL, a flexible DRAM address mapping solution that efficiently places tensors in DRAM for PIM operations while allowing SoC processors to access the same data using contiguous virtual addresses. FACIL consists of (i) a memory controller that assigns different DRAM address mapping to the page offset bits of each huge page and (ii) a user-level library that determines the appropriate DRAM address mapping. We demonstrate that enabling re-layout-free access of both PIM and SoC processor benefits LLM inference on various on-device LLM tasks, including short conversation and code autocompletion, reducing the time-to-first-token by $2.37 \times$ and $2.63 \times$, respectively, over the SoC-PIM baseline.

Loading