LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: LithoGRPO unifies flow matching and GRPO reinforcement learning for fast inverse lithography with a fast shot-count evaluator, achieving state-of-the-art mask quality.
Abstract: In semiconductor manufacturing, lithography projects circuit layouts onto silicon wafers through an optical mask. As circuit features shrink below the wavelength of light, optical diffraction causes the printed patterns to deviate from their intended layouts. Inverse Lithography Technology (ILT) addresses this challenge by generating optimized masks that enhance the fidelity of pattern transfer onto wafers. While ILT resembles an image synthesis task, its reliance on explicit physical metrics for mask evaluation limits the applicability of existing generative models. We introduce LithoGRPO, an ILT framework that integrates the flow‑matching paradigm with GRPO‑based reinforcement learning (RL) fine‑tuning, enabling efficient exploration of diverse masks for a given target layout. Unlike purely generative or optimization‑based approaches, RL in LithoGRPO exploits the explicitly defined, physics‑based reward function of ILT, enabling optimization under complex, process‑aware constraints. To the best of our knowledge, this is the first framework that unifies flow matching and RL for mask optimization. To improve RL sampling efficiency, we propose a fast shot-counting algorithm for manufacturability evaluation, achieving over 130× speedup while preserving the mask ranking of the traditional shot-count metric. Extensive experiments demonstrate that LithoGRPO achieves state‑of‑the‑art performance over both optimization‑based and learning‑based methods, while maintaining efficient mask generation.
Lay Summary: Modern computer chips are built by shining light through a patterned "stencil," called a mask, onto a silicon wafer, much like a photographic projector. The catch is that today's circuit features are smaller than the wavelength of the light itself, so the projected image comes out blurred and distorted. To compensate, engineers deliberately distort the mask, designing it so that after the inevitable blurring, the correct shape still appears on the wafer. Designing such pre-distorted masks is called Inverse Lithography, and it is one of the most demanding computational problems in chip manufacturing. Existing approaches face a hard trade-off. Slow optimization methods can carefully tune each mask but take a long time, and they only handle quality measures that are mathematically smooth. Faster AI methods skip the heavy optimization but inherit the same blind spots, ignoring important manufacturing realities such as how many machine exposures ("shots") are needed to actually fabricate the mask, a quantity that ordinary learning cannot directly optimize. We introduce LithoGRPO, which combines a modern generative AI technique with reinforcement learning. Like a student who first learns by example and is then graded for getting better answers, our model first imitates good masks, then is rewarded on every quality measure that matters, including ones traditional learning could not handle. We also developed a new shot-counting algorithm that runs more than 130 times faster than the standard one, making the feedback loop practical. The result is a system that produces higher-quality, more manufacturable masks in a fraction of a second per layout, setting a new state of the art on standard benchmarks and pointing toward faster, cheaper chip manufacturing.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/laiyao1/LithoGRPO
Primary Area: Applications->Everything Else
Keywords: Computational Lithography, Image Synthesis, Reinforcement Learning
Originally Submitted PDF: pdf
Submission Number: 23102
Loading