Abstract: Highlights•We propose LSFM-Diff, a diffusion model for UIE with dual LLaVA semantic guidance.•We introduce WTIF-CR, a module that fuses text and features for fine-grained guidance.•We design SGDA, a mechanism for spatially adaptive feature enhancement within UNet.
External IDs:dblp:journals/inffus/FanZHLZ26
Loading