EntroCap: Zero-shot image captioning with entropy-based retrieval

Published: 01 Jan 2025, Last Modified: 24 Mar 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•EntroCap retrieves small targets and captions unseen images in zero-shot settings.•A Hierarchical Projector learns global signals from CLIP for better discrimination.•An Entropy-based Strategy adjusts logits, capturing small targets in local signals.•A Balancing Gate refines GPT-2 for more accurate description generation.•EntroCap achieves state-of-the-art transferability in cross-domain captioning.
Loading