Abstract: Highlights•We propose a class-agnostic one-shot counting framework using only one reference image.•Instead of text-guided multimodal models, we employ a lightweight ResNet-50 and SAFE to reduce feature confusion.•We introduce the LOCO dataset with point- and box-level annotations and new metrics for evaluation.•Extensive experiments show strong performance and generalization across multiple benchmarks.
External IDs:doi:10.1016/j.neunet.2025.107961
Loading