Abstract: Highlights•An embedding set alignment module is proposed to extract fine-grained features.•An adaptive semantic margin loss is introduced for text-image alignment adaptively.•Extensive experiments on public benchmarks show our method outperforms the SOTAs.
External IDs:dblp:journals/ivc/ZhaoFZDY24
Loading