Abstract: Highlights•We propose a multi-level attention network for referring expression comprehension.•We combine attribute attention and context attention for more effective REC.•We incorporate position information to assist the location of the target object.•Experiments on three REC datasets show the effectiveness of our framework.
External IDs:dblp:journals/prl/SunZJHY23
Loading