Abstract: Highlights•A CNN-Transformer hybrid decoding model is proposed to decode visual neural activities evoked by natural images into texts about the visual stimuli.•A specific architecture of the transformer is investigated to improve the decoding performance.•The function of visual durations, attention mapping, and visual regions are explored to understand the neural mechanism in the human brain.
Loading