Abstract: Recent large machine learning models such as CLIP have shown impressive generalization performance for various perception tasks. In this work, we explore to what extent they model the human cognitive process. We focus our attention on how these models perceive optical illusions. We present a simple way to assess the effect by presenting illusions in the form of image and text prompts while observing the changes in models’ output under different illusory strengths. Our results show that CLIP can indeed be fooled by different types of illusions relating to lightness and geometry.