SketchMLP: effectively utilize rasterized images and drawing sequences for sketch recognition

Tengjie Li, Shikui Tu, Lei Xu

Published: 2025, Last Modified: 15 Jun 2025Mach. Learn. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As a versatile tool for effectively communicating abstract concepts, freehand sketching has garnered significant attention and has undergone extensive exploration in computer vision. Sketch recognition is one of the fundamental and challenging tasks. Sketches are composed of a limited number of sparse and simple lines, rendering them challenging to identify as natural images through the use of texture and color. Consequently, the introduction of information about the drawing order and graph structure represents a significant avenue for improving the performance of the model. The previous deep learning methods often necessitate the deployment of considerable expertise to process the sketches, including the application of specific rules to the coloring of black-and-white sketches, the design of hierarchical graphic representation structures, and the decoupling of strokes. But, does sketch recognition necessarily require expert knowledge when the dataset is sufficient? In this paper, we propose a multi-layer perceptron (MLPs) based model, namely SketchMLP, for sketch recognition, which directly takes sketch images and drawing sequences as inputs avoiding the dependence on complex data processing expert knowledge. SketchMLP utilizes a gated MLP (gMLP) based sequence branch to discover the relationships between points in the drawing sequence and an axial shifted MLP (AS-MLP) based image branch to extract local visual features. The two branches of the model produce their own recognition decisions, which are then combined by a mixture of expert (MoE) block to generate the final result. The MoE block automatically assigns weight for different branches based on the sketch instance. Sufficient experimental results show the effectiveness of SketchMLP. On the largest sketch dataset, QuickDraw, SketchMLP demonstrates superior performance to the state-of-the-art method SketchXAI in terms of recognition accuracy. This is achieved with a significantly reduced parameter count, about only 41.3% of SketchXAI. The code is available at https://github.com/CMACH508/SketchMLP.