Abstract: Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs separately, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction information. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g., driving instruction-related texts, symbols, and guide panels) into a natural language for providing complete traffic instruction support to autonomous or assistant driving. Meanwhile, considering the lack of an effective framework for the proposed TSI task in existing works, we design a multi-task learning architecture (TSI-arch) to detect and recognize various traffic signs with drastic changes in sizes and aspect ratios. Meanwhile interpreting these signs into a natural language like a human according to Chinese design criteria of road traffic signs. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver’s perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on our TSI-CN dataset demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complicated semantic logic among signs.
Loading