Keywords: latency estimation, transformer encoder, graph convolutional network, code representations
TL;DR: We learn source code representations that are useful to predict the latency of DNN implementations.
Abstract: With the recent developments of Deep Learning, having an accurate and device specific latency prediction for Deep Neural Networks (DNNs) has become important for both the manual and automatic design of efficient DNNs. Directly predicting the latency of DNNs from their source code yields significant practical benefits. It opens a way towards profilers that can instantly feedback the latency of a given piece of deep learning code to the developer. In this paper, we conduct a preliminary study for source code based latency prediction of DNNs. We introduce Code Based Runtime Approximation (COBRA), that leverages a transformer encoder to learn representations of short code snippets. These representations are then aggregated by a Graph Convolutional Network (GCN) that captures the algorithmic dependencies and that estimates the latency of the implemented DNN. Our experiments with COBRA show promising results and indicate that latency prediction from code can be competitive with traditional latency prediction methods for DNNs.