Abstract: Deep networks have set the state-of-the-art in most image analysis tasks by replacing handcrafted features with learned convolution filters within end-to-end trainable architectures. Still, the specifications of a convolutional network are subject to much manual design - the shape and size of the receptive field for convolutional operations is a very sensitive part that has to be tuned for different image analysis applications. 3D fully-convolutional multi-scale architectures with skip-connection that excel at semantic segmentation and landmark localisation have huge memory requirements and rely on large annotated datasets - an important limitation for wider adaptation in medical image analysis. We propose a novel and effective method based on a single trainable 3D convolution kernel that addresses these issues and enables high quality results with a compact four-layer architecture and without sensitive hyperparameters for convolutions and architectural design. Instead of a manual choice of filter size, dilation of weights, and number of scales, our one binary extremely large and inflecting sparse kernel (OBELISK) automatically learns filter offsets in a differentiable continuous space together with weight coefficients. Geometric data augmentation can be directly incorporated into the training by simple coordinate transforms. This powerful new architecture has less than 130'000 parameters, can be trained in few minutes with only 700 MBytes of memory and achieves an increase of Dice overlap of +5.5\% compared to the U-Net for CT multi-organ segmentation.
Keywords: deformable convolution, sparsity, deep learning, segmentation
Author Affiliation: University of Luebeck, Babylon Health London