Abstract: Highlights•Present an end-to-end trainable DOLNet to detect document objects more accurately.•DOLNet consists of Cascade Mask R-CNN, composite backbones with deformable convolution.•Single model trained on IIIT-AR-13K achieves state-of-the-art performance.•Achieve state-of-the-art results on IIIT-AR-13K for detecting various document objects.•Achieve state-of-the-art results on benchmark datasets for table detection.
Loading