Abstract: The rise of self-driving laboratories has seen significant growth across various research domains, particularly in chemistry, materials science and life science. However, a major challenge persists—the majority of self-driving systems are costly due to the use of highly precise lab equipment, robotic platforms, and case-specific algorithms, rendering these systems less accessible for educational purposes. This paper takes a multidisciplinary approach; we first introduce a small-scale self-driving experiment platform tailored for educational use, focusing on liquid materials mixing tasks commonly seen in chemistry and life sciences. To understand the operational status in real-time while maintaining self-driving capability and efficiency, we propose a novel system concept: employing a mobile robot as the lab supervisor to monitor the experiment process across multiple identical self-driving platforms. Specifically, this paper focuses on implementing a vision-based monitoring system. A deep learning architecture with a new training strategy is presented to jointly address two tasks: (a) vessel and content material segmentation and (b) volume estimation. The two tasks can be trained independently but can be inferred end-to-end by integrating them into the Mask R-CNN framework. Through evaluating the monitoring module on a real dataset, the results showcase promising detection capabilities, good real-time performance, and compatibility with the self-driving platform, indicating the feasibility of our proposed system.