Abstract: Deep learning has achieved remarkable success in various computer vision tasks, including object detection, segmentation, and classification. However, these models rely on large-scale, accurately annotated datasets, making manual annotation a labor-intensive and time-consuming process. Existing annotation tools, such as LabelImg, LabelMe, and CVAT, suffer from key limitations, including extensive human intervention, limited scalability, and a general-purpose design that is not optimized for specialized tasks such as suspiciousness estimation. To address these challenges, this work introduces DL-AAT, an automated annotation tool that leverages lightweight Deep Convolutional Neural Networks (DCNNs) to minimize manual effort in dataset preparation. DL-AAT integrates YOLO-Light, a state-of-the-art object detection module, to accurately localize suspicious objects while incorporating an efficient deep encoder for automatic facial expression classification. This eliminates the need for separate classification modules, streamlining the annotation workflow. Designed for computational efficiency, DL-AAT enhances scalability and adaptability for large-scale datasets while supporting seamless integration with custom object detection, segmentation, and classification models tailored to specific applications. The annotation performance is rigorously evaluated using Cohen’s Kappa coefficient on multiple benchmark datasets, including FER20E, FER2013, AffectNet, COCO, and OpenImage. This demonstrates its effectiveness in reducing human intervention while maintaining high annotation precision.
External IDs:doi:10.1007/978-3-032-15809-3_32
Loading