A modified vision transformer architecture with scratch learning capabilities for effective fire detection
Abstract: Highlights•We used a novel tokenization mechanism in ViT for effective fire scene classification.•Employing a locality attention mechanism with a tuned three dense layered-block of multi-head unit in MLP.•Extensive experimental evaluations via three benchmark and a self-created datasets.•The proposed model achieved state-of-the-art performance.
External IDs:dblp:journals/eswa/YarKHB24
Loading