WebPromptM2: A Website Classification Method Leveraging Prompt-Based Learning with Multimodal Features
Abstract: Website classification proves crucial for tasks like malicious website detection and information management. Current methods typically focus on effective feature extraction and algorithm selection to create balanced website datasets, often leading to decreased performance due to data imbalance. In this study, we propose an intelligent website classification method(WebPromptM2) based on prompt-based learning with multimodal features. We design a prompt template which incorporates the textual and visual elements of the website, thereby facilitating a multimodal representation of the website, then leverage domain-specific expertise to establish mapping relationships between website categories and a label word set. Finally, we fine-tune the masked pre-trained language model (PLM) and map the prediction results to the categories. We find that our method increases recognition accuracy of tail classes and achieves superior performance on long-tail and short-tail datasets.
External IDs:dblp:conf/cscwd/LiuG0S0C24
Loading