Evaluating open- and closed-source LLM-based chatbots to combat cybercrime targeting senior citizens

Evaluating open- and closed-source LLM-based chatbots to combat cybercrime targeting senior citizens

ACL ARR 2025 May Submission3605 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This study evaluates state-of-the-art open-source and closed-source models that we trained on a cybersecurity Q&A task to assist senior citizens in recognizing and responding to cybercrimes. We evaluate five LLMs and their finetuned variants using automatic evaluation metrics such as F1, BertScore, n-gram based overlap, and human evaluation of response quality through seven criteria, including accuracy, relevance, and usefulness. Our evaluation results show that several open-source models, particularly fine-tuned variants, outperform the closed-source model, with Mistral3-LoRA leading on nearly all automatic evaluation metrics and LLaMA3.1-LoRA achieving the highest recall. However, ChatGPT-4o slightly outperforms in the human evaluation task, with annotators preferring its responses for their formatting and polished language. Our chatbot application, code, and data is available at https://anonymous.4open.science/r/SeniorSafeAI-36F4/.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: Interdisciplinary recontextualization of NLP; Cybersecurity; Evaluation and metrics; Human evaluation; Automatic evaluation; Large language models; Senior citizens

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 3605

Loading