Text-to-CSEQL: Taking a Step in Natural Language Search for Cyberspace Assets Using Large Language Model

Text-to-CSEQL: Taking a Step in Natural Language Search for Cyberspace Assets Using Large Language Model

ACL ARR 2025 February Submission5614 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Cyberspace search engines (CSEs) are systems designed to search and index information about network assets in cyberspace. However, using CSEs to grasp the status of cyberspace assets still encounters challenges in usability, due to the specificity of terminology and the professionalism of cyberspace search engine query language (CSEQL). To improve the usability of CSEs, it is essential to support natural language querying. To this end, we propose a task called Text-to-CSEQL, which aims to translate natural language into CSEQL. We propose a method based on the large language model (LLM) to enable natural language interaction with CSEs. Specifically, we adopt retrieval-augmented generation (RAG) techniques with LLM by constructing a knowledge base. Upon receiving a natural language input, it extracts relevant fields and examples from the knowledge base, crafting a well-formed prompt for LLM. To comprehensively assess the method, we construct a dataset related to Text-to-CSEQL and design a new domain-specific evaluation metric, called Field Match (FM). Extensive experiments demonstrate that our framework is highly effective, outperforming existing methods. In addition, our method is adaptable and can accommodate various CSEs.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: text-to-text generation, retrieval-augmented generation

Contribution Types: NLP engineering experiment

Languages Studied: English, Chinese

Submission Number: 5614

Loading