Text-to-CSEQL: Taking a Step in Natural Language Search for Cyberspace Assets Using Large Language Model
Abstract: Cyberspace search engines (CSEs) are systems designed to search and index information about network assets in cyberspace.
However, using CSEs to grasp the status of cyberspace assets still encounters challenges in usability, due to the specificity of terminology and the professionalism of cyberspace search engine query language (CSEQL).
To improve the usability of CSEs, it is essential to support natural language querying.
To this end, we propose a task called Text-to-CSEQL, which aims to translate natural language into CSEQL.
We propose a method based on the large language model (LLM) to enable natural language interaction with CSEs.
Specifically, we adopt retrieval-augmented generation (RAG) techniques with LLM by constructing a knowledge base.
Upon receiving a natural language input, it extracts relevant fields and examples from the knowledge base, crafting a well-formed prompt for LLM.
To comprehensively assess the method, we construct a dataset related to Text-to-CSEQL and design a new domain-specific evaluation metric, called Field Match (FM).
Extensive experiments demonstrate that our framework is highly effective, outperforming existing methods.
In addition, our method is adaptable and can accommodate various CSEs.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: text-to-text generation, retrieval-augmented generation
Contribution Types: NLP engineering experiment
Languages Studied: English, Chinese
Submission Number: 5614
Loading