Keywords: Tokamaks, LLM, RAG, copilot
TL;DR: We present a retrieval augmented generation system that assists the operators of tokamaks, evaluate the system, and give thoughts on future directions for systems like this.
Abstract: The tokamak is one of the most promising approaches for achieving nuclear fusion as an energy source. As such, many tokamaks have been built with rich experimental histories and datasets. While the quantitative data generated by tokamaks is invaluable, tokamaks also generate another, often underutilized data stream: text logs written by experimental operators. In this work, we leverage these extensive text logs by employing Retrieval-Augmented Generation (RAG) with state-of-the-art large language models (LLMs) to create a prototype "copilot". Instances of this copilot were created using text logs from the fusion experiments DIII-D and Alcator C-Mod and deployed for researchers to use. In this paper, we report on the datasets and methodology used to create this ``copilot", along with its performance on three use cases: 1) semantic search of experiments, 2) assisting with device-specific operations, and 3) answering general tokamak questions. Although we found via a survey of researchers that for general tokamak operations questions RAG doesn't offer a clear advantage over the base GPT-4 model, in the first two use cases, we observe clear advantages that RAG offers over base LLMs and simple keyword search.
Submission Track: Original Research
Submission Number: 49
Loading