['1c1', '< Title: TOWARDS LLM4QPE: UNSUPERVISED PRETRAINING OF QUANTUM PROPERTY ESTIMATION AND A BENCH-MARK', '---', '> Title: LLM4QPE: A Large Language Model Style Paradigm for Unsupervised Pretraining and Property Estimation of Quantum Systems', '3c3', '< Abstract: Estimating the properties of quantum systems such as quantum phase has been critical in addressing the essential quantum many-body problems in physics and chemistry. Deep learning models have been recently introduced to property estimation, surpassing conventional statistical approaches. However, these methods are tailored to the specific task and quantum data at hand. It remains an open and attractive question for devising a more universal task-agnostic pretraining model for quantum property estimation. In this paper, we propose LLM4QPE, a large language model style quantum task-agnostic pretraining and finetuning paradigm that 1) performs unsupervised pretraining on diverse quantum systems with different physical conditions; 2) uses the pretrained model for supervised finetuning and delivers high performance with limited training data, on downstream tasks. It mitigates the cost for quantum data collection and speeds up convergence. Extensive experiments show the promising efficacy of LLM4QPE in various tasks including classifying quantum phases of matter on Rydberg atom model and predicting two-body correlation function on anisotropic Heisenberg model.', '---', "> Abstract: Estimating properties of quantum systems, such as quantum phases, is fundamental to addressing complex quantum many-body problems in physics and chemistry. While deep learning models have shown promise in quantum property estimation (QPE), they are typically specialized for specific tasks and data, limiting their generalizability and requiring extensive labeled data. This paper introduces LLM4QPE, a novel Large Language Model (LLM)-style paradigm for quantum task-agnostic pretraining and finetuning. LLM4QPE addresses the limitations of existing methods by: 1) performing unsupervised pretraining on vast, diverse quantum datasets under varying physical conditions to learn universal quantum intricacies; and 2) leveraging the pretrained model for supervised finetuning on downstream tasks, achieving high performance with significantly limited labeled training data and accelerating convergence. This approach substantially mitigates the high cost and computational burden associated with quantum data collection and labeling. We demonstrate LLM4QPE's superior efficacy through extensive experiments on critical QPE tasks, including classifying quantum phases of matter on the Rydberg atom model and predicting two-body correlation functions on the anisotropic Heisenberg model. Our results highlight LLM4QPE's potential to revolutionize QPE, especially in resource-constrained scenarios.", '6,13c6', '< Estimating quantum system properties such as quantum phase is essential for verifying and evaluating quantum technologies (Huang et al., 2020;Gočanin et al., 2022), which is often in the form of many-body problems. Precise estimation of generic quantum systems is challenged due to the exponential complexity inherent in describing quantum many-body systems (Gebhart et al., 2023). Fortunately, physical systems of interest such as those generated by the dynamics of local Hamiltonians are not generic, since their particular structure guarantees that the full complexity of Hilbert space is in principle not required for their accurate description (Carrasquilla et al., 2019). Accordingly, statistical (including learning-based) approaches have emerged to characterize quantum systems from traditional Density Functional Theory (DFT) (Hohenberg & Kohn, 1964), Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986), to advanced variational methods e.g. Tensor Networks (TNs) (Orús, 2019) and Neural Network Quantum States (NNQS) (Zhang & Di Ventra, 2023).', '< There are basically two categories of variational methods for quantum property estimation (QPE). The first category refers to the TNs and NNQS which formulate QPE as an optimization problem where the quantum state is approximately represented by a parameterized wave function. The parameterized wave function is updated by minimizing the expectation values of relevant observable estimators, based on either density matrix renormalization group (DMRG) algorithm (White, 1992) or variational Monte Carlo (VMC) (McMillan, 1965). Afterwards the interested properties can be analyzed by preforming algebra operations on the wave function. Another line of research resorts to neural networks to serve as universal functions for directly approximating quantum system properties (Gilmer et al., 2017;Kawai & Nakagawa, 2020;Xiao et al., 2022), which we call NNQPE. The input to the neural networks is the measurement results of the quantum state, and the output is the property of interest. The parameters are optimized using gradient descent. The goal of NNQPE is to accurately characterize the properties of the quantum state using as few identical copies and measurements as possible. Compared with the TNs, this class of methods could more easily display nonlocal correlations, allowing in principle to capture quantum states with higher entanglement (Huang et al., 2022). Moreover, rather than TNs and the NNQS where additional computational overheads is required to extract the properties given the optimized parameterized wave function, NNQPE can directly predict the properties for unknown quantum states.', '< However, NNQPE suffers generalization ability issue, especially given limited measurement data for training (Gebhart et al., 2023). Although the generalizability could be improved by training the models based on extensive measurement data and corresponding labels, the labeling process, i.e., accurately estimating properties of quantum systems requires computational and memory resources that increase exponentially with the system size (Carleo et al., 2019). In particular, the labeling efforts for quantum systems are intensive. For example, DFT suffers from self-interaction error and delocalization error, making it difficult to represent quantum states with strong correlations (Verma & Truhlar, 2020). The sign problem (Loh Jr et al., 1990) implies that it is intractable for QMC to evaluate properties for large systems or systems with low temperatures (Troyer & Wiese, 2005;Huang et al., 2022). The maximum bond dimensions of TNs for precisely preserving the properties of quantum states such as the entanglement entropy scales exponentially w.r.t. the evolution time (Brandao & Horodecki, 2015). In conclusion, the labeling process is hard to complete classically due to the inherent separation between quantum and classical computing.', '< Furthermore, despite the significant promise of NNQPE, their application in harnessing advanced machine learning techniques for quantum physics remains in its early stages. Current models of NNQPE are tailored and trained for particular quantum systems and specific tasks. This approach contrasts sharply with the era of Large Language Models (LLMs) (Radford et al., 2018;Brown et al., 2020), which have achieved general-purpose language generation and understanding capabilities. In the realm of LLMs, pretraining serves as the primary method for capturing general language understanding and afterwards finetuning is adopted to adapt the model to accomplish specialized tasks. This distinction highlights the nascent yet evolving nature of applying sophisticated machine learning strategies within the quantum physics domain.', "< In fact, with the increasing scale of the quantum devices, a vast amount of quantum data are produced by quantum measurement (Brydges et al., 2019). Such data holds intricate details about the system. An open question is designing a versatile model, which undergoes extensive pretraining to master these quantum intricacies. The success of deep learning in handling high-dimensional data sheds lights on answering this question. First, the sheer volume of quantum data from measurements allows for the extraction of meaningful patterns and representations (Anshu & Arunachalam, 2024). Second, the universal approximation capabilities of neural networks suggest that given sufficient data and computational resources, it's possible to model the complex, nonlinear relationships inherent in quantum systems (Carleo et al., 2019;Gebhart et al., 2023). Lastly, the task-agnostic nature of pretraining (Liu et al., 2023) aligns with the quantum realm's diversity, where a single model can learn hidden features across various systems and physical conditions. This feasibility is further supported by the principle of transfer learning (Weiss et al., 2016), where knowledge gained in one context can significantly benefit task-specific applications.", '< In this paper, we introduce an LLM-style task-agnostic pretraining model for Quantum Property Estimation named LLM4QPE. This model is pretrained by leveraging vast (unlabeled) quantum data, across diverse quantum systems of the same family govern by different physical conditions. For the downstream tasks, we finetune LLM4QPE on two typical QPE tasks including classifying quantum phases of matter and predicting two-body correlation function. We also consider two families of quantum model including the Rydberg atom model and the anisotropic Heisenberg model. The results show its promising power for tackling QPE problems especially in scenarios with limited data availability. The contributions are: 1) Departure from most existing supervised learning QPE models reliant on restricted, task-specific labeled quantum data, we propose LLM4QPE, to our best knowledge, the first LLM-style model for quantum property estimation. Its unsupervised pretraining is fulfilled by maximizing the expected log likelihood of measurement bit strings, which is entirely unsupervised and task-agnostic.', '< 2) We develop the novel architecture of our LLM4QPE model. Specifically, to embed the batch-style discrete measurement records to a continuous space, a trainable LSTM embedding layer is attached to the transformer decoder. The LSTM-Transformer architecture provides an innate framework for handling diverse quantum data stemming from experiments under varying physical conditions, enabling prediction of the property of quantum systems of the same family.', '< 3) We collect a set of quantum data from simulations for unsupervised pretraining and supervised finetuning. For pretraining, the dataset consists of quantum state measurement records, the size of which scales linearly w.r.t. the system size and the number of measurements, along with the values of physical condition variables determining the evolution of quantum systems. Downstream tasks utilize a set of data generated from quantum systems of the same family, with additional system properties serving as labels for tasks like phase classification and correlation prediction. 4) We verify the superiority of our approach by empirical studies on two QPE tasks: classifying quantum phases of matter on Rydberg atom model and predicting two-body correlation function on anisotropic Heisenberg model, given limited measurements on a resource-limited device.', '---', '> Estimating quantum system properties, such as quantum phases, is a cornerstone for advancing and validating quantum technologies (Huang et al., 2020; Gočanin et al., 2022). These estimations often involve solving quantum many-body problems, which are notoriously challenging due to the exponential complexity inherent in describing generic quantum systems (Gebhart et al., 2023). However, physical systems of interest, particularly those governed by local Hamiltonians, possess a specific structure that circumvents the need for the full Hilbert space complexity (Carrasquilla et al., 2019). This inherent structure has paved the way for the emergence of various statistical and learning-based approaches, ranging from traditional Density Functional Theory (DFT) (Hohenberg & Kohn, 1964) and Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986) to advanced variational methods like Tensor Networks (TNs) (Orús, 2019) and Neural Network Quantum States (NNQS) (Zhang & Di Ventra, 2023).', '14a8,21', '> Variational methods for Quantum Property Estimation (QPE) broadly fall into two categories. The first category includes TNs and NNQS, which frame QPE as an optimization problem. Here, the quantum state is approximated by a parameterized wave function, updated by minimizing expectation values of relevant observable estimators using algorithms such as Density Matrix Renormalization Group (DMRG) (White, 1992) or Variational Monte Carlo (VMC) (McMillan, 1965). Subsequently, desired properties are extracted via algebraic operations on the optimized wave function. The second category, termed Neural Network Quantum Property Estimation (NNQPE), employs neural networks as universal function approximators to directly predict quantum system properties (Gilmer et al., 2017; Kawai & Nakagawa, 2020; Xiao et al., 2022). NNQPE models take measurement results of the quantum state as input and directly output the property of interest, optimizing parameters via gradient descent. The primary objective of NNQPE is to accurately characterize quantum state properties with minimal identical copies and measurements. Compared to TNs, NNQPE methods can more readily capture non-local correlations and higher entanglement (Huang et al., 2022). Furthermore, NNQPE offers a direct prediction mechanism, circumventing the additional computational overhead required by TNs and NNQS to extract properties from optimized wave functions.', '> ', '> Despite its advantages, NNQPE faces significant challenges, particularly regarding generalization ability when confronted with limited measurement data for training (Gebhart et al., 2023). Improving generalizability often necessitates extensive measurement data and corresponding labels. However, the process of accurately labeling quantum system properties is computationally and memory-intensive, scaling exponentially with system size (Carleo et al., 2019). The labeling burden for quantum systems is particularly acute: DFT struggles with self-interaction and delocalization errors for strongly correlated quantum states (Verma & Truhlar, 2020); the sign problem renders QMC intractable for large or low-temperature systems (Loh Jr et al., 1990; Troyer & Wiese, 2005; Huang et al., 2022); and the maximum bond dimensions for TNs to preserve properties like entanglement entropy scale exponentially with evolution time (Brandao & Horodecki, 2015). These fundamental limitations underscore the difficulty of classical labeling due to the inherent quantum-classical computational divide.', '> ', '> Moreover, the application of advanced machine learning techniques in quantum physics, particularly NNQPE, is still in its nascent stages. Current NNQPE models are typically bespoke, trained for specific quantum systems and tasks. This contrasts sharply with the transformative success of Large Language Models (LLMs) (Radford et al., 2018; Brown et al., 2020), which have achieved remarkable general-purpose language generation and understanding capabilities through a paradigm of extensive pretraining followed by specialized finetuning. This LLM paradigm, where pretraining captures broad knowledge and finetuning adapts to specific tasks, represents a powerful, yet largely unexplored, avenue for quantum physics.', '> ', '> The escalating scale of quantum devices is generating vast amounts of quantum data from measurements (Brydges et al., 2019), rich with intricate details about quantum systems. This presents a compelling opportunity to develop a versatile model capable of mastering these quantum intricacies through extensive pretraining. The success of deep learning in high-dimensional data processing provides a strong foundation for this endeavor. Firstly, the sheer volume of quantum measurement data enables the extraction of meaningful patterns and representations (Anshu & Arunachalam, 2024). Secondly, the universal approximation capabilities of neural networks suggest that complex, nonlinear relationships in quantum systems can be modeled given sufficient data and resources (Carleo et al., 2019; Gebhart et al., 2023). Lastly, the task-agnostic nature of pretraining (Liu et al., 2023) is ideally suited for the diverse quantum realm, allowing a single model to learn hidden features across various systems and physical conditions. This is further bolstered by the principle of transfer learning (Weiss et al., 2016), where knowledge acquired in one context can significantly benefit related applications.', '> ', "> In this paper, we introduce LLM4QPE (Large Language Model for Quantum Property Estimation), an LLM-style task-agnostic pretraining and finetuning paradigm. LLM4QPE is pretrained using extensive unlabeled quantum data collected from diverse quantum systems within the same family, governed by varying physical conditions. For downstream tasks, we finetune LLM4QPE on two representative QPE problems: classifying quantum phases of matter and predicting two-body correlation functions. We validate our approach using two distinct families of quantum models: the Rydberg atom model and the anisotropic Heisenberg model. Our empirical results demonstrate LLM4QPE's superior performance, particularly in scenarios with limited data availability, highlighting its potential to address a critical bottleneck in QPE. Our key contributions are:", '> 1)  **A Novel LLM-style Paradigm for QPE:** We propose LLM4QPE, the first LLM-style model for quantum property estimation. Unlike most existing supervised QPE models that depend on restricted, task-specific labeled quantum data, LLM4QPE employs a fully unsupervised and task-agnostic pretraining procedure, maximizing the expected log-likelihood of measurement bit strings to learn fundamental quantum system characteristics.', '> 2)  **Innovative Architecture for Quantum Data:** We develop a novel architecture for LLM4QPE that effectively handles diverse quantum data. Specifically, a trainable Long Short-Term Memory (LSTM) embedding layer is integrated with a Transformer decoder to embed batch-style discrete measurement records into a continuous feature space. This LSTM-Transformer architecture provides an inherent framework for processing quantum data from experiments under varying physical conditions, enabling robust property prediction for quantum systems within the same family.', '> 3)  **Comprehensive Quantum Dataset Collection:** We curate and utilize a specialized set of quantum data from simulations for both unsupervised pretraining and supervised finetuning. The pretraining dataset comprises quantum state measurement records, scaling linearly with system size and number of measurements, alongside corresponding physical condition variables. Downstream tasks leverage finetuning datasets generated from the same family of quantum systems, augmented with system properties as labels for tasks like phase classification and correlation prediction.', "> 4)  **Empirical Validation and Superior Performance:** We rigorously verify the efficacy of LLM4QPE through extensive empirical studies on two challenging QPE tasks: classifying quantum phases of matter on the Rydberg atom model and predicting two-body correlation functions on the anisotropic Heisenberg model. Our results consistently demonstrate LLM4QPE's superior performance, especially under conditions of limited measurement information on resource-constrained devices.", '> ', '16,22c23,29', '< We introduce basic concepts of quantum computing. Please refer to (Nielsen & Chuang, 2010) for more details. We put the details on related work to Appendix A.', '< Quantum State and Density Operator. The quantum bit named as qubit is the basic unit of the quantum system. We call the ensemble of all qubits in a (sub)system the quantum state. The qubit is in superposition and becomes deterministic once the measurement is performed on it. How a quantum state is described mathematically depends on the chosen basis state. For example, by using two orthogonal computational basis states1 |0⟩ = 1 0 and |1⟩ = 0 1 , one qubit can be described mathematically as a linear combination |ϕ⟩ = α|0⟩ + β|1⟩ = α β in the space C 2 , where α, β ∈ C are the amplitudes satisfying |α| 2 + |β| 2 = 1. An alternate formulation for describing the quantum state is possible using a tool known as the density operator or density matrix. For example, the density matrix of |0⟩ is ρ 0 = |0⟩⟨0| = 1 0 0 0 where ⟨0| denotes the conjugate transpose of |0⟩. For a generic L-qubit quantum state, it can be described by the so called wave function:', '< |ψ⟩ = M σ1=1 • • • M σ L =1 Ψ(σ 1 , . . . , σ L )|σ 1 , . . . , σ L ⟩,(1)', '< where Ψ : Z L → C maps a fixed configuration σ = (σ 1 , . . . , σ L ) of L qubits to a complex number satisfying', '< M σ1=1 • • • M σ L =1 |Ψ(σ 1 , . . . , σ L )| 2 = 1', '< , and σ i ∈ {1, . . . , M } is one of the M possible outcomes by performing quantum measurement on the i-th qubit. The wave function is formulated in a complex Hilbert space where the vector representation of the quantum state |ψ⟩ ∈ C M L and its density matrix |ψ⟩⟨ψ| ∈ C M L ×M L , which becomes astronomical for large L.', '< Quantum Measurement. It converts some of the quantum information into classical form (for further processing), as described by a set of measurement operators {O m } M m=1 satisfying m O m = I, where M is the total number of operators. Measuring a qubit leads to collapse of the wave function and produces potentially yield different outcomes. The possible outcomes correspond to the indices m of measurement operators. Concretely, upon measuring the qubit ρ, the probability of getting the result m is given by p(m) = tr(ρO m ). For a quantum state with L qubits, the common strategy is to measure each of the qubits in parallel (Leibfried et al., 1996;Jullien et al., 2014). According to the born rule of quantum mechanics, such a measurement procedure outputs a measurement string σ = (σ 1 , . . . , σ L ) where σ i ∈ {1, . . . , M } with probability |Ψ(σ 1 , . . . , σ L )| 2 as given in Eq. 1.', '---', '> This section introduces the foundational concepts of quantum computing essential for understanding LLM4QPE. For a more comprehensive background, readers are referred to (Nielsen & Chuang, 2010). Detailed discussions on related work are provided in Appendix A.', '> Quantum State and Density Operator. The quantum bit, or qubit, is the fundamental unit of information in a quantum system. A quantum state refers to the collective ensemble of all qubits within a (sub)system. A qubit, existing in a superposition of states, collapses to a deterministic outcome upon measurement. The mathematical description of a quantum state is basis-dependent. For instance, using the two orthogonal computational basis states, $|0\\rangle = \\begin{pmatrix} 1 \\\\ 0 \\end{pmatrix}$ and $|1\\rangle = \\begin{pmatrix} 0 \\\\ 1 \\end{pmatrix}$, a single qubit can be described as a linear combination $|\\phi\\rangle = \\alpha|0\\rangle + \\beta|1\\rangle = \\begin{pmatrix} \\alpha \\\\ \\beta \\end{pmatrix}$ in the complex space $\\mathbb{C}^2$, where $\\alpha, \\beta \\in \\mathbb{C}$ are amplitudes satisfying $|\\alpha|^2 + |\\beta|^2 = 1$. An alternative, and often more general, description of a quantum state is provided by the density operator or density matrix. For example, the density matrix for $|0\\rangle$ is $\\rho_0 = |0\\rangle\\langle0| = \\begin{pmatrix} 1 & 0 \\\\ 0 & 0 \\end{pmatrix}$, where $\\langle0|$ denotes the conjugate transpose of $|0\\rangle$. For a generic $L$-qubit quantum state, its wave function is given by:', '> $|\\psi\\rangle = \\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_L=1}^{M} \\Psi(\\sigma_1, \\dots, \\sigma_L)|\\sigma_1, \\dots, \\sigma_L\\rangle$, (1)', '> where $\\Psi: \\mathbb{Z}^L \\to \\mathbb{C}$ maps a fixed configuration $\\sigma = (\\sigma_1, \\dots, \\sigma_L)$ of $L$ qubits to a complex amplitude. These amplitudes satisfy the normalization condition:', '> $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_L=1}^{M} |\\Psi(\\sigma_1, \\dots, \\sigma_L)|^2 = 1$,', '> and $\\sigma_i \\in \\{1, \\dots, M\\}$ represents one of the $M$ possible outcomes from a quantum measurement on the $i$-th qubit. The wave function is formulated within a complex Hilbert space, where the vector representation of the quantum state $|\\psi\\rangle \\in \\mathbb{C}^{M^L}$ and its density matrix $|\\psi\\rangle\\langle\\psi| \\in \\mathbb{C}^{M^L \\times M^L}$ become astronomically large for increasing $L$.', '> Quantum Measurement. Quantum measurement is the process of converting quantum information into a classical form for subsequent processing. This process is described by a set of measurement operators $\\{O_m\\}_{m=1}^M$ satisfying $\\sum_m O_m = I$, where $M$ is the total number of possible outcomes. Measuring a qubit causes the wave function to collapse and yields one of the possible outcomes, corresponding to the index $m$ of the measurement operator. Specifically, upon measuring a qubit in state $\\rho$, the probability of obtaining result $m$ is given by $p(m) = \\text{tr}(\\rho O_m)$. For an $L$-qubit quantum state, a common strategy involves measuring each qubit in parallel (Leibfried et al., 1996; Jullien et al., 2014). According to the Born rule of quantum mechanics, this measurement procedure outputs a measurement string $\\sigma = (\\sigma_1, \\dots, \\sigma_L)$, where each $\\sigma_i \\in \\{1, \\dots, M\\}$, with a probability $|\\Psi(\\sigma_1, \\dots, \\sigma_L)|^2$ as defined in Eq. 1.', '25,26c32,33', '< 3.1 OVERVIEW As shown in Fig. 1, our model involves two steps: pretraining and finetuning. For pretraining, the model is fed with unlabeled D p , and undergoes fully unsupervised training. Subsequently, the pretrained parameters are transferred to the supervised finetuning phase, where all the parameters are updated using labeled data D t for various downstream tasks with their task-specific supervised losses. Finally, we evaluate our LLM4QPE using dataset D e . Each downstream finetuning model possesses separate parameters, even though they initially share the same pretrained parameters. One of the most notable aspects of our model is the consistent structural similarity between pretraining and finetuning, with only a few small modifications when handling different downstream tasks.', '< The description of the quantum data is discussed in Sec. 3.2. We make an analogy between quantum data and text that, each measurement outcome σ i of a qubit is analogue to the token, and the number of the possible outcomes M is likely to the vocabulary size |V|. A measurement string σ, which resembles the sentence in texts, is a projection of the entire quantum system with correlative effects among them. The collection of measurement records R comprised of many measurement strings from various physical conditions are akin to the corpus gathered from various sources and genres. In fact, these have also been mentioned implicitly in (Sharir et al., 2020;Hibat-Allah et al., 2020;Cha et al., 2021;Zhang & Di Ventra, 2023). Yet existing works are still confined to the single task for training and testing, involving no pretraining. Our model, in contrast, draws inspiration from LLMs to handle quantum data. Specifically, the data type and data collection strategies are described in Sec. 3.2 and more details can be found in Appendix B. Given the generated datasets, we first discuss how to unsupervisely pretrain LLM4QPE in Sec. 3.3. Afterwards the pretrained parameters are updated towards a supervised loss for different tasks, as presented in Sec. 3.4.', '---', '> 3.1 OVERVIEW As illustrated in Fig. 1, the LLM4QPE paradigm consists of two primary stages: pretraining and finetuning. In the pretraining phase, the model is exposed to a large volume of unlabeled quantum data, $D_p$, undergoing a fully unsupervised training process. Subsequently, the learned parameters from pretraining are transferred to the supervised finetuning phase. Here, all model parameters are further optimized using labeled data, $D_t$, for various downstream tasks, each guided by its specific supervised loss function. Finally, the performance of LLM4QPE is evaluated using a dedicated test dataset, $D_e$. It is important to note that while downstream finetuning models initially share the same pretrained parameters, they ultimately possess separate, task-specific parameters. A key design principle of LLM4QPE is the consistent structural similarity between its pretraining and finetuning configurations, requiring only minor modifications to adapt to different downstream tasks.', "> The detailed description of the quantum data is provided in Section 3.2. We draw a powerful analogy between quantum data and natural language text: each measurement outcome $\\sigma_i$ of a qubit is analogous to a 'token', and the total number of possible outcomes $M$ is akin to the 'vocabulary size' $|V|$. A measurement string $\\sigma$, which represents a projection of the entire quantum system with inherent correlative effects, resembles a 'sentence' in textual data. Furthermore, a collection of measurement records $R$, comprising many measurement strings from diverse physical conditions, can be considered a 'corpus' gathered from various sources and genres. While similar analogies have been implicitly explored in prior works (Sharir et al., 2020; Hibat-Allah et al., 2020; Cha et al., 2021; Zhang & Di Ventra, 2023), these existing methods are largely confined to single-task training and testing, without incorporating a pretraining step. Our LLM4QPE model, in stark contrast, explicitly draws inspiration from the LLM paradigm to process and understand quantum data. Specifically, the data types and collection strategies are elaborated in Section 3.2, with further details in Appendix B. Given these generated datasets, we first detail the unsupervised pretraining of LLM4QPE in Section 3.3. Following this, the pretrained parameters are adapted and optimized towards a supervised loss for various tasks, as presented in Section 3.4.", '29,30c36,40', '< We first provide the definition of the quantum dataset in Def. 1 in which the procedures of quantum dataset generation are provided. An easy-to-understand flowchart is also provided in Fig. 2. Definition 1 (Quantum Dataset). The quantum dataset is described as D = {s i }. Each sample s i = (R i , c i , p i ) contains the measurement records R i , the physical condition variables c i and the (optional) system property variables p i . Let L denote the number of qubits, K represent the number of copies of each quantum state and M denote the number of possible outcomes by performing measurement on a single qubit. We explain their meaning in detail below.', '< 1) c i ∈ R C represents the physical condition variables controlling the evolution of the quantum system. These variables can be directly obtained when initializing quantum experiments. The types of the variables could be system size, coupling strength of Hamiltonians, etc. 2) The measurement records, denoted as R i ∈ Z K×L , are outcomes generated by quantum measurement. A quantum state is generated by evolving the system under a fixed physical condition ', '---', '> We begin by formally defining the quantum dataset in Definition 1, which also outlines the procedures for its generation. An intuitive flowchart visualizing this process is presented in Fig. 2. Definition 1 (Quantum Dataset). A quantum dataset is represented as $\\mathcal{D} = \\{s_i\\}$. Each sample $s_i = (R_i, c_i, p_i)$ comprises measurement records $R_i$, physical condition variables $c_i$, and (optionally) system property variables $p_i$. Let $L$ denote the number of qubits, $K$ represent the number of copies of each quantum state, and $M$ denote the number of possible outcomes from a single-qubit measurement. We elaborate on these components below:', '> 1)  $c_i \\in \\mathbb{R}^C$ represents the physical condition variables that govern the evolution of the quantum system. These variables, such as system size, coupling strength of Hamiltonians, etc., are directly accessible during the initialization of quantum experiments.', '> 2)  The measurement records, denoted as $R_i \\in \\mathbb{Z}^{K \\times L}$, are outcomes generated by quantum measurements. A quantum state is prepared by evolving the system under a fixed physical condition $c_i$. Subsequently, quantum measurements are performed independently on each qubit in parallel using a set of measurement operators $\\{O_m\\}_{m=1}^M$. Performing measurements on $L$ qubits yields a measurement string, represented as $\\sigma = (\\sigma_1, \\dots, \\sigma_L)$, where each $\\sigma_l \\in \\{1, \\dots, M\\}$. This measurement procedure is repeated $K$ times for each copy of the quantum state. Finally, we collect $K \\times L$ measurement outcomes and store them within $R_i$.', '> 3)  (Optional) Certain system property $p_i \\in \\mathbb{R}^P$ represents the statistics of the quantum system conditioned on $c_i$, such as quantum phase, correlation function, entanglement entropy, purity, etc. The exact values of $p_i$ can be calculated through classical post-processing by analyzing either the wave functions or measurement statistics. We treat these properties as supervised labels used for finetuning the model.', '> It is important to mention that our quantum dataset generation process is akin to that described in Wang et al. (2022). A key distinction, however, is that LLM4QPE explicitly requires ground-truth labels of system properties for finetuning. This contrasts with the approach in Wang et al. (2022), where authors propose reconstructing the quantum state via unsupervised learning on measurement records, followed by classical shadow (Huang et al., 2020) for predicting specific quantum properties. This two-step strategy often introduces additional computational overheads. Furthermore, our experiments demonstrate that parameters optimized within LLM4QPE for specific objectives, such as quantum phases of matter and correlation functions, consistently lead to superior performance in our numerical results.', '50,52c60', '< Figure 2: Process of generating the quantum dataset. a) For each qubit of the quantum system, we perform quantum measurement using operators {O m } M m=1 and obtain an integer outcome m with probability p(m). b) Consider the quantum system govern by different physical conditions. Quantum measurements are performed on an ensemble of identical quantum states evolved under each of fixed physical conditions. Measurement can be done parallel for all the qubits of single copy of the quantum state and outputs a measurement string. This process is applicable and feasible to existing digital and analog quantum computers. c) The collected data are structured and packed into a series of tensors, which can be efficiently stored into classical devices and easy to process. specialized by c i . Afterwards quantum measurement is performed independently on each qubit in parallel using a set of measurement operators {O m } M m=1 . Performing measurement on L qubits results in a measurement string, represented as σ = (σ 1 , . . . , σ L ) where each σ l ∈ {1, . . . , M }. The measurement procedures above are repeated K times for each copy of the quantum state. Finally, we collect K × L measurement outcomes and store them within R i .', '< 3) (Optional) Certain system property p i ∈ R P represents the statistics of the quantum system conditioned on c i , such as the quantum phase, correlation function, entanglement entropy, purity, etc. The exact values of p i can be calculated by classical post-processing by analyzing the either the wave functions or measurement statistics. We treat these properties as supervised labels which used for finetuning the model.', '< It should be mentioned that the process of quantum dataset generation above is closed to Wang et al. (2022). The difference is that LLM4QPE requires additional ground-truth labels of system properties for finetuning, rather than the suggestions of Wang et al. (2022) in which the authors propose to reconstruct the quantum state by unsupervised learning on measurement records, afterwards classical shadow (Huang et al., 2020) is required to predict specific quantum properties. The two step strategy often introduces additional overheads. Furthermore, our experiments indicate that parameters in LLM4QPE are specifically optimized for corresponding objectives such as quantum phase of matters and correlation function, which often leads to superior performance in our numerical results.', '---', '> Figure 2: Process of generating the quantum dataset. a) For each qubit of the quantum system, we perform quantum measurement using operators $\\{O_m\\}_{m=1}^M$ and obtain an integer outcome $m$ with probability $p(m)$. b) Consider the quantum system governed by different physical conditions. Quantum measurements are performed on an ensemble of identical quantum states evolved under each of fixed physical conditions. Measurement can be done in parallel for all the qubits of a single copy of the quantum state and outputs a measurement string. This process is applicable and feasible to existing digital and analog quantum computers. c) The collected data are structured and packed into a series of tensors, which can be efficiently stored into classical devices and are easy to process.', '55,65c63,70', "< Unlike the previous studies (Czischek et al., 2022;Zhang & Di Ventra, 2023) which consider the pretraining as a warmup process to find suitable initialization for model's parameters and then finetune the model on the specific system with the same learning objective as pretraining. Instead, LLM4QPE regards the pretraining as the avenue to master the quantum intricacies across different systems of the same family. The pretrained parameters can be transferred towards various downstream tasks. LLM4QPE is pretrained in a fully unsupervised manner, as illustrated in Fig. 1b.", "< Quantum Data for Pretraining. The quantum dataset D p = {R i , c i } Np i=1 used for pretraining is constructed using the strategy discussed in Sec. 3.2. Here we discuss how to reorganize the data to adapt to LLM4QPE's unsupervised pretraining. Let K p be the number of measurement strings used for pretraining. We stack all the input measurement records {R i } Np i=1 along the first dimension and output E in ∈ Z NpKp×L , where each row is a measurement string σ b ∈ Z L . We also construct the matrix C in ∈ R NpKp×C where each row is the values of physical condition variables c b ∈ R C . For both the Rydberg atom model and the anisotropic Heisenberg model, we fix N p = 100 and K p = 1024. For each training iteration, we randomly sample B p rows of E in and C in . Such that the input of the model is", '< {(σ b , c b )|σ b ∈ E in , c b ∈ C in } Bp b=1 with batch size B p .', '< Input Embeddings. As shown in Fig. 1a, we consider three types of embeddings as input to capture the hidden patterns of the quantum system: token embeddings, condition embeddings and position embeddings. Since each element of the measurement string σ b is a discrete integer σ ∈ {1, . . . , M } which resembles to the token in NLP, we use learned embeddings to convert the measurement string σ b with additional start token s and output the token embeddings E t ∈ R Bp×(L+1)×d where d is the feature dimension. We empirically find that encoding the physical condition into the model can further improve the performance. A Feed-Forward Network (FFN) with one hidden layer is used to embed the physical condition c b into the feature vector E c ∈ R Bp×d . It is treated as a sentence-level embedding which will be added to all of the L measurement tokens, and we call it the global embedding. Subsequently, the input embeddings are the (broadcasting) summation E out = E t + E c + E p where E p is the positional embeddings as the same as (Vaswani et al., 2017). E out is then processed by deeper layers in the discussion below.', '< Model Architecture. As depicted in Fig. 1b, the main part of LLM4QPE is a multi-layer transformer decoder which originates from (Vaswani et al., 2017). The input is the embedding E out and the output is H ∈ R Bp×(L+1)×d , which are high-order representations of all the measurement strings and the conditional variables in a batch. Please refer to (Vaswani et al., 2017) for more details on transformer. For pretraining, given a fixed qubit configuration σ = (σ 1 , . . . , σ L ), LLM4QPE attempts to approximate the classical distribution p(σ 1 , . . . , σ L ) = |Ψ(σ 1 , . . . , σ L )| 2 in Eq. 1. Such joint distribution is approximated by factorizing it into a product of conditional probabilities:', '< p(σ 1 , . . . , σ L |c) = L l=1 p(σ l |σ l-1 , . . . , σ 1 , c).', '< (2)', '< The parameters are optimized by minimizing the average negative log-likelihood loss:', '< L unsup = 1 B p (σ,c)∈Dp -log p(σ 1 , . . . , σ L |c),(3)', '< which corresponds to the maximization of (conditional) likelihoods concerning the observed measurement outcomes. Pretraining is entirely unsupervised, enabling the model to be trained on extensive quantum data that encompass a wide range of physical conditions. To maintain the physical validity that restricts the output distribution to be normalized, a general strategy is employed to fix the last layer as the linear projection with softmax activation function, such that the output distribution satisfies', '< M σ1=1 • • • M σ L =1 p(σ 1 , . . . , σ L ) = 1 (see Appendix C for proof).', '---', '> Unlike prior studies (Czischek et al., 2022; Zhang & Di Ventra, 2023) that view pretraining primarily as a warm-up phase to find suitable parameter initialization for subsequent finetuning with the same learning objective, LLM4QPE redefines pretraining as a crucial avenue to master the intricate underlying physics across diverse quantum systems within the same family. The pretrained parameters are then robustly transferable to a wide array of distinct downstream tasks. LLM4QPE is pretrained in a fully unsupervised manner, as comprehensively illustrated in Fig. 1b.', "> Quantum Data for Pretraining. The quantum dataset $D_p = \\{R_i, c_i\\}_{i=1}^{N_p}$ designated for pretraining is constructed following the strategy detailed in Section 3.2. Here, we elaborate on how this data is reorganized to suit LLM4QPE's unsupervised pretraining paradigm. Let $K_p$ denote the number of measurement strings utilized for pretraining. We concatenate all input measurement records $\\{R_i\\}_{i=1}^{N_p}$ along the first dimension to form an input tensor $E_{in} \\in \\mathbb{Z}^{N_p K_p \\times L}$, where each row $\\sigma_b \\in \\mathbb{Z}^L$ represents a single measurement string. Concurrently, we construct a matrix $C_{in} \\in \\mathbb{R}^{N_p K_p \\times C}$, where each row $c_b \\in \\mathbb{R}^C$ corresponds to the physical condition variables. For both the Rydberg atom model and the anisotropic Heisenberg model, we consistently set $N_p = 100$ and $K_p = 1024$. In each training iteration, we randomly sample $B_p$ rows from $E_{in}$ and $C_{in}$. Thus, the input to the model for a given batch is $\\{( \\sigma_b, c_b ) | \\sigma_b \\in E_{in}, c_b \\in C_{in}\\}_{b=1}^{B_p}$ with a batch size $B_p$.", "> Input Embeddings. As depicted in Fig. 1a, LLM4QPE incorporates three distinct types of embeddings as input to effectively capture the hidden patterns inherent in the quantum system: token embeddings, condition embeddings, and position embeddings. Since each element $\\sigma \\in \\{1, \\dots, M\\}$ within a measurement string $\\sigma_b$ is a discrete integer, analogous to a 'token' in Natural Language Processing (NLP), we employ learned embeddings to convert the measurement string $\\sigma_b$ (augmented with an additional start token 's') into token embeddings $E_t \\in \\mathbb{R}^{B_p \\times (L+1) \\times d}$, where $d$ is the feature dimension. Our empirical studies confirm that encoding the physical condition into the model significantly enhances performance. A Feed-Forward Network (FFN) with a single hidden layer is utilized to embed the continuous physical condition $c_b$ into a feature vector $E_c \\in \\mathbb{R}^{B_p \\times d}$. This $E_c$ is treated as a sentence-level embedding, which is broadcasted and added to all $L$ measurement tokens, and is referred to as the global embedding. Subsequently, the final input embeddings are obtained by the broadcasting summation $E_{out} = E_t + E_c + E_p$, where $E_p$ represents the positional embeddings, identical to those used in (Vaswani et al., 2017). $E_{out}$ then serves as the input for deeper layers, as discussed below.", '> Model Architecture. As illustrated in Fig. 1b, the core component of LLM4QPE is a multi-layer Transformer decoder, derived from the architecture proposed in (Vaswani et al., 2017). The input to this decoder is the embedding $E_{out}$, and its output is $H \\in \\mathbb{R}^{B_p \\times (L+1) \\times d}$, which represents high-order representations of all measurement strings and conditional variables within a given batch. For a detailed exposition of the Transformer architecture, please refer to (Vaswani et al., 2017). During pretraining, given a fixed qubit configuration $\\sigma = (\\sigma_1, \\dots, \\sigma_L)$, LLM4QPE aims to approximate the classical probability distribution $p(\\sigma_1, \\dots, \\sigma_L) = |\\Psi(\\sigma_1, \\dots, \\sigma_L)|^2$ as defined in Eq. 1. This joint distribution is effectively approximated by factorizing it into a product of conditional probabilities:', '> $p(\\sigma_1, \\dots, \\sigma_L|c) = \\prod_{l=1}^{L} p(\\sigma_l|\\sigma_{l-1}, \\dots, \\sigma_1, c)$. (2)', "> The model's parameters are optimized by minimizing the average negative log-likelihood loss:", '> $\\mathcal{L}_{\\text{unsup}} = \\frac{1}{B_p} \\sum_{(\\sigma,c) \\in D_p} -\\log p(\\sigma_1, \\dots, \\sigma_L|c)$, (3)', '> which corresponds to maximizing the (conditional) likelihoods of the observed measurement outcomes. This entirely unsupervised pretraining enables the model to learn from extensive quantum data spanning a wide range of physical conditions. To ensure physical validity and maintain a normalized output distribution, a standard strategy is employed: the final layer is a linear projection followed by a softmax activation function, guaranteeing that the output distribution satisfies $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_L=1}^{M} p(\\sigma_1, \\dots, \\sigma_L) = 1$ (see Appendix C for a formal proof).', '68,79c73,83', '< The self-attention mechanism in the transformer allows LLM4QPE to model a wide range of downstream tasks, whether it involves classifying quantum phases of matter or predicting the entanglement entropy of quantum states. This adaptability is achieved simply by replacing the relevant inputs and outputs as needed. Rather than the two-step model (Wang et al., 2022) that uses the pretrained model to generate new measurement records conditioning on the physical variables and then predicts quantum properties based on classical shadow (Huang et al., 2020). LLM4QPE is an end-to-end task-agnostic pretrained model to provide property estimation for the quantum system.', '< Quantum Data for Finetuning and Input Embeddings. The dataset', '< D f = {(R j , c j ), p j } N f', '< i=j are generated using the random seed different from the seed for generating D p . Then we split D f to construct train/test dataset D t /D e . It is ensured that the sampled physical conditions for pretraining will not appear in finetuing, i.e. c j / ∈ {c i } for j ∈ {1, . . . , N f }. Note that the physical conditions for finetuning are sampled from the same distribution as the pretraining. The details about the data collection can be found in Appendix B. Unlike the pretraining where the input measurement records is a sentence-level vector σ b ∈ Z L , the input of fine-tuning becomes a batch of measurement records X i ∈ Z L×K f where K f is the number of measurement strings. The reason for such change can be explained through both intuitive and rational perspectives. Intuitively, single measurement string cannot reflect the whole picture of the quantum system. Rationally, predicting the properties of the quantum system in classical computers generally requires exponential number of measurements with respect to the system size L (Gebhart et al., 2023). Even though for some quantum systems with low entanglement, the number stills grows quasi-polynomially with L (Huang et al., 2022). Accordingly, the input of the model is replaced with {(X j , c j ), p j } Bt j=1 where the tuple (X j , c j ) is the input, p j is the corresponding label and B t is the batch size used for supervised finetuning. The embedding is also distinct from that of pretraining. The learned token embeddings for the measurement string σ i is not feasible for the batch-style records X j . To deal with it, a Long Short-Term Memory (LSTM) layer is attached in front of the decoder, as depicted in Fig. 1c. The LSTM layer converts the discrete measurement records X j and outputs high-order embeddings E rnn ∈ R Bt×L×d . The additional embeddings including physical condition embeddings and positional embeddings are transferred from pretraining. The output embedding is the summation given as', '< E out = E rnn + E c + E p transferred .', '< Feature Aggregation and Output Projection. The output of the L-layer transformer decoder is H ∈ R Bt×L×d . For a specific downstream task, the decoder is initialized with the pretrained parameters and all the parameters are finetuned towards a supervised loss. To obtain the feature representation for each of the B t training samples, a feature aggregation layer is attached after the last multi-head attention layer. This layer converts the hidden feature H along the second axis and output H ′ ∈ R Bt×d . Finally, additional linear projection layer is employed to project the feature into H ′′ ∈ R Bt×P , along with a task-dependent activated function which is taken to be tanh for predicting the correlation function, since we have the prior that each element of the label p j is in the range [-1, 1] (See Appendix B for details). While the log-softmax is adopted for classifying quantum phases of matter.', "< Learning Objective. The properties estimation for the quantum system are treated as the supervised learning tasks. Tow types of tasks are considered in this paper, including classifying quantum phases of matter and predicting correlation function. The former belongs to the regression task, while the latter can be regarded as a classification task. For each supervised task, we maintain a consistent architecture within LLM4QPE. We seamlessly integrate task-specific inputs and ground-truth labels into LLM4QPE and proceed to finetune all model's parameters in an end-to-end manner. Given that the training samples are {(X j , c j ), p j } Bt j=1 where B t is the batch size. For classifying quantum phases of matter, p j is the one-hot label. We minimize the observed data negative log-likelihood which yields a supervised loss for classification (with P classes):", '< L sup = - 1 B t j∈{1,...,Nt} P u=1 I [p j,u = 1] log f θ (X j , c j ) u ,(4)', '< where I[•] is an indicator function, N t is the size of training dataset and f θ (•) denotes the prediction of the model with parameters θ to be optimized. For predicting the correlation, p j is the continuous valued label. We adopt the Root Mean Square Error (RMSE) loss:', '< L sup = Lsup , Lsup = 1 B t j∈{1,...,Nt} P u=1 f θ (X j , c j ) u -p j,u 2 . (5', '< )', '< Detailed description of task-specific finetuning can be found in the experiment section.', '---', '> The inherent self-attention mechanism within the Transformer architecture empowers LLM4QPE to adeptly handle a diverse spectrum of downstream tasks, ranging from classifying quantum phases of matter to predicting the entanglement entropy of quantum states. This remarkable adaptability is achieved by simply adjusting the relevant inputs and outputs as required. Crucially, LLM4QPE distinguishes itself from two-step models (e.g., Wang et al., 2022) that first use a pretrained model to generate new measurement records conditioned on physical variables and then predict quantum properties via classical shadow protocols (Huang et al., 2020). Instead, LLM4QPE operates as an end-to-end task-agnostic pretrained model, directly providing property estimations for quantum systems.', "> Quantum Data for Finetuning and Input Embeddings. The finetuning dataset $D_f = \\{(R_j, c_j), p_j\\}_{j=1}^{N_f}$ is generated using a random seed distinct from that used for $D_p$. Subsequently, $D_f$ is partitioned into training ($D_t$) and evaluation ($D_e$) datasets. A critical aspect of our experimental design is ensuring that the physical conditions sampled for pretraining do not overlap with those used for finetuning, i.e., $c_j \\notin \\{c_i\\}$ for $j \\in \\{1, \\dots, N_f\\}$. Nonetheless, the physical conditions for finetuning are sampled from the same underlying distribution as those for pretraining. Further details on data collection can be found in Appendix B. A notable difference from pretraining is the input format for finetuning: instead of a sentence-level vector $\\sigma_b \\in \\mathbb{Z}^L$, the input becomes a batch of measurement records $X_j \\in \\mathbb{Z}^{L \\times K_f}$, where $K_f$ is the number of measurement strings per sample. This change is justified by both intuitive and rational considerations. Intuitively, a single measurement string provides only a partial snapshot of the quantum system; a collection of strings offers a more complete picture. Rationally, accurately predicting quantum system properties on classical computers typically demands an exponential number of measurements with respect to the system size $L$ (Gebhart et al., 2023), or at least quasi-polynomially for low-entanglement systems (Huang et al., 2022). Accordingly, the model's input is structured as $\\{(X_j, c_j), p_j\\}_{j=1}^{B_t}$, where the tuple $(X_j, c_j)$ is the input, $p_j$ is the corresponding label, and $B_t$ is the batch size for supervised finetuning. The embedding strategy also differs from pretraining. The learned token embeddings suitable for single measurement strings $\\sigma_i$ are not directly applicable to the batch-style records $X_j$. To address this, a Long Short-Term Memory (LSTM) layer is integrated at the forefront of the decoder, as depicted in Fig. 1c. This LSTM layer processes the discrete measurement records $X_j$ and outputs high-order embeddings $E_{rnn} \\in \\mathbb{R}^{B_t \\times L \\times d}$. The additional embeddings, including physical condition embeddings ($E_c$) and positional embeddings ($E_p$), are directly transferred from the pretrained model. The final output embedding for finetuning is the summation:", '> $E_{out} = E_{rnn} + E_c + E_{p}^{\\text{transferred}}$.', "> Feature Aggregation and Output Projection. The output of the $L$-layer Transformer decoder is $H \\in \\mathbb{R}^{B_t \\times L \\times d}$. For each specific downstream task, the decoder is initialized with the parameters obtained from pretraining, and all parameters are subsequently finetuned towards a task-specific supervised loss. To derive a concise feature representation for each of the $B_t$ training samples, a feature aggregation layer is appended after the last multi-head attention layer. This layer processes the hidden feature $H$ along the second axis, yielding an aggregated feature $H' \\in \\mathbb{R}^{B_t \\times d}$. Finally, an additional linear projection layer transforms $H'$ into $H'' \\in \\mathbb{R}^{B_t \\times P}$, where $P$ is the dimension of the predicted property. This projection is followed by a task-dependent activation function. For predicting the correlation function, we employ the tanh activation, leveraging the prior knowledge that each element of the label $p_j$ lies within the range $[-1, 1]$ (refer to Appendix B for details). Conversely, for classifying quantum phases of matter, the log-softmax activation function is adopted.", '> Learning Objective. The estimation of quantum system properties is framed as supervised learning tasks. This paper considers two distinct types of tasks: classifying quantum phases of matter (a classification task) and predicting correlation functions (a regression task). For each supervised task, LLM4QPE maintains a consistent underlying architecture. We seamlessly integrate task-specific inputs and ground-truth labels into LLM4QPE and proceed to finetune all model parameters in an end-to-end manner. Given a batch of training samples $\\{(X_j, c_j), p_j\\}_{j=1}^{B_t}$ where $B_t$ is the batch size:', '> For classifying quantum phases of matter, $p_j$ is a one-hot encoded label. We minimize the observed data negative log-likelihood, which translates to a supervised loss for classification with $P$ classes:', '> $\\mathcal{L}_{\\text{sup}} = - \\frac{1}{B_t} \\sum_{j=1}^{N_t} \\sum_{u=1}^{P} \\mathbb{I}[p_{j,u} = 1] \\log f_{\\theta}(X_j, c_j)_u$, (4)', "> where $\\mathbb{I}[\\cdot]$ is an indicator function, $N_t$ is the size of the training dataset, and $f_{\\theta}(\\cdot)$ denotes the model's prediction with parameters $\\theta$ to be optimized.", '> For predicting the correlation function, $p_j$ is a continuous-valued label. We adopt the Root Mean Square Error (RMSE) loss:', '> $\\mathcal{L}_{\\text{sup}} = \\sqrt{\\frac{1}{B_t} \\sum_{j=1}^{N_t} \\sum_{u=1}^{P} (f_{\\theta}(X_j, c_j)_u - p_{j,u})^2}$. (5)', '> A more detailed description of task-specific finetuning configurations can be found in the experimental section.', '82,83c86,87', '< In this section, we present the finetuning results on two quantum property estimation tasks including classifying quantum phases of matter and predicting correlation function. Two families of quan-    tum models are considered -the Rydberg atom model (Bernien et al., 2017) and the anisotropic Heisenberg model (Kranzl et al., 2023).', '< As baseline methods, we basically consider the classical shadow (Huang et al., 2020) -a learningfree protocol for constructing the representation of an unknown quantum state. Besides, we compare with some kernel methods including Radial Basis Function (RBF) Kernel (Huang et al., 2022) and Neural Tangent Kernel (NTK) (Huang et al., 2022). We further consider some advanced deep learning based methods, such as PixelCNN (Sharir et al., 2020) and a classical shadow based generative model (NN-shadow) (Wang et al., 2022) for comparison.', '---', "> In this section, we present a comprehensive evaluation of LLM4QPE's finetuning performance on two distinct quantum property estimation tasks: classifying quantum phases of matter and predicting correlation functions. We investigate two prominent families of quantum models: the Rydberg atom model (Bernien et al., 2017) and the anisotropic Heisenberg model (Kranzl et al., 2023).", '> For robust comparison, we consider several baseline methods. These include the classical shadow (Huang et al., 2020), a learning-free protocol for efficiently constructing quantum state representations. We also compare against kernel-based methods such as the Radial Basis Function (RBF) Kernel (Huang et al., 2022) and Neural Tangent Kernel (NTK) (Huang et al., 2022). Furthermore, we include advanced deep learning-based approaches like PixelCNN (Sharir et al., 2020) and a classical shadow-based generative model (NN-shadow) (Wang et al., 2022) to benchmark against state-of-the-art techniques.', '86,90c90,94', '< We first consider the Rydberg atom model with different system size L ∈ {19, 25, 31}. We pretrain LLM4QPE for different system sizes separately with a fixed number of sampled physical conditions N p = 100. Each physical condition variable c i is a 4-dimensional vector denoted as', '< [L i , ∆ i , Ω i , R 0 /a i ] ⊤', '< where ∆ is the detuning of a laser, Ω is the Rabi frequency and R 0 /a is the interaction range. The values of these four variables can be obtained directly when initializing the (simulated) quantum experiments. For each physical condition we generate K f measurement strings based on computational basis measurement operators, such that the total number of possible measurement outcomes is M = 2. Then LLM4QPE is pretrained with dataset D p . The pretrained parameters are transferred to finetune the model using D t , where the number of sampled physical conditions N t ∈ {25, 64, 100} and the number of measurement strings K f ∈ {64, 128, 256, 512, 1024}.', '< We fix the size of D e for evaluation to be N e = 10000. Following (Bernien et al., 2017), we consider three categories of quantum phase, i.e., Disorder, Z 2 , Z 3 to establish the label p j , which is a 3-dimensional one-hot vector. More details about the data generation can be found in Appendix B.', '< We also take evaluation without pretaining the LLM4QPE: all the parameters are initialized randomly in a uniform distribution [-1, 1]. We use accuracy and weighted F1 score as metrics for 3-class classification for evaluation of our models and baselines. The results are listed in Tab. 1 and LLM4QPE achieves the best mean accuracy except for one setting L = 31 with N t = 25. Fig. 3 shows the performance on varied K f . LLM4QPE achieves the best weighted F1 score across all systems and in particular, outperforms by a large margin when K f = 64. The results indicate that pretrained LLM4QPE can handle the input when a few number of measurement records are available, which is greatly instrumental due to the expensive and time-consuming (simulated) quantum experiments. We further plot the training dynamics of LLM4QPE with and without pretraining throughout the training epochs in Fig. 4. The curves indicate that the pretraining enables much faster convergence of supervised loss and achieves better finetuning accuracy. Meanwhile, the required number of epoch for the model to attain 90% of its peak weighted F1 score is provided in Fig. 5. It reflectw that within the same system size L, the pretrained LLM4QPE converges faster than the non-pretrained version, with a lower training error and a higher test weighted F1 score.', '---', '> We first investigate the Rydberg atom model across different system sizes $L \\in \\{19, 25, 31\\}$. LLM4QPE is pretrained separately for each system size, with a fixed number of sampled physical conditions $N_p = 100$. Each physical condition variable $c_i$ is represented as a 4-dimensional vector:', '> $[L_i, \\Delta_i, \\Omega_i, R_0/a_i]^\\top$,', '> where $\\Delta$ is the laser detuning, $\\Omega$ is the Rabi frequency, and $R_0/a$ denotes the interaction range. The values for these four variables are directly accessible during the initialization of (simulated) quantum experiments. For each physical condition, we generate $K_f$ measurement strings based on computational basis measurement operators, resulting in a total of $M=2$ possible measurement outcomes. LLM4QPE is then pretrained using the dataset $D_p$. The pretrained parameters are subsequently transferred and finetuned using $D_t$, where the number of sampled physical conditions $N_t \\in \\{25, 64, 100\\}$ and the number of measurement strings $K_f \\in \\{64, 128, 256, 512, 1024\\}$.', '> The evaluation dataset $D_e$ is fixed to a size of $N_e = 10000$. Following (Bernien et al., 2017), we define three categories of quantum phases: Disorder, $Z_2$ Ordered, and $Z_3$ Ordered, to establish the label $p_j$, which is a 3-dimensional one-hot vector. Further details regarding data generation are provided in Appendix B.', "> For comparative analysis, we also evaluate LLM4QPE without pretraining, where all parameters are initialized randomly from a uniform distribution within $[-1, 1]$. We employ accuracy and weighted F1 score as key metrics for this 3-class classification task to evaluate both our models and baselines. The results, summarized in Table 1, demonstrate that LLM4QPE achieves the best mean accuracy across most settings, with a single exception for $L=31$ and $N_t=25$. Figure 3 illustrates the performance across varied $K_f$. LLM4QPE consistently achieves the best weighted F1 score across all system sizes and, notably, exhibits a substantial performance margin when $K_f = 64$. These results underscore LLM4QPE's robust capability to handle inputs with a limited number of measurement records, a significant advantage given the expensive and time-consuming nature of (simulated) quantum experiments. Furthermore, Figure 4 plots the training dynamics of LLM4QPE with and without pretraining across epochs. The curves clearly indicate that pretraining facilitates significantly faster convergence of the supervised loss and leads to superior finetuning accuracy. Figure 5 further quantifies the number of epochs required for the model to reach 90% of its peak weighted F1 score. It consistently shows that for the same system size $L$, the pretrained LLM4QPE converges more rapidly than its non-pretrained counterpart, achieving lower training error and higher test weighted F1 scores.", '93,95c97,99', '< Next we consider a regression task -predicting correlation on the anisotropic Heisenberg model. This quantum model inherits the long-range interactions between every two quantum sites, leading to a complex dynamics which is hard to be simulated by classical computers (Orús, 2019). We restrict the system size L ∈ {8, 10, 12} due to memory limitations. The ground states of quantum systems with different physical conditions are calculated by eigenvalue decomposition. For each physical condition we generate K f measurement strings based on Pauli-6 measurement operators such that M = 6. Then we pretrain the LLM4QPE for different system sizes independently with training size N p = 100.', "< For model's finetuning, we vary the number of generated training samples N t ∈ {20, 50, 90} and fix the measurement strings K f = 64. The dataset used for evaluation is generated with N e = 200. To obtain the ground-truth labels, We calculate true values of the two-body correlation functions and collect them as the supervised labels, which is an L × L continuous-valued matrix where each entry is in the range [-1, 1]. The RMSE results is reported in Tab. 2. LLM4QPE outperforms baselines in all settings. The learning-based models baselines often fail to surpass the predictive accuracy of learning-free classical shadow. While our pretrained LLM4QPE stands out by a remarkable margin.", '< Finally, we study the effects of condition embedding and the LSTM embedding on both Rydberg atom model and anisotropic Heisenberg model. Note that we replace the LSTM with a fully connected layer with same input/output dimension. The results are given in Tab. 3, where the results consistently show that both embedding techniques contribute to some positive effects and suggest that these two techniques can both help to leverage useful information from input quantum data.', '---', '> We now turn our attention to a regression task: predicting the correlation function on the anisotropic Heisenberg model. This quantum model is characterized by long-range interactions between every pair of quantum sites, leading to complex dynamics that are computationally challenging for classical simulation (Orús, 2019). Due to memory constraints, we restrict the system size to $L \\in \\{8, 10, 12\\}$. The ground states of quantum systems under different physical conditions are computed via eigenvalue decomposition. For each physical condition, we generate $K_f$ measurement strings using Pauli-6 measurement operators, resulting in $M=6$ outcomes. LLM4QPE is pretrained independently for each system size with a training size of $N_p = 100$.', "> For the model's finetuning, we vary the number of generated training samples $N_t \\in \\{20, 50, 90\\}$ while keeping the number of measurement strings fixed at $K_f = 64$. The evaluation dataset is generated with $N_e = 200$. To obtain the ground-truth labels, we calculate the true values of the two-body correlation functions, which form an $L \\times L$ continuous-valued matrix where each entry is in the range $[-1, 1]$. These are collected as the supervised labels. The Root Mean Square Error (RMSE) results are reported in Table 2. LLM4QPE consistently outperforms all baselines across all settings. Notably, many learning-based baseline models struggle to surpass the predictive accuracy of the learning-free classical shadow protocol. In contrast, our pretrained LLM4QPE demonstrates a remarkable margin of superiority.", "> Finally, we conduct an ablation study to investigate the individual contributions of condition embedding and LSTM embedding in both the Rydberg atom model and the anisotropic Heisenberg model. For this study, the LSTM layer is replaced with a fully connected layer having the same input/output dimensions. The results, presented in Table 3, consistently indicate that both embedding techniques positively contribute to the model's performance, suggesting that they are crucial for leveraging useful information from the input quantum data.", '98,99c102,103', '< This paper proposes a task-agnostic unsupervised pretraining approach for estimation of the properties of the quantum systems via quantum datasets. The core of our approach is a transformer encoder enabling to learn useful hidden information in a fully unsupervised pretraining procedure.', '< The pretrained parameters can be transferred to solving downstream tasks, leading to more effective classifying quantum phases and predicting correlation function on a resource-limited device given limited measurement information.', '---', '> This paper introduces LLM4QPE, a novel task-agnostic unsupervised pretraining paradigm for estimating properties of quantum systems using quantum datasets. The core of our approach lies in a Transformer decoder architecture, which effectively learns intricate hidden information through a fully unsupervised pretraining procedure.', "> The parameters acquired during pretraining are then successfully transferred to solve a variety of downstream tasks. This transfer learning capability leads to significantly more effective classification of quantum phases and prediction of correlation functions, particularly on resource-limited devices and with sparse measurement information. Our empirical results demonstrate LLM4QPE's superior performance across different quantum models and tasks, highlighting its potential to address key challenges in QPE.", '102,103c106,107', "< A.1 LEARNING-FREE METHODS FOR QPE Estimating the properties of the quantum system is a long-standing problem in quantum physics (D'Ariano et al., 2003). The main challenge is that the complexity of describing the quantum system using classical computers typically scales exponentially with respect to the system size (Nielsen & Chuang, 2010). Even though, in fact, the quantum systems studied in physical experiments generally can be described by a limited number of physical variables. This restriction leads to the studied quantum systems occupy only a small part of the exponentially large Hilbert space (Carrasquilla et al., 2019), such that they can be characterized by some classical methods within an acceptable error.", '< Traditional algorithms including the QMC (Ceperley & Alder, 1986) and DFT (Hohenberg & Kohn, 1964) has made success for investigating the electronic structure (or nuclear structure), principally the ground state of many-body systems, such as atoms, molecules, and the condensed phases (Gubernatis et al., 2016). However, these methods have scalability issues and are difficult to be used to deal with large-scale quantum many body problems. An alternative is a class of TNs methods (Orús, 2019) based on variational method and shows unprecedented performance in analyzing the characteristics of ground state. These methods including Matrix Product State (MPS) (Perez-Garcia et al., 2006) and Projected Entangled Pair States (PEPS) (Corboz, 2016). This class of methods approximates the wave function by decomposition of the high-order wave functions into multiple low-rank tensors. It is then possible to analyze properties of the quantum state by taking algebra operations on the wave function. Recently, the classical shadow protocol (Huang et al., 2020) suggests to use random measurements to characterize the quantum properties. Classical shadow has facilitated applications such as direct fidelity estimation (Struchalin et al., 2021) and state function prediction (Zhang et al., 2021).', '---', "> A.1 LEARNING-FREE METHODS FOR QPE Estimating the properties of quantum systems is a long-standing and critical problem in quantum physics (D'Ariano et al., 2003). The primary challenge stems from the fact that describing a quantum system using classical computers typically incurs an exponential scaling of complexity with respect to the system size (Nielsen & Chuang, 2010). However, quantum systems encountered in physical experiments are often non-generic and can be characterized by a limited number of physical variables. This structural restriction implies that such systems occupy only a small, accessible portion of the exponentially large Hilbert space (Carrasquilla et al., 2019), allowing for their characterization by classical methods within acceptable error bounds.", '> Traditional algorithms, including Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986) and Density Functional Theory (DFT) (Hohenberg & Kohn, 1964), have achieved significant success in investigating the electronic and nuclear structures, primarily the ground states, of many-body systems such as atoms, molecules, and condensed phases (Gubernatis et al., 2016). Nevertheless, these methods face scalability issues, rendering them challenging for large-scale quantum many-body problems. An alternative class of methods, Tensor Networks (TNs) (Orús, 2019), built upon variational principles, has demonstrated unprecedented performance in analyzing ground state characteristics. These methods encompass Matrix Product States (MPS) (Perez-Garcia et al., 2006) and Projected Entangled Pair States (PEPS) (Corboz, 2016). TNs approximate the wave function by decomposing high-order wave functions into multiple low-rank tensors, enabling the analysis of quantum state properties through algebraic operations on the wave function. More recently, the classical shadow protocol (Huang et al., 2020) has emerged, proposing the use of random measurements to efficiently characterize quantum properties. Classical shadow has facilitated various applications, including direct fidelity estimation (Struchalin et al., 2021) and state function prediction (Zhang et al., 2021).', '106,108c110,112', '< With the continuous development of machine learning technologies, neural network based methods have emerged to tackle the QPE problems. These methods can be categorized into two classes according to the purpose. The methods (Carleo & Troyer, 2017;Gao & Duan, 2017;Torlai et al., 2018;Schütt et al., 2019;Hibat-Allah et al., 2020;Zhang & Di Ventra, 2023) of the first class are called Neural Network Quantum State (NNQS), which replace the tensor used in TNs with a neural network as a parametric function approximator of quantum many-body wave functions. The parameterized wave function is updated by minimizing the expectation values of relevant observable estimators, based on either density matrix renormalization group (DMRG) algorithm (White, 1992) or variational Monte Carlo (VMC) (McMillan, 1965). Afterwards the interested properties can be analyzed by preforming algebra operations on the wave function. Another line of research (Gilmer et al., 2017;Kawai & Nakagawa, 2020;Xiao et al., 2022) is known as Neural Network Quantum Property Estimation (NNQPE). NNQPE directly optimizes the parameters towards a specific learning objective which represents a certain property of quantum systems such as the quantum phase.', '< For both NNQS and NNQPE, different neural network ansatz corresponds to solve quantum manybody problems with different physical structures. Examples include restricted Boltzmann machine (RBM) (Carleo & Troyer, 2017), recurrent neural networks (RNNs) (Carrasquilla et al., 2019), convolutional neural networks (CNNs) (Wu et al., 2019;Sharir et al., 2020;Wu et al., 2023), and transformers (Cha et al., 2021;Wang et al., 2022;Zhang & Di Ventra, 2023;Du et al., 2023).', '< Our work is closely related to NNQPE. While ours employs a unsupervised pretraining to extract the hidden information of the quantum systems govern by different parameters. We find empirically that this scheme can make the model perform better under a limited number of copies of quantum states and measurements. The recent work proposed by Zhu et al. (2022) implements a similar pretraining strategy for learning of quantum states, whereas our approach differs from it by avoiding assumptions about knowing the prior frequency about the measurement strings.', '---', '> With the continuous advancements in machine learning technologies, neural network-based methods have emerged as powerful tools to tackle Quantum Property Estimation (QPE) problems. These methods can be broadly categorized into two classes based on their primary objective. The first class, known as Neural Network Quantum States (NNQS) (Carleo & Troyer, 2017; Gao & Duan, 2017; Torlai et al., 2018; Schütt et al., 2019; Hibat-Allah et al., 2020; Zhang & Di Ventra, 2023), replaces the tensors used in TNs with neural networks as parametric function approximators for quantum many-body wave functions. In NNQS, the parameterized wave function is optimized by minimizing the expectation values of relevant observable estimators, typically using algorithms such as Density Matrix Renormalization Group (DMRG) (White, 1992) or Variational Monte Carlo (VMC) (McMillan, 1965). Subsequently, the properties of interest are extracted by performing algebraic operations on the optimized wave function. The second line of research (Gilmer et al., 2017; Kawai & Nakagawa, 2020; Xiao et al., 2022) is referred to as Neural Network Quantum Property Estimation (NNQPE). NNQPE directly optimizes the neural network parameters towards a specific learning objective that represents a particular property of quantum systems, such as the quantum phase.', '> For both NNQS and NNQPE, different neural network architectures (ansätze) are employed to solve quantum many-body problems with varying physical structures. Examples include Restricted Boltzmann Machines (RBMs) (Carleo & Troyer, 2017), Recurrent Neural Networks (RNNs) (Carrasquilla et al., 2019), Convolutional Neural Networks (CNNs) (Wu et al., 2019; Sharir et al., 2020; Wu et al., 2023), and Transformers (Cha et al., 2021; Wang et al., 2022; Zhang & Di Ventra, 2023; Du et al., 2023).', '> Our work, LLM4QPE, is closely related to NNQPE. However, a key distinguishing feature of our approach is the utilization of unsupervised pretraining to extract hidden information from quantum systems governed by diverse parameters. Our empirical findings demonstrate that this scheme significantly enhances model performance, particularly under conditions of limited copies of quantum states and measurements. While the recent work by Zhu et al. (2022) implements a similar pretraining strategy for learning quantum states, our approach differs by avoiding assumptions about prior knowledge of measurement string frequencies.', '111,112c115,116', '< A quantum dataset is a collection of data that describes quantum systems and their evolution. The collection of quantum data must take into account the following factors: 1) the method of data collection must be feasible on quantum devices and not contradict the disciplines of quantum mechanics; 2) the process of data collection is completely automated and does not require experienced experts to organize and label it and 3) the data must be structured and can be stored on resourcelimited classical devices, thus can be easy to be processed by the machine learning techniques without further post-processing. The quantum dataset we established satisfies these three points. It is also worth mentioning that our model can be used as a centralized infrastructure to process all these data uniformly, thanks to the unsupervised pretraining design of the model.', '< In this paper, we conduct simulated experiments to generate the quantum dataset in classical computers. For the anisotropic Heisenberg model, quantum measurement is performed using the Pauli-6 measurement operators such that M = 6, whereas computational basis measurement operators are employed for the Rydberg atom model leading to M = 2. Assume that variables c i describing the physical condition lives in a finite continuous space F within the physical restriction. Let . Afterwards we conduct simulated experiments for each c i and collect the corresponding measurement records. The system property p i is not needed since the pre-training phase is fully unsupervised. While for fine-tuning, we initialized the experiments with another random seed and sample N f physical conditions also within space F, resulting in {c j |c j ∈ F} N f j=1 . Note that We also collect the measurement records for each c j . The difference part is that we additionally calculate the system property p j and use it as supervised labels. We further split the D f into D t and D e for training and evaluation respectively with varied separation ratio. Details of the hyper-parameters and the experimental configurations of the dataset generation are discussed below.', '---', "> A quantum dataset is a structured collection of data that characterizes quantum systems and their dynamic evolution. The design and collection of such data must adhere to several crucial factors: 1) the data collection methodology must be experimentally feasible on actual quantum devices and consistent with the fundamental principles of quantum mechanics; 2) the data collection process should be fully automated, minimizing the need for expert intervention in organization and labeling; and 3) the data must be structured and efficiently storable on resource-limited classical devices, facilitating straightforward processing by machine learning techniques without extensive post-processing. The quantum dataset we have established for LLM4QPE meticulously satisfies all three of these criteria. Furthermore, our model's unsupervised pretraining design allows it to serve as a centralized infrastructure, uniformly processing this diverse data.", '> In this paper, we generate the quantum dataset through simulated experiments conducted on classical computers. For the anisotropic Heisenberg model, quantum measurements are performed using Pauli-6 measurement operators, yielding $M=6$ possible outcomes. For the Rydberg atom model, computational basis measurement operators are employed, resulting in $M=2$ outcomes. We assume that the physical condition variables $c_i$ reside within a finite continuous space $\\mathcal{F}$, respecting physical restrictions. We then conduct simulated experiments for each sampled $c_i$ and collect the corresponding measurement records. For the pretraining phase, the system property $p_i$ is not required, as pretraining is entirely unsupervised. For the finetuning phase, we initialize experiments with a distinct random seed and sample $N_f$ physical conditions, also within space $\\mathcal{F}$, generating $\\{c_j | c_j \\in \\mathcal{F}\\}_{j=1}^{N_f}$. Crucially, we ensure that the sampled physical conditions for pretraining do not overlap with those used for finetuning. For finetuning, we additionally calculate the system property $p_j$ and utilize it as supervised labels. We further partition the finetuning dataset $D_f$ into training ($D_t$) and evaluation ($D_e$) sets with varied separation ratios. Detailed hyper-parameters and experimental configurations for dataset generation are discussed subsequently.', '115,120c119,122', '< Rydberg atom model is a programmable quantum simulators capable of preparing interacting qubit systems (Bernien et al., 2017). Such quantum model can be effectively described as a two-level quantum system consisting the ground state |g⟩ (|0⟩) and the Rydberg state |r⟩(|1⟩). The quantum dynamics of this model is governed by the Hamiltonian', '< H Rydberg = i Ω 2 σ i x - i ∆n i + i<j V 0 |⃗ x i -⃗ x j | n i n j (6)', '< where σ x is the Pauli-X matrix, Ω is the Rabi frequency, ∆ is the detuning of a laser, V 0 is the Rydberg interaction constant, i, j is the Rydberg interaction constant and ⃗ x i is the position vector of the site i. n i = |r i ⟩ ⟨r i | is the occupation number operator at site i, and σ i x = |g i ⟩⟨r i | + |r i ⟩⟨g i | describes the coupling between the ground state |g i ⟩ and the Rydberg state |r i ⟩ at position i.', '< We follow the recent work in (Wang et al., 2022) to generate the quantum dataset. We refer the readers to their paper for details. Here we briefly introduce the main procedures. We consider the Rydberg atom model with system size L ∈ {19, 25, 31}. We fix the interaction constant V 0 = 862690 × 2π MHz µm 6 and vary the value of Ω ∈ [0, 5] and ∆ ∈ [-10, 15] to get different physical conditions c, where c is a 4-dimensional vector in the form [L, ∆, Ω, R 0 /a], where R 0 /a denote the interaction range with R 0 = (V 0 /Ω) 1/6 . Then the approximate ground state for diffident physical condition is prepared by the tool Bloqade.jl (blo, 2023). This tool can also output the measurement strings and the true phase of each physical condition. The measurement operators are chosen to be the computational basis {|0⟩⟨0|, |1⟩⟨1|} for the quantum measurement, such that the total number of the possible outcomes is M = 2. In this paper, three different phases are considered including the Disordered phase, Z 2 Ordered phase and Z 3 Ordered phase. We sample N p = 100 physical conditions with K p = 1024 measurement strings for pre-training, and N t ∈ {25, 64, 100} physical conditions with K f ∈ {64, 128, 256, 512, 1024} for fine-tuning. The number of physical conditions for evaluation is fixed to be N e = 10000. The supervised labels for fine-tuning are onehot encoded vectors of the true phases such that the dimension (number of classes) of p is 3. Note that it is ensured that the sampled physical conditions for pre-training will not appear in fine-tuning.  ', '< • • M σ k+1 =1 p(σ 1 , . . . , σ k+1 ) = M σ1=1 • • • M σ k+1 =1 |Ψ(σ 1 , . . . , σ k+1 )| 2 = M σ1=1 • • • M σ k+1 =1 k+1 i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 k i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 M σ k+1 =1 |Ψ(σ k+1 |σ k , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 |Ψ(σ 1 , . . . , σ k )| 2 = 1(11)', '< The proof then complete.', '---', '> The Rydberg atom model serves as a highly programmable quantum simulator, capable of preparing interacting qubit systems (Bernien et al., 2017). This quantum model can be effectively described as a two-level quantum system, comprising the ground state $|g\\rangle$ (or $|0\\rangle$) and the Rydberg state $|r\\rangle$ (or $|1\\rangle$). The quantum dynamics of this model are governed by the Hamiltonian:', '> $H_{\\text{Rydberg}} = \\sum_i \\frac{\\Omega}{2} \\sigma_i^x - \\sum_i \\Delta n_i + \\sum_{i<j} V_0 |\\vec{x}_i - \\vec{x}_j|^{-6} n_i n_j$ (6)', '> where $\\sigma_i^x$ is the Pauli-X matrix acting on site $i$, $\\Omega$ is the Rabi frequency, $\\Delta$ is the laser detuning, $V_0$ is the Rydberg interaction constant, and $\\vec{x}_i$ is the position vector of site $i$. The operator $n_i = |r_i\\rangle\\langle r_i|$ represents the occupation number at site $i$, and $\\sigma_i^x = |g_i\\rangle\\langle r_i| + |r_i\\rangle\\langle g_i|$ describes the coupling between the ground state $|g_i\\rangle$ and the Rydberg state $|r_i\\rangle$ at position $i$.', '> We follow the methodology outlined in (Wang et al., 2022) to generate the quantum dataset, referring readers to their paper for comprehensive details. Here, we provide a concise overview of the main procedures. We consider the Rydberg atom model with system sizes $L \\in \\{19, 25, 31\\}$. We fix the interaction constant $V_0 = 862690 \\times 2\\pi \\text{ MHz } \\mu\\text{m}^6$ and vary the values of $\\Omega \\in [0, 5]$ and $\\Delta \\in [-10, 15]$ to obtain diverse physical conditions $c$. Each $c$ is a 4-dimensional vector of the form $[L, \\Delta, \\Omega, R_0/a]$, where $R_0/a$ denotes the interaction range with $R_0 = (V_0/\\Omega)^{1/6}$. The approximate ground state for each physical condition is then prepared using the Bloqade.jl tool (blo, 2023). This tool also facilitates the output of measurement strings and the true phase for each physical condition. The measurement operators are chosen to be the computational basis $\\{|0\\rangle\\langle0|, |1\\rangle\\langle1|\\}$ for quantum measurement, resulting in a total of $M=2$ possible outcomes. In this paper, we consider three distinct phases: Disordered phase, $Z_2$ Ordered phase, and $Z_3$ Ordered phase. We sample $N_p = 100$ physical conditions with $K_p = 1024$ measurement strings for pre-training, and $N_t \\in \\{25, 64, 100\\}$ physical conditions with $K_f \\in \\{64, 128, 256, 512, 1024\\}$ for fine-tuning. The number of physical conditions for evaluation is fixed at $N_e = 10000$. The supervised labels for fine-tuning are one-hot encoded vectors of the true phases, meaning the dimension (number of classes) of $p$ is 3. It is strictly ensured that the physical conditions sampled for pre-training do not overlap with those used for fine-tuning.', '124d125', '< ', '126c127', '< We take an additional downstream task: predicting the second-order Rényi entanglement entropy -log(tr(ρ 2 A )) for the anisotropic Heisenberg model, where A is the left-half subsystem with system size L/2 of the L-qubit quantum system. The number of training size is set to be N t = 90 and the predicted RMSE results are given in Tab. 4. It can be observed that pre-training remains effective for predicting the entanglement entropy of the anisotropic Heisenberg model.', '---', '> As an additional downstream task, we investigate the prediction of the second-order Rényi entanglement entropy, defined as $-\\log(\\text{tr}(\\rho_A^2))$, for the anisotropic Heisenberg model. Here, $A$ represents the left-half subsystem, with system size $L/2$ of the total $L$-qubit quantum system. The number of training samples is set to $N_t = 90$, and the predicted RMSE results are presented in Table 4. These results consistently demonstrate that pre-training remains highly effective for accurately predicting the entanglement entropy of the anisotropic Heisenberg model.', '128,131c129,130', '< Section: D.2 MODEL SENSITIVITY TO THE NUMBER OF MEASUREMENTS', '< In Sec. 4, we study the relationship between the number of measurements and the classification accuracy of quantum phase of matters on Rydberg atom model. It is empirically evident in Fig. 3 that achieving linear growth in classification accuracy requires an exponential increase in the number of measurements per training example. Beyond the scaling related to number of measurements, we dive into further research on the scaling relationship between accuracy and the size of the training set (i.e., the number of sampled physical conditions which determine the dynamics of the quantum systems). We constrain the number of measurement per example to 256 (since we find that a large value makes the accuracy reach saturation) and the results on the 31-qubit system are listed in the Tab. 5. The results show that the accuracy approximately exhibits linear growth w.r.t. training size. This finding is consistent with theoretical results presented in (Huang et al., 2022;Lewis et al., 2024), which demonstrate that there exists a polynomial scaling relationship between model performance and the size of training dataset. In this section, we consider fine tuning the LLM4QPE with out-of-distribution (OOD) dataset, which means the dataset used for fine-tuning and the dataset used for pre-training come from different distributions.', '< Here, we consider two different configurations to make the fine-tuning dataset out-of-distribution from the pre-training one: the first is to re-generate the fine-tuning data by modifying the physical variables and the second is to fine tune the model based on the parameters transferred from the model pretrained on fewer qubits. In the following, we consider the Rydberg atom model.', '< First, we take the evaluation that fine-tuning the model on 31-qubit system by using he parameters pre-trained on 19 and 25-qubit system. Note that the number of qubits is also a physical variable and we want to see if model parameters trained on small-scale systems could transfer and help model characterize larger-scale systems. The results are listed in Tab. 6. It is evident that pre-trained parameters transferred from small-scale systems is also useful for large-scale systems. ', '---', '> Section: D.2 MODEL SENSITIVITY TO THE NUMBER OF MEASUREMENTS AND TRAINING DATA SIZE', "> In Section 4, we explored the relationship between the number of measurements and the classification accuracy of quantum phases of matter on the Rydberg atom model. Figure 3 empirically illustrates that achieving linear growth in classification accuracy generally necessitates an exponential increase in the number of measurements per training example. Beyond the scaling with the number of measurements, we further investigate the scaling relationship between accuracy and the size of the training set (i.e., the number of sampled physical conditions that determine the quantum system's dynamics). We fix the number of measurements per example to 256, as we observe that larger values lead to accuracy saturation, and present the results for the 31-qubit system in Table 5. The results indicate that accuracy approximately exhibits linear growth with respect to the training size. This finding aligns with theoretical predictions presented in (Huang et al., 2022; Lewis et al., 2024), which suggest a polynomial scaling relationship between model performance and the size of the training dataset.", '133,134c132,136', '< Section: E LIMITATIONS', "< In this study, we concentrate on the classification of quantum phases of matter and the prediction of correlation functions for the Rydberg atom model and the anisotropic Heisenberg model, respectively. While the LLM4QPE model offers flexibility for addressing various quantum many-body challenges, such as reconstructing the density matrix. Our focus here is primarily on pretraining the model with a fixed number of measurement strings. The impact of varying the number of measurement strings on the model's performance presents a fascinating area for exploration. Additionally, the LLM4QPE model is characterized by a relatively small parameter count (tens of thousands of parameters) when compared to the significantly larger parameter sets of large language models. Due to the constraints imposed by the model's size, our pretraining efforts are confined to quantum systems govern by Hamiltonians from the same family. Looking forward, there is an anticipation to develop a more robust model, enriched with a greater number of parameters, through learning on datasets generated from diverse families of quantum systems.", '---', '> D.3 FINE-TUNING ON OUT-OF-DISTRIBUTION (OOD) DATASETS', '> In this section, we evaluate the fine-tuning performance of LLM4QPE on out-of-distribution (OOD) datasets, meaning the dataset used for fine-tuning originates from a different distribution than the one used for pre-training.', '> We consider two distinct configurations to create OOD fine-tuning datasets: firstly, by regenerating the fine-tuning data with modified physical variables, and secondly, by fine-tuning the model using parameters transferred from a model pretrained on fewer qubits. For this analysis, we focus on the Rydberg atom model.', '> First, we assess the fine-tuning performance on a 31-qubit system using parameters pretrained on 19- and 25-qubit systems. The number of qubits itself is a physical variable, and this experiment aims to determine if model parameters trained on smaller-scale systems can effectively transfer and aid in characterizing larger-scale systems. The results, summarized in Table 6, clearly demonstrate that pretrained parameters transferred from smaller-scale systems are indeed beneficial for larger-scale systems, indicating positive transfer learning capabilities across system sizes.', '> Second, we modify the detuning of a laser from its original range of $[-10, 15]$ (as used in the paper) to an OOD range of $[-20, -10] \\cup [15, 25]$ to generate an OOD fine-tuning dataset for the 19-qubit Rydberg atom model. The classification accuracies are listed in Table 7. In this specific OOD scenario, the pretrained LLM4QPE fails to perform better than LLM4QPE without pre-training. The primary reason for this degradation is that the significantly altered detuning values drive the quantum system into a very different dynamic regime, for which the pre-trained model has learned less relevant knowledge. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, warranting further exploration in our future work.', '136c138,141', '< Section: ', '---', '> Section: E LIMITATIONS AND FUTURE WORK', "> In this study, our primary focus has been on the classification of quantum phases of matter and the prediction of correlation functions, specifically for the Rydberg atom model and the anisotropic Heisenberg model. While the LLM4QPE model inherently offers the flexibility to address a broader range of quantum many-body challenges, such as reconstructing the density matrix, these areas were beyond the scope of the current investigation. Our pretraining efforts were also primarily concentrated on a fixed number of measurement strings. A deeper exploration into the impact of varying the number of measurement strings on the model's performance represents a fascinating and important area for future research. Additionally, the current LLM4QPE model is characterized by a relatively modest parameter count (in the tens of thousands) when compared to the significantly larger parameter sets of contemporary large language models. Due to the constraints imposed by the model's current scale, our pretraining endeavors were confined to quantum systems governed by Hamiltonians from the same family. Looking forward, there is a clear anticipation and strong motivation to develop a more robust and expansive model, enriched with a substantially greater number of parameters, through learning on datasets generated from diverse families of quantum systems. This will push the boundaries of generalizability and applicability of LLM-style paradigms in quantum physics.", '> ', '> Section: ACKNOWLEDGEMENTS', '140c145', '< This paper proposes a novel approach for estimating the properties of quantum systems inspired by LLMs. The authors acknowledge the potential ethical implications of this research, such as the misuse of quantum data, the bias or error in the estimation results, and the impact on the development of quantum technologies. The authors have followed the best practices for data collection, model design, and evaluation, and have disclosed the sources of funding and the conflicts of interest. The authors also adhere to the principles of research integrity and comply with the relevant laws and regulations. The authors hope that this research will contribute to the advancement of quantum science and benefit the future research.', '---', '> This paper proposes a novel approach for estimating the properties of quantum systems inspired by LLMs. The authors acknowledge and have carefully considered the potential ethical implications of this research. These include, but are not limited to, the potential for misuse of quantum data, the presence of inherent biases or errors in estimation results, and the broader societal impact on the development and accessibility of quantum technologies. To mitigate these concerns, the authors have rigorously adhered to best practices for data collection, model design, and evaluation throughout the study. Furthermore, all sources of funding and any potential conflicts of interest have been transparently disclosed. The authors are committed to upholding the highest principles of research integrity and ensuring full compliance with all relevant laws and regulations. It is our sincere hope that this research will serve as a constructive contribution to the advancement of quantum science and will positively benefit future research endeavors in this rapidly evolving field.', '143c148', '< The generated quantum data of the Rydberg atom model and the anisotropic Heisenberg model is available at https://github.com/abel1231/qpe-data. The code to train the model and analyze the experimental results is available from the first author on reasonable request.', '---', '> The generated quantum data for both the Rydberg atom model and the anisotropic Heisenberg model is publicly available at https://github.com/abel1231/qpe-data, ensuring full data reproducibility. The code developed for training the LLM4QPE model and analyzing the experimental results can be obtained from the first author upon reasonable request, facilitating independent verification and further research.', '147,150c152,155', '< Exploring the effects of these long-range interactions of the quantum system is essential for understanding the quantum mechanics (Bermúdez et al., 2017). In this paper, we consider the recent progress for the long-range interactions with the experimentally realized power-law exponent of the anisotropic Heisenberg model (Kranzl et al., 2023). The dynamics of the anisotropic Heisenberg model is determined by the Hamiltonian', "< where σ i x,y,z is the Pauli matrix operated on the i-th site, h determines the Ising interactions between the magnons, and J ij is the long-range interaction strength satisfying J ij = J/|i -j| α . We follow the configuration of (Kranzl et al., 2023) to geenrate the quantum dataset. The values of h and J are fixed with 1 and 369 rad/s, and we vary the value of α ∈ (1, 2] uniformly. It is extremely hard to characterize the quantum system with long-range interactions using the existing computing techniques. Thus we restrict the system size L ∈ {8, 10, 12}. For all the systems we consider the number of measurement strings used for pre-training as K p = 1024 and fix the number of sampled physical conditions as N p = 100. For model's finetuning, we vary the number of generated training samples N t ∈ {20, 50, 90} and fix the measurement strings K f = 64. The physical condition c is defined as a vector whose dimension C = L 2 , in which each element is the coupling strength J ij for i, j ∈ {1, . . . , L}. The problem of finding the ground state is viewed as the eigenvalue decomposition problem and we obtain the ground state for each sampled physical condition by the scipy (Virtanen et al., 2020) built-in functions. The measurement records and the true values of the two-body correlation function and the entanglement entropy are obtained using the pennylane (Bergholm et al., 2018) toolbox. We consider the Pauli-6 POVM measurement operators with M = 6 outcomes, which are given as", '< and {|0⟩, |1⟩}, {|+⟩, |-⟩}, {|r⟩, |l⟩} stand for the eigenbasis of the Pauli operators σ z , σ x , and σ y , respectively. For the task of predicting the correlation matrix, the ground-truth label is a L × L matrix and each element of the matrix is the expectation value of the observable', '< Thus each element can be written as tr(ρO ij ) in the range [-1, 1], where ρ is the density matrix of the ground state for each sampled physical condition. We flatten the correlation function matrix to be the L 2 -dimensional continuous-valued vector and treat it as the supervised label for fine-tuning. While for the task of predicting the entanglement entropy, the label is a real number which can be calculated as -log(tr(ρ 2 A )), where A is the left-half subsystem with system size L/2 of the L-qubit quantum system.', '---', '> Exploring the intricate effects of long-range interactions within quantum systems is paramount for a deeper understanding of quantum mechanics (Bermúdez et al., 2017). In this paper, we leverage recent progress in experimentally realized power-law exponents for the anisotropic Heisenberg model (Kranzl et al., 2023) to study such interactions. The dynamics of the anisotropic Heisenberg model are governed by the Hamiltonian:', '> $H = \\sum_{i<j} J_{ij} (\\sigma_i^x \\sigma_j^x + \\sigma_i^y \\sigma_j^y + \\Delta \\sigma_i^z \\sigma_j^z) + \\sum_i h \\sigma_i^z$', '> where $\\sigma_i^{x,y,z}$ are the Pauli matrices acting on the $i$-th site, $h$ determines the Ising interactions between magnons, and $J_{ij}$ is the long-range interaction strength satisfying $J_{ij} = J/|i - j|^\\alpha$. We adopt the configuration from (Kranzl et al., 2023) for generating our quantum dataset. The values of $h$ and $J$ are fixed at 1 and 369 rad/s, respectively, while the power-law exponent $\\alpha$ is varied uniformly within the range $(1, 2]$. Characterizing quantum systems with long-range interactions using existing classical computing techniques is exceptionally challenging. Consequently, we restrict the system size to $L \\in \\{8, 10, 12\\}$. For all considered systems, we use $K_p = 1024$ measurement strings for pre-training and fix the number of sampled physical conditions at $N_p = 100$. For model finetuning, we vary the number of generated training samples $N_t \\in \\{20, 50, 90\\}$ and fix the number of measurement strings at $K_f = 64$. The physical condition $c$ is defined as a vector with dimension $C = L^2$, where each element corresponds to the coupling strength $J_{ij}$ for $i, j \\in \\{1, \\dots, L\\}$. The problem of finding the ground state is formulated as an eigenvalue decomposition problem, and we obtain the ground state for each sampled physical condition using built-in functions from SciPy (Virtanen et al., 2020). The measurement records and the true values of the two-body correlation function and entanglement entropy are obtained using the PennyLane (Bergholm et al., 2018) toolbox. We employ Pauli-6 POVM measurement operators, which yield $M=6$ outcomes and are given as:', '> $\\{|0\\rangle, |1\\rangle\\}$, $\\{|+\\rangle, |-\\rangle\\}$, $\\{|r\\rangle, |l\\rangle\\}$ stand for the eigenbasis of the Pauli operators $\\sigma_z$, $\\sigma_x$, and $\\sigma_y$, respectively. For the task of predicting the correlation matrix, the ground-truth label is an $L \\times L$ matrix where each element is the expectation value of the observable $\\langle \\sigma_i^z \\sigma_j^z \\rangle$. Thus, each element can be written as $\\text{tr}(\\rho O_{ij})$ and falls within the range $[-1, 1]$, where $\\rho$ is the density matrix of the ground state for each sampled physical condition. We flatten this correlation function matrix into an $L^2$-dimensional continuous-valued vector, which serves as the supervised label for fine-tuning. For the task of predicting the entanglement entropy, the label is a real number calculated as $-\\log(\\text{tr}(\\rho_A^2))$, where $A$ denotes the left-half subsystem of size $L/2$ of the $L$-qubit quantum system.', '152,156c157,173', '< Section: C POOF OF THE NORMALIZED OUTPUT DISTRIBUTION', '< In the main text, we claim that the output (classical) distribution satisfies', '< as long as the last linear projection layer uses the softmax activated function. The proof is given below.', "< The softmax activated function is performed on the model's output, which is the product of conditional probabilities p(σ 1 , . . . , σ L ) = L i=1 p(σ i |σ i-1 , . . . , σ 1 ). It is easy to check the claim holds for L = 1. Given that the claim also holds for L = k. For L = k + 1, the following equation then be hold: M σi=1 p(σ i |σ i-1 , . . . , σ 1 ) = 1.", '< (10)', '---', '> Section: C PROOF OF THE NORMALIZED OUTPUT DISTRIBUTION', '> In the main text, we assert that the output (classical) probability distribution satisfies $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_L=1}^{M} p(\\sigma_1, \\dots, \\sigma_L) = 1$, provided that the last linear projection layer employs a softmax activation function. The formal proof is presented below.', "> The softmax activation function is applied to the model's output, which is designed to approximate the joint probability distribution $p(\\sigma_1, \\dots, \\sigma_L)$. As per Equation (2), this joint distribution is factorized into a product of conditional probabilities: $p(\\sigma_1, \\dots, \\sigma_L|c) = \\prod_{l=1}^{L} p(\\sigma_l|\\sigma_{l-1}, \\dots, \\sigma_1, c)$.", '> We can prove the normalization property by induction.', '> Base Case: For $L=1$, the output distribution is $p(\\sigma_1|c)$. If the last layer uses softmax, then $\\sum_{\\sigma_1=1}^{M} p(\\sigma_1|c) = 1$. This is trivially true by the definition of softmax.', '> Inductive Hypothesis: Assume that the claim holds for $L=k$, i.e., $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_k=1}^{M} p(\\sigma_1, \\dots, \\sigma_k|c) = 1$.', '> Inductive Step: For $L = k+1$, we need to show that $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_{k+1}=1}^{M} p(\\sigma_1, \\dots, \\sigma_{k+1}|c) = 1$.', '> Using the factorization from Equation (2):', '> $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_{k+1}=1}^{M} p(\\sigma_1, \\dots, \\sigma_{k+1}|c) = \\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_{k+1}=1}^{M} \\left( \\prod_{l=1}^{k+1} p(\\sigma_l|\\sigma_{l-1}, \\dots, \\sigma_1, c) \\right)$', '> $= \\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_k=1}^{M} \\left( \\prod_{l=1}^{k} p(\\sigma_l|\\sigma_{l-1}, \\dots, \\sigma_1, c) \\right) \\left( \\sum_{\\sigma_{k+1}=1}^{M} p(\\sigma_{k+1}|\\sigma_k, \\dots, \\sigma_1, c) \\right)$', '> Since the last linear projection layer uses softmax for $p(\\sigma_{k+1}|\\sigma_k, \\dots, \\sigma_1, c)$, we know that $\\sum_{\\sigma_{k+1}=1}^{M} p(\\sigma_{k+1}|\\sigma_k, \\dots, \\sigma_1, c) = 1$ for any given $(\\sigma_k, \\dots, \\sigma_1, c)$.', '> Therefore, the expression simplifies to:', '> $= \\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_k=1}^{M} \\left( \\prod_{l=1}^{k} p(\\sigma_l|\\sigma_{l-1}, \\dots, \\sigma_1, c) \\right) \\cdot 1$', '> $= \\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_k=1}^{M} p(\\sigma_1, \\dots, \\sigma_k|c)$', '> By the inductive hypothesis, this sum equals 1.', '> Thus, $\\sum_{\\sigma_1=1}^{M} \\dots \\sum_{\\sigma_{k+1}=1}^{M} p(\\sigma_1, \\dots, \\sigma_{k+1}|c) = 1$. (11)', '> The proof is then complete.', '158d174', '< ', '160,215c176,231', '< [b0]  Bloqade;  Jl (2023). Package for the quantum computation and quantum simulation based on the neutralatom architecture. ', '< [b1] Anurag Anshu; Srinivasan Arunachalam (2024). A survey on the complexity of learning quantum states. Nature Reviews Physics', '< [b2] Ville Bergholm; Josh Izaac; Maria Schuld; Christian Gogolin; Shahnawaz Ahmed; Vishnu Ajith; M Sohaib Alam; Guillermo Alonso-Linaje; B Akashnarayanan; Ali Asadi (2018). Pennylane: Automatic differentiation of hybrid quantum-classical computations. ', '< [b3] Alejandro Bermúdez; Luca Tagliacozzo; Germán Sierra;  Richerme (2017). Long-range heisenberg models in quasiperiodically driven crystals of trapped ions. Physical Review B', '< [b4] Hannes Bernien; Sylvain Schwartz; Alexander Keesling; Harry Levine; Ahmed Omran; Hannes Pichler; Soonwon Choi; Alexander S Zibrov; Manuel Endres; Markus Greiner (2017). Probing many-body dynamics on a 51-atom quantum simulator. Nature', '< [b5] Gsl Fernando; Michał Brandao;  Horodecki (2015). Exponential decay of correlations implies area law. Communications in mathematical physics', '< [b6] Tom Brown; Benjamin Mann; Nick Ryder; Melanie Subbiah; Jared D Kaplan; Prafulla Dhariwal; Arvind Neelakantan; Pranav Shyam; Girish Sastry; Amanda Askell (2020). Language models are few-shot learners. Advances in neural information processing systems', '< [b7] Tiff Brydges; Andreas Elben; Petar Jurcevic; Benoît Vermersch; Christine Maier; Peter Ben P Lanyon; Rainer Zoller; Christian F Blatt;  Roos (2019). Probing rényi entanglement entropy via randomized measurements. Science', '< [b8] Giuseppe Carleo; Matthias Troyer (2017). Solving the quantum many-body problem with artificial neural networks. Science', '< [b9] Giuseppe Carleo; Ignacio Cirac; Kyle Cranmer; Laurent Daudet; Maria Schuld; Naftali Tishby; Leslie Vogt-Maranto; Lenka Zdeborová (2019). Machine learning and the physical sciences. Reviews of Modern Physics', '< [b10] Juan Carrasquilla; Giacomo Torlai; Roger G Melko; Leandro Aolita (2019). Reconstructing quantum states with generative models. Nature Machine Intelligence', '< [b11] David Ceperley; Berni Alder (1986). Quantum monte carlo. Science', '< [b12] Peter Cha; Paul Ginsparg; Felix Wu; Juan Carrasquilla; Eun-Ah Peter L Mcmahon;  Kim (2021). Attention-based quantum tomography. Machine Learning: Science and Technology', '< [b13] Philippe Corboz (2016). Variational optimization with infinite projected entangled-pair states. Physical Review B', '< [b14] Stefanie Czischek; Schuyler Moss; Matthew Radzihovsky; Ejaaz Merali; Roger G Melko (2022). Data-enhanced variational monte carlo simulations for rydberg atom arrays. Physical Review B', "< [b15] D' Mauro; Matteo Ga Ariano; Massimiliano F Paris;  Sacchi (2003). Quantum tomography. Advances in imaging and electron physics", '< [b16] Yuxuan Du; Yibo Yang; Tongliang Liu; Zhouchen Lin; Bernard Ghanem; Dacheng Tao (2023). Shadownet for data-centric quantum system learning. ', '< [b17] Xun Gao; Lu-Ming Duan (2017). Efficient representation of quantum many-body states with deep neural networks. Nature communications', '< [b18] Valentin Gebhart; Raffaele Santagati; Antonio ; Andrea Gentile; Erik M Gauger; David Craig; Natalia Ares; Leonardo Banchi; Florian Marquardt; Luca Pezzè; Cristian Bonato (2023). Learning quantum systems. Nature Reviews Physics', '< [b19] Justin Gilmer; S Samuel; Patrick F Schoenholz; Oriol Riley; George E Vinyals;  Dahl (2017). Neural message passing for quantum chemistry. PMLR', '< [b20] Aleksandra Gočanin; Ivan Šupić; Borivoje Dakić (2022). Sample-efficient device-independent quantum state verification and certification. PRX Quantum', '< [b21] James Gubernatis; Naoki Kawashima; Philipp Werner (2016). Quantum Monte Carlo Methods. Cambridge University Press', '< [b22] Mohamed Hibat-Allah; Martin Ganahl; Lauren E Hayward; Roger G Melko; Juan Carrasquilla (2020). Recurrent neural network wave functions. Physical Review Research', '< [b23] Pierre Hohenberg; Walter Kohn (1964). Inhomogeneous electron gas. Physical review', '< [b24] Hsin-Yuan Huang; Richard Kueng; John Preskill (2020). Predicting many properties of a quantum system from very few measurements. Nature Physics', '< [b25] Hsin-Yuan Huang; Richard Kueng; Giacomo Torlai; John Victor V Albert;  Preskill (2022). Provably efficient machine learning for quantum many-body problems. Science', '< [b26]  Jullien; B Roulleau;  Roche; Y Cavanna;  Jin;  Glattli (2014). Quantum tomography of an electron. Nature', '< [b27] Hiroki Kawai; O Yuya;  Nakagawa (2020). Predicting excited states from ground state wavefunction by supervised quantum machine learning. Machine Learning: Science and Technology', '< [b28] Florian Kranzl; Stefan Birnkammer; K Manoj; Alvise Joshi; Rainer Bastianello; Michael Blatt; Christian F Knap;  Roos (2023). Observation of magnon bound states in the long-range, anisotropic heisenberg model. Physical Review X', '< [b29] Dietrich Leibfried;  Meekhof; C H King; Wayne M Monroe; David J Itano;  Wineland (1996). Experimental determination of the motional quantum state of a trapped atom. Physical review letters', '< [b30] Laura Lewis; Hsin-Yuan Huang; Sebastian Viet T Tran; Richard Lehner; John Kueng;  Preskill (2024). Improved machine learning algorithm for predicting ground state properties. Nature communications', '< [b31] Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (2023). Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys', '< [b32] J E Loh;  Gubernatis;  Scalettar;  White;  Scalapino;  Sugar (1990). Sign problem in the numerical simulation of many-electron systems. Physical Review B', '< [b33]  William Lauchlin Mcmillan (1965). Ground state of liquid he 4. Physical Review', '< [b34] A Michael; Isaac L Nielsen;  Chuang (2010). Quantum computation and quantum information. Cambridge university press', '< [b35] Román Orús (2019). Tensor networks for complex quantum systems. Nature Reviews Physics', '< [b36] David Perez-Garcia; Frank Verstraete; Michael M Wolf; Ignacio Cirac (2006). Matrix product state representations. ', '< [b37] Alec Radford; Karthik Narasimhan; Tim Salimans; Ilya Sutskever (2018). Improving language understanding by generative pre-training. ', '< [b38] T Kristof; Michael Schütt; Alexandre Gastegger; K-R Tkatchenko; Reinhard J Müller;  Maurer (2019). Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications', '< [b39]  (). . Or', '< [b40] Yoav Sharir; Noam Levine; Giuseppe Wies; Amnon Carleo;  Shashua (2020). Deep autoregressive models for the efficient variational simulation of many-body quantum systems. Physical review letters', '< [b41]  Gi Struchalin; A Ya; E V Zagorovskii;  Kovlakov;  Ss Straupe;  Kulik (2021). Experimental estimation of quantum state properties from classical shadows. PRX Quantum', '< [b42] Giacomo Torlai; Guglielmo Mazzola; Juan Carrasquilla; Matthias Troyer; Roger Melko; Giuseppe Carleo (2018). Neural-network quantum state tomography. Nature Physics', '< [b43] Matthias Troyer; Uwe-Jens Wiese (2005). Computational complexity and fundamental limitations to fermionic quantum monte carlo simulations. Physical review letters', '< [b44] Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N Gomez; Łukasz Kaiser; Illia Polosukhin (2017). Attention is all you need. Advances in neural information processing systems', '< [b45] Pragya Verma; Donald G Truhlar (2020). Status and challenges of density functional theory. Trends in Chemistry', '< [b46] Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; J Stéfan; Matthew Van Der Walt; Joshua Brett; K Wilson; Nikolay Jarrod Millman;  Mayorov; R J Andrew; Eric Nelson; Robert Jones; Eric Kern; C J Larson; İlhan Carey; Yu Polat; Eric W Feng; Jake Moore; Denis Vanderplas; Josef Laxalde; Robert Perktold; Ian Cimrman; E A Henriksen; Charles R Quintero; Anne M Harris; Antônio H Archibald; Fabian Ribeiro;  Pedregosa (2020). Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods', '< [b47] Haoxiang Wang; Maurice Weber; Josh Izaac; Cedric Yen-Yu Lin (2022). Predicting properties of quantum systems with conditional generative models. ', '< [b48] Karl Weiss; Taghi M Khoshgoftaar; Dingding Wang (2016). A survey of transfer learning. Journal of Big data', '< [b49]  Steven R White (1992). Density matrix formulation for quantum renormalization groups. Physical review letters', '< [b50] Dian Wu; Lei Wang; Pan Zhang (2019). Solving statistical mechanics using variational autoregressive networks. Physical review letters', '< [b51] Ya-Dong Wu; Yan Zhu; Ge Bai; Yuexuan Wang; Giulio Chiribella (2023). Quantum similarity testing with convolutional neural networks. Physical Review Letters', '< [b52] Tailong Xiao; Jingzheng Huang; Hongjing Li; Jianping Fan; Guihua Zeng (2022). Intelligent certification for quantum simulators via machine learning. npj Quantum Information', '< [b53] Ting Zhang; Jinzhao Sun; Xiao-Xu Fang; Xiao-Ming Zhang; Xiao Yuan; He Lu (2021). Experimental quantum state measurement with classical shadows. Physical Review Letters', '< [b54] Yuan-Hang Zhang; Massimiliano Di; Ventra  (2023). Transformer quantum state: A multipurpose model for quantum many-body problems. Physical Review B', '< [b55] Yan Zhu; Ya-Dong Wu; Ge Bai; Dong-Sheng Wang; Yuexuan Wang; Giulio Chiribella (2022). Flexible learning of quantum states with generative query neural networks. Nature Communications', '---', '> [b0] Bloqade.jl (2023). Package for the quantum computation and quantum simulation based on the neutral-atom architecture.', '> [b1] Anurag Anshu; Srinivasan Arunachalam (2024). A survey on the complexity of learning quantum states. Nature Reviews Physics.', '> [b2] Ville Bergholm; Josh Izaac; Maria Schuld; Christian Gogolin; Shahnawaz Ahmed; Vishnu Ajith; M Sohaib Alam; Guillermo Alonso-Linaje; B Akashnarayanan; Ali Asadi (2018). Pennylane: Automatic differentiation of hybrid quantum-classical computations.', '> [b3] Alejandro Bermúdez; Luca Tagliacozzo; Germán Sierra; Richerme (2017). Long-range heisenberg models in quasiperiodically driven crystals of trapped ions. Physical Review B.', '> [b4] Hannes Bernien; Sylvain Schwartz; Alexander Keesling; Harry Levine; Ahmed Omran; Hannes Pichler; Soonwon Choi; Alexander S Zibrov; Manuel Endres; Markus Greiner (2017). Probing many-body dynamics on a 51-atom quantum simulator. Nature.', '> [b5] Gsl Fernando; Michał Brandao; Horodecki (2015). Exponential decay of correlations implies area law. Communications in mathematical physics.', '> [b6] Tom Brown; Benjamin Mann; Nick Ryder; Melanie Subbiah; Jared D Kaplan; Prafulla Dhariwal; Arvind Neelakantan; Pranav Shyam; Girish Sastry; Amanda Askell (2020). Language models are few-shot learners. Advances in neural information processing systems.', '> [b7] Tiff Brydges; Andreas Elben; Petar Jurcevic; Benoît Vermersch; Christine Maier; Peter Ben P Lanyon; Rainer Zoller; Christian F Blatt; Roos (2019). Probing rényi entanglement entropy via randomized measurements. Science.', '> [b8] Giuseppe Carleo; Matthias Troyer (2017). Solving the quantum many-body problem with artificial neural networks. Science.', '> [b9] Giuseppe Carleo; Ignacio Cirac; Kyle Cranmer; Laurent Daudet; Maria Schuld; Naftali Tishby; Leslie Vogt-Maranto; Lenka Zdeborová (2019). Machine learning and the physical sciences. Reviews of Modern Physics.', '> [b10] Juan Carrasquilla; Giacomo Torlai; Roger G Melko; Leandro Aolita (2019). Reconstructing quantum states with generative models. Nature Machine Intelligence.', '> [b11] David Ceperley; Berni Alder (1986). Quantum monte carlo. Science.', '> [b12] Peter Cha; Paul Ginsparg; Felix Wu; Juan Carrasquilla; Eun-Ah Peter L Mcmahon; Kim (2021). Attention-based quantum tomography. Machine Learning: Science and Technology.', '> [b13] Philippe Corboz (2016). Variational optimization with infinite projected entangled-pair states. Physical Review B.', '> [b14] Stefanie Czischek; Schuyler Moss; Matthew Radzihovsky; Ejaaz Merali; Roger G Melko (2022). Data-enhanced variational monte carlo simulations for rydberg atom arrays. Physical Review B.', "> [b15] D' Mauro; Matteo Ga Ariano; Massimiliano F Paris; Sacchi (2003). Quantum tomography. Advances in imaging and electron physics.", '> [b16] Yuxuan Du; Yibo Yang; Tongliang Liu; Zhouchen Lin; Bernard Ghanem; Dacheng Tao (2023). Shadownet for data-centric quantum system learning.', '> [b17] Xun Gao; Lu-Ming Duan (2017). Efficient representation of quantum many-body states with deep neural networks. Nature communications.', '> [b18] Valentin Gebhart; Raffaele Santagati; Antonio; Andrea Gentile; Erik M Gauger; David Craig; Natalia Ares; Leonardo Banchi; Florian Marquardt; Luca Pezzè; Cristian Bonato (2023). Learning quantum systems. Nature Reviews Physics.', '> [b19] Justin Gilmer; S Samuel; Patrick F Schoenholz; Oriol Riley; George E Vinyals; Dahl (2017). Neural message passing for quantum chemistry. PMLR.', '> [b20] Aleksandra Gočanin; Ivan Šupić; Borivoje Dakić (2022). Sample-efficient device-independent quantum state verification and certification. PRX Quantum.', '> [b21] James Gubernatis; Naoki Kawashima; Philipp Werner (2016). Quantum Monte Carlo Methods. Cambridge University Press.', '> [b22] Mohamed Hibat-Allah; Martin Ganahl; Lauren E Hayward; Roger G Melko; Juan Carrasquilla (2020). Recurrent neural network wave functions. Physical Review Research.', '> [b23] Pierre Hohenberg; Walter Kohn (1964). Inhomogeneous electron gas. Physical review.', '> [b24] Hsin-Yuan Huang; Richard Kueng; John Preskill (2020). Predicting many properties of a quantum system from very few measurements. Nature Physics.', '> [b25] Hsin-Yuan Huang; Richard Kueng; Giacomo Torlai; John Victor V Albert; Preskill (2022). Provably efficient machine learning for quantum many-body problems. Science.', '> [b26] Jullien; B Roulleau; Roche; Y Cavanna; Jin; Glattli (2014). Quantum tomography of an electron. Nature.', '> [b27] Hiroki Kawai; O Yuya; Nakagawa (2020). Predicting excited states from ground state wavefunction by supervised quantum machine learning. Machine Learning: Science and Technology.', '> [b28] Florian Kranzl; Stefan Birnkammer; K Manoj; Alvise Joshi; Rainer Bastianello; Michael Blatt; Christian F Knap; Roos (2023). Observation of magnon bound states in the long-range, anisotropic heisenberg model. Physical Review X.', '> [b29] Dietrich Leibfried; Meekhof; C H King; Wayne M Monroe; David J Itano; Wineland (1996). Experimental determination of the motional quantum state of a trapped atom. Physical review letters.', '> [b30] Laura Lewis; Hsin-Yuan Huang; Sebastian Viet T Tran; Richard Lehner; John Kueng; Preskill (2024). Improved machine learning algorithm for predicting ground state properties. Nature communications.', '> [b31] Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (2023). Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys.', '> [b32] J E Loh; Gubernatis; Scalettar; White; Scalapino; Sugar (1990). Sign problem in the numerical simulation of many-electron systems. Physical Review B.', '> [b33] William Lauchlin Mcmillan (1965). Ground state of liquid he 4. Physical Review.', '> [b34] A Michael; Isaac L Nielsen; Chuang (2010). Quantum computation and quantum information. Cambridge university press.', '> [b35] Román Orús (2019). Tensor networks for complex quantum systems. Nature Reviews Physics.', '> [b36] David Perez-Garcia; Frank Verstraete; Michael M Wolf; Ignacio Cirac (2006). Matrix product state representations.', '> [b37] Alec Radford; Karthik Narasimhan; Tim Salimans; Ilya Sutskever (2018). Improving language understanding by generative pre-training.', '> [b38] T Kristof; Michael Schütt; Alexandre Gastegger; K-R Tkatchenko; Reinhard J Müller; Maurer (2019). Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications.', '> [b39] (). . Or.', '> [b40] Yoav Sharir; Noam Levine; Giuseppe Wies; Amnon Carleo; Shashua (2020). Deep autoregressive models for the efficient variational simulation of many-body quantum systems. Physical review letters.', '> [b41] Gi Struchalin; A Ya; E V Zagorovskii; Kovlakov; Ss Straupe; Kulik (2021). Experimental estimation of quantum state properties from classical shadows. PRX Quantum.', '> [b42] Giacomo Torlai; Guglielmo Mazzola; Juan Carrasquilla; Matthias Troyer; Roger Melko; Giuseppe Carleo (2018). Neural-network quantum state tomography. Nature Physics.', '> [b43] Matthias Troyer; Uwe-Jens Wiese (2005). Computational complexity and fundamental limitations to fermionic quantum monte carlo simulations. Physical review letters.', '> [b44] Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N Gomez; Łukasz Kaiser; Illia Polosukhin (2017). Attention is all you need. Advances in neural information processing systems.', '> [b45] Pragya Verma; Donald G Truhlar (2020). Status and challenges of density functional theory. Trends in Chemistry.', '> [b46] Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; J Stéfan; Matthew Van Der Walt; Joshua Brett; K Wilson; Nikolay Jarrod Millman; Mayorov; R J Andrew; Eric Nelson; Robert Jones; Eric Kern; C J Larson; İlhan Carey; Yu Polat; Eric W Feng; Jake Moore; Denis Vanderplas; Josef Laxalde; Robert Perktold; Ian Cimrman; E A Henriksen; Charles R Quintero; Anne M Harris; Antônio H Archibald; Fabian Ribeiro; Pedregosa (2020). Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods.', '> [b47] Haoxiang Wang; Maurice Weber; Josh Izaac; Cedric Yen-Yu Lin (2022). Predicting properties of quantum systems with conditional generative models.', '> [b48] Karl Weiss; Taghi M Khoshgoftaar; Dingding Wang (2016). A survey of transfer learning. Journal of Big data.', '> [b49] Steven R White (1992). Density matrix formulation for quantum renormalization groups. Physical review letters.', '> [b50] Dian Wu; Lei Wang; Pan Zhang (2019). Solving statistical mechanics using variational autoregressive networks. Physical review letters.', '> [b51] Ya-Dong Wu; Yan Zhu; Ge Bai; Yuexuan Wang; Giulio Chiribella (2023). Quantum similarity testing with convolutional neural networks. Physical Review Letters.', '> [b52] Tailong Xiao; Jingzheng Huang; Hongjing Li; Jianping Fan; Guihua Zeng (2022). Intelligent certification for quantum simulators via machine learning. npj Quantum Information.', '> [b53] Ting Zhang; Jinzhao Sun; Xiao-Xu Fang; Xiao-Ming Zhang; Xiao Yuan; He Lu (2021). Experimental quantum state measurement with classical shadows. Physical Review Letters.', '> [b54] Yuan-Hang Zhang; Massimiliano Di; Ventra (2023). Transformer quantum state: A multipurpose model for quantum many-body problems. Physical Review B.', '> [b55] Yan Zhu; Ya-Dong Wu; Ge Bai; Dong-Sheng Wang; Yuexuan Wang; Giulio Chiribella (2022). Flexible learning of quantum states with generative query neural networks. Nature Communications.', '220,221c236,237', '< Caption: Figure 3 :3Figure 3: Comparison of weighted F1 score w.r.t. number of measurement strings on Rydberg atom model.', '< Data: ', '---', '> Caption: Figure 3: Comparison of weighted F1 score w.r.t. number of measurement strings on Rydberg atom model.', '> Data:', '225,226c241,242', '< Caption: Figure 4 :4Figure 4: The evolution of training loss and test weighted F1 score with increasing training epochs where Nt = 100 and K f = 1024.', '< Data: ', '---', '> Caption: Figure 4: The evolution of training loss and test weighted F1 score with increasing training epochs where Nt = 100 and K f = 1024.', '> Data:', '228c244', '< Figure fig_2: ', '---', '> Figure fig_2:', '231c247', '< Data: ', '---', '> Data:', '233c249', '< Figure fig_3: ', '---', '> Figure fig_3:', '236c252', '< Data: ', '---', '> Data:', '238c254', '< Figure fig_4: ', '---', '> Figure fig_4:', '240,241c256,257', '< Caption: Second, we modify the detuning of a laser from[-10, 15]  (which is exactly used in the paper) to[-20, -10] ∪ [15, 25]  to generate OOD fine-tuning dataset, on Rydberg atom model with 19 qubits. The classification accracy are listed in Tab. 7. The pre-trained one fails to perform better than the LLM4QPE w/o pre-train. The main reason is that the modified detuning values have driven the quantum evolution into a very different dynamics and the pre-trained model learns less knowledge about it. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, and will be further explored in our future work.', '< Data: ', '---', '> Caption: Second, we modify the detuning of a laser from[-10, 15] (which is exactly used in the paper) to[-20, -10] ∪ [15, 25] to generate OOD fine-tuning dataset, on Rydberg atom model with 19 qubits. The classification accracy are listed in Tab. 7. The pre-trained one fails to perform better than the LLM4QPE w/o pre-train. The main reason is that the modified detuning values have driven the quantum evolution into a very different dynamics and the pre-trained model learns less knowledge about it. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, and will be further explored in our future work.', '> Data:', '243c259', '< Figure tab_0: ', '---', '> Figure tab_0:', '314d329', '< ']
