Title: TOWARDS LLM4QPE: UNSUPERVISED PRETRAINING OF QUANTUM PROPERTY ESTIMATION AND A BENCH-MARK

Abstract: Estimating the properties of quantum systems such as quantum phase has been critical in addressing the essential quantum many-body problems in physics and chemistry. Deep learning models have been recently introduced to property estimation, surpassing conventional statistical approaches. However, these methods are tailored to the specific task and quantum data at hand. It remains an open and attractive question for devising a more universal task-agnostic pretraining model for quantum property estimation. In this paper, we propose LLM4QPE, a large language model style quantum task-agnostic pretraining and finetuning paradigm that 1) performs unsupervised pretraining on diverse quantum systems with different physical conditions; 2) uses the pretrained model for supervised finetuning and delivers high performance with limited training data, on downstream tasks. It mitigates the cost for quantum data collection and speeds up convergence. Extensive experiments show the promising efficacy of LLM4QPE in various tasks including classifying quantum phases of matter on Rydberg atom model and predicting two-body correlation function on anisotropic Heisenberg model.

Section: INTRODUCTION
Estimating quantum system properties such as quantum phase is essential for verifying and evaluating quantum technologies (Huang et al., 2020;Gočanin et al., 2022), which is often in the form of many-body problems. Precise estimation of generic quantum systems is challenged due to the exponential complexity inherent in describing quantum many-body systems (Gebhart et al., 2023). Fortunately, physical systems of interest such as those generated by the dynamics of local Hamiltonians are not generic, since their particular structure guarantees that the full complexity of Hilbert space is in principle not required for their accurate description (Carrasquilla et al., 2019). Accordingly, statistical (including learning-based) approaches have emerged to characterize quantum systems from traditional Density Functional Theory (DFT) (Hohenberg & Kohn, 1964), Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986), to advanced variational methods e.g. Tensor Networks (TNs) (Orús, 2019) and Neural Network Quantum States (NNQS) (Zhang & Di Ventra, 2023).
There are basically two categories of variational methods for quantum property estimation (QPE). The first category refers to the TNs and NNQS which formulate QPE as an optimization problem where the quantum state is approximately represented by a parameterized wave function. The parameterized wave function is updated by minimizing the expectation values of relevant observable estimators, based on either density matrix renormalization group (DMRG) algorithm (White, 1992) or variational Monte Carlo (VMC) (McMillan, 1965). Afterwards the interested properties can be analyzed by preforming algebra operations on the wave function. Another line of research resorts to neural networks to serve as universal functions for directly approximating quantum system properties (Gilmer et al., 2017;Kawai & Nakagawa, 2020;Xiao et al., 2022), which we call NNQPE. The input to the neural networks is the measurement results of the quantum state, and the output is the property of interest. The parameters are optimized using gradient descent. The goal of NNQPE is to accurately characterize the properties of the quantum state using as few identical copies and measurements as possible. Compared with the TNs, this class of methods could more easily display nonlocal correlations, allowing in principle to capture quantum states with higher entanglement (Huang et al., 2022). Moreover, rather than TNs and the NNQS where additional computational overheads is required to extract the properties given the optimized parameterized wave function, NNQPE can directly predict the properties for unknown quantum states.
However, NNQPE suffers generalization ability issue, especially given limited measurement data for training (Gebhart et al., 2023). Although the generalizability could be improved by training the models based on extensive measurement data and corresponding labels, the labeling process, i.e., accurately estimating properties of quantum systems requires computational and memory resources that increase exponentially with the system size (Carleo et al., 2019). In particular, the labeling efforts for quantum systems are intensive. For example, DFT suffers from self-interaction error and delocalization error, making it difficult to represent quantum states with strong correlations (Verma & Truhlar, 2020). The sign problem (Loh Jr et al., 1990) implies that it is intractable for QMC to evaluate properties for large systems or systems with low temperatures (Troyer & Wiese, 2005;Huang et al., 2022). The maximum bond dimensions of TNs for precisely preserving the properties of quantum states such as the entanglement entropy scales exponentially w.r.t. the evolution time (Brandao & Horodecki, 2015). In conclusion, the labeling process is hard to complete classically due to the inherent separation between quantum and classical computing.
Furthermore, despite the significant promise of NNQPE, their application in harnessing advanced machine learning techniques for quantum physics remains in its early stages. Current models of NNQPE are tailored and trained for particular quantum systems and specific tasks. This approach contrasts sharply with the era of Large Language Models (LLMs) (Radford et al., 2018;Brown et al., 2020), which have achieved general-purpose language generation and understanding capabilities. In the realm of LLMs, pretraining serves as the primary method for capturing general language understanding and afterwards finetuning is adopted to adapt the model to accomplish specialized tasks. This distinction highlights the nascent yet evolving nature of applying sophisticated machine learning strategies within the quantum physics domain.
In fact, with the increasing scale of the quantum devices, a vast amount of quantum data are produced by quantum measurement (Brydges et al., 2019). Such data holds intricate details about the system. An open question is designing a versatile model, which undergoes extensive pretraining to master these quantum intricacies. The success of deep learning in handling high-dimensional data sheds lights on answering this question. First, the sheer volume of quantum data from measurements allows for the extraction of meaningful patterns and representations (Anshu & Arunachalam, 2024). Second, the universal approximation capabilities of neural networks suggest that given sufficient data and computational resources, it's possible to model the complex, nonlinear relationships inherent in quantum systems (Carleo et al., 2019;Gebhart et al., 2023). Lastly, the task-agnostic nature of pretraining (Liu et al., 2023) aligns with the quantum realm's diversity, where a single model can learn hidden features across various systems and physical conditions. This feasibility is further supported by the principle of transfer learning (Weiss et al., 2016), where knowledge gained in one context can significantly benefit task-specific applications.
In this paper, we introduce an LLM-style task-agnostic pretraining model for Quantum Property Estimation named LLM4QPE. This model is pretrained by leveraging vast (unlabeled) quantum data, across diverse quantum systems of the same family govern by different physical conditions. For the downstream tasks, we finetune LLM4QPE on two typical QPE tasks including classifying quantum phases of matter and predicting two-body correlation function. We also consider two families of quantum model including the Rydberg atom model and the anisotropic Heisenberg model. The results show its promising power for tackling QPE problems especially in scenarios with limited data availability. The contributions are: 1) Departure from most existing supervised learning QPE models reliant on restricted, task-specific labeled quantum data, we propose LLM4QPE, to our best knowledge, the first LLM-style model for quantum property estimation. Its unsupervised pretraining is fulfilled by maximizing the expected log likelihood of measurement bit strings, which is entirely unsupervised and task-agnostic.
2) We develop the novel architecture of our LLM4QPE model. Specifically, to embed the batch-style discrete measurement records to a continuous space, a trainable LSTM embedding layer is attached to the transformer decoder. The LSTM-Transformer architecture provides an innate framework for handling diverse quantum data stemming from experiments under varying physical conditions, enabling prediction of the property of quantum systems of the same family.
3) We collect a set of quantum data from simulations for unsupervised pretraining and supervised finetuning. For pretraining, the dataset consists of quantum state measurement records, the size of which scales linearly w.r.t. the system size and the number of measurements, along with the values of physical condition variables determining the evolution of quantum systems. Downstream tasks utilize a set of data generated from quantum systems of the same family, with additional system properties serving as labels for tasks like phase classification and correlation prediction. 4) We verify the superiority of our approach by empirical studies on two QPE tasks: classifying quantum phases of matter on Rydberg atom model and predicting two-body correlation function on anisotropic Heisenberg model, given limited measurements on a resource-limited device.

Section: PRELIMINARIES OF QUANTUM STATE AND QUANTUM MEASUREMENT
We introduce basic concepts of quantum computing. Please refer to (Nielsen & Chuang, 2010) for more details. We put the details on related work to Appendix A.
Quantum State and Density Operator. The quantum bit named as qubit is the basic unit of the quantum system. We call the ensemble of all qubits in a (sub)system the quantum state. The qubit is in superposition and becomes deterministic once the measurement is performed on it. How a quantum state is described mathematically depends on the chosen basis state. For example, by using two orthogonal computational basis states1 |0⟩ = 1 0 and |1⟩ = 0 1 , one qubit can be described mathematically as a linear combination |ϕ⟩ = α|0⟩ + β|1⟩ = α β in the space C 2 , where α, β ∈ C are the amplitudes satisfying |α| 2 + |β| 2 = 1. An alternate formulation for describing the quantum state is possible using a tool known as the density operator or density matrix. For example, the density matrix of |0⟩ is ρ 0 = |0⟩⟨0| = 1 0 0 0 where ⟨0| denotes the conjugate transpose of |0⟩. For a generic L-qubit quantum state, it can be described by the so called wave function:
|ψ⟩ = M σ1=1 • • • M σ L =1 Ψ(σ 1 , . . . , σ L )|σ 1 , . . . , σ L ⟩,(1)
where Ψ : Z L → C maps a fixed configuration σ = (σ 1 , . . . , σ L ) of L qubits to a complex number satisfying
M σ1=1 • • • M σ L =1 |Ψ(σ 1 , . . . , σ L )| 2 = 1
, and σ i ∈ {1, . . . , M } is one of the M possible outcomes by performing quantum measurement on the i-th qubit. The wave function is formulated in a complex Hilbert space where the vector representation of the quantum state |ψ⟩ ∈ C M L and its density matrix |ψ⟩⟨ψ| ∈ C M L ×M L , which becomes astronomical for large L.
Quantum Measurement. It converts some of the quantum information into classical form (for further processing), as described by a set of measurement operators {O m } M m=1 satisfying m O m = I, where M is the total number of operators. Measuring a qubit leads to collapse of the wave function and produces potentially yield different outcomes. The possible outcomes correspond to the indices m of measurement operators. Concretely, upon measuring the qubit ρ, the probability of getting the result m is given by p(m) = tr(ρO m ). For a quantum state with L qubits, the common strategy is to measure each of the qubits in parallel (Leibfried et al., 1996;Jullien et al., 2014). According to the born rule of quantum mechanics, such a measurement procedure outputs a measurement string σ = (σ 1 , . . . , σ L ) where σ i ∈ {1, . . . , M } with probability |Ψ(σ 1 , . . . , σ L )| 2 as given in Eq. 1.

Section: LLM4QPE
3.1 OVERVIEW As shown in Fig. 1, our model involves two steps: pretraining and finetuning. For pretraining, the model is fed with unlabeled D p , and undergoes fully unsupervised training. Subsequently, the pretrained parameters are transferred to the supervised finetuning phase, where all the parameters are updated using labeled data D t for various downstream tasks with their task-specific supervised losses. Finally, we evaluate our LLM4QPE using dataset D e . Each downstream finetuning model possesses separate parameters, even though they initially share the same pretrained parameters. One of the most notable aspects of our model is the consistent structural similarity between pretraining and finetuning, with only a few small modifications when handling different downstream tasks.
The description of the quantum data is discussed in Sec. 3.2. We make an analogy between quantum data and text that, each measurement outcome σ i of a qubit is analogue to the token, and the number of the possible outcomes M is likely to the vocabulary size |V|. A measurement string σ, which resembles the sentence in texts, is a projection of the entire quantum system with correlative effects among them. The collection of measurement records R comprised of many measurement strings from various physical conditions are akin to the corpus gathered from various sources and genres. In fact, these have also been mentioned implicitly in (Sharir et al., 2020;Hibat-Allah et al., 2020;Cha et al., 2021;Zhang & Di Ventra, 2023). Yet existing works are still confined to the single task for training and testing, involving no pretraining. Our model, in contrast, draws inspiration from LLMs to handle quantum data. Specifically, the data type and data collection strategies are described in Sec. 3.2 and more details can be found in Appendix B. Given the generated datasets, we first discuss how to unsupervisely pretrain LLM4QPE in Sec. 3.3. Afterwards the pretrained parameters are updated towards a supervised loss for different tasks, as presented in Sec. 3.4.

Section: DESCRIPTION OF THE QUANTUM DATASET GENERATED FROM SIMULATION
We first provide the definition of the quantum dataset in Def. 1 in which the procedures of quantum dataset generation are provided. An easy-to-understand flowchart is also provided in Fig. 2. Definition 1 (Quantum Dataset). The quantum dataset is described as D = {s i }. Each sample s i = (R i , c i , p i ) contains the measurement records R i , the physical condition variables c i and the (optional) system property variables p i . Let L denote the number of qubits, K represent the number of copies of each quantum state and M denote the number of possible outcomes by performing measurement on a single qubit. We explain their meaning in detail below.
1) c i ∈ R C represents the physical condition variables controlling the evolution of the quantum system. These variables can be directly obtained when initializing quantum experiments. The types of the variables could be system size, coupling strength of Hamiltonians, etc. 2) The measurement records, denoted as R i ∈ Z K×L , are outcomes generated by quantum measurement. A quantum state is generated by evolving the system under a fixed physical condition 

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution
Random Measurement
# Measurement Strings # Physical Conditions a) c) b)
Figure 2: Process of generating the quantum dataset. a) For each qubit of the quantum system, we perform quantum measurement using operators {O m } M m=1 and obtain an integer outcome m with probability p(m). b) Consider the quantum system govern by different physical conditions. Quantum measurements are performed on an ensemble of identical quantum states evolved under each of fixed physical conditions. Measurement can be done parallel for all the qubits of single copy of the quantum state and outputs a measurement string. This process is applicable and feasible to existing digital and analog quantum computers. c) The collected data are structured and packed into a series of tensors, which can be efficiently stored into classical devices and easy to process. specialized by c i . Afterwards quantum measurement is performed independently on each qubit in parallel using a set of measurement operators {O m } M m=1 . Performing measurement on L qubits results in a measurement string, represented as σ = (σ 1 , . . . , σ L ) where each σ l ∈ {1, . . . , M }. The measurement procedures above are repeated K times for each copy of the quantum state. Finally, we collect K × L measurement outcomes and store them within R i .
3) (Optional) Certain system property p i ∈ R P represents the statistics of the quantum system conditioned on c i , such as the quantum phase, correlation function, entanglement entropy, purity, etc. The exact values of p i can be calculated by classical post-processing by analyzing the either the wave functions or measurement statistics. We treat these properties as supervised labels which used for finetuning the model.
It should be mentioned that the process of quantum dataset generation above is closed to Wang et al. (2022). The difference is that LLM4QPE requires additional ground-truth labels of system properties for finetuning, rather than the suggestions of Wang et al. (2022) in which the authors propose to reconstruct the quantum state by unsupervised learning on measurement records, afterwards classical shadow (Huang et al., 2020) is required to predict specific quantum properties. The two step strategy often introduces additional overheads. Furthermore, our experiments indicate that parameters in LLM4QPE are specifically optimized for corresponding objectives such as quantum phase of matters and correlation function, which often leads to superior performance in our numerical results.

Section: UNSUPERVISED PRETRAINING
Unlike the previous studies (Czischek et al., 2022;Zhang & Di Ventra, 2023) which consider the pretraining as a warmup process to find suitable initialization for model's parameters and then finetune the model on the specific system with the same learning objective as pretraining. Instead, LLM4QPE regards the pretraining as the avenue to master the quantum intricacies across different systems of the same family. The pretrained parameters can be transferred towards various downstream tasks. LLM4QPE is pretrained in a fully unsupervised manner, as illustrated in Fig. 1b.
Quantum Data for Pretraining. The quantum dataset D p = {R i , c i } Np i=1 used for pretraining is constructed using the strategy discussed in Sec. 3.2. Here we discuss how to reorganize the data to adapt to LLM4QPE's unsupervised pretraining. Let K p be the number of measurement strings used for pretraining. We stack all the input measurement records {R i } Np i=1 along the first dimension and output E in ∈ Z NpKp×L , where each row is a measurement string σ b ∈ Z L . We also construct the matrix C in ∈ R NpKp×C where each row is the values of physical condition variables c b ∈ R C . For both the Rydberg atom model and the anisotropic Heisenberg model, we fix N p = 100 and K p = 1024. For each training iteration, we randomly sample B p rows of E in and C in . Such that the input of the model is
{(σ b , c b )|σ b ∈ E in , c b ∈ C in } Bp b=1 with batch size B p .
Input Embeddings. As shown in Fig. 1a, we consider three types of embeddings as input to capture the hidden patterns of the quantum system: token embeddings, condition embeddings and position embeddings. Since each element of the measurement string σ b is a discrete integer σ ∈ {1, . . . , M } which resembles to the token in NLP, we use learned embeddings to convert the measurement string σ b with additional start token s and output the token embeddings E t ∈ R Bp×(L+1)×d where d is the feature dimension. We empirically find that encoding the physical condition into the model can further improve the performance. A Feed-Forward Network (FFN) with one hidden layer is used to embed the physical condition c b into the feature vector E c ∈ R Bp×d . It is treated as a sentence-level embedding which will be added to all of the L measurement tokens, and we call it the global embedding. Subsequently, the input embeddings are the (broadcasting) summation E out = E t + E c + E p where E p is the positional embeddings as the same as (Vaswani et al., 2017). E out is then processed by deeper layers in the discussion below.
Model Architecture. As depicted in Fig. 1b, the main part of LLM4QPE is a multi-layer transformer decoder which originates from (Vaswani et al., 2017). The input is the embedding E out and the output is H ∈ R Bp×(L+1)×d , which are high-order representations of all the measurement strings and the conditional variables in a batch. Please refer to (Vaswani et al., 2017) for more details on transformer. For pretraining, given a fixed qubit configuration σ = (σ 1 , . . . , σ L ), LLM4QPE attempts to approximate the classical distribution p(σ 1 , . . . , σ L ) = |Ψ(σ 1 , . . . , σ L )| 2 in Eq. 1. Such joint distribution is approximated by factorizing it into a product of conditional probabilities:
p(σ 1 , . . . , σ L |c) = L l=1 p(σ l |σ l-1 , . . . , σ 1 , c).
(2)
The parameters are optimized by minimizing the average negative log-likelihood loss:
L unsup = 1 B p (σ,c)∈Dp -log p(σ 1 , . . . , σ L |c),(3)
which corresponds to the maximization of (conditional) likelihoods concerning the observed measurement outcomes. Pretraining is entirely unsupervised, enabling the model to be trained on extensive quantum data that encompass a wide range of physical conditions. To maintain the physical validity that restricts the output distribution to be normalized, a general strategy is employed to fix the last layer as the linear projection with softmax activation function, such that the output distribution satisfies
M σ1=1 • • • M σ L =1 p(σ 1 , . . . , σ L ) = 1 (see Appendix C for proof).

Section: SUPERVISED FINETUNING
The self-attention mechanism in the transformer allows LLM4QPE to model a wide range of downstream tasks, whether it involves classifying quantum phases of matter or predicting the entanglement entropy of quantum states. This adaptability is achieved simply by replacing the relevant inputs and outputs as needed. Rather than the two-step model (Wang et al., 2022) that uses the pretrained model to generate new measurement records conditioning on the physical variables and then predicts quantum properties based on classical shadow (Huang et al., 2020). LLM4QPE is an end-to-end task-agnostic pretrained model to provide property estimation for the quantum system.
Quantum Data for Finetuning and Input Embeddings. The dataset
D f = {(R j , c j ), p j } N f
i=j are generated using the random seed different from the seed for generating D p . Then we split D f to construct train/test dataset D t /D e . It is ensured that the sampled physical conditions for pretraining will not appear in finetuing, i.e. c j / ∈ {c i } for j ∈ {1, . . . , N f }. Note that the physical conditions for finetuning are sampled from the same distribution as the pretraining. The details about the data collection can be found in Appendix B. Unlike the pretraining where the input measurement records is a sentence-level vector σ b ∈ Z L , the input of fine-tuning becomes a batch of measurement records X i ∈ Z L×K f where K f is the number of measurement strings. The reason for such change can be explained through both intuitive and rational perspectives. Intuitively, single measurement string cannot reflect the whole picture of the quantum system. Rationally, predicting the properties of the quantum system in classical computers generally requires exponential number of measurements with respect to the system size L (Gebhart et al., 2023). Even though for some quantum systems with low entanglement, the number stills grows quasi-polynomially with L (Huang et al., 2022). Accordingly, the input of the model is replaced with {(X j , c j ), p j } Bt j=1 where the tuple (X j , c j ) is the input, p j is the corresponding label and B t is the batch size used for supervised finetuning. The embedding is also distinct from that of pretraining. The learned token embeddings for the measurement string σ i is not feasible for the batch-style records X j . To deal with it, a Long Short-Term Memory (LSTM) layer is attached in front of the decoder, as depicted in Fig. 1c. The LSTM layer converts the discrete measurement records X j and outputs high-order embeddings E rnn ∈ R Bt×L×d . The additional embeddings including physical condition embeddings and positional embeddings are transferred from pretraining. The output embedding is the summation given as
E out = E rnn + E c + E p transferred .
Feature Aggregation and Output Projection. The output of the L-layer transformer decoder is H ∈ R Bt×L×d . For a specific downstream task, the decoder is initialized with the pretrained parameters and all the parameters are finetuned towards a supervised loss. To obtain the feature representation for each of the B t training samples, a feature aggregation layer is attached after the last multi-head attention layer. This layer converts the hidden feature H along the second axis and output H ′ ∈ R Bt×d . Finally, additional linear projection layer is employed to project the feature into H ′′ ∈ R Bt×P , along with a task-dependent activated function which is taken to be tanh for predicting the correlation function, since we have the prior that each element of the label p j is in the range [-1, 1] (See Appendix B for details). While the log-softmax is adopted for classifying quantum phases of matter.
Learning Objective. The properties estimation for the quantum system are treated as the supervised learning tasks. Tow types of tasks are considered in this paper, including classifying quantum phases of matter and predicting correlation function. The former belongs to the regression task, while the latter can be regarded as a classification task. For each supervised task, we maintain a consistent architecture within LLM4QPE. We seamlessly integrate task-specific inputs and ground-truth labels into LLM4QPE and proceed to finetune all model's parameters in an end-to-end manner. Given that the training samples are {(X j , c j ), p j } Bt j=1 where B t is the batch size. For classifying quantum phases of matter, p j is the one-hot label. We minimize the observed data negative log-likelihood which yields a supervised loss for classification (with P classes):
L sup = - 1 B t j∈{1,...,Nt} P u=1 I [p j,u = 1] log f θ (X j , c j ) u ,(4)
where I[•] is an indicator function, N t is the size of training dataset and f θ (•) denotes the prediction of the model with parameters θ to be optimized. For predicting the correlation, p j is the continuous valued label. We adopt the Root Mean Square Error (RMSE) loss:
L sup = Lsup , Lsup = 1 B t j∈{1,...,Nt} P u=1 f θ (X j , c j ) u -p j,u 2 . (5
)
Detailed description of task-specific finetuning can be found in the experiment section.

Section: EXPERIMENTS
In this section, we present the finetuning results on two quantum property estimation tasks including classifying quantum phases of matter and predicting correlation function. Two families of quan-    tum models are considered -the Rydberg atom model (Bernien et al., 2017) and the anisotropic Heisenberg model (Kranzl et al., 2023).
As baseline methods, we basically consider the classical shadow (Huang et al., 2020) -a learningfree protocol for constructing the representation of an unknown quantum state. Besides, we compare with some kernel methods including Radial Basis Function (RBF) Kernel (Huang et al., 2022) and Neural Tangent Kernel (NTK) (Huang et al., 2022). We further consider some advanced deep learning based methods, such as PixelCNN (Sharir et al., 2020) and a classical shadow based generative model (NN-shadow) (Wang et al., 2022) for comparison.

Section: CLASSIFYING QUANTUM PHASES OF MATTER ON RYDBERG ATOM MODEL
We first consider the Rydberg atom model with different system size L ∈ {19, 25, 31}. We pretrain LLM4QPE for different system sizes separately with a fixed number of sampled physical conditions N p = 100. Each physical condition variable c i is a 4-dimensional vector denoted as
[L i , ∆ i , Ω i , R 0 /a i ] ⊤
where ∆ is the detuning of a laser, Ω is the Rabi frequency and R 0 /a is the interaction range. The values of these four variables can be obtained directly when initializing the (simulated) quantum experiments. For each physical condition we generate K f measurement strings based on computational basis measurement operators, such that the total number of possible measurement outcomes is M = 2. Then LLM4QPE is pretrained with dataset D p . The pretrained parameters are transferred to finetune the model using D t , where the number of sampled physical conditions N t ∈ {25, 64, 100} and the number of measurement strings K f ∈ {64, 128, 256, 512, 1024}.
We fix the size of D e for evaluation to be N e = 10000. Following (Bernien et al., 2017), we consider three categories of quantum phase, i.e., Disorder, Z 2 , Z 3 to establish the label p j , which is a 3-dimensional one-hot vector. More details about the data generation can be found in Appendix B.
We also take evaluation without pretaining the LLM4QPE: all the parameters are initialized randomly in a uniform distribution [-1, 1]. We use accuracy and weighted F1 score as metrics for 3-class classification for evaluation of our models and baselines. The results are listed in Tab. 1 and LLM4QPE achieves the best mean accuracy except for one setting L = 31 with N t = 25. Fig. 3 shows the performance on varied K f . LLM4QPE achieves the best weighted F1 score across all systems and in particular, outperforms by a large margin when K f = 64. The results indicate that pretrained LLM4QPE can handle the input when a few number of measurement records are available, which is greatly instrumental due to the expensive and time-consuming (simulated) quantum experiments. We further plot the training dynamics of LLM4QPE with and without pretraining throughout the training epochs in Fig. 4. The curves indicate that the pretraining enables much faster convergence of supervised loss and achieves better finetuning accuracy. Meanwhile, the required number of epoch for the model to attain 90% of its peak weighted F1 score is provided in Fig. 5. It reflectw that within the same system size L, the pretrained LLM4QPE converges faster than the non-pretrained version, with a lower training error and a higher test weighted F1 score.

Section: PREDICTING CORRELATION FUNCTION ON ANISOTROPIC HEISENBERG MODEL
Next we consider a regression task -predicting correlation on the anisotropic Heisenberg model. This quantum model inherits the long-range interactions between every two quantum sites, leading to a complex dynamics which is hard to be simulated by classical computers (Orús, 2019). We restrict the system size L ∈ {8, 10, 12} due to memory limitations. The ground states of quantum systems with different physical conditions are calculated by eigenvalue decomposition. For each physical condition we generate K f measurement strings based on Pauli-6 measurement operators such that M = 6. Then we pretrain the LLM4QPE for different system sizes independently with training size N p = 100.
For model's finetuning, we vary the number of generated training samples N t ∈ {20, 50, 90} and fix the measurement strings K f = 64. The dataset used for evaluation is generated with N e = 200. To obtain the ground-truth labels, We calculate true values of the two-body correlation functions and collect them as the supervised labels, which is an L × L continuous-valued matrix where each entry is in the range [-1, 1]. The RMSE results is reported in Tab. 2. LLM4QPE outperforms baselines in all settings. The learning-based models baselines often fail to surpass the predictive accuracy of learning-free classical shadow. While our pretrained LLM4QPE stands out by a remarkable margin.
Finally, we study the effects of condition embedding and the LSTM embedding on both Rydberg atom model and anisotropic Heisenberg model. Note that we replace the LSTM with a fully connected layer with same input/output dimension. The results are given in Tab. 3, where the results consistently show that both embedding techniques contribute to some positive effects and suggest that these two techniques can both help to leverage useful information from input quantum data.

Section: CONCLUSION AND OUTLOOK
This paper proposes a task-agnostic unsupervised pretraining approach for estimation of the properties of the quantum systems via quantum datasets. The core of our approach is a transformer encoder enabling to learn useful hidden information in a fully unsupervised pretraining procedure.
The pretrained parameters can be transferred to solving downstream tasks, leading to more effective classifying quantum phases and predicting correlation function on a resource-limited device given limited measurement information.

Section: A RELATED WORK
A.1 LEARNING-FREE METHODS FOR QPE Estimating the properties of the quantum system is a long-standing problem in quantum physics (D'Ariano et al., 2003). The main challenge is that the complexity of describing the quantum system using classical computers typically scales exponentially with respect to the system size (Nielsen & Chuang, 2010). Even though, in fact, the quantum systems studied in physical experiments generally can be described by a limited number of physical variables. This restriction leads to the studied quantum systems occupy only a small part of the exponentially large Hilbert space (Carrasquilla et al., 2019), such that they can be characterized by some classical methods within an acceptable error.
Traditional algorithms including the QMC (Ceperley & Alder, 1986) and DFT (Hohenberg & Kohn, 1964) has made success for investigating the electronic structure (or nuclear structure), principally the ground state of many-body systems, such as atoms, molecules, and the condensed phases (Gubernatis et al., 2016). However, these methods have scalability issues and are difficult to be used to deal with large-scale quantum many body problems. An alternative is a class of TNs methods (Orús, 2019) based on variational method and shows unprecedented performance in analyzing the characteristics of ground state. These methods including Matrix Product State (MPS) (Perez-Garcia et al., 2006) and Projected Entangled Pair States (PEPS) (Corboz, 2016). This class of methods approximates the wave function by decomposition of the high-order wave functions into multiple low-rank tensors. It is then possible to analyze properties of the quantum state by taking algebra operations on the wave function. Recently, the classical shadow protocol (Huang et al., 2020) suggests to use random measurements to characterize the quantum properties. Classical shadow has facilitated applications such as direct fidelity estimation (Struchalin et al., 2021) and state function prediction (Zhang et al., 2021).

Section: A.2 LEARNING-BASED METHODS FOR QPE
With the continuous development of machine learning technologies, neural network based methods have emerged to tackle the QPE problems. These methods can be categorized into two classes according to the purpose. The methods (Carleo & Troyer, 2017;Gao & Duan, 2017;Torlai et al., 2018;Schütt et al., 2019;Hibat-Allah et al., 2020;Zhang & Di Ventra, 2023) of the first class are called Neural Network Quantum State (NNQS), which replace the tensor used in TNs with a neural network as a parametric function approximator of quantum many-body wave functions. The parameterized wave function is updated by minimizing the expectation values of relevant observable estimators, based on either density matrix renormalization group (DMRG) algorithm (White, 1992) or variational Monte Carlo (VMC) (McMillan, 1965). Afterwards the interested properties can be analyzed by preforming algebra operations on the wave function. Another line of research (Gilmer et al., 2017;Kawai & Nakagawa, 2020;Xiao et al., 2022) is known as Neural Network Quantum Property Estimation (NNQPE). NNQPE directly optimizes the parameters towards a specific learning objective which represents a certain property of quantum systems such as the quantum phase.
For both NNQS and NNQPE, different neural network ansatz corresponds to solve quantum manybody problems with different physical structures. Examples include restricted Boltzmann machine (RBM) (Carleo & Troyer, 2017), recurrent neural networks (RNNs) (Carrasquilla et al., 2019), convolutional neural networks (CNNs) (Wu et al., 2019;Sharir et al., 2020;Wu et al., 2023), and transformers (Cha et al., 2021;Wang et al., 2022;Zhang & Di Ventra, 2023;Du et al., 2023).
Our work is closely related to NNQPE. While ours employs a unsupervised pretraining to extract the hidden information of the quantum systems govern by different parameters. We find empirically that this scheme can make the model perform better under a limited number of copies of quantum states and measurements. The recent work proposed by Zhu et al. (2022) implements a similar pretraining strategy for learning of quantum states, whereas our approach differs from it by avoiding assumptions about knowing the prior frequency about the measurement strings.

Section: B DETAILS OF THE QUANTUM DATASET GENERATION
A quantum dataset is a collection of data that describes quantum systems and their evolution. The collection of quantum data must take into account the following factors: 1) the method of data collection must be feasible on quantum devices and not contradict the disciplines of quantum mechanics; 2) the process of data collection is completely automated and does not require experienced experts to organize and label it and 3) the data must be structured and can be stored on resourcelimited classical devices, thus can be easy to be processed by the machine learning techniques without further post-processing. The quantum dataset we established satisfies these three points. It is also worth mentioning that our model can be used as a centralized infrastructure to process all these data uniformly, thanks to the unsupervised pretraining design of the model.
In this paper, we conduct simulated experiments to generate the quantum dataset in classical computers. For the anisotropic Heisenberg model, quantum measurement is performed using the Pauli-6 measurement operators such that M = 6, whereas computational basis measurement operators are employed for the Rydberg atom model leading to M = 2. Assume that variables c i describing the physical condition lives in a finite continuous space F within the physical restriction. Let . Afterwards we conduct simulated experiments for each c i and collect the corresponding measurement records. The system property p i is not needed since the pre-training phase is fully unsupervised. While for fine-tuning, we initialized the experiments with another random seed and sample N f physical conditions also within space F, resulting in {c j |c j ∈ F} N f j=1 . Note that We also collect the measurement records for each c j . The difference part is that we additionally calculate the system property p j and use it as supervised labels. We further split the D f into D t and D e for training and evaluation respectively with varied separation ratio. Details of the hyper-parameters and the experimental configurations of the dataset generation are discussed below.

Section: B.1 RYDBERG ATOM MODEL
Rydberg atom model is a programmable quantum simulators capable of preparing interacting qubit systems (Bernien et al., 2017). Such quantum model can be effectively described as a two-level quantum system consisting the ground state |g⟩ (|0⟩) and the Rydberg state |r⟩(|1⟩). The quantum dynamics of this model is governed by the Hamiltonian
H Rydberg = i Ω 2 σ i x - i ∆n i + i<j V 0 |⃗ x i -⃗ x j | n i n j (6)
where σ x is the Pauli-X matrix, Ω is the Rabi frequency, ∆ is the detuning of a laser, V 0 is the Rydberg interaction constant, i, j is the Rydberg interaction constant and ⃗ x i is the position vector of the site i. n i = |r i ⟩ ⟨r i | is the occupation number operator at site i, and σ i x = |g i ⟩⟨r i | + |r i ⟩⟨g i | describes the coupling between the ground state |g i ⟩ and the Rydberg state |r i ⟩ at position i.
We follow the recent work in (Wang et al., 2022) to generate the quantum dataset. We refer the readers to their paper for details. Here we briefly introduce the main procedures. We consider the Rydberg atom model with system size L ∈ {19, 25, 31}. We fix the interaction constant V 0 = 862690 × 2π MHz µm 6 and vary the value of Ω ∈ [0, 5] and ∆ ∈ [-10, 15] to get different physical conditions c, where c is a 4-dimensional vector in the form [L, ∆, Ω, R 0 /a], where R 0 /a denote the interaction range with R 0 = (V 0 /Ω) 1/6 . Then the approximate ground state for diffident physical condition is prepared by the tool Bloqade.jl (blo, 2023). This tool can also output the measurement strings and the true phase of each physical condition. The measurement operators are chosen to be the computational basis {|0⟩⟨0|, |1⟩⟨1|} for the quantum measurement, such that the total number of the possible outcomes is M = 2. In this paper, three different phases are considered including the Disordered phase, Z 2 Ordered phase and Z 3 Ordered phase. We sample N p = 100 physical conditions with K p = 1024 measurement strings for pre-training, and N t ∈ {25, 64, 100} physical conditions with K f ∈ {64, 128, 256, 512, 1024} for fine-tuning. The number of physical conditions for evaluation is fixed to be N e = 10000. The supervised labels for fine-tuning are onehot encoded vectors of the true phases such that the dimension (number of classes) of p is 3. Note that it is ensured that the sampled physical conditions for pre-training will not appear in fine-tuning.  
• • M σ k+1 =1 p(σ 1 , . . . , σ k+1 ) = M σ1=1 • • • M σ k+1 =1 |Ψ(σ 1 , . . . , σ k+1 )| 2 = M σ1=1 • • • M σ k+1 =1 k+1 i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 k i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 M σ k+1 =1 |Ψ(σ k+1 |σ k , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 |Ψ(σ 1 , . . . , σ k )| 2 = 1(11)
The proof then complete.

Section: D ADDITIONAL NUMERICAL RESULTS


Section: D.1 RESULTS OF PREDICTING THE ENTANGLEMENT ENTROPY
We take an additional downstream task: predicting the second-order Rényi entanglement entropy -log(tr(ρ 2 A )) for the anisotropic Heisenberg model, where A is the left-half subsystem with system size L/2 of the L-qubit quantum system. The number of training size is set to be N t = 90 and the predicted RMSE results are given in Tab. 4. It can be observed that pre-training remains effective for predicting the entanglement entropy of the anisotropic Heisenberg model.

Section: D.2 MODEL SENSITIVITY TO THE NUMBER OF MEASUREMENTS
In Sec. 4, we study the relationship between the number of measurements and the classification accuracy of quantum phase of matters on Rydberg atom model. It is empirically evident in Fig. 3 that achieving linear growth in classification accuracy requires an exponential increase in the number of measurements per training example. Beyond the scaling related to number of measurements, we dive into further research on the scaling relationship between accuracy and the size of the training set (i.e., the number of sampled physical conditions which determine the dynamics of the quantum systems). We constrain the number of measurement per example to 256 (since we find that a large value makes the accuracy reach saturation) and the results on the 31-qubit system are listed in the Tab. 5. The results show that the accuracy approximately exhibits linear growth w.r.t. training size. This finding is consistent with theoretical results presented in (Huang et al., 2022;Lewis et al., 2024), which demonstrate that there exists a polynomial scaling relationship between model performance and the size of training dataset. In this section, we consider fine tuning the LLM4QPE with out-of-distribution (OOD) dataset, which means the dataset used for fine-tuning and the dataset used for pre-training come from different distributions.
Here, we consider two different configurations to make the fine-tuning dataset out-of-distribution from the pre-training one: the first is to re-generate the fine-tuning data by modifying the physical variables and the second is to fine tune the model based on the parameters transferred from the model pretrained on fewer qubits. In the following, we consider the Rydberg atom model.
First, we take the evaluation that fine-tuning the model on 31-qubit system by using he parameters pre-trained on 19 and 25-qubit system. Note that the number of qubits is also a physical variable and we want to see if model parameters trained on small-scale systems could transfer and help model characterize larger-scale systems. The results are listed in Tab. 6. It is evident that pre-trained parameters transferred from small-scale systems is also useful for large-scale systems. 

Section: E LIMITATIONS
In this study, we concentrate on the classification of quantum phases of matter and the prediction of correlation functions for the Rydberg atom model and the anisotropic Heisenberg model, respectively. While the LLM4QPE model offers flexibility for addressing various quantum many-body challenges, such as reconstructing the density matrix. Our focus here is primarily on pretraining the model with a fixed number of measurement strings. The impact of varying the number of measurement strings on the model's performance presents a fascinating area for exploration. Additionally, the LLM4QPE model is characterized by a relatively small parameter count (tens of thousands of parameters) when compared to the significantly larger parameter sets of large language models. Due to the constraints imposed by the model's size, our pretraining efforts are confined to quantum systems govern by Hamiltonians from the same family. Looking forward, there is an anticipation to develop a more robust model, enriched with a greater number of parameters, through learning on datasets generated from diverse families of quantum systems.

Section: 
* Correspondence author. Work was partly supported by NSFC (62222607), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), SJTU Trans-med Awards Research (STAR) 20210106.

Section: ETHICS STATEMENT
This paper proposes a novel approach for estimating the properties of quantum systems inspired by LLMs. The authors acknowledge the potential ethical implications of this research, such as the misuse of quantum data, the bias or error in the estimation results, and the impact on the development of quantum technologies. The authors have followed the best practices for data collection, model design, and evaluation, and have disclosed the sources of funding and the conflicts of interest. The authors also adhere to the principles of research integrity and comply with the relevant laws and regulations. The authors hope that this research will contribute to the advancement of quantum science and benefit the future research.

Section: REPRODUCIBILITY STATEMENT
The generated quantum data of the Rydberg atom model and the anisotropic Heisenberg model is available at https://github.com/abel1231/qpe-data. The code to train the model and analyze the experimental results is available from the first author on reasonable request.
Published as a conference paper at ICLR 2024

Section: B.2 ANISOTROPIC HEISENBERG MODEL
Exploring the effects of these long-range interactions of the quantum system is essential for understanding the quantum mechanics (Bermúdez et al., 2017). In this paper, we consider the recent progress for the long-range interactions with the experimentally realized power-law exponent of the anisotropic Heisenberg model (Kranzl et al., 2023). The dynamics of the anisotropic Heisenberg model is determined by the Hamiltonian
where σ i x,y,z is the Pauli matrix operated on the i-th site, h determines the Ising interactions between the magnons, and J ij is the long-range interaction strength satisfying J ij = J/|i -j| α . We follow the configuration of (Kranzl et al., 2023) to geenrate the quantum dataset. The values of h and J are fixed with 1 and 369 rad/s, and we vary the value of α ∈ (1, 2] uniformly. It is extremely hard to characterize the quantum system with long-range interactions using the existing computing techniques. Thus we restrict the system size L ∈ {8, 10, 12}. For all the systems we consider the number of measurement strings used for pre-training as K p = 1024 and fix the number of sampled physical conditions as N p = 100. For model's finetuning, we vary the number of generated training samples N t ∈ {20, 50, 90} and fix the measurement strings K f = 64. The physical condition c is defined as a vector whose dimension C = L 2 , in which each element is the coupling strength J ij for i, j ∈ {1, . . . , L}. The problem of finding the ground state is viewed as the eigenvalue decomposition problem and we obtain the ground state for each sampled physical condition by the scipy (Virtanen et al., 2020) built-in functions. The measurement records and the true values of the two-body correlation function and the entanglement entropy are obtained using the pennylane (Bergholm et al., 2018) toolbox. We consider the Pauli-6 POVM measurement operators with M = 6 outcomes, which are given as
and {|0⟩, |1⟩}, {|+⟩, |-⟩}, {|r⟩, |l⟩} stand for the eigenbasis of the Pauli operators σ z , σ x , and σ y , respectively. For the task of predicting the correlation matrix, the ground-truth label is a L × L matrix and each element of the matrix is the expectation value of the observable
Thus each element can be written as tr(ρO ij ) in the range [-1, 1], where ρ is the density matrix of the ground state for each sampled physical condition. We flatten the correlation function matrix to be the L 2 -dimensional continuous-valued vector and treat it as the supervised label for fine-tuning. While for the task of predicting the entanglement entropy, the label is a real number which can be calculated as -log(tr(ρ 2 A )), where A is the left-half subsystem with system size L/2 of the L-qubit quantum system.

Section: C POOF OF THE NORMALIZED OUTPUT DISTRIBUTION
In the main text, we claim that the output (classical) distribution satisfies
as long as the last linear projection layer uses the softmax activated function. The proof is given below.
The softmax activated function is performed on the model's output, which is the product of conditional probabilities p(σ 1 , . . . , σ L ) = L i=1 p(σ i |σ i-1 , . . . , σ 1 ). It is easy to check the claim holds for L = 1. Given that the claim also holds for L = k. For L = k + 1, the following equation then be hold: M σi=1 p(σ i |σ i-1 , . . . , σ 1 ) = 1.
(10)


References:
[b0]  Bloqade;  Jl (2023). Package for the quantum computation and quantum simulation based on the neutralatom architecture. 
[b1] Anurag Anshu; Srinivasan Arunachalam (2024). A survey on the complexity of learning quantum states. Nature Reviews Physics
[b2] Ville Bergholm; Josh Izaac; Maria Schuld; Christian Gogolin; Shahnawaz Ahmed; Vishnu Ajith; M Sohaib Alam; Guillermo Alonso-Linaje; B Akashnarayanan; Ali Asadi (2018). Pennylane: Automatic differentiation of hybrid quantum-classical computations. 
[b3] Alejandro Bermúdez; Luca Tagliacozzo; Germán Sierra;  Richerme (2017). Long-range heisenberg models in quasiperiodically driven crystals of trapped ions. Physical Review B
[b4] Hannes Bernien; Sylvain Schwartz; Alexander Keesling; Harry Levine; Ahmed Omran; Hannes Pichler; Soonwon Choi; Alexander S Zibrov; Manuel Endres; Markus Greiner (2017). Probing many-body dynamics on a 51-atom quantum simulator. Nature
[b5] Gsl Fernando; Michał Brandao;  Horodecki (2015). Exponential decay of correlations implies area law. Communications in mathematical physics
[b6] Tom Brown; Benjamin Mann; Nick Ryder; Melanie Subbiah; Jared D Kaplan; Prafulla Dhariwal; Arvind Neelakantan; Pranav Shyam; Girish Sastry; Amanda Askell (2020). Language models are few-shot learners. Advances in neural information processing systems
[b7] Tiff Brydges; Andreas Elben; Petar Jurcevic; Benoît Vermersch; Christine Maier; Peter Ben P Lanyon; Rainer Zoller; Christian F Blatt;  Roos (2019). Probing rényi entanglement entropy via randomized measurements. Science
[b8] Giuseppe Carleo; Matthias Troyer (2017). Solving the quantum many-body problem with artificial neural networks. Science
[b9] Giuseppe Carleo; Ignacio Cirac; Kyle Cranmer; Laurent Daudet; Maria Schuld; Naftali Tishby; Leslie Vogt-Maranto; Lenka Zdeborová (2019). Machine learning and the physical sciences. Reviews of Modern Physics
[b10] Juan Carrasquilla; Giacomo Torlai; Roger G Melko; Leandro Aolita (2019). Reconstructing quantum states with generative models. Nature Machine Intelligence
[b11] David Ceperley; Berni Alder (1986). Quantum monte carlo. Science
[b12] Peter Cha; Paul Ginsparg; Felix Wu; Juan Carrasquilla; Eun-Ah Peter L Mcmahon;  Kim (2021). Attention-based quantum tomography. Machine Learning: Science and Technology
[b13] Philippe Corboz (2016). Variational optimization with infinite projected entangled-pair states. Physical Review B
[b14] Stefanie Czischek; Schuyler Moss; Matthew Radzihovsky; Ejaaz Merali; Roger G Melko (2022). Data-enhanced variational monte carlo simulations for rydberg atom arrays. Physical Review B
[b15] D' Mauro; Matteo Ga Ariano; Massimiliano F Paris;  Sacchi (2003). Quantum tomography. Advances in imaging and electron physics
[b16] Yuxuan Du; Yibo Yang; Tongliang Liu; Zhouchen Lin; Bernard Ghanem; Dacheng Tao (2023). Shadownet for data-centric quantum system learning. 
[b17] Xun Gao; Lu-Ming Duan (2017). Efficient representation of quantum many-body states with deep neural networks. Nature communications
[b18] Valentin Gebhart; Raffaele Santagati; Antonio ; Andrea Gentile; Erik M Gauger; David Craig; Natalia Ares; Leonardo Banchi; Florian Marquardt; Luca Pezzè; Cristian Bonato (2023). Learning quantum systems. Nature Reviews Physics
[b19] Justin Gilmer; S Samuel; Patrick F Schoenholz; Oriol Riley; George E Vinyals;  Dahl (2017). Neural message passing for quantum chemistry. PMLR
[b20] Aleksandra Gočanin; Ivan Šupić; Borivoje Dakić (2022). Sample-efficient device-independent quantum state verification and certification. PRX Quantum
[b21] James Gubernatis; Naoki Kawashima; Philipp Werner (2016). Quantum Monte Carlo Methods. Cambridge University Press
[b22] Mohamed Hibat-Allah; Martin Ganahl; Lauren E Hayward; Roger G Melko; Juan Carrasquilla (2020). Recurrent neural network wave functions. Physical Review Research
[b23] Pierre Hohenberg; Walter Kohn (1964). Inhomogeneous electron gas. Physical review
[b24] Hsin-Yuan Huang; Richard Kueng; John Preskill (2020). Predicting many properties of a quantum system from very few measurements. Nature Physics
[b25] Hsin-Yuan Huang; Richard Kueng; Giacomo Torlai; John Victor V Albert;  Preskill (2022). Provably efficient machine learning for quantum many-body problems. Science
[b26]  Jullien; B Roulleau;  Roche; Y Cavanna;  Jin;  Glattli (2014). Quantum tomography of an electron. Nature
[b27] Hiroki Kawai; O Yuya;  Nakagawa (2020). Predicting excited states from ground state wavefunction by supervised quantum machine learning. Machine Learning: Science and Technology
[b28] Florian Kranzl; Stefan Birnkammer; K Manoj; Alvise Joshi; Rainer Bastianello; Michael Blatt; Christian F Knap;  Roos (2023). Observation of magnon bound states in the long-range, anisotropic heisenberg model. Physical Review X
[b29] Dietrich Leibfried;  Meekhof; C H King; Wayne M Monroe; David J Itano;  Wineland (1996). Experimental determination of the motional quantum state of a trapped atom. Physical review letters
[b30] Laura Lewis; Hsin-Yuan Huang; Sebastian Viet T Tran; Richard Lehner; John Kueng;  Preskill (2024). Improved machine learning algorithm for predicting ground state properties. Nature communications
[b31] Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (2023). Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys
[b32] J E Loh;  Gubernatis;  Scalettar;  White;  Scalapino;  Sugar (1990). Sign problem in the numerical simulation of many-electron systems. Physical Review B
[b33]  William Lauchlin Mcmillan (1965). Ground state of liquid he 4. Physical Review
[b34] A Michael; Isaac L Nielsen;  Chuang (2010). Quantum computation and quantum information. Cambridge university press
[b35] Román Orús (2019). Tensor networks for complex quantum systems. Nature Reviews Physics
[b36] David Perez-Garcia; Frank Verstraete; Michael M Wolf; Ignacio Cirac (2006). Matrix product state representations. 
[b37] Alec Radford; Karthik Narasimhan; Tim Salimans; Ilya Sutskever (2018). Improving language understanding by generative pre-training. 
[b38] T Kristof; Michael Schütt; Alexandre Gastegger; K-R Tkatchenko; Reinhard J Müller;  Maurer (2019). Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications
[b39]  (). . Or
[b40] Yoav Sharir; Noam Levine; Giuseppe Wies; Amnon Carleo;  Shashua (2020). Deep autoregressive models for the efficient variational simulation of many-body quantum systems. Physical review letters
[b41]  Gi Struchalin; A Ya; E V Zagorovskii;  Kovlakov;  Ss Straupe;  Kulik (2021). Experimental estimation of quantum state properties from classical shadows. PRX Quantum
[b42] Giacomo Torlai; Guglielmo Mazzola; Juan Carrasquilla; Matthias Troyer; Roger Melko; Giuseppe Carleo (2018). Neural-network quantum state tomography. Nature Physics
[b43] Matthias Troyer; Uwe-Jens Wiese (2005). Computational complexity and fundamental limitations to fermionic quantum monte carlo simulations. Physical review letters
[b44] Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N Gomez; Łukasz Kaiser; Illia Polosukhin (2017). Attention is all you need. Advances in neural information processing systems
[b45] Pragya Verma; Donald G Truhlar (2020). Status and challenges of density functional theory. Trends in Chemistry
[b46] Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; J Stéfan; Matthew Van Der Walt; Joshua Brett; K Wilson; Nikolay Jarrod Millman;  Mayorov; R J Andrew; Eric Nelson; Robert Jones; Eric Kern; C J Larson; İlhan Carey; Yu Polat; Eric W Feng; Jake Moore; Denis Vanderplas; Josef Laxalde; Robert Perktold; Ian Cimrman; E A Henriksen; Charles R Quintero; Anne M Harris; Antônio H Archibald; Fabian Ribeiro;  Pedregosa (2020). Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods
[b47] Haoxiang Wang; Maurice Weber; Josh Izaac; Cedric Yen-Yu Lin (2022). Predicting properties of quantum systems with conditional generative models. 
[b48] Karl Weiss; Taghi M Khoshgoftaar; Dingding Wang (2016). A survey of transfer learning. Journal of Big data
[b49]  Steven R White (1992). Density matrix formulation for quantum renormalization groups. Physical review letters
[b50] Dian Wu; Lei Wang; Pan Zhang (2019). Solving statistical mechanics using variational autoregressive networks. Physical review letters
[b51] Ya-Dong Wu; Yan Zhu; Ge Bai; Yuexuan Wang; Giulio Chiribella (2023). Quantum similarity testing with convolutional neural networks. Physical Review Letters
[b52] Tailong Xiao; Jingzheng Huang; Hongjing Li; Jianping Fan; Guihua Zeng (2022). Intelligent certification for quantum simulators via machine learning. npj Quantum Information
[b53] Ting Zhang; Jinzhao Sun; Xiao-Xu Fang; Xiao-Ming Zhang; Xiao Yuan; He Lu (2021). Experimental quantum state measurement with classical shadows. Physical Review Letters
[b54] Yuan-Hang Zhang; Massimiliano Di; Ventra  (2023). Transformer quantum state: A multipurpose model for quantum many-body problems. Physical Review B
[b55] Yan Zhu; Ya-Dong Wu; Ge Bai; Dong-Sheng Wang; Yuexuan Wang; Giulio Chiribella (2022). Flexible learning of quantum states with generative query neural networks. Nature Communications

Figures:
Figure fig_0: 3
Type: figure
Caption: Figure 3 :3Figure 3: Comparison of weighted F1 score w.r.t. number of measurement strings on Rydberg atom model.
Data: 

Figure fig_1: 4
Type: figure
Caption: Figure 4 :4Figure 4: The evolution of training loss and test weighted F1 score with increasing training epochs where Nt = 100 and K f = 1024.
Data: 

Figure fig_2: 
Type: figure
Caption: 64
Data: 

Figure fig_3: 
Type: figure
Caption: D p = {R i , c i } Np i=1 denote the quantum dataset used for pre-training and D f = {(R i , c i ), p i } N f i=1 for fine-tuning, where |D p | = N p and |D f | = N f . For pre-training the model, we first uniformly sample a number of points {c i |c i ∈ F} Np i=1
Data: 

Figure fig_4: 
Type: figure
Caption: Second, we modify the detuning of a laser from[-10, 15]  (which is exactly used in the paper) to[-20, -10] ∪ [15, 25]  to generate OOD fine-tuning dataset, on Rydberg atom model with 19 qubits. The classification accracy are listed in Tab. 7. The pre-trained one fails to perform better than the LLM4QPE w/o pre-train. The main reason is that the modified detuning values have driven the quantum evolution into a very different dynamics and the pre-trained model learns less knowledge about it. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, and will be further explored in our future work.
Data: 

Figure tab_0: 
Type: table
Caption: The main part of the model is a multi-layer transformer decoder. Pretraining is entirely unsupervised. The output target is to approximate the classical distribution of the wave function. c) The model for finetuning and pretraining share the same structure. The pretrained parameters are transferred to the finetuning stage and updated towards a task specific supervised loss.
Data: Figure 1: Pretraining and finetuning of LLM4QPE. a) The output embeddings are the summationof token embeddings, condition embeddings and position embeddings. Three embeddings corre-spond to encode discrete measurement records, continuous physical variables and qubit positions,respectively. The token embeddings are replaced with the LSTM embeddings while finetuning. b)

Figure tab_2: 1
Type: table
Caption: Classification accuracy of quantum phases of matter on the Rydberg atom model with varied system size L and varied training size Nt, where K f is fixed to be 1024. The best results are highlighted in bold. = 25 N t = 64 N t = 100 N t = 25 N t = 64 N t = 100 N t = 25 N t = 64 N t = 100
Data: L = 19L = 25L = 31Method N t RBF Kernel 91.7592.2993.2588.4392.2794.288.3290.7992.75NTK92.1292.5893.7989.1794.1495.3986.9992.0392.71PixelCNN92.1892.7992.9888.9191.5994.7385.2992.2192.98NN-shadow91.7392.6493.6190.5791.3295.9186.3891.7992.51LLM4QPE94.1493.3895.9593.9596.5196.0587.9594.9596.67LLM4QPE w/o pretrain93.8092.8993.3590.8595.3595.2787.4592.7794.32

Figure tab_4: 2
Type: table
Caption: RMSE of predicting the correlation on the anisotropic Heisenberg model with varied system size L and training size N t . K f is fixed to 64. The best results are in bold.
Data: L = 8L = 10L = 12MethodNt = 20Nt = 50Nt = 90Nt = 20Nt = 50Nt = 90Nt = 20Nt = 50Nt = 90Classical Shadow0.20150.19540.19670.20150.19970.20150.19910.20640.2117RBF Kernel0.20850.20770.20810.21040.21310.20790.20390.19310.2157NTK0.20620.20640.20520.20950.20850.20970.21410.19220.2105PixelCNN0.2257±0.015 0.2357±0.019 0.2239±0.0240.23930.2289±0.023 0.2108±0.024 0.2390±0.024 0.2297±0.035 0.2267±0.038NN-shadow0.2069±0.022 0.2098±0.015 0.2057±0.012 0.2078±0.017 0.2054±0.017 0.1959±0.013 0.2037±0.029 0.2021±0.019 0.2102±0.026LLM4QPE0.1761±0.032 0.1612±0.022 0.1697±0.025 0.1986±0.011 0.1949±0.012 0.1893±0.023 0.1989±0.023 0.1787±0.021 0.1769±0.015LLM4QPE w/o pretrain 0.2043±0.027 0.2057±0.036 0.1949±0.027 0.2179±0.015 0.1984±0.013 0.1981±0.025 0.2040±0.028 0.2097±0.031 0.2026±0.027

Figure tab_5: 3
Type: table
Caption: Ablation study results on condition embedding and LSTM embedding. We consider N t = 64 with K f = 1024 for the Rydberg model, and N t = 50 with K f = 64 for the Heisenberg model.
Data: RydbergL = 19 L = 25 L = 31 HeisenbergL = 8 L = 10 L = 12original93.3896.5194.95original0.1612 0.1949 0.1787w/o cond. embed.93.2995.9693.52w/o cond. embed.0.1906 0.2095 0.1981w/o LSTM embed.90.7592.1889.65w/o LSTM embed. 0.1929 0.1997 0.1904

Figure tab_6: 4
Type: table
Caption: The RMSE of predicting the second-order Rényi entanglement entropy for the anisotropic Heisenberg model. We sample N p = 100 physical conditions with K p = 1024 measurement strings for pre-training. Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024 Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024 Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024
Data: MethodL = 8L = 10L = 12Classical Shadow1.582821.566881.509891.402701.229741.723791.714511.731351.727401.685562.894812.908742.913912.907732.89722RBF Kernel0.073220.071600.076700.076920.077060.025390.022570.022420.020020.019830.087100.082420.081040.070810.07032NTK0.071170.067990.088340.087080.086900.024970.022210.021290.019960.019470.084320.082490.080710.079980.07381PixelCNN0.071980.070910.068490.066870.067840.019070.018920.019480.019520.020890.074060.071450.071070.068950.06677NN-shadow0.068600.064150.064030.063150.062210.018440.017470.016640.016620.016570.072610.068580.065730.061560.05924LLM4QPE0.063020.061410.061040.059980.060720.016980.016230.015340.015170.015200.058610.058120.056480.056230.05597LLM4QPE w/o Pretrain0.066490.062950.062280.060710.060340.017110.016620.016960.016550.015320.066240.065420.063810.060420.05931Such thatMσ1=1

Figure tab_8: 5
Type: table
Caption: Classification accuracy of quantum phases of matter on the Rydberg atom model with varied training size N t , where L = 31 and K f = 256. The results are averaged over 3 runs with different random seeds. N t = 20 N t = 40 N t = 60 N t = 80
Data: LLM4QPE82.0587.2489.1690.63LLM4QPE w/o pretrain79.1781.7885.9688.47

Figure tab_9: 6
Type: table
Caption: Classification accuracy of quantum phases of matter on the 31-qubit Rydberg atom model. The pre-trained parameters are transferred from the model trained on smaller system size. The training size is set to be N t = 100, and the number of measurements K f = 1024.
Data: LLM4QPE (pre-trained on 19-qubit system) 95.74LLM4QPE (pre-trained on 25-qubit system) 96.13LLM4QPE (pre-trained on 31-qubit system) 96.67LLM4QPE w/o pre-train94.32

Figure tab_10: 7
Type: table
Caption: Classification accuracy of quantum phases of matter on the 19-qubit Rydberg atom model. The training size is set to be N t 100, and the number of measurements K f = 1024.
Data: no OOD OODLLM4QPE95.9584.82LLM4QPE w/o pre-train93.3594.23D.3 FINE TUNING


Formulas:
Formula formula_0: |ψ⟩ = M σ1=1 • • • M σ L =1 Ψ(σ 1 , . . . , σ L )|σ 1 , . . . , σ L ⟩,(1)

Formula formula_1: M σ1=1 • • • M σ L =1 |Ψ(σ 1 , . . . , σ L )| 2 = 1

Formula formula_2: # Measurement Strings # Physical Conditions a) c) b)

Formula formula_3: {(σ b , c b )|σ b ∈ E in , c b ∈ C in } Bp b=1 with batch size B p .

Formula formula_4: p(σ 1 , . . . , σ L |c) = L l=1 p(σ l |σ l-1 , . . . , σ 1 , c).

Formula formula_5: L unsup = 1 B p (σ,c)∈Dp -log p(σ 1 , . . . , σ L |c),(3)

Formula formula_6: M σ1=1 • • • M σ L =1 p(σ 1 , . . . , σ L ) = 1 (see Appendix C for proof).

Formula formula_7: D f = {(R j , c j ), p j } N f

Formula formula_8: E out = E rnn + E c + E p transferred .

Formula formula_9: L sup = - 1 B t j∈{1,...,Nt} P u=1 I [p j,u = 1] log f θ (X j , c j ) u ,(4)

Formula formula_10: L sup = Lsup , Lsup = 1 B t j∈{1,...,Nt} P u=1 f θ (X j , c j ) u -p j,u 2 . (5

Formula formula_11: )

Formula formula_12: [L i , ∆ i , Ω i , R 0 /a i ] ⊤

Formula formula_13: H Rydberg = i Ω 2 σ i x - i ∆n i + i<j V 0 |⃗ x i -⃗ x j | n i n j (6)

Formula formula_14: • • M σ k+1 =1 p(σ 1 , . . . , σ k+1 ) = M σ1=1 • • • M σ k+1 =1 |Ψ(σ 1 , . . . , σ k+1 )| 2 = M σ1=1 • • • M σ k+1 =1 k+1 i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 k i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 M σ k+1 =1 |Ψ(σ k+1 |σ k , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 |Ψ(σ 1 , . . . , σ k )| 2 = 1(11)

