Title: LLM4QPE: A Large Language Model Style Paradigm for Unsupervised Pretraining and Property Estimation of Quantum Systems

Abstract: Estimating properties of quantum systems, such as quantum phases, is fundamental to addressing complex quantum many-body problems in physics and chemistry. While deep learning models have shown promise in quantum property estimation (QPE), they are typically specialized for specific tasks and data, limiting their generalizability and requiring extensive labeled data. This paper introduces LLM4QPE, a novel Large Language Model (LLM)-style paradigm for quantum task-agnostic pretraining and finetuning. LLM4QPE addresses the limitations of existing methods by: 1) performing unsupervised pretraining on vast, diverse quantum datasets under varying physical conditions to learn universal quantum intricacies; and 2) leveraging the pretrained model for supervised finetuning on downstream tasks, achieving high performance with significantly limited labeled training data and accelerating convergence. This approach substantially mitigates the high cost and computational burden associated with quantum data collection and labeling. We demonstrate LLM4QPE's superior efficacy through extensive experiments on critical QPE tasks, including classifying quantum phases of matter on the Rydberg atom model and predicting two-body correlation functions on the anisotropic Heisenberg model. Our results highlight LLM4QPE's potential to revolutionize QPE, especially in resource-constrained scenarios.

Section: INTRODUCTION
Estimating quantum system properties, such as quantum phases, is a cornerstone for advancing and validating quantum technologies (Huang et al., 2020; Gočanin et al., 2022). These estimations often involve solving quantum many-body problems, which are notoriously challenging due to the exponential complexity inherent in describing generic quantum systems (Gebhart et al., 2023). However, physical systems of interest, particularly those governed by local Hamiltonians, possess a specific structure that circumvents the need for the full Hilbert space complexity (Carrasquilla et al., 2019). This inherent structure has paved the way for the emergence of various statistical and learning-based approaches, ranging from traditional Density Functional Theory (DFT) (Hohenberg & Kohn, 1964) and Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986) to advanced variational methods like Tensor Networks (TNs) (Orús, 2019) and Neural Network Quantum States (NNQS) (Zhang & Di Ventra, 2023).

Variational methods for Quantum Property Estimation (QPE) broadly fall into two categories. The first category includes TNs and NNQS, which frame QPE as an optimization problem. Here, the quantum state is approximated by a parameterized wave function, updated by minimizing expectation values of relevant observable estimators using algorithms such as Density Matrix Renormalization Group (DMRG) (White, 1992) or Variational Monte Carlo (VMC) (McMillan, 1965). Subsequently, desired properties are extracted via algebraic operations on the optimized wave function. The second category, termed Neural Network Quantum Property Estimation (NNQPE), employs neural networks as universal function approximators to directly predict quantum system properties (Gilmer et al., 2017; Kawai & Nakagawa, 2020; Xiao et al., 2022). NNQPE models take measurement results of the quantum state as input and directly output the property of interest, optimizing parameters via gradient descent. The primary objective of NNQPE is to accurately characterize quantum state properties with minimal identical copies and measurements. Compared to TNs, NNQPE methods can more readily capture non-local correlations and higher entanglement (Huang et al., 2022). Furthermore, NNQPE offers a direct prediction mechanism, circumventing the additional computational overhead required by TNs and NNQS to extract properties from optimized wave functions.

Despite its advantages, NNQPE faces significant challenges, particularly regarding generalization ability when confronted with limited measurement data for training (Gebhart et al., 2023). Improving generalizability often necessitates extensive measurement data and corresponding labels. However, the process of accurately labeling quantum system properties is computationally and memory-intensive, scaling exponentially with system size (Carleo et al., 2019). The labeling burden for quantum systems is particularly acute: DFT struggles with self-interaction and delocalization errors for strongly correlated quantum states (Verma & Truhlar, 2020); the sign problem renders QMC intractable for large or low-temperature systems (Loh Jr et al., 1990; Troyer & Wiese, 2005; Huang et al., 2022); and the maximum bond dimensions for TNs to preserve properties like entanglement entropy scale exponentially with evolution time (Brandao & Horodecki, 2015). These fundamental limitations underscore the difficulty of classical labeling due to the inherent quantum-classical computational divide.

Moreover, the application of advanced machine learning techniques in quantum physics, particularly NNQPE, is still in its nascent stages. Current NNQPE models are typically bespoke, trained for specific quantum systems and tasks. This contrasts sharply with the transformative success of Large Language Models (LLMs) (Radford et al., 2018; Brown et al., 2020), which have achieved remarkable general-purpose language generation and understanding capabilities through a paradigm of extensive pretraining followed by specialized finetuning. This LLM paradigm, where pretraining captures broad knowledge and finetuning adapts to specific tasks, represents a powerful, yet largely unexplored, avenue for quantum physics.

The escalating scale of quantum devices is generating vast amounts of quantum data from measurements (Brydges et al., 2019), rich with intricate details about quantum systems. This presents a compelling opportunity to develop a versatile model capable of mastering these quantum intricacies through extensive pretraining. The success of deep learning in high-dimensional data processing provides a strong foundation for this endeavor. Firstly, the sheer volume of quantum measurement data enables the extraction of meaningful patterns and representations (Anshu & Arunachalam, 2024). Secondly, the universal approximation capabilities of neural networks suggest that complex, nonlinear relationships in quantum systems can be modeled given sufficient data and resources (Carleo et al., 2019; Gebhart et al., 2023). Lastly, the task-agnostic nature of pretraining (Liu et al., 2023) is ideally suited for the diverse quantum realm, allowing a single model to learn hidden features across various systems and physical conditions. This is further bolstered by the principle of transfer learning (Weiss et al., 2016), where knowledge acquired in one context can significantly benefit related applications.

In this paper, we introduce LLM4QPE (Large Language Model for Quantum Property Estimation), an LLM-style task-agnostic pretraining and finetuning paradigm. LLM4QPE is pretrained using extensive unlabeled quantum data collected from diverse quantum systems within the same family, governed by varying physical conditions. For downstream tasks, we finetune LLM4QPE on two representative QPE problems: classifying quantum phases of matter and predicting two-body correlation functions. We validate our approach using two distinct families of quantum models: the Rydberg atom model and the anisotropic Heisenberg model. Our empirical results demonstrate LLM4QPE's superior performance, particularly in scenarios with limited data availability, highlighting its potential to address a critical bottleneck in QPE. Our key contributions are:
1)  **A Novel LLM-style Paradigm for QPE:** We propose LLM4QPE, the first LLM-style model for quantum property estimation. Unlike most existing supervised QPE models that depend on restricted, task-specific labeled quantum data, LLM4QPE employs a fully unsupervised and task-agnostic pretraining procedure, maximizing the expected log-likelihood of measurement bit strings to learn fundamental quantum system characteristics.
2)  **Innovative Architecture for Quantum Data:** We develop a novel architecture for LLM4QPE that effectively handles diverse quantum data. Specifically, a trainable Long Short-Term Memory (LSTM) embedding layer is integrated with a Transformer decoder to embed batch-style discrete measurement records into a continuous feature space. This LSTM-Transformer architecture provides an inherent framework for processing quantum data from experiments under varying physical conditions, enabling robust property prediction for quantum systems within the same family.
3)  **Comprehensive Quantum Dataset Collection:** We curate and utilize a specialized set of quantum data from simulations for both unsupervised pretraining and supervised finetuning. The pretraining dataset comprises quantum state measurement records, scaling linearly with system size and number of measurements, alongside corresponding physical condition variables. Downstream tasks leverage finetuning datasets generated from the same family of quantum systems, augmented with system properties as labels for tasks like phase classification and correlation prediction.
4)  **Empirical Validation and Superior Performance:** We rigorously verify the efficacy of LLM4QPE through extensive empirical studies on two challenging QPE tasks: classifying quantum phases of matter on the Rydberg atom model and predicting two-body correlation functions on the anisotropic Heisenberg model. Our results consistently demonstrate LLM4QPE's superior performance, especially under conditions of limited measurement information on resource-constrained devices.

Section: PRELIMINARIES OF QUANTUM STATE AND QUANTUM MEASUREMENT
This section introduces the foundational concepts of quantum computing essential for understanding LLM4QPE. For a more comprehensive background, readers are referred to (Nielsen & Chuang, 2010). Detailed discussions on related work are provided in Appendix A.
Quantum State and Density Operator. The quantum bit, or qubit, is the fundamental unit of information in a quantum system. A quantum state refers to the collective ensemble of all qubits within a (sub)system. A qubit, existing in a superposition of states, collapses to a deterministic outcome upon measurement. The mathematical description of a quantum state is basis-dependent. For instance, using the two orthogonal computational basis states, $|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ and $|1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$, a single qubit can be described as a linear combination $|\phi\rangle = \alpha|0\rangle + \beta|1\rangle = \begin{pmatrix} \alpha \\ \beta \end{pmatrix}$ in the complex space $\mathbb{C}^2$, where $\alpha, \beta \in \mathbb{C}$ are amplitudes satisfying $|\alpha|^2 + |\beta|^2 = 1$. An alternative, and often more general, description of a quantum state is provided by the density operator or density matrix. For example, the density matrix for $|0\rangle$ is $\rho_0 = |0\rangle\langle0| = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$, where $\langle0|$ denotes the conjugate transpose of $|0\rangle$. For a generic $L$-qubit quantum state, its wave function is given by:
$|\psi\rangle = \sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_L=1}^{M} \Psi(\sigma_1, \dots, \sigma_L)|\sigma_1, \dots, \sigma_L\rangle$, (1)
where $\Psi: \mathbb{Z}^L \to \mathbb{C}$ maps a fixed configuration $\sigma = (\sigma_1, \dots, \sigma_L)$ of $L$ qubits to a complex amplitude. These amplitudes satisfy the normalization condition:
$\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_L=1}^{M} |\Psi(\sigma_1, \dots, \sigma_L)|^2 = 1$,
and $\sigma_i \in \{1, \dots, M\}$ represents one of the $M$ possible outcomes from a quantum measurement on the $i$-th qubit. The wave function is formulated within a complex Hilbert space, where the vector representation of the quantum state $|\psi\rangle \in \mathbb{C}^{M^L}$ and its density matrix $|\psi\rangle\langle\psi| \in \mathbb{C}^{M^L \times M^L}$ become astronomically large for increasing $L$.
Quantum Measurement. Quantum measurement is the process of converting quantum information into a classical form for subsequent processing. This process is described by a set of measurement operators $\{O_m\}_{m=1}^M$ satisfying $\sum_m O_m = I$, where $M$ is the total number of possible outcomes. Measuring a qubit causes the wave function to collapse and yields one of the possible outcomes, corresponding to the index $m$ of the measurement operator. Specifically, upon measuring a qubit in state $\rho$, the probability of obtaining result $m$ is given by $p(m) = \text{tr}(\rho O_m)$. For an $L$-qubit quantum state, a common strategy involves measuring each qubit in parallel (Leibfried et al., 1996; Jullien et al., 2014). According to the Born rule of quantum mechanics, this measurement procedure outputs a measurement string $\sigma = (\sigma_1, \dots, \sigma_L)$, where each $\sigma_i \in \{1, \dots, M\}$, with a probability $|\Psi(\sigma_1, \dots, \sigma_L)|^2$ as defined in Eq. 1.

Section: LLM4QPE
3.1 OVERVIEW As illustrated in Fig. 1, the LLM4QPE paradigm consists of two primary stages: pretraining and finetuning. In the pretraining phase, the model is exposed to a large volume of unlabeled quantum data, $D_p$, undergoing a fully unsupervised training process. Subsequently, the learned parameters from pretraining are transferred to the supervised finetuning phase. Here, all model parameters are further optimized using labeled data, $D_t$, for various downstream tasks, each guided by its specific supervised loss function. Finally, the performance of LLM4QPE is evaluated using a dedicated test dataset, $D_e$. It is important to note that while downstream finetuning models initially share the same pretrained parameters, they ultimately possess separate, task-specific parameters. A key design principle of LLM4QPE is the consistent structural similarity between its pretraining and finetuning configurations, requiring only minor modifications to adapt to different downstream tasks.
The detailed description of the quantum data is provided in Section 3.2. We draw a powerful analogy between quantum data and natural language text: each measurement outcome $\sigma_i$ of a qubit is analogous to a 'token', and the total number of possible outcomes $M$ is akin to the 'vocabulary size' $|V|$. A measurement string $\sigma$, which represents a projection of the entire quantum system with inherent correlative effects, resembles a 'sentence' in textual data. Furthermore, a collection of measurement records $R$, comprising many measurement strings from diverse physical conditions, can be considered a 'corpus' gathered from various sources and genres. While similar analogies have been implicitly explored in prior works (Sharir et al., 2020; Hibat-Allah et al., 2020; Cha et al., 2021; Zhang & Di Ventra, 2023), these existing methods are largely confined to single-task training and testing, without incorporating a pretraining step. Our LLM4QPE model, in stark contrast, explicitly draws inspiration from the LLM paradigm to process and understand quantum data. Specifically, the data types and collection strategies are elaborated in Section 3.2, with further details in Appendix B. Given these generated datasets, we first detail the unsupervised pretraining of LLM4QPE in Section 3.3. Following this, the pretrained parameters are adapted and optimized towards a supervised loss for various tasks, as presented in Section 3.4.

Section: DESCRIPTION OF THE QUANTUM DATASET GENERATED FROM SIMULATION
We begin by formally defining the quantum dataset in Definition 1, which also outlines the procedures for its generation. An intuitive flowchart visualizing this process is presented in Fig. 2. Definition 1 (Quantum Dataset). A quantum dataset is represented as $\mathcal{D} = \{s_i\}$. Each sample $s_i = (R_i, c_i, p_i)$ comprises measurement records $R_i$, physical condition variables $c_i$, and (optionally) system property variables $p_i$. Let $L$ denote the number of qubits, $K$ represent the number of copies of each quantum state, and $M$ denote the number of possible outcomes from a single-qubit measurement. We elaborate on these components below:
1)  $c_i \in \mathbb{R}^C$ represents the physical condition variables that govern the evolution of the quantum system. These variables, such as system size, coupling strength of Hamiltonians, etc., are directly accessible during the initialization of quantum experiments.
2)  The measurement records, denoted as $R_i \in \mathbb{Z}^{K \times L}$, are outcomes generated by quantum measurements. A quantum state is prepared by evolving the system under a fixed physical condition $c_i$. Subsequently, quantum measurements are performed independently on each qubit in parallel using a set of measurement operators $\{O_m\}_{m=1}^M$. Performing measurements on $L$ qubits yields a measurement string, represented as $\sigma = (\sigma_1, \dots, \sigma_L)$, where each $\sigma_l \in \{1, \dots, M\}$. This measurement procedure is repeated $K$ times for each copy of the quantum state. Finally, we collect $K \times L$ measurement outcomes and store them within $R_i$.
3)  (Optional) Certain system property $p_i \in \mathbb{R}^P$ represents the statistics of the quantum system conditioned on $c_i$, such as quantum phase, correlation function, entanglement entropy, purity, etc. The exact values of $p_i$ can be calculated through classical post-processing by analyzing either the wave functions or measurement statistics. We treat these properties as supervised labels used for finetuning the model.
It is important to mention that our quantum dataset generation process is akin to that described in Wang et al. (2022). A key distinction, however, is that LLM4QPE explicitly requires ground-truth labels of system properties for finetuning. This contrasts with the approach in Wang et al. (2022), where authors propose reconstructing the quantum state via unsupervised learning on measurement records, followed by classical shadow (Huang et al., 2020) for predicting specific quantum properties. This two-step strategy often introduces additional computational overheads. Furthermore, our experiments demonstrate that parameters optimized within LLM4QPE for specific objectives, such as quantum phases of matter and correlation functions, consistently lead to superior performance in our numerical results.

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution Random Measurement
Digital quantum circuit Analog quantum simulation

Section: Quantum Evolution
Random Measurement
# Measurement Strings # Physical Conditions a) c) b)
Figure 2: Process of generating the quantum dataset. a) For each qubit of the quantum system, we perform quantum measurement using operators $\{O_m\}_{m=1}^M$ and obtain an integer outcome $m$ with probability $p(m)$. b) Consider the quantum system governed by different physical conditions. Quantum measurements are performed on an ensemble of identical quantum states evolved under each of fixed physical conditions. Measurement can be done in parallel for all the qubits of a single copy of the quantum state and outputs a measurement string. This process is applicable and feasible to existing digital and analog quantum computers. c) The collected data are structured and packed into a series of tensors, which can be efficiently stored into classical devices and are easy to process.

Section: UNSUPERVISED PRETRAINING
Unlike prior studies (Czischek et al., 2022; Zhang & Di Ventra, 2023) that view pretraining primarily as a warm-up phase to find suitable parameter initialization for subsequent finetuning with the same learning objective, LLM4QPE redefines pretraining as a crucial avenue to master the intricate underlying physics across diverse quantum systems within the same family. The pretrained parameters are then robustly transferable to a wide array of distinct downstream tasks. LLM4QPE is pretrained in a fully unsupervised manner, as comprehensively illustrated in Fig. 1b.
Quantum Data for Pretraining. The quantum dataset $D_p = \{R_i, c_i\}_{i=1}^{N_p}$ designated for pretraining is constructed following the strategy detailed in Section 3.2. Here, we elaborate on how this data is reorganized to suit LLM4QPE's unsupervised pretraining paradigm. Let $K_p$ denote the number of measurement strings utilized for pretraining. We concatenate all input measurement records $\{R_i\}_{i=1}^{N_p}$ along the first dimension to form an input tensor $E_{in} \in \mathbb{Z}^{N_p K_p \times L}$, where each row $\sigma_b \in \mathbb{Z}^L$ represents a single measurement string. Concurrently, we construct a matrix $C_{in} \in \mathbb{R}^{N_p K_p \times C}$, where each row $c_b \in \mathbb{R}^C$ corresponds to the physical condition variables. For both the Rydberg atom model and the anisotropic Heisenberg model, we consistently set $N_p = 100$ and $K_p = 1024$. In each training iteration, we randomly sample $B_p$ rows from $E_{in}$ and $C_{in}$. Thus, the input to the model for a given batch is $\{( \sigma_b, c_b ) | \sigma_b \in E_{in}, c_b \in C_{in}\}_{b=1}^{B_p}$ with a batch size $B_p$.
Input Embeddings. As depicted in Fig. 1a, LLM4QPE incorporates three distinct types of embeddings as input to effectively capture the hidden patterns inherent in the quantum system: token embeddings, condition embeddings, and position embeddings. Since each element $\sigma \in \{1, \dots, M\}$ within a measurement string $\sigma_b$ is a discrete integer, analogous to a 'token' in Natural Language Processing (NLP), we employ learned embeddings to convert the measurement string $\sigma_b$ (augmented with an additional start token 's') into token embeddings $E_t \in \mathbb{R}^{B_p \times (L+1) \times d}$, where $d$ is the feature dimension. Our empirical studies confirm that encoding the physical condition into the model significantly enhances performance. A Feed-Forward Network (FFN) with a single hidden layer is utilized to embed the continuous physical condition $c_b$ into a feature vector $E_c \in \mathbb{R}^{B_p \times d}$. This $E_c$ is treated as a sentence-level embedding, which is broadcasted and added to all $L$ measurement tokens, and is referred to as the global embedding. Subsequently, the final input embeddings are obtained by the broadcasting summation $E_{out} = E_t + E_c + E_p$, where $E_p$ represents the positional embeddings, identical to those used in (Vaswani et al., 2017). $E_{out}$ then serves as the input for deeper layers, as discussed below.
Model Architecture. As illustrated in Fig. 1b, the core component of LLM4QPE is a multi-layer Transformer decoder, derived from the architecture proposed in (Vaswani et al., 2017). The input to this decoder is the embedding $E_{out}$, and its output is $H \in \mathbb{R}^{B_p \times (L+1) \times d}$, which represents high-order representations of all measurement strings and conditional variables within a given batch. For a detailed exposition of the Transformer architecture, please refer to (Vaswani et al., 2017). During pretraining, given a fixed qubit configuration $\sigma = (\sigma_1, \dots, \sigma_L)$, LLM4QPE aims to approximate the classical probability distribution $p(\sigma_1, \dots, \sigma_L) = |\Psi(\sigma_1, \dots, \sigma_L)|^2$ as defined in Eq. 1. This joint distribution is effectively approximated by factorizing it into a product of conditional probabilities:
$p(\sigma_1, \dots, \sigma_L|c) = \prod_{l=1}^{L} p(\sigma_l|\sigma_{l-1}, \dots, \sigma_1, c)$. (2)
The model's parameters are optimized by minimizing the average negative log-likelihood loss:
$\mathcal{L}_{\text{unsup}} = \frac{1}{B_p} \sum_{(\sigma,c) \in D_p} -\log p(\sigma_1, \dots, \sigma_L|c)$, (3)
which corresponds to maximizing the (conditional) likelihoods of the observed measurement outcomes. This entirely unsupervised pretraining enables the model to learn from extensive quantum data spanning a wide range of physical conditions. To ensure physical validity and maintain a normalized output distribution, a standard strategy is employed: the final layer is a linear projection followed by a softmax activation function, guaranteeing that the output distribution satisfies $\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_L=1}^{M} p(\sigma_1, \dots, \sigma_L) = 1$ (see Appendix C for a formal proof).

Section: SUPERVISED FINETUNING
The inherent self-attention mechanism within the Transformer architecture empowers LLM4QPE to adeptly handle a diverse spectrum of downstream tasks, ranging from classifying quantum phases of matter to predicting the entanglement entropy of quantum states. This remarkable adaptability is achieved by simply adjusting the relevant inputs and outputs as required. Crucially, LLM4QPE distinguishes itself from two-step models (e.g., Wang et al., 2022) that first use a pretrained model to generate new measurement records conditioned on physical variables and then predict quantum properties via classical shadow protocols (Huang et al., 2020). Instead, LLM4QPE operates as an end-to-end task-agnostic pretrained model, directly providing property estimations for quantum systems.
Quantum Data for Finetuning and Input Embeddings. The finetuning dataset $D_f = \{(R_j, c_j), p_j\}_{j=1}^{N_f}$ is generated using a random seed distinct from that used for $D_p$. Subsequently, $D_f$ is partitioned into training ($D_t$) and evaluation ($D_e$) datasets. A critical aspect of our experimental design is ensuring that the physical conditions sampled for pretraining do not overlap with those used for finetuning, i.e., $c_j \notin \{c_i\}$ for $j \in \{1, \dots, N_f\}$. Nonetheless, the physical conditions for finetuning are sampled from the same underlying distribution as those for pretraining. Further details on data collection can be found in Appendix B. A notable difference from pretraining is the input format for finetuning: instead of a sentence-level vector $\sigma_b \in \mathbb{Z}^L$, the input becomes a batch of measurement records $X_j \in \mathbb{Z}^{L \times K_f}$, where $K_f$ is the number of measurement strings per sample. This change is justified by both intuitive and rational considerations. Intuitively, a single measurement string provides only a partial snapshot of the quantum system; a collection of strings offers a more complete picture. Rationally, accurately predicting quantum system properties on classical computers typically demands an exponential number of measurements with respect to the system size $L$ (Gebhart et al., 2023), or at least quasi-polynomially for low-entanglement systems (Huang et al., 2022). Accordingly, the model's input is structured as $\{(X_j, c_j), p_j\}_{j=1}^{B_t}$, where the tuple $(X_j, c_j)$ is the input, $p_j$ is the corresponding label, and $B_t$ is the batch size for supervised finetuning. The embedding strategy also differs from pretraining. The learned token embeddings suitable for single measurement strings $\sigma_i$ are not directly applicable to the batch-style records $X_j$. To address this, a Long Short-Term Memory (LSTM) layer is integrated at the forefront of the decoder, as depicted in Fig. 1c. This LSTM layer processes the discrete measurement records $X_j$ and outputs high-order embeddings $E_{rnn} \in \mathbb{R}^{B_t \times L \times d}$. The additional embeddings, including physical condition embeddings ($E_c$) and positional embeddings ($E_p$), are directly transferred from the pretrained model. The final output embedding for finetuning is the summation:
$E_{out} = E_{rnn} + E_c + E_{p}^{\text{transferred}}$.
Feature Aggregation and Output Projection. The output of the $L$-layer Transformer decoder is $H \in \mathbb{R}^{B_t \times L \times d}$. For each specific downstream task, the decoder is initialized with the parameters obtained from pretraining, and all parameters are subsequently finetuned towards a task-specific supervised loss. To derive a concise feature representation for each of the $B_t$ training samples, a feature aggregation layer is appended after the last multi-head attention layer. This layer processes the hidden feature $H$ along the second axis, yielding an aggregated feature $H' \in \mathbb{R}^{B_t \times d}$. Finally, an additional linear projection layer transforms $H'$ into $H'' \in \mathbb{R}^{B_t \times P}$, where $P$ is the dimension of the predicted property. This projection is followed by a task-dependent activation function. For predicting the correlation function, we employ the tanh activation, leveraging the prior knowledge that each element of the label $p_j$ lies within the range $[-1, 1]$ (refer to Appendix B for details). Conversely, for classifying quantum phases of matter, the log-softmax activation function is adopted.
Learning Objective. The estimation of quantum system properties is framed as supervised learning tasks. This paper considers two distinct types of tasks: classifying quantum phases of matter (a classification task) and predicting correlation functions (a regression task). For each supervised task, LLM4QPE maintains a consistent underlying architecture. We seamlessly integrate task-specific inputs and ground-truth labels into LLM4QPE and proceed to finetune all model parameters in an end-to-end manner. Given a batch of training samples $\{(X_j, c_j), p_j\}_{j=1}^{B_t}$ where $B_t$ is the batch size:
For classifying quantum phases of matter, $p_j$ is a one-hot encoded label. We minimize the observed data negative log-likelihood, which translates to a supervised loss for classification with $P$ classes:
$\mathcal{L}_{\text{sup}} = - \frac{1}{B_t} \sum_{j=1}^{N_t} \sum_{u=1}^{P} \mathbb{I}[p_{j,u} = 1] \log f_{\theta}(X_j, c_j)_u$, (4)
where $\mathbb{I}[\cdot]$ is an indicator function, $N_t$ is the size of the training dataset, and $f_{\theta}(\cdot)$ denotes the model's prediction with parameters $\theta$ to be optimized.
For predicting the correlation function, $p_j$ is a continuous-valued label. We adopt the Root Mean Square Error (RMSE) loss:
$\mathcal{L}_{\text{sup}} = \sqrt{\frac{1}{B_t} \sum_{j=1}^{N_t} \sum_{u=1}^{P} (f_{\theta}(X_j, c_j)_u - p_{j,u})^2}$. (5)
A more detailed description of task-specific finetuning configurations can be found in the experimental section.

Section: EXPERIMENTS
In this section, we present a comprehensive evaluation of LLM4QPE's finetuning performance on two distinct quantum property estimation tasks: classifying quantum phases of matter and predicting correlation functions. We investigate two prominent families of quantum models: the Rydberg atom model (Bernien et al., 2017) and the anisotropic Heisenberg model (Kranzl et al., 2023).
For robust comparison, we consider several baseline methods. These include the classical shadow (Huang et al., 2020), a learning-free protocol for efficiently constructing quantum state representations. We also compare against kernel-based methods such as the Radial Basis Function (RBF) Kernel (Huang et al., 2022) and Neural Tangent Kernel (NTK) (Huang et al., 2022). Furthermore, we include advanced deep learning-based approaches like PixelCNN (Sharir et al., 2020) and a classical shadow-based generative model (NN-shadow) (Wang et al., 2022) to benchmark against state-of-the-art techniques.

Section: CLASSIFYING QUANTUM PHASES OF MATTER ON RYDBERG ATOM MODEL
We first investigate the Rydberg atom model across different system sizes $L \in \{19, 25, 31\}$. LLM4QPE is pretrained separately for each system size, with a fixed number of sampled physical conditions $N_p = 100$. Each physical condition variable $c_i$ is represented as a 4-dimensional vector:
$[L_i, \Delta_i, \Omega_i, R_0/a_i]^\top$,
where $\Delta$ is the laser detuning, $\Omega$ is the Rabi frequency, and $R_0/a$ denotes the interaction range. The values for these four variables are directly accessible during the initialization of (simulated) quantum experiments. For each physical condition, we generate $K_f$ measurement strings based on computational basis measurement operators, resulting in a total of $M=2$ possible measurement outcomes. LLM4QPE is then pretrained using the dataset $D_p$. The pretrained parameters are subsequently transferred and finetuned using $D_t$, where the number of sampled physical conditions $N_t \in \{25, 64, 100\}$ and the number of measurement strings $K_f \in \{64, 128, 256, 512, 1024\}$.
The evaluation dataset $D_e$ is fixed to a size of $N_e = 10000$. Following (Bernien et al., 2017), we define three categories of quantum phases: Disorder, $Z_2$ Ordered, and $Z_3$ Ordered, to establish the label $p_j$, which is a 3-dimensional one-hot vector. Further details regarding data generation are provided in Appendix B.
For comparative analysis, we also evaluate LLM4QPE without pretraining, where all parameters are initialized randomly from a uniform distribution within $[-1, 1]$. We employ accuracy and weighted F1 score as key metrics for this 3-class classification task to evaluate both our models and baselines. The results, summarized in Table 1, demonstrate that LLM4QPE achieves the best mean accuracy across most settings, with a single exception for $L=31$ and $N_t=25$. Figure 3 illustrates the performance across varied $K_f$. LLM4QPE consistently achieves the best weighted F1 score across all system sizes and, notably, exhibits a substantial performance margin when $K_f = 64$. These results underscore LLM4QPE's robust capability to handle inputs with a limited number of measurement records, a significant advantage given the expensive and time-consuming nature of (simulated) quantum experiments. Furthermore, Figure 4 plots the training dynamics of LLM4QPE with and without pretraining across epochs. The curves clearly indicate that pretraining facilitates significantly faster convergence of the supervised loss and leads to superior finetuning accuracy. Figure 5 further quantifies the number of epochs required for the model to reach 90% of its peak weighted F1 score. It consistently shows that for the same system size $L$, the pretrained LLM4QPE converges more rapidly than its non-pretrained counterpart, achieving lower training error and higher test weighted F1 scores.

Section: PREDICTING CORRELATION FUNCTION ON ANISOTROPIC HEISENBERG MODEL
We now turn our attention to a regression task: predicting the correlation function on the anisotropic Heisenberg model. This quantum model is characterized by long-range interactions between every pair of quantum sites, leading to complex dynamics that are computationally challenging for classical simulation (Orús, 2019). Due to memory constraints, we restrict the system size to $L \in \{8, 10, 12\}$. The ground states of quantum systems under different physical conditions are computed via eigenvalue decomposition. For each physical condition, we generate $K_f$ measurement strings using Pauli-6 measurement operators, resulting in $M=6$ outcomes. LLM4QPE is pretrained independently for each system size with a training size of $N_p = 100$.
For the model's finetuning, we vary the number of generated training samples $N_t \in \{20, 50, 90\}$ while keeping the number of measurement strings fixed at $K_f = 64$. The evaluation dataset is generated with $N_e = 200$. To obtain the ground-truth labels, we calculate the true values of the two-body correlation functions, which form an $L \times L$ continuous-valued matrix where each entry is in the range $[-1, 1]$. These are collected as the supervised labels. The Root Mean Square Error (RMSE) results are reported in Table 2. LLM4QPE consistently outperforms all baselines across all settings. Notably, many learning-based baseline models struggle to surpass the predictive accuracy of the learning-free classical shadow protocol. In contrast, our pretrained LLM4QPE demonstrates a remarkable margin of superiority.
Finally, we conduct an ablation study to investigate the individual contributions of condition embedding and LSTM embedding in both the Rydberg atom model and the anisotropic Heisenberg model. For this study, the LSTM layer is replaced with a fully connected layer having the same input/output dimensions. The results, presented in Table 3, consistently indicate that both embedding techniques positively contribute to the model's performance, suggesting that they are crucial for leveraging useful information from the input quantum data.

Section: CONCLUSION AND OUTLOOK
This paper introduces LLM4QPE, a novel task-agnostic unsupervised pretraining paradigm for estimating properties of quantum systems using quantum datasets. The core of our approach lies in a Transformer decoder architecture, which effectively learns intricate hidden information through a fully unsupervised pretraining procedure.
The parameters acquired during pretraining are then successfully transferred to solve a variety of downstream tasks. This transfer learning capability leads to significantly more effective classification of quantum phases and prediction of correlation functions, particularly on resource-limited devices and with sparse measurement information. Our empirical results demonstrate LLM4QPE's superior performance across different quantum models and tasks, highlighting its potential to address key challenges in QPE.

Section: A RELATED WORK
A.1 LEARNING-FREE METHODS FOR QPE Estimating the properties of quantum systems is a long-standing and critical problem in quantum physics (D'Ariano et al., 2003). The primary challenge stems from the fact that describing a quantum system using classical computers typically incurs an exponential scaling of complexity with respect to the system size (Nielsen & Chuang, 2010). However, quantum systems encountered in physical experiments are often non-generic and can be characterized by a limited number of physical variables. This structural restriction implies that such systems occupy only a small, accessible portion of the exponentially large Hilbert space (Carrasquilla et al., 2019), allowing for their characterization by classical methods within acceptable error bounds.
Traditional algorithms, including Quantum Monte Carlo (QMC) (Ceperley & Alder, 1986) and Density Functional Theory (DFT) (Hohenberg & Kohn, 1964), have achieved significant success in investigating the electronic and nuclear structures, primarily the ground states, of many-body systems such as atoms, molecules, and condensed phases (Gubernatis et al., 2016). Nevertheless, these methods face scalability issues, rendering them challenging for large-scale quantum many-body problems. An alternative class of methods, Tensor Networks (TNs) (Orús, 2019), built upon variational principles, has demonstrated unprecedented performance in analyzing ground state characteristics. These methods encompass Matrix Product States (MPS) (Perez-Garcia et al., 2006) and Projected Entangled Pair States (PEPS) (Corboz, 2016). TNs approximate the wave function by decomposing high-order wave functions into multiple low-rank tensors, enabling the analysis of quantum state properties through algebraic operations on the wave function. More recently, the classical shadow protocol (Huang et al., 2020) has emerged, proposing the use of random measurements to efficiently characterize quantum properties. Classical shadow has facilitated various applications, including direct fidelity estimation (Struchalin et al., 2021) and state function prediction (Zhang et al., 2021).

Section: A.2 LEARNING-BASED METHODS FOR QPE
With the continuous advancements in machine learning technologies, neural network-based methods have emerged as powerful tools to tackle Quantum Property Estimation (QPE) problems. These methods can be broadly categorized into two classes based on their primary objective. The first class, known as Neural Network Quantum States (NNQS) (Carleo & Troyer, 2017; Gao & Duan, 2017; Torlai et al., 2018; Schütt et al., 2019; Hibat-Allah et al., 2020; Zhang & Di Ventra, 2023), replaces the tensors used in TNs with neural networks as parametric function approximators for quantum many-body wave functions. In NNQS, the parameterized wave function is optimized by minimizing the expectation values of relevant observable estimators, typically using algorithms such as Density Matrix Renormalization Group (DMRG) (White, 1992) or Variational Monte Carlo (VMC) (McMillan, 1965). Subsequently, the properties of interest are extracted by performing algebraic operations on the optimized wave function. The second line of research (Gilmer et al., 2017; Kawai & Nakagawa, 2020; Xiao et al., 2022) is referred to as Neural Network Quantum Property Estimation (NNQPE). NNQPE directly optimizes the neural network parameters towards a specific learning objective that represents a particular property of quantum systems, such as the quantum phase.
For both NNQS and NNQPE, different neural network architectures (ansätze) are employed to solve quantum many-body problems with varying physical structures. Examples include Restricted Boltzmann Machines (RBMs) (Carleo & Troyer, 2017), Recurrent Neural Networks (RNNs) (Carrasquilla et al., 2019), Convolutional Neural Networks (CNNs) (Wu et al., 2019; Sharir et al., 2020; Wu et al., 2023), and Transformers (Cha et al., 2021; Wang et al., 2022; Zhang & Di Ventra, 2023; Du et al., 2023).
Our work, LLM4QPE, is closely related to NNQPE. However, a key distinguishing feature of our approach is the utilization of unsupervised pretraining to extract hidden information from quantum systems governed by diverse parameters. Our empirical findings demonstrate that this scheme significantly enhances model performance, particularly under conditions of limited copies of quantum states and measurements. While the recent work by Zhu et al. (2022) implements a similar pretraining strategy for learning quantum states, our approach differs by avoiding assumptions about prior knowledge of measurement string frequencies.

Section: B DETAILS OF THE QUANTUM DATASET GENERATION
A quantum dataset is a structured collection of data that characterizes quantum systems and their dynamic evolution. The design and collection of such data must adhere to several crucial factors: 1) the data collection methodology must be experimentally feasible on actual quantum devices and consistent with the fundamental principles of quantum mechanics; 2) the data collection process should be fully automated, minimizing the need for expert intervention in organization and labeling; and 3) the data must be structured and efficiently storable on resource-limited classical devices, facilitating straightforward processing by machine learning techniques without extensive post-processing. The quantum dataset we have established for LLM4QPE meticulously satisfies all three of these criteria. Furthermore, our model's unsupervised pretraining design allows it to serve as a centralized infrastructure, uniformly processing this diverse data.
In this paper, we generate the quantum dataset through simulated experiments conducted on classical computers. For the anisotropic Heisenberg model, quantum measurements are performed using Pauli-6 measurement operators, yielding $M=6$ possible outcomes. For the Rydberg atom model, computational basis measurement operators are employed, resulting in $M=2$ outcomes. We assume that the physical condition variables $c_i$ reside within a finite continuous space $\mathcal{F}$, respecting physical restrictions. We then conduct simulated experiments for each sampled $c_i$ and collect the corresponding measurement records. For the pretraining phase, the system property $p_i$ is not required, as pretraining is entirely unsupervised. For the finetuning phase, we initialize experiments with a distinct random seed and sample $N_f$ physical conditions, also within space $\mathcal{F}$, generating $\{c_j | c_j \in \mathcal{F}\}_{j=1}^{N_f}$. Crucially, we ensure that the sampled physical conditions for pretraining do not overlap with those used for finetuning. For finetuning, we additionally calculate the system property $p_j$ and utilize it as supervised labels. We further partition the finetuning dataset $D_f$ into training ($D_t$) and evaluation ($D_e$) sets with varied separation ratios. Detailed hyper-parameters and experimental configurations for dataset generation are discussed subsequently.

Section: B.1 RYDBERG ATOM MODEL
The Rydberg atom model serves as a highly programmable quantum simulator, capable of preparing interacting qubit systems (Bernien et al., 2017). This quantum model can be effectively described as a two-level quantum system, comprising the ground state $|g\rangle$ (or $|0\rangle$) and the Rydberg state $|r\rangle$ (or $|1\rangle$). The quantum dynamics of this model are governed by the Hamiltonian:
$H_{\text{Rydberg}} = \sum_i \frac{\Omega}{2} \sigma_i^x - \sum_i \Delta n_i + \sum_{i<j} V_0 |\vec{x}_i - \vec{x}_j|^{-6} n_i n_j$ (6)
where $\sigma_i^x$ is the Pauli-X matrix acting on site $i$, $\Omega$ is the Rabi frequency, $\Delta$ is the laser detuning, $V_0$ is the Rydberg interaction constant, and $\vec{x}_i$ is the position vector of site $i$. The operator $n_i = |r_i\rangle\langle r_i|$ represents the occupation number at site $i$, and $\sigma_i^x = |g_i\rangle\langle r_i| + |r_i\rangle\langle g_i|$ describes the coupling between the ground state $|g_i\rangle$ and the Rydberg state $|r_i\rangle$ at position $i$.
We follow the methodology outlined in (Wang et al., 2022) to generate the quantum dataset, referring readers to their paper for comprehensive details. Here, we provide a concise overview of the main procedures. We consider the Rydberg atom model with system sizes $L \in \{19, 25, 31\}$. We fix the interaction constant $V_0 = 862690 \times 2\pi \text{ MHz } \mu\text{m}^6$ and vary the values of $\Omega \in [0, 5]$ and $\Delta \in [-10, 15]$ to obtain diverse physical conditions $c$. Each $c$ is a 4-dimensional vector of the form $[L, \Delta, \Omega, R_0/a]$, where $R_0/a$ denotes the interaction range with $R_0 = (V_0/\Omega)^{1/6}$. The approximate ground state for each physical condition is then prepared using the Bloqade.jl tool (blo, 2023). This tool also facilitates the output of measurement strings and the true phase for each physical condition. The measurement operators are chosen to be the computational basis $\{|0\rangle\langle0|, |1\rangle\langle1|\}$ for quantum measurement, resulting in a total of $M=2$ possible outcomes. In this paper, we consider three distinct phases: Disordered phase, $Z_2$ Ordered phase, and $Z_3$ Ordered phase. We sample $N_p = 100$ physical conditions with $K_p = 1024$ measurement strings for pre-training, and $N_t \in \{25, 64, 100\}$ physical conditions with $K_f \in \{64, 128, 256, 512, 1024\}$ for fine-tuning. The number of physical conditions for evaluation is fixed at $N_e = 10000$. The supervised labels for fine-tuning are one-hot encoded vectors of the true phases, meaning the dimension (number of classes) of $p$ is 3. It is strictly ensured that the physical conditions sampled for pre-training do not overlap with those used for fine-tuning.

Section: D ADDITIONAL NUMERICAL RESULTS

Section: D.1 RESULTS OF PREDICTING THE ENTANGLEMENT ENTROPY
As an additional downstream task, we investigate the prediction of the second-order Rényi entanglement entropy, defined as $-\log(\text{tr}(\rho_A^2))$, for the anisotropic Heisenberg model. Here, $A$ represents the left-half subsystem, with system size $L/2$ of the total $L$-qubit quantum system. The number of training samples is set to $N_t = 90$, and the predicted RMSE results are presented in Table 4. These results consistently demonstrate that pre-training remains highly effective for accurately predicting the entanglement entropy of the anisotropic Heisenberg model.

Section: D.2 MODEL SENSITIVITY TO THE NUMBER OF MEASUREMENTS AND TRAINING DATA SIZE
In Section 4, we explored the relationship between the number of measurements and the classification accuracy of quantum phases of matter on the Rydberg atom model. Figure 3 empirically illustrates that achieving linear growth in classification accuracy generally necessitates an exponential increase in the number of measurements per training example. Beyond the scaling with the number of measurements, we further investigate the scaling relationship between accuracy and the size of the training set (i.e., the number of sampled physical conditions that determine the quantum system's dynamics). We fix the number of measurements per example to 256, as we observe that larger values lead to accuracy saturation, and present the results for the 31-qubit system in Table 5. The results indicate that accuracy approximately exhibits linear growth with respect to the training size. This finding aligns with theoretical predictions presented in (Huang et al., 2022; Lewis et al., 2024), which suggest a polynomial scaling relationship between model performance and the size of the training dataset.

D.3 FINE-TUNING ON OUT-OF-DISTRIBUTION (OOD) DATASETS
In this section, we evaluate the fine-tuning performance of LLM4QPE on out-of-distribution (OOD) datasets, meaning the dataset used for fine-tuning originates from a different distribution than the one used for pre-training.
We consider two distinct configurations to create OOD fine-tuning datasets: firstly, by regenerating the fine-tuning data with modified physical variables, and secondly, by fine-tuning the model using parameters transferred from a model pretrained on fewer qubits. For this analysis, we focus on the Rydberg atom model.
First, we assess the fine-tuning performance on a 31-qubit system using parameters pretrained on 19- and 25-qubit systems. The number of qubits itself is a physical variable, and this experiment aims to determine if model parameters trained on smaller-scale systems can effectively transfer and aid in characterizing larger-scale systems. The results, summarized in Table 6, clearly demonstrate that pretrained parameters transferred from smaller-scale systems are indeed beneficial for larger-scale systems, indicating positive transfer learning capabilities across system sizes.
Second, we modify the detuning of a laser from its original range of $[-10, 15]$ (as used in the paper) to an OOD range of $[-20, -10] \cup [15, 25]$ to generate an OOD fine-tuning dataset for the 19-qubit Rydberg atom model. The classification accuracies are listed in Table 7. In this specific OOD scenario, the pretrained LLM4QPE fails to perform better than LLM4QPE without pre-training. The primary reason for this degradation is that the significantly altered detuning values drive the quantum system into a very different dynamic regime, for which the pre-trained model has learned less relevant knowledge. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, warranting further exploration in our future work.

Section: E LIMITATIONS AND FUTURE WORK
In this study, our primary focus has been on the classification of quantum phases of matter and the prediction of correlation functions, specifically for the Rydberg atom model and the anisotropic Heisenberg model. While the LLM4QPE model inherently offers the flexibility to address a broader range of quantum many-body challenges, such as reconstructing the density matrix, these areas were beyond the scope of the current investigation. Our pretraining efforts were also primarily concentrated on a fixed number of measurement strings. A deeper exploration into the impact of varying the number of measurement strings on the model's performance represents a fascinating and important area for future research. Additionally, the current LLM4QPE model is characterized by a relatively modest parameter count (in the tens of thousands) when compared to the significantly larger parameter sets of contemporary large language models. Due to the constraints imposed by the model's current scale, our pretraining endeavors were confined to quantum systems governed by Hamiltonians from the same family. Looking forward, there is a clear anticipation and strong motivation to develop a more robust and expansive model, enriched with a substantially greater number of parameters, through learning on datasets generated from diverse families of quantum systems. This will push the boundaries of generalizability and applicability of LLM-style paradigms in quantum physics.

Section: ACKNOWLEDGEMENTS
* Correspondence author. Work was partly supported by NSFC (62222607), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), SJTU Trans-med Awards Research (STAR) 20210106.

Section: ETHICS STATEMENT
This paper proposes a novel approach for estimating the properties of quantum systems inspired by LLMs. The authors acknowledge and have carefully considered the potential ethical implications of this research. These include, but are not limited to, the potential for misuse of quantum data, the presence of inherent biases or errors in estimation results, and the broader societal impact on the development and accessibility of quantum technologies. To mitigate these concerns, the authors have rigorously adhered to best practices for data collection, model design, and evaluation throughout the study. Furthermore, all sources of funding and any potential conflicts of interest have been transparently disclosed. The authors are committed to upholding the highest principles of research integrity and ensuring full compliance with all relevant laws and regulations. It is our sincere hope that this research will serve as a constructive contribution to the advancement of quantum science and will positively benefit future research endeavors in this rapidly evolving field.

Section: REPRODUCIBILITY STATEMENT
The generated quantum data for both the Rydberg atom model and the anisotropic Heisenberg model is publicly available at https://github.com/abel1231/qpe-data, ensuring full data reproducibility. The code developed for training the LLM4QPE model and analyzing the experimental results can be obtained from the first author upon reasonable request, facilitating independent verification and further research.
Published as a conference paper at ICLR 2024

Section: B.2 ANISOTROPIC HEISENBERG MODEL
Exploring the intricate effects of long-range interactions within quantum systems is paramount for a deeper understanding of quantum mechanics (Bermúdez et al., 2017). In this paper, we leverage recent progress in experimentally realized power-law exponents for the anisotropic Heisenberg model (Kranzl et al., 2023) to study such interactions. The dynamics of the anisotropic Heisenberg model are governed by the Hamiltonian:
$H = \sum_{i<j} J_{ij} (\sigma_i^x \sigma_j^x + \sigma_i^y \sigma_j^y + \Delta \sigma_i^z \sigma_j^z) + \sum_i h \sigma_i^z$
where $\sigma_i^{x,y,z}$ are the Pauli matrices acting on the $i$-th site, $h$ determines the Ising interactions between magnons, and $J_{ij}$ is the long-range interaction strength satisfying $J_{ij} = J/|i - j|^\alpha$. We adopt the configuration from (Kranzl et al., 2023) for generating our quantum dataset. The values of $h$ and $J$ are fixed at 1 and 369 rad/s, respectively, while the power-law exponent $\alpha$ is varied uniformly within the range $(1, 2]$. Characterizing quantum systems with long-range interactions using existing classical computing techniques is exceptionally challenging. Consequently, we restrict the system size to $L \in \{8, 10, 12\}$. For all considered systems, we use $K_p = 1024$ measurement strings for pre-training and fix the number of sampled physical conditions at $N_p = 100$. For model finetuning, we vary the number of generated training samples $N_t \in \{20, 50, 90\}$ and fix the number of measurement strings at $K_f = 64$. The physical condition $c$ is defined as a vector with dimension $C = L^2$, where each element corresponds to the coupling strength $J_{ij}$ for $i, j \in \{1, \dots, L\}$. The problem of finding the ground state is formulated as an eigenvalue decomposition problem, and we obtain the ground state for each sampled physical condition using built-in functions from SciPy (Virtanen et al., 2020). The measurement records and the true values of the two-body correlation function and entanglement entropy are obtained using the PennyLane (Bergholm et al., 2018) toolbox. We employ Pauli-6 POVM measurement operators, which yield $M=6$ outcomes and are given as:
$\{|0\rangle, |1\rangle\}$, $\{|+\rangle, |-\rangle\}$, $\{|r\rangle, |l\rangle\}$ stand for the eigenbasis of the Pauli operators $\sigma_z$, $\sigma_x$, and $\sigma_y$, respectively. For the task of predicting the correlation matrix, the ground-truth label is an $L \times L$ matrix where each element is the expectation value of the observable $\langle \sigma_i^z \sigma_j^z \rangle$. Thus, each element can be written as $\text{tr}(\rho O_{ij})$ and falls within the range $[-1, 1]$, where $\rho$ is the density matrix of the ground state for each sampled physical condition. We flatten this correlation function matrix into an $L^2$-dimensional continuous-valued vector, which serves as the supervised label for fine-tuning. For the task of predicting the entanglement entropy, the label is a real number calculated as $-\log(\text{tr}(\rho_A^2))$, where $A$ denotes the left-half subsystem of size $L/2$ of the $L$-qubit quantum system.

Section: C PROOF OF THE NORMALIZED OUTPUT DISTRIBUTION
In the main text, we assert that the output (classical) probability distribution satisfies $\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_L=1}^{M} p(\sigma_1, \dots, \sigma_L) = 1$, provided that the last linear projection layer employs a softmax activation function. The formal proof is presented below.
The softmax activation function is applied to the model's output, which is designed to approximate the joint probability distribution $p(\sigma_1, \dots, \sigma_L)$. As per Equation (2), this joint distribution is factorized into a product of conditional probabilities: $p(\sigma_1, \dots, \sigma_L|c) = \prod_{l=1}^{L} p(\sigma_l|\sigma_{l-1}, \dots, \sigma_1, c)$.
We can prove the normalization property by induction.
Base Case: For $L=1$, the output distribution is $p(\sigma_1|c)$. If the last layer uses softmax, then $\sum_{\sigma_1=1}^{M} p(\sigma_1|c) = 1$. This is trivially true by the definition of softmax.
Inductive Hypothesis: Assume that the claim holds for $L=k$, i.e., $\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_k=1}^{M} p(\sigma_1, \dots, \sigma_k|c) = 1$.
Inductive Step: For $L = k+1$, we need to show that $\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_{k+1}=1}^{M} p(\sigma_1, \dots, \sigma_{k+1}|c) = 1$.
Using the factorization from Equation (2):
$\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_{k+1}=1}^{M} p(\sigma_1, \dots, \sigma_{k+1}|c) = \sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_{k+1}=1}^{M} \left( \prod_{l=1}^{k+1} p(\sigma_l|\sigma_{l-1}, \dots, \sigma_1, c) \right)$
$= \sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_k=1}^{M} \left( \prod_{l=1}^{k} p(\sigma_l|\sigma_{l-1}, \dots, \sigma_1, c) \right) \left( \sum_{\sigma_{k+1}=1}^{M} p(\sigma_{k+1}|\sigma_k, \dots, \sigma_1, c) \right)$
Since the last linear projection layer uses softmax for $p(\sigma_{k+1}|\sigma_k, \dots, \sigma_1, c)$, we know that $\sum_{\sigma_{k+1}=1}^{M} p(\sigma_{k+1}|\sigma_k, \dots, \sigma_1, c) = 1$ for any given $(\sigma_k, \dots, \sigma_1, c)$.
Therefore, the expression simplifies to:
$= \sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_k=1}^{M} \left( \prod_{l=1}^{k} p(\sigma_l|\sigma_{l-1}, \dots, \sigma_1, c) \right) \cdot 1$
$= \sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_k=1}^{M} p(\sigma_1, \dots, \sigma_k|c)$
By the inductive hypothesis, this sum equals 1.
Thus, $\sum_{\sigma_1=1}^{M} \dots \sum_{\sigma_{k+1}=1}^{M} p(\sigma_1, \dots, \sigma_{k+1}|c) = 1$. (11)
The proof is then complete.

References:
[b0] Bloqade.jl (2023). Package for the quantum computation and quantum simulation based on the neutral-atom architecture.
[b1] Anurag Anshu; Srinivasan Arunachalam (2024). A survey on the complexity of learning quantum states. Nature Reviews Physics.
[b2] Ville Bergholm; Josh Izaac; Maria Schuld; Christian Gogolin; Shahnawaz Ahmed; Vishnu Ajith; M Sohaib Alam; Guillermo Alonso-Linaje; B Akashnarayanan; Ali Asadi (2018). Pennylane: Automatic differentiation of hybrid quantum-classical computations.
[b3] Alejandro Bermúdez; Luca Tagliacozzo; Germán Sierra; Richerme (2017). Long-range heisenberg models in quasiperiodically driven crystals of trapped ions. Physical Review B.
[b4] Hannes Bernien; Sylvain Schwartz; Alexander Keesling; Harry Levine; Ahmed Omran; Hannes Pichler; Soonwon Choi; Alexander S Zibrov; Manuel Endres; Markus Greiner (2017). Probing many-body dynamics on a 51-atom quantum simulator. Nature.
[b5] Gsl Fernando; Michał Brandao; Horodecki (2015). Exponential decay of correlations implies area law. Communications in mathematical physics.
[b6] Tom Brown; Benjamin Mann; Nick Ryder; Melanie Subbiah; Jared D Kaplan; Prafulla Dhariwal; Arvind Neelakantan; Pranav Shyam; Girish Sastry; Amanda Askell (2020). Language models are few-shot learners. Advances in neural information processing systems.
[b7] Tiff Brydges; Andreas Elben; Petar Jurcevic; Benoît Vermersch; Christine Maier; Peter Ben P Lanyon; Rainer Zoller; Christian F Blatt; Roos (2019). Probing rényi entanglement entropy via randomized measurements. Science.
[b8] Giuseppe Carleo; Matthias Troyer (2017). Solving the quantum many-body problem with artificial neural networks. Science.
[b9] Giuseppe Carleo; Ignacio Cirac; Kyle Cranmer; Laurent Daudet; Maria Schuld; Naftali Tishby; Leslie Vogt-Maranto; Lenka Zdeborová (2019). Machine learning and the physical sciences. Reviews of Modern Physics.
[b10] Juan Carrasquilla; Giacomo Torlai; Roger G Melko; Leandro Aolita (2019). Reconstructing quantum states with generative models. Nature Machine Intelligence.
[b11] David Ceperley; Berni Alder (1986). Quantum monte carlo. Science.
[b12] Peter Cha; Paul Ginsparg; Felix Wu; Juan Carrasquilla; Eun-Ah Peter L Mcmahon; Kim (2021). Attention-based quantum tomography. Machine Learning: Science and Technology.
[b13] Philippe Corboz (2016). Variational optimization with infinite projected entangled-pair states. Physical Review B.
[b14] Stefanie Czischek; Schuyler Moss; Matthew Radzihovsky; Ejaaz Merali; Roger G Melko (2022). Data-enhanced variational monte carlo simulations for rydberg atom arrays. Physical Review B.
[b15] D' Mauro; Matteo Ga Ariano; Massimiliano F Paris; Sacchi (2003). Quantum tomography. Advances in imaging and electron physics.
[b16] Yuxuan Du; Yibo Yang; Tongliang Liu; Zhouchen Lin; Bernard Ghanem; Dacheng Tao (2023). Shadownet for data-centric quantum system learning.
[b17] Xun Gao; Lu-Ming Duan (2017). Efficient representation of quantum many-body states with deep neural networks. Nature communications.
[b18] Valentin Gebhart; Raffaele Santagati; Antonio; Andrea Gentile; Erik M Gauger; David Craig; Natalia Ares; Leonardo Banchi; Florian Marquardt; Luca Pezzè; Cristian Bonato (2023). Learning quantum systems. Nature Reviews Physics.
[b19] Justin Gilmer; S Samuel; Patrick F Schoenholz; Oriol Riley; George E Vinyals; Dahl (2017). Neural message passing for quantum chemistry. PMLR.
[b20] Aleksandra Gočanin; Ivan Šupić; Borivoje Dakić (2022). Sample-efficient device-independent quantum state verification and certification. PRX Quantum.
[b21] James Gubernatis; Naoki Kawashima; Philipp Werner (2016). Quantum Monte Carlo Methods. Cambridge University Press.
[b22] Mohamed Hibat-Allah; Martin Ganahl; Lauren E Hayward; Roger G Melko; Juan Carrasquilla (2020). Recurrent neural network wave functions. Physical Review Research.
[b23] Pierre Hohenberg; Walter Kohn (1964). Inhomogeneous electron gas. Physical review.
[b24] Hsin-Yuan Huang; Richard Kueng; John Preskill (2020). Predicting many properties of a quantum system from very few measurements. Nature Physics.
[b25] Hsin-Yuan Huang; Richard Kueng; Giacomo Torlai; John Victor V Albert; Preskill (2022). Provably efficient machine learning for quantum many-body problems. Science.
[b26] Jullien; B Roulleau; Roche; Y Cavanna; Jin; Glattli (2014). Quantum tomography of an electron. Nature.
[b27] Hiroki Kawai; O Yuya; Nakagawa (2020). Predicting excited states from ground state wavefunction by supervised quantum machine learning. Machine Learning: Science and Technology.
[b28] Florian Kranzl; Stefan Birnkammer; K Manoj; Alvise Joshi; Rainer Bastianello; Michael Blatt; Christian F Knap; Roos (2023). Observation of magnon bound states in the long-range, anisotropic heisenberg model. Physical Review X.
[b29] Dietrich Leibfried; Meekhof; C H King; Wayne M Monroe; David J Itano; Wineland (1996). Experimental determination of the motional quantum state of a trapped atom. Physical review letters.
[b30] Laura Lewis; Hsin-Yuan Huang; Sebastian Viet T Tran; Richard Lehner; John Kueng; Preskill (2024). Improved machine learning algorithm for predicting ground state properties. Nature communications.
[b31] Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (2023). Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys.
[b32] J E Loh; Gubernatis; Scalettar; White; Scalapino; Sugar (1990). Sign problem in the numerical simulation of many-electron systems. Physical Review B.
[b33] William Lauchlin Mcmillan (1965). Ground state of liquid he 4. Physical Review.
[b34] A Michael; Isaac L Nielsen; Chuang (2010). Quantum computation and quantum information. Cambridge university press.
[b35] Román Orús (2019). Tensor networks for complex quantum systems. Nature Reviews Physics.
[b36] David Perez-Garcia; Frank Verstraete; Michael M Wolf; Ignacio Cirac (2006). Matrix product state representations.
[b37] Alec Radford; Karthik Narasimhan; Tim Salimans; Ilya Sutskever (2018). Improving language understanding by generative pre-training.
[b38] T Kristof; Michael Schütt; Alexandre Gastegger; K-R Tkatchenko; Reinhard J Müller; Maurer (2019). Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications.
[b39] (). . Or.
[b40] Yoav Sharir; Noam Levine; Giuseppe Wies; Amnon Carleo; Shashua (2020). Deep autoregressive models for the efficient variational simulation of many-body quantum systems. Physical review letters.
[b41] Gi Struchalin; A Ya; E V Zagorovskii; Kovlakov; Ss Straupe; Kulik (2021). Experimental estimation of quantum state properties from classical shadows. PRX Quantum.
[b42] Giacomo Torlai; Guglielmo Mazzola; Juan Carrasquilla; Matthias Troyer; Roger Melko; Giuseppe Carleo (2018). Neural-network quantum state tomography. Nature Physics.
[b43] Matthias Troyer; Uwe-Jens Wiese (2005). Computational complexity and fundamental limitations to fermionic quantum monte carlo simulations. Physical review letters.
[b44] Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N Gomez; Łukasz Kaiser; Illia Polosukhin (2017). Attention is all you need. Advances in neural information processing systems.
[b45] Pragya Verma; Donald G Truhlar (2020). Status and challenges of density functional theory. Trends in Chemistry.
[b46] Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; J Stéfan; Matthew Van Der Walt; Joshua Brett; K Wilson; Nikolay Jarrod Millman; Mayorov; R J Andrew; Eric Nelson; Robert Jones; Eric Kern; C J Larson; İlhan Carey; Yu Polat; Eric W Feng; Jake Moore; Denis Vanderplas; Josef Laxalde; Robert Perktold; Ian Cimrman; E A Henriksen; Charles R Quintero; Anne M Harris; Antônio H Archibald; Fabian Ribeiro; Pedregosa (2020). Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods.
[b47] Haoxiang Wang; Maurice Weber; Josh Izaac; Cedric Yen-Yu Lin (2022). Predicting properties of quantum systems with conditional generative models.
[b48] Karl Weiss; Taghi M Khoshgoftaar; Dingding Wang (2016). A survey of transfer learning. Journal of Big data.
[b49] Steven R White (1992). Density matrix formulation for quantum renormalization groups. Physical review letters.
[b50] Dian Wu; Lei Wang; Pan Zhang (2019). Solving statistical mechanics using variational autoregressive networks. Physical review letters.
[b51] Ya-Dong Wu; Yan Zhu; Ge Bai; Yuexuan Wang; Giulio Chiribella (2023). Quantum similarity testing with convolutional neural networks. Physical Review Letters.
[b52] Tailong Xiao; Jingzheng Huang; Hongjing Li; Jianping Fan; Guihua Zeng (2022). Intelligent certification for quantum simulators via machine learning. npj Quantum Information.
[b53] Ting Zhang; Jinzhao Sun; Xiao-Xu Fang; Xiao-Ming Zhang; Xiao Yuan; He Lu (2021). Experimental quantum state measurement with classical shadows. Physical Review Letters.
[b54] Yuan-Hang Zhang; Massimiliano Di; Ventra (2023). Transformer quantum state: A multipurpose model for quantum many-body problems. Physical Review B.
[b55] Yan Zhu; Ya-Dong Wu; Ge Bai; Dong-Sheng Wang; Yuexuan Wang; Giulio Chiribella (2022). Flexible learning of quantum states with generative query neural networks. Nature Communications.

Figures:
Figure fig_0: 3
Type: figure
Caption: Figure 3: Comparison of weighted F1 score w.r.t. number of measurement strings on Rydberg atom model.
Data:

Figure fig_1: 4
Type: figure
Caption: Figure 4: The evolution of training loss and test weighted F1 score with increasing training epochs where Nt = 100 and K f = 1024.
Data:

Figure fig_2:
Type: figure
Caption: 64
Data:

Figure fig_3:
Type: figure
Caption: D p = {R i , c i } Np i=1 denote the quantum dataset used for pre-training and D f = {(R i , c i ), p i } N f i=1 for fine-tuning, where |D p | = N p and |D f | = N f . For pre-training the model, we first uniformly sample a number of points {c i |c i ∈ F} Np i=1
Data:

Figure fig_4:
Type: figure
Caption: Second, we modify the detuning of a laser from[-10, 15] (which is exactly used in the paper) to[-20, -10] ∪ [15, 25] to generate OOD fine-tuning dataset, on Rydberg atom model with 19 qubits. The classification accracy are listed in Tab. 7. The pre-trained one fails to perform better than the LLM4QPE w/o pre-train. The main reason is that the modified detuning values have driven the quantum evolution into a very different dynamics and the pre-trained model learns less knowledge about it. Whether pre-training of LLM4QPE remains beneficial for OOD quantum datasets in other settings remains an open question, and will be further explored in our future work.
Data:

Figure tab_0:
Type: table
Caption: The main part of the model is a multi-layer transformer decoder. Pretraining is entirely unsupervised. The output target is to approximate the classical distribution of the wave function. c) The model for finetuning and pretraining share the same structure. The pretrained parameters are transferred to the finetuning stage and updated towards a task specific supervised loss.
Data: Figure 1: Pretraining and finetuning of LLM4QPE. a) The output embeddings are the summationof token embeddings, condition embeddings and position embeddings. Three embeddings corre-spond to encode discrete measurement records, continuous physical variables and qubit positions,respectively. The token embeddings are replaced with the LSTM embeddings while finetuning. b)

Figure tab_2: 1
Type: table
Caption: Classification accuracy of quantum phases of matter on the Rydberg atom model with varied system size L and varied training size Nt, where K f is fixed to be 1024. The best results are highlighted in bold. = 25 N t = 64 N t = 100 N t = 25 N t = 64 N t = 100 N t = 25 N t = 64 N t = 100
Data: L = 19L = 25L = 31Method N t RBF Kernel 91.7592.2993.2588.4392.2794.288.3290.7992.75NTK92.1292.5893.7989.1794.1495.3986.9992.0392.71PixelCNN92.1892.7992.9888.9191.5994.7385.2992.2192.98NN-shadow91.7392.6493.6190.5791.3295.9186.3891.7992.51LLM4QPE94.1493.3895.9593.9596.5196.0587.9594.9596.67LLM4QPE w/o pretrain93.8092.8993.3590.8595.3595.2787.4592.7794.32

Figure tab_4: 2
Type: table
Caption: RMSE of predicting the correlation on the anisotropic Heisenberg model with varied system size L and training size N t . K f is fixed to 64. The best results are in bold.
Data: L = 8L = 10L = 12MethodNt = 20Nt = 50Nt = 90Nt = 20Nt = 50Nt = 90Nt = 20Nt = 50Nt = 90Classical Shadow0.20150.19540.19670.20150.19970.20150.19910.20640.2117RBF Kernel0.20850.20770.20810.21040.21310.20790.20390.19310.2157NTK0.20620.20640.20520.20950.20850.20970.21410.19220.2105PixelCNN0.2257±0.015 0.2357±0.019 0.2239±0.0240.23930.2289±0.023 0.2108±0.024 0.2390±0.024 0.2297±0.035 0.2267±0.038NN-shadow0.2069±0.022 0.2098±0.015 0.2057±0.012 0.2078±0.017 0.2054±0.017 0.1959±0.013 0.2037±0.029 0.2021±0.019 0.2102±0.026LLM4QPE0.1761±0.032 0.1612±0.022 0.1697±0.025 0.1986±0.011 0.1949±0.012 0.1893±0.023 0.1989±0.023 0.1787±0.021 0.1769±0.015LLM4QPE w/o pretrain 0.2043±0.027 0.2057±0.036 0.1949±0.027 0.2179±0.015 0.1984±0.013 0.1981±0.025 0.2040±0.028 0.2097±0.031 0.2026±0.027

Figure tab_5: 3
Type: table
Caption: Ablation study results on condition embedding and LSTM embedding. We consider N t = 64 with K f = 1024 for the Rydberg model, and N t = 50 with K f = 64 for the Heisenberg model.
Data: RydbergL = 19 L = 25 L = 31 HeisenbergL = 8 L = 10 L = 12original93.3896.5194.95original0.1612 0.1949 0.1787w/o cond. embed.93.2995.9693.52w/o cond. embed.0.1906 0.2095 0.1981w/o LSTM embed.90.7592.1889.65w/o LSTM embed. 0.1929 0.1997 0.1904

Figure tab_6: 4
Type: table
Caption: The RMSE of predicting the second-order Rényi entanglement entropy for the anisotropic Heisenberg model. We sample N p = 100 physical conditions with K p = 1024 measurement strings for pre-training. Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024 Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024 Kf = 64 Kf = 128 Kf = 256 Kf = 512 Kf = 1024
Data: MethodL = 8L = 10L = 12Classical Shadow1.582821.566881.509891.402701.229741.723791.714511.731351.727401.685562.894812.908742.913912.907732.89722RBF Kernel0.073220.071600.076700.076920.077060.025390.022570.022420.020020.019830.087100.082420.081040.070810.07032NTK0.071170.067990.088340.087080.086900.024970.022210.021290.019960.019470.084320.082490.080710.079980.07381PixelCNN0.071980.070910.068490.066870.067840.019070.018920.019480.019520.020890.074060.071450.071070.068950.06677NN-shadow0.068600.064150.064030.063150.062210.018440.017470.016640.016620.016570.072610.068580.065730.061560.05924LLM4QPE0.063020.061410.061040.059980.060720.016980.016230.015340.015170.015200.058610.058120.056480.056230.05597LLM4QPE w/o Pretrain0.066490.062950.062280.060710.060340.017110.016620.016960.016550.015320.066240.065420.063810.060420.05931Such thatMσ1=1

Figure tab_8: 5
Type: table
Caption: Classification accuracy of quantum phases of matter on the Rydberg atom model with varied training size N t , where L = 31 and K f = 256. The results are averaged over 3 runs with different random seeds. N t = 20 N t = 40 N t = 60 N t = 80
Data: LLM4QPE82.0587.2489.1690.63LLM4QPE w/o pretrain79.1781.7885.9688.47

Figure tab_9: 6
Type: table
Caption: Classification accuracy of quantum phases of matter on the 31-qubit Rydberg atom model. The pre-trained parameters are transferred from the model trained on smaller system size. The training size is set to be N t = 100, and the number of measurements K f = 1024.
Data: LLM4QPE (pre-trained on 19-qubit system) 95.74LLM4QPE (pre-trained on 25-qubit system) 96.13LLM4QPE (pre-trained on 31-qubit system) 96.67LLM4QPE w/o pre-train94.32

Figure tab_10: 7
Type: table
Caption: Classification accuracy of quantum phases of matter on the 19-qubit Rydberg atom model. The training size is set to be N t 100, and the number of measurements K f = 1024.
Data: no OOD OODLLM4QPE95.9584.82LLM4QPE w/o pre-train93.3594.23D.3 FINE TUNING


Formulas:
Formula formula_0: |ψ⟩ = M σ1=1 • • • M σ L =1 Ψ(σ 1 , . . . , σ L )|σ 1 , . . . , σ L ⟩,(1)

Formula formula_1: M σ1=1 • • • M σ L =1 |Ψ(σ 1 , . . . , σ L )| 2 = 1

Formula formula_2: # Measurement Strings # Physical Conditions a) c) b)

Formula formula_3: {(σ b , c b )|σ b ∈ E in , c b ∈ C in } Bp b=1 with batch size B p .

Formula formula_4: p(σ 1 , . . . , σ L |c) = L l=1 p(σ l |σ l-1 , . . . , σ 1 , c).

Formula formula_5: L unsup = 1 B p (σ,c)∈Dp -log p(σ 1 , . . . , σ L |c),(3)

Formula formula_6: M σ1=1 • • • M σ L =1 p(σ 1 , . . . , σ L ) = 1 (see Appendix C for proof).

Formula formula_7: D f = {(R j , c j ), p j } N f

Formula formula_8: E out = E rnn + E c + E p transferred .

Formula formula_9: L sup = - 1 B t j∈{1,...,Nt} P u=1 I [p j,u = 1] log f θ (X j , c j ) u ,(4)

Formula formula_10: L sup = Lsup , Lsup = 1 B t j∈{1,...,Nt} P u=1 f θ (X j , c j ) u -p j,u 2 . (5

Formula formula_11: )

Formula formula_12: [L i , ∆ i , Ω i , R 0 /a i ] ⊤

Formula formula_13: H Rydberg = i Ω 2 σ i x - i ∆n i + i<j V 0 |⃗ x i -⃗ x j | n i n j (6)

Formula formula_14: • • M σ k+1 =1 p(σ 1 , . . . , σ k+1 ) = M σ1=1 • • • M σ k+1 =1 |Ψ(σ 1 , . . . , σ k+1 )| 2 = M σ1=1 • • • M σ k+1 =1 k+1 i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 k i=1 |Ψ(σ i |σ i-1 , . . . , σ 1 )| 2 M σ k+1 =1 |Ψ(σ k+1 |σ k , . . . , σ 1 )| 2 = M σ1=1 • • • M σ k =1 |Ψ(σ 1 , . . . , σ k )| 2 = 1(11)
