# Implementation Details

## Experimental Details
We estimate gradients using a one-shot estimator that captures contributions from both the noise and its inverse in each shot. Since every shot may correspond to a different circuit due to randomized sampling of noise operations, all gradients are computed using circuit-level sampling. 

The initial learning rate ($\eta^{(1)}$) is selected using Grid Search across a wide range and then refined with Binary Search for better precision. Additionally, we opt for the Adam optimizer, which adapts the learning rate ($\eta^{(t)}$) over time through moment decay, ensuring effective adjustments and stable convergence throughout the training process. We keep the initial ($\eta^{(1)}$) same for all the configurations used in the experiments, as stated in the paper.

For comparison, we implemented the method of Van Den Berg et al.([1]) using [2]. This implementation generates noise parameters of the quantum system, which are then utilized during the training process to obtain mitigated outputs. The *ProcedureOverview.ipynb* file of the repository describes in detail how to generate the noise parameter/coefficients.

## Experimental Reproducibility

We tried to use seeds whereever applicable to get somewhat consistent outcome, But due to inherit random of circuit execution it may not be exactly same at each execution.


## Experiments compute resources

As stated earlier, every shot corresponds to a different circuit due to randomized sampling of noise and inverse noise operations, we have to execute a lot of circuits through one shot estimation process for each of the parameter (or a select number of parameters)  to get a reliable gradient estimate. Aside, we are using CPU-based computation on the server side to calculate the gradient and update the parameter in each iteration. So, as a whole, the computation takes quite a long time. For example, a 6-qubit system with around 354 parameters and 2048 shots, if we opt for full gradient computation for 2500 iterations, takes around a day and a half or two. 

We will provide a toy example (with a smaller qubit count and number of parameters) along with the implementation to have a quick understanding of the implementation.

### Remarks

Implementation-wise: During training, we are primarily concerned with the direction of the gradient rather than its exact magnitude. For sufficiently large $\widetilde{N}_1$, Hoeffding's lemma implies that the estimator $\left(y_t - \frac{1}{\widetilde{N}_1} \mathcal{O}_t^{(1)}\right)$ concentrates around its expected value with high probability. Consequently, when the current parameters $(\overrightarrow{\sigma}, \overrightarrow{\theta)}$ are not near a stationary point, the sign of this estimator will - with high probability - match the true sign. Although the theoretical framework requires computing $\left(y_t - \frac{1}{\widetilde{N}_1} \mathcal{O}_t^{(1)}\right)$ independently for each parameter using $\widetilde{N}_1$ shots, we instead compute it once per update round and reuse it across all selected parameters. This practical modification significantly reduces the computational overhead associated with quantum system simulation. 


## Implementation of Van Den Berg method

Here’s an overview of the implementation procedures we followed for the Van Den Berg method:
1. Noise Characterization via Randomized Benchmarking -- We used randomized benchmarking to determine the structure of the noise channel. To ensure a fair comparison with our nmPQC approach, we opted for the same target accuracy level ε, which in turn dictated a similar sample size (2048).
2. Noise Model Fitting -- The output of the benchmarking phase was used to fit a non-negative least squares (NNLS) model to approximate the effective noise channel.

As stated before this two steps are achieved by using [2]. The *ProcedureOverview.ipynb* file of the repository describes in detail how to generate the noise parameter/coefficients. Only difference is we need to create the backend using *get_noise_induced_backend* method from *backendPreparation.py*. Also make it sure we are using shots and sample being 2048 whenever necessary. Finally store the preestimated noise in a pickle file. 

3. Learning with Noise Mitigation -- The final step involved using the learned noise model to invert the noise effects during the actual learning phase (i.e., model parameter optimization). Here we get the pre estimated noise from the pickle file and apply it to invert the noise during PQC optimization process.

## Instruction of using nmPQC to generate mitigated, noisy and noiseless outcome

The repository contains a generic training procedure. To implement the noiseless scenario, make it sure that no noise or inverse noise is applied and only PQC params are optimized. Same goes for noisy case, but here the only difference is along with the previous case, we are allowing only noise to be present in the system.

## Experiment 1 - static noise assumption
To ensure a fair comparison:
1. We used identical noise profiles across similar gates (i.e. CNOT) when applying all method, as Van Den Berg method assumes that.
2. We then trained models using both approaches and tracked mean squared error (MSE) per epoch.
3. Although our method requires more samples per iteration (due to the larger number of trainable parameters), it does not require a separate benchmarking phase.

## Experiment 2 - adaptability under dynamic noise

Here, We change the error rates of three CNOT gates purposedly just after optimaztion begin. The Van Den Berg used the previously computed noise model in the optimization procedure, where as ours method adapt to the new changes by dynamically updating the inverse noise parameters. 

## Experiment 3 - convergence behavior based on subsampling

Here we assume the same characteristics of the noise model as Experiment 1. See the *experiment.py* for the details. 

## Note

We have two files named 'ExperimentsHEA.py' and 'Experiments.py', Where the first one deals with the HEA structure and the second one can be used for generic $exp(i \theta P)$ structure of gates. For the experiments we opt for the 'ExperimentsHEA.py'. For the toy problem we provide a version that uses 'Experiments.py' with a simpler data to get accquinted with the method.


## References

[1]. Van Den Berg, E., Minev, Z. K., Kandala, A., & Temme, K. (2023). Probabilistic error cancellation with sparse Pauli–Lindblad models on noisy quantum processors. Nature Physics, 19(8), 1116-1121.

[2] McDonough, B. (2022, October). benmcdonough20/autonomouspertools: v0.2.0-alpha [Software]. Zenodo. https://doi.org/10.5281/zenodo.7197234
