### Code Appendix of Correcting the Sub-optimal Bit Allocation
* This repo contains the core code of the corrected bit allocation described in Sec. 4.4 and Alg. 4, which is our core algorithm and claim. It can be used to reimplement the results of Sec 6.2 and Sec 6.3, which is our core results.
* All the links in this readme are 3rd party links, we ensure that they reveal no information about the authors identity.

### Requirements
* See environment.yml. If you are using conda, you can automatically install everything by:
```bash
conda env create -f environment.yml
```

### Dataset Preparation
* Download [HEVC CTC](https://hevc.hhi.fraunhofer.de/) and [UVG dataset](http://ultravideo.fi/#testsequences) (1080p/8bit/YUV/RAW), slice them into multipler of 64 and convert them to PNG image sequence by ffmpeg.
```bash
ffmpeg -pix_fmt yuv420p -s $ORIGINAL_Hx$ORIGINAL_W -i $ORIGINAL_SEQUENCE_NAME$ -vf crop=$NEW_H:$NEW_W:0:0 $ORIGINAL_SEQUENCE_NAME/im%03d.png
```
* For HEVC CTC Class B and UVG dataset, original size is 1920x1080, and new size is 1920x1024
* For HEVC CTC Class C, original size is 832x480, and new size is 832x448
* For HEVC CTC Class D, original size is 416x240, and new size is 384x192
* For HEVC CTC Class E, original size is 1280x720, and new size is 1280x704

### Pre-trained Model Preparation
#### DVC based
* Download following pre-trained models and put them into ./DVC_based/Checkpoints and ./DCVC_based/Checkpoints folder:
* [DVC](https://drive.google.com/drive/folders/1M54MPrAzaA0QVySnzUu9HZWx1bfIrTZ6), then rename them to ```DVC_{\lambda}.pth```;
* [DCVC](https://onedrive.live.com/redir?resid=2866592D5C55DF8C!1198&authkey=!AGZwZffbRsVcjSQ&e=iMeykH), then rename them to ```DCVC_{\lambda}.pth```;
* As for intra coding, download pre-trained models of Cheng et al. 2020 from CompressaAI via ```python -u ./DCVC_based/Checkpoints/download_compressai_models.py```.

### Notes for the Core Code
* The core implementation of our bit allocation lies in line 165-510 in ./DVC_based/main.py, which correspond to the Alg. 4.
* To understand our bit llocation method, we provide an annotated version in this readme:
    ```python
    # the start point
    I_y_stack, I_z_stack, delta_I_stack = [], [], []
    mv_feature_stack, delta_mv_stack = [], []
    feature_stack, z_stack, delta_res_stack = [], [], []
    # sub_iter_i: gradient ascent step K for I frane
    # sub_iter_w, sub_iter_y: gradient ascent step K for P frame
    sub_iter_I, sub_iter_w, sub_iter_y, sub_lr = 2000, 400, 400, 1e-3
    for i in range(seqlen):
        # this loop iterate over frame in temporal order, the inner part 
        # iterate over latent in motion - residual order. And combined 
        # together, the latent is optimized in topological order
        # correspond to line 3 of Alg. 4
        cur_frame = Var(cropped_blocks[i][n].cuda()) 
        # get current gt frame \bm{x}_i
        b, h, w = cur_frame.shape[0], cur_frame.shape[2], cur_frame.shape[3]
        num_pixels = b * h * w
        if i == 0:
            #################### I frame initialization ###################
            # FAVI to initialize latent
            # \bm{y}_0 \leftarrow f(\bm{x}_0)
            # correspond to line 4 of Alg. 4
            with torch.no_grad():
                # use deterministic rounding here
                arr = I_codec([cur_frame, "test_for_first", "testing"])
            # \bm{y}_0 is actually composed of three latents: y_0, z_0, and \Delta_0
            I_y_stack.append(arr['y'].detach().clone().requires_grad_(True))
            I_z_stack.append(arr['z'].detach().clone().requires_grad_(True))
            delta_I_stack.append(torch.tensor(arr["delta"]).clone().detach().requires_grad_(True))
            ################# I frame gradient ascent #####################
            cur_params = I_y_stack + I_z_stack + delta_I_stack
            optimizer_I = Adam(params=cur_params, lr=sub_lr)
            # add \bm{y}_0 to optimizer
            for sub_it in range(sub_iter_I):
                # for 0,...,K-1 in K: 
                # correspond to line 5 of Alg. 4
                optimizer_I.zero_grad()
                for sub_i in range(seqlen):
                    sub_cur_frame = Var(cropped_blocks[sub_i][n].cuda())
                    if sub_i == 0:
                        result = I_codec(
                            [sub_cur_frame, "finetune", "training", sub_it, sub_iter_I, I_y_stack[0], I_z_stack[0],
                                delta_I_stack[0]])
                        recon_image = result['x_hat']
                        I_likelihood_y, I_likelihood_z = result["likelihoods"]['y'], result["likelihoods"]['z']
                        y_bpp, z_bpp = cal_bpp(likelihood=I_likelihood_y, num_pixels=num_pixels), cal_bpp(
                            likelihood=I_likelihood_z, num_pixels=num_pixels)
                        bpp = y_bpp + z_bpp
                    else:
                        clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _, _, _, _, _, _ = \
                            net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                stage="test_for_first", mode="training")
                        recon_image = clipped_recon_image
                        bpp = bpp
                    ref_image = recon_image
                    distortion = cal_distoration(sub_cur_frame, recon_image)
                    rd_cost = cal_rd_cost(distortion, bpp, lambda_for_test) / seqlen
                    rd_cost.backward(retain_graph=True)
                optimizer_I.step()
                # update \bm{y}_0
                # correspond to line 7 of Alg. 4
                optimizer_I.zero_grad()
            for param in cur_params:
                param.requires_grad = False
            ###################### testing & logging omitted ##############
        else:
            ######### initialize \bm{w}_i motion vectors of frame i #######
            # FAVI to initialize latent
            # \bm{w}_i \leftarrow f(\bm{x}_i,\bm{w}_{<i},\bm{y}_{<i})
            # correspond to line 4 of Alg. 4
            for sub_i in range(i + 1):
                sub_cur_frame = Var(cropped_blocks[sub_i][n].cuda())
                if sub_i == 0:
                    with torch.no_grad():
                        result = I_codec(
                            [sub_cur_frame, "finetune", "test", 0, 0, I_y_stack[0], I_z_stack[0], delta_I_stack[0]])
                    ref_image = result['x_hat'].detach().clone()
                else:
                    with torch.no_grad():
                        if sub_i < i:
                            clipped_recon_image, _, _, _, _, _, _, _, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="finetune", mode="test", \
                                    feature=feature_stack[sub_i - 1][0], z=z_stack[sub_i - 1][0],
                                    delta=delta_res_stack[sub_i - 1][0], \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0],
                                    calrealbits=calrealbits)
                        elif sub_i == i:
                            clipped_recon_image, _, _, _, _, _, _, _, mvfeature, _, _, _, delta_mv, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="test_for_first", mode="test")
                            mv_feature_stack.append([mvfeature.detach().clone().requires_grad_(True)])
                            delta_mv_stack.append([delta_mv.detach().clone().requires_grad_(False)])
                        else:
                            assert (0)
                    ref_image = clipped_recon_image
            ############# actual tuning \bm{w}: mv of frame i #############
            cur_params = mv_feature_stack[i - 1]
            optimizer_mv = Adam(params=cur_params, lr=sub_lr)
            # add \bm{w}_i to optimizer
            for sub_it in range(sub_iter_w):
                # for 0,...,K-1 in K: 
                # correspond to line 5 of Alg. 4
                optimizer_mv.zero_grad()
                for sub_i in range(seqlen):
                    sub_cur_frame = Var(cropped_blocks[sub_i][n].cuda())
                    if sub_i == 0:
                        I_y_for_optim, I_z_for_optim, delta_I_for_optim = I_y_stack[0], I_z_stack[0], delta_I_stack[
                            0]
                        result = I_codec([sub_cur_frame, "finetune", "test", 0, 0, I_y_for_optim, I_z_for_optim,
                                            delta_I_for_optim])
                        recon_image = result['x_hat']
                        I_likelihood_y, I_likelihood_z = result["likelihoods"]['y'], result["likelihoods"]['z']
                        y_bpp, z_bpp = cal_bpp(likelihood=I_likelihood_y, num_pixels=num_pixels), cal_bpp(
                            likelihood=I_likelihood_z, num_pixels=num_pixels)
                        bpp = y_bpp + z_bpp
                    else:
                        if sub_i < i:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="finetune", mode="test", \
                                    feature=feature_stack[sub_i - 1][0], z=z_stack[sub_i - 1][0],
                                    delta=delta_res_stack[sub_i - 1][0], \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0],
                                    calrealbits=calrealbits)
                        elif sub_i == i:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=sub_it, total_iter=sub_iter_w,
                                    stage="finetune_flow", mode="training", \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0])
                        else:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _, _, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="test_for_first", mode="training")
                        recon_image = clipped_recon_image
                        bpp = bpp
                    ref_image = recon_image
                    if sub_i >= i:
                        distortion = cal_distoration(sub_cur_frame, recon_image)
                        rd_cost = cal_rd_cost(distortion, bpp, lambda_for_test) / seqlen
                        rd_cost.backward(retain_graph=True)
                optimizer_mv.step()
                # update \bm{y}_0
                # correspond to line 7 of Alg. 4
                optimizer_mv.zero_grad()
            for param in cur_params:
                param.requires_grad = False
            ###################### testing & logging omitted ##############
            ########### initialize \bm{y}_i residule of frame i ###########
            # FAVI to initialize latent
            # \bm{y}_i \leftarrow f(\bm{x}_i,\bm{w}_{\le i},\bm{y}_{<i})
            # correspond to line 4 of Alg. 4
            for sub_i in range(i + 1):
                sub_cur_frame = Var(cropped_blocks[sub_i][n].cuda())
                if sub_i == 0:
                    with torch.no_grad():
                        result = I_codec(
                            [sub_cur_frame, "finetune", "test", 0, 0, I_y_stack[0], I_z_stack[0], delta_I_stack[0]])
                    ref_image = result['x_hat'].detach().clone()
                else:
                    with torch.no_grad():
                        if sub_i < i:
                            clipped_recon_image, _, _, _, _, _, _, _, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="finetune", mode="test", \
                                    feature=feature_stack[sub_i - 1][0], z=z_stack[sub_i - 1][0],
                                    delta=delta_res_stack[sub_i - 1][0], \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0],
                                    calrealbits=calrealbits)
                        elif sub_i == i:
                            clipped_recon_image, _, _, _, _, _, _, _, _, feature, z, delta, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="test_for_stage1", mode="test", \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0])
                            feature_stack.append([feature.detach().clone().requires_grad_(True)])
                            z_stack.append([z.detach().clone().requires_grad_(True)])
                            delta_res_stack.append([delta.detach().clone().requires_grad_(True)])
                        else:
                            assert (0)
                    ref_image = clipped_recon_image
            ############## actual tuning \bm{y}_i: res of frame i #########
            cur_params = feature_stack[i - 1] + z_stack[i - 1] + delta_res_stack[i - 1]
            optimizer_res = Adam(params=cur_params, lr=sub_lr)
            # add \bm{y}_i to optimizer
            for sub_it in range(sub_iter_y):
                # for 0,...,K-1 in K: 
                # correspond to line 5 of Alg. 4
                optimizer_res.zero_grad()
                for sub_i in range(seqlen):
                    sub_cur_frame = Var(cropped_blocks[sub_i][n].cuda())
                    if sub_i == 0:
                        I_y_for_optim, I_z_for_optim, delta_I_for_optim = I_y_stack[0], I_z_stack[0], delta_I_stack[
                            0]
                        result = I_codec([sub_cur_frame, "finetune", "test", 0, 0, I_y_for_optim, I_z_for_optim,
                                            delta_I_for_optim])
                        recon_image = result['x_hat']
                        I_likelihood_y, I_likelihood_z = result["likelihoods"]['y'], result["likelihoods"]['z']
                        y_bpp, z_bpp = cal_bpp(likelihood=I_likelihood_y, num_pixels=num_pixels), cal_bpp(
                            likelihood=I_likelihood_z, num_pixels=num_pixels)
                        bpp = y_bpp + z_bpp
                    else:
                        if sub_i < i:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="finetune", mode="test", \
                                    feature=feature_stack[sub_i - 1][0], z=z_stack[sub_i - 1][0],
                                    delta=delta_res_stack[sub_i - 1][0], \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0],
                                    calrealbits=calrealbits)
                        elif sub_i == i:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=sub_it, total_iter=sub_iter_y,
                                    stage="finetune", mode="training", \
                                    feature=feature_stack[sub_i - 1][0], z=z_stack[sub_i - 1][0],
                                    delta=delta_res_stack[sub_i - 1][0], \
                                    mvfeature=mv_feature_stack[sub_i - 1][0], delta_mv=delta_mv_stack[sub_i - 1][0],
                                    calrealbits=calrealbits)
                        else:
                            clipped_recon_image, _, _, _, _, _, _, bpp, _, _, _, _, _, _, _, _, _ = \
                                net(referframe=ref_image, input_image=sub_cur_frame, iter=0, total_iter=0,
                                    stage="test_for_first", mode="training")
                        recon_image = clipped_recon_image
                        bpp = bpp
                    ref_image = recon_image
                    if sub_i >= i:
                        distortion = cal_distoration(sub_cur_frame, recon_image)
                        rd_cost = cal_rd_cost(distortion, bpp, lambda_for_test) / seqlen
                        rd_cost.backward(retain_graph=True)
                optimizer_res.step()
                # update \bm{y}_i
                # correspond to line 7 of Alg. 4
                optimizer_res.zero_grad()
            for param in cur_params:
                param.requires_grad = False
            ###################### testing & logging omitted ##############
    ```

### Run Bit Allocation

Here we provide testing code for our proposed bit allocation method on both DVC (Lu et al. 2019) and DCVC (Li et al. 2021).

#### Run Bit Allocation on DVC

* Setup your dataset dir in line 19,20,58,59 of ./DVC_based/dataset.py.
* Setup your model dir in line 19,20,58,59 of ./DVC_based/main.py.
* Then run
  ```bash
  cd ./DVC_based/
  python -u main.py --test_class=$DATASET --test_lambdas=$TARGET_LAMBDAS --factor=$BLOCK_SIZE --overlap=0 --gop_size=$GOP_SIZE --test_gop_num=$NUM_OF_GOP
  ```
* e.g. run hevc class D with default lambda (2048, 1024, 512, 256)
  ```bash
  python -u main.py --test_class=HEVC_D --test_lambdas=(2048, 1024, 512, 256) --factor=16 --overlap=0 --gop_size=10 --test_gop_num=1
  ```

#### Run Bit Allocation on DCVC
* Setup your dataset dir in line 19,20,58,59 of ./DCVC_based/dataset.py.
* Setup your model dir in line 646,651 of ./DCVC_based/main.py and line 12 of ./DCVC_based/src/models/video_net.py
* Then run
  ```bash
  cd ./DCVC_based/
  python -u main.py --test_class=$DATASET --test_lambdas=$TARGET_LAMBDAS --factor=$BLOCK_SIZE --overlap=0 --gop_size=$GOP_SIZE --test_gop_num=$NUM_OF_GOP
  ```
* e.g. run hevc class D with default lambda (2048, 1024, 512, 256)
  ```bash
  python -u main.py --test_class=HEVC_D --test_lambdas=(2048, 1024, 512, 256) --factor=16 --overlap=0 --gop_size=10 --test_gop_num=1
  ```

### Reference
* Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Cai, C.; and Gao, Z. 2019. Dvc: An end-to-end deep video compression framework. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, 11006–11015
* Li, J.; Li, B.; and Lu, Y. 2021. Deep contextual video compression. Advances in Neural Information Processing Systems, 34.