# README

This project is part of the Bachelor Thesis __From Occlusion to Transparency__. It
sets up prodigy using Docker and is an adaption of the `judgment_outcome` task. The prodigy setup contains
the [facts_annotation recipe](recipes/facts_annotation.py) and
[inspect_facts_annotation recipe](recipes/inspect_facts_annotation.py). The datasets for the annotation (de, fr, it)
where created with the [prodigy_dataset_creation](scrc/annotation/prodigy_dataset_creation/prodigy_dataset_creation.py).
The test set for the occlusion experiment can be created using
the [experiment_creato](scrc/annotation/judgment_explainability/occlusion/experiment_creator.py)
script. 

This project also contains tools to analyse the annotation and the results from the occlusion experiments (``.\analysis``).

```
project structure
...
├── scrc
│   └─── annotation
│       ├── judgment_explainability
│       │   ├── analysis (Contains scripts for analysis)
│       │   ├── guidelines (Contains pdfs and .tex of guidelines)
│       │   ├── occlusion (Contains experiment_creator and experiment_creator script)
│       │   └── recipes (Contains prodigy recipes)
│       └── prodigy_dataset_creation (Contains the script to create the annotation datasets)
│
...

Note some directories must be created in particular judgment_explainability/legal_expert_annotations and the different data
directory referenced in gitignore.
```

## Facts annotation for Legal Judgment Prediction in Switzerland

For this project, the prodigy tool is used to gather annotations and subsequently explanations from legal experts to
create ground truths. These ground truths could give insight into the inner workings of the model used for the
prediction in the Swiss Court Ruling Corpus presented by . The facts are
provided to prodigy by a JSONL file which can be created by running
the [prodigy_dataset_creation](scrc/annotation/prodigy_dataset_creation/prodigy_dataset_creator.py). Note that the input
files produced by the ``prodigy_dataset_creator`` can change when running the script again (when the database changes).
It is therefore advised to safe a copy of your files after you started the annotation, so that your input dataset
remains the same. The annotated data is saved in the annotations.db. The `facts_annotation` recipe allows the annotation
of the fact section of a court ruling. It uses prodigy's block interface to display the span_manual task as well as the
link and title of the ruling and a free text comment section. The `inspect_facts_annotation` allows the annotator to go
back and revise the annotations according to the newest guidelines. The guidelines for this annotation task can be found
in the `guidelines` directory. To create gold standard annotations' prodigy's
inbuilt [review recipe](https://prodi.gy/docs/recipes#review) is used. To run a recipe please follow the directions
given in `setup.sh` and `run.sh`. The different tasks use different ports which are specified in the different recipes.
Note that because we use a built-in recipe for the gold standard annotations only one gold standard task can be run at
the time. The other recipes can be run in parallel to each other.

## Occlusion for Legal Judgment Prediction in Switzerland

Occlusion is a method used to analyze and explain the predictions made by a neural network by examining the effect of 
obscuring parts of the input on the model's decision. In this implementation of occlusion, explanations are produced by 
removing elements from the input of the SJP task and analyzing the prediction confidence in comparison to a
non-occluded baseline. The model's prediction conditions are not changed, and the same training and validation set 
as XYZ is used, with only the test set being modified. 

### Occlusion Test Set Creation

o create the test set for the occlusion analysis, a subset of the original SJP test set is selected and sentence permutation
is applied to obscure each sentence of the fact section once. Additionally, larger groups of up to 4 sentences are also 
tested. The sentence splitting is determined using legal expert annotations. The occlusion test sets can be generated by 
running the `experiment_creator.py` script, which produces 12 test sets (4 for each language).

### Lower Court Insertion Test Set Creation

As an addition to the “normal" occlusion, we also experiment with a setup called Lower Court Insertion (LCI) where we extract 
the lower court instances and insert each lower court in each case. This task functions as a study on the bias from one 
lower court to another. These experiments keep the same setup described above, again only adding a new test set for the model’s prediction. 
The LCI test set are produced when running ``experiment_creator.py``.

### Running on XYZ

Some part of the implementation can take very long to run. If you have access to a high-performance computing cluster we
recommend these parts on there. These instructions refer to XYZ the cluster of XYZ. It is
recommended to first read the [documentation](https://xyz.org)
to get familiar with the infrastructure.

If you are already familiar with XYZ please follow the following steps:

1. Open the .bashrc file in your $HOME Folder and enter module load CUDA
2. Enter `module load Anaconda3` in the terminal
3. Enter the conda environment using `eval "$(conda shell.bash hook)"`

#### Occlusion

This implementation of the occlusion method has the aim to produce new prediction with the sjp model using 
the occlusion test sets created  with ``experiment_creator.py``. To run the occlusion experiments on xyz. Clone 
the  SwissJudgementPrediction repository into your home directory using ``git clone https://github.com/xyz/SwissJudgementPrediction.git``. 
1. If not already done create a directory called data with a subdirectory for each language you want to test. Place `train.csv`,
`val.csv` and your desired occlusion test set under each language directory
2. Change the file name in `SwissJudgementPrediction/run_tc.py` from `test.csv` to the name of your occlusion test set.
3. Create a new environment called "sjp" and install packages from the env.yml file using `conda env create -f env.yml`
4. Activate the sjp environment using `conda activate sjp`.
5. Use the following command to install the right version of PyTorch: `pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113`
6. Create a Weights & Biases account, get your API token, and enter wandb login inside your conda environment. 
After you entered the token, it will be saved in the .netrc file in you $HOME folder
7. Run occlusion using ``sbatch run_xyz_job.sh adapters test xlm-roberta-base hierarchical lang lang switzerland no_augmentation civil_law False``

The prediction results will be placed under ``/SwissJudgementPrediction/sjp/adapters/xlm-roberta-base-hierarchical/lang/2``.
Note that you might need to adapt some line in ``run_xyz_job.sh`` to fit your setup.


#### Analysis

The annotation_analysis uses BERT (for the bert_score). Please follow the general steps above before running the
analysis. Then clone this repository into your home directory
using ``git clone https://github.com/ninabaumgartner/SwissCourtRulingCorpus.git``. Change to the analysis directory
with ``cd SwissCourtRulingCorpus/scrc/annotation/judgment_explainability/analysis``.

1. For each py file in the analysis directory change the import path
   from `scrc.annotation.judgment_explainability.analysis.utils.module as module`
   To `utils.module as module`.
3. Create a new environment called "judgment-explainability" with ``conda env create -f env.yml``
   3.Activate the "judgment-explainability" environment using ``conda activate judgment_explainability``
4. Place your annotation under directory ``legal_expert_annotation/language/`` using the ``scp`` command.
5. Run the analysis using ``sbatch run_analysis_xyz.sh``
   Note that if you want to use another file structure or naming convention the paths in the scripts have to adapted
   accordingly.

Note that the other components of the analysis (except for bert_score) can be run without a high-performance computing cluster.

## Thesis
This work was part of the bachelor thesis "From Occlusion to Transparency". You can cite it as follows:
```
```
## Refrences

