# AlignKGC #

This code repository accompanies the manuscript titled "Multilingual Knowledge Graph Completion With Joint Relation and Entity Alignment".
The code requires a GPU with 12GB+ memory.  BERT and each AlignKGC variant takes 20+ hrs of training.  All models have over 11 million parameters to train.


## Set up the environment ##

```
conda create --name alignkgc-conda python=3.7
conda activate alignkgc-conda
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch 
conda install scikit-learn tqdm requests filelock packaging
conda install -c conda-forge tensorboard
pip install transformers==3.3.1
pip install matplotlib
```

To use an RTX A6000 GPU, install the package `nvhpc-21-1-cuda-multi` [from nVidia](https://developer.nvidia.com/nvidia-hpc-sdk-211-downloads), then 

```
$ export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/cuda/11.0/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/math_libs/11.0/targets/x86_64-linux/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/REDIST/math_libs/11.0/targets/x86_64-linux/lib:
```

and substitute some of the above packages with
```
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
```
or whatever is needed for your platform.


## Data import and preparation ##

DBP5L can be downloaded from [here](https://drive.google.com/file/d/1iP_TtB6jAcVnLQJDGawKL4kzk8QyOUaU/view?usp=sharing) (anonymous).

<!---  OpenEA can be downloaded from [here](https://github.com/nju-websoft/OpenEA).  DBP15K can be downloaded from [here](http://ws.nju.edu.cn/jape/). -->

Download `bert-cased.zip` from [zenodo](https://zenodo.org/record/5501704/) (anonymous).

Create a `/path/to/data` folder outside the git working code directory base.
Unpack `DBP-5L` and `bert-cased` into `/path/to/data/` to give two sibling directories `/path/to/data/DBP-5L` and `/path/to/data/bert-cased`.

For sampling DBP5L to reveal a fraction of EA and a fraction of RA,

```
python -m importers.dbp5l_sampler --inpath /path/to/data/DBP-5L/
```

Next, to produce inputs for KGCmono, KGCunion and AlignKGC variants, run
```
python -m importers.dbp5l_combiner [--ea_percent 20] [--ra_percent 20] --dbp5l /path/to/data/DBP-5L/
```
You can specify EA and RA sample percents, otherwise standard ranges will be swept.  Subdirectories will be created under `/path/to/data/DBP-5L/combined/` for each percent pair.


## Training and testing ##

### Fine-tune the combined BERT ###

```
$ python -m mBERT.combined_mBERT --bert_pretrained_path /path/to/data/bert-cased/ --dbp5l_path /path/to/data/DBP-5L/ --combined_base /path/to/data/DBP-5L/combined/
```

This will save fine-tuned mBERT models for a sweep of EA percent.


### Train Jaccard ###

```
./scripts/Jaccard.sh /path/to/data/DBP-5L/ EA RA
```

Here EA is an entity alignment percent (1 or 2 digits) and RA is a relation alignment percent.  Use the percents available from the sampling step above.The model will be written inside a randomly named subdirectory of `/path/to/data/DBP-5L/combined/Combined_EA_RA/model_Jaccard/`.   Each run will create a new randomly named subdirectory.  All parameters will be saved to a file inside this subdirectory.


### Train Asymmetric (Hard-Asymmetric) ###

```
./scripts/Asymmetric.sh /path/to/data/DBP-5L EA RA
```


### Train AlignKGC (Soft-Asymmetric) ###

```
./scripts/AlignKGC.sh /path/to/data/DBP-5L EA RA
```


### Train AlignKGCmBERT (Soft-Asymmetric+mBERT) ###

```
./scripts/AlignKGCmBERT.sh /path/to/data/DBP-5L/ EA RA /path/to/data/bert_cased/
```


### Evaluate entity and relation alignments ###

```
python -m AlignKGC.eval_align --dbp5l /path/to/DBP-5L/ --ea_percent 50 --ra_percent 0 --model_path /path/to/best_valid_model.pt
```

### Evaluate KGC ###

The python entry point in each shell script above has a flag `--eval_only` that prints KGC performance for each language.


