=======================
SSHG
=======================
Some part of the code is adapted from https://github.com/divelab/DIG


Installation
============

#. Install ``uv``:

   .. code:: bash

      curl -LsSf https://astral.sh/uv/install.sh | sh

#. **Suggested:** Set the package cache directory of ``uv`` to a directory in a mounted drive.
   For example,

   .. code:: bash

      echo "export UV_CACHE_DIR=/root/workspace/out/uv-cache" >> ~/.bashrc
      source ~/.bashrc

#. Install Python dependencies using ``uv``:

   .. code:: bash

      uv sync

#. Install additional dependencies not tracked by ``uv``:

   .. code:: bash

      uv pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html

Dataset Processing
==================

EC Dataset
----------

#. Install `GNU Parallel <https://www.gnu.org/software/parallel/parallel_tutorial.html>`_.
#. Edit the ``OUT_DIR`` variable ``src/protein_fragments/constants.py`` to point to where the processd data should be placed.
#. Download all the protein data by running

   .. code:: bash

      parallel --eta --colsep , --header : python src/protein_fragments/download_proteins.py {pdb_id} :::: data/ECDataset/<split>_with_chain_functions.csv

   where ``<split>`` is either ``training``, ``validation``, or ``testing``.

#. Process each dataset split by running

   .. code:: bash

      parallel --eta --colsep , --header : python src/protein_fragments/process_proteins.py {pdb_id} {case_id} {chain_function} :::: data/ECDataset/<split>_with_chain_functions.csv

   where ``<split>`` is either ``training``, ``validation``, or ``testing``.
   Note that about 500 of the proteins will fail to process.
   The failing proteins dataset split and ``pdb_id_and_case_id`` are listed in ``data/ECDataset/missing_proteins.csv``.

#. Running ``src/protein_fragments/ECDataset.py`` should print out the following:

   .. code:: python

      {'training': ECDataset(29215),
       'validation': ECDataset(2562),
       'testing': ECDataset(5651)}


Run Model
==================
    .. code:: bash

       python src/fragment/main_ss_react --model <ProNet/GVPNet> --SS True --geo True --dataset_path <PATH2DATA>
