Introduction
=======================

ChemicalX is a deep learning library for drug-drug interaction, polypharmacy
side effect, and synergy prediction. The library consists of data loaders
and integrated benchmark datasets. It also includes state-of-the-art deep
neural network architectures that solve the drug pair scoring task.
Implemented methods cover traditional SMILES string-based techniques
and neural message-passing based models.

.. code-block:: latex

     >@article{chemicalx,
               arxivId = {2202.05240},
               author = {Rozemberczki, Benedek and Hoyt, Charles Tapley and Gogleva, Anna and Grabowski, Piotr and Karis, Klas and Lamov, Andrej and Nikolov, Andriy and Nilsson, Sebastian and Ughetto, Michael and Wang, Yu and Derr, Tyler and Gyori, Benjamin M},
               month = {feb},
               title = {{ChemicalX: A Deep Learning Library for Drug Pair Scoring}},
               url = {http://arxiv.org/abs/2202.05240},
               year = {2022}
     }


Overview
========
We shortly overview the fundamental concepts and features of **ChemicalX**
through simple examples. These are the following:

.. contents::
    :local:

Design Philosophy
-----------------

When ``ChemicalX`` was created we wanted to reuse the high-level
architectural elements of ``torch`` and ``torchdrug``. We also wanted to
conceptualize the ideas outlined in `A Unified View of Relational Deep
Learning for Drug Pair Scoring`.

Drug Feature Set
^^^^^^^^^^^^^^^^

Drug feature sets are custom ``UserDict`` objects that allow the fast
retrieval of the molecular graph and the drug level features such as
the Morgan fingerprint of the drug. The ``get_feature_matrix`` and
``get_molecules`` class methods allow the batching of drugs and
molecular graphs using the drug identifiers. Molecule level features
are returned as a ``torch.FloatTensor`` matrix while the molecular graphs
are ``PackedGraph`` objects generated by ``torchdrug``.

Context Feature Set
^^^^^^^^^^^^^^^^^^^
Similarly to the ``DrugFeatureSet`` the ``ContextFeatureSet`` are custom
``UserDict`` objects that allow the storage of biological or chemical
context-specific feature vectors. These features are stored as
``torch.FloatTensor`` instances for each context identifier key.

Labeled Triples
^^^^^^^^^^^^^^^

Labeled triples contain labeled drug pairs where the label is
specific to a context. The ``LabeledTriples`` class is a wrapper around
``pandas`` dataframes that allow shuffling the triples and the generation
of training and test splits by using the ``train_test_split`` class method.
This class also provides basic descriptive statistics about the number of
negatively labeled instances and the number of labeled triples.

Dataset Loaders
^^^^^^^^^^^^^^^

Dataset loaders allow the prompt retrieval of integrated datasets. After
a loader is initialized the class methods allow getting the respective
``DrugFeatureSet``, ``ContextFeatureSet`` and ``LabeledTriples``.

.. code-block:: python

    from chemicalx.data import DrugCombDB

    loader = DrugCombDB()

    context_set = loader.get_context_features()
    drug_set = loader.get_drug_features()
    triples = loader.get_labeled_triples()

Batch Generators and Drug Pair Batches
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using instances of the ``DrugFeatureSet``, ``ContextFeatureSet``,
and ``LabeledTriples`` classes one can initialize a ``BatchGenerator`
instance. This class allows the generation of drug ``DrugPairBatch``
instances which contain the drug and context features for the drugs in
the batch. In the training and evaluation of deep drug pair scoring models
the ``DrugPairBatch`` acts as a custom data class.

Models and Pipelines
--------------------

Model Layers
^^^^^^^^^^^^

Drug pair scoring models in ``ChemicalX`` inherit from ``torch``
neural network modules. Each of the models provides an ``unpack``
and ``forward`` method; the first helps with unpacking the
drug pair batch while the second makes a forward pass to make
predictions and return propensities for the drug pairs in the
batch. Models have sensible default parameters for the
non-dataset-dependent hyperparameters.

Pipelines
^^^^^^^^^

Pipelines provide high-level abstractions for the end-to-end
training and evaluation of ChemicalX models. Given a dataset
and model a pipeline can easily train the model on
the dataset, generate scores and evaluation metrics.

.. code-block:: python

    from chemicalx import pipeline
    from chemicalx.models import DeepSynergy
    from chemicalx.data import DrugCombDB

    model = DeepSynergy(context_channels=112,
                        drug_channels=256)

    dataset = DrugCombDB()

    results = pipeline(dataset=dataset,
                       model=model,
                       batch_size=1024,
                       context_features=True,
                       drug_features=True,
                       drug_molecules=False,
                       labels=True,
                       epochs=100)

    results.summarize()

    results.save("~/test_results/")
