Experiments
===========

MoltenFlow provides utilities for managing reproducible experiment runs with
automatic directory organization, training curve visualization, and comprehensive
metrics.

Experiment Management
---------------------

The ``setup_run`` function creates a structured experiment directory based on
your configuration file:

.. code-block:: python

   from moltenflow.utils import setup_run, get_stage_dir

   # Set up experiment run
   run_ctx = setup_run("configs/experiments/esol_lipo_pipeline.yaml")

   # Access run information
   print(f"Run directory: {run_ctx.run_dir}")
   print(f"Config hash: {run_ctx.config_hash}")
   print(f"Loaded config: {run_ctx.config}")

   # Create stage-specific directories
   pretrain_dir = get_stage_dir(run_ctx, "pretrain")
   finetune_dir = get_stage_dir(run_ctx, "finetune")

Directory Structure
~~~~~~~~~~~~~~~~~~~

The experiment system creates the following structure:

.. code-block:: text

   experiments/
       {config_hash}/              # Unique ID from config contents
           config.yaml             # Copy of experiment config
           {timestamp}/            # Run-specific directory (YYYYMMDD_HHMMSS)
               run_metadata.json   # Run metadata
               pretrain/           # Stage outputs
                   vae_best.pt
                   training_history.json
                   training_curves.png
               finetune/
                   vae_best.pt
                   training_history.json
               flow/
                   flow_best.pt
                   training_history.json
               generated/
                   uncond_samples.csv
                   cond_samples.csv
               analysis/
                   metrics.json
                   training_curves.png
                   umap_*.png

The config hash ensures that experiments with the same configuration are grouped
together, while timestamps ensure unique runs.

ESOL/Lipophilicity Pipeline
---------------------------

The ESOL/Lipophilicity pipeline runs experiments on real molecular property data.

Running the Pipeline
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Run full pipeline with property surrogate
   uv run python scripts/run_esol_lipo_pipeline.py

   # Run without surrogate (unconditioned generation only)
   uv run python scripts/run_esol_lipo_pipeline.py --no-surrogate

   # Skip pretraining with existing checkpoint
   uv run python scripts/run_esol_lipo_pipeline.py \
       --pretrain-checkpoint outputs/pretrain/vae_best.pt

   # Run specific stages
   uv run python scripts/run_esol_lipo_pipeline.py --stages pretrain,flow,generate

Pipeline Stages
~~~~~~~~~~~~~~~

1. **process_data**: Ensures ESOL/Lipophilicity data exists
2. **pretrain**: VAE reconstruction training
3. **finetune**: VAE + surrogate head training (if surrogate enabled)
4. **flow**: Flow matching model training
5. **generate**: Sample generation (unconditioned; conditioned if surrogate enabled)
6. **evaluate**: Comprehensive metrics and visualizations

Configuration
~~~~~~~~~~~~~

The pipeline configuration (``configs/experiments/esol_lipo_pipeline.yaml``)
supports ablation studies:

.. code-block:: yaml

   # Enable/disable surrogate for ablation
   ablation:
     use_surrogate: true  # false = unconditioned only

   # Property conditioning targets
   guidance:
     gamma: 1000.0
     target: [-3.0, 2.0]  # [ESOL, Lipophilicity]

Training History and Visualization
----------------------------------

Training functions return history objects when requested:

.. code-block:: python

   from moltenflow.training import train_vae
   from moltenflow.utils import plot_training_curves, plot_multi_stage_curves

   # Train with history
   history = train_vae(..., return_history=True)

   # Plot single stage
   plot_training_curves(
       history.to_list(),
       "pretrain_curves.png",
       title="VAE Pretraining"
   )

   # Plot multiple stages
   histories = {
       "pretrain": pretrain_history.to_list(),
       "finetune": finetune_history.to_list(),
       "flow": flow_history.to_list(),
   }
   plot_multi_stage_curves(histories, "all_curves.png")

Generation Metrics
------------------

The evaluation stage computes comprehensive metrics:

Basic Metrics
~~~~~~~~~~~~~

- **Validity**: Fraction of generated SMILES parseable by RDKit
- **Uniqueness**: Fraction of unique valid molecules
- **Novelty**: Fraction not in training set

Distribution Metrics
~~~~~~~~~~~~~~~~~~~~

- **Frechet Distance**: FCD (if available) or fingerprint-based FID
- **Descriptor KL**: KL divergence for molecular descriptors

Conditioned Generation Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For conditioned generation, additional metrics measure target accuracy:

.. code-block:: python

   from moltenflow.eval import compute_conditioned_target_metrics

   metrics = compute_conditioned_target_metrics(
       predicted_properties,  # (N, n_props) from surrogate
       target_values,         # (n_props,)
       tolerances=[0.5, 1.0, 2.0],
   )

   print(f"MAE: {metrics.mean_absolute_error}")
   print(f"Within 1.0: {metrics.within_tolerance[1.0]:.2%}")

Metrics Output
~~~~~~~~~~~~~~

The ``analysis/metrics.json`` file contains:

.. code-block:: json

   {
     "unconditioned": {
       "basic": {"valid_frac": 0.95, "unique_valid_frac": 0.98, ...},
       "distribution": {"frechet_distance": 12.5, ...},
       "scaffold": {"scaffold_diversity": 0.45, ...}
     },
     "conditioned": {...},
     "conditioned_target": {
       "target_values": [-3.0, 2.0],
       "mean_absolute_error": [0.8, 0.6],
       "within_tolerance": {"0.5": 0.25, "1.0": 0.55, "2.0": 0.85}
     }
   }

ZINC250K Multi-Objective Pipeline
---------------------------------

The ZINC250K pipeline supports multi-objective optimization with three
RDKit-computed properties: QED (drug-likeness), SAS (synthetic accessibility),
and pLogP (penalized lipophilicity).

Dataset and Properties
~~~~~~~~~~~~~~~~~~~~~~

The ZINC250K dataset contains approximately 250,000 drug-like molecules.
Properties are computed using RDKit:

- **QED** (Quantitative Estimate of Drug-likeness): 0-1, higher is better
- **SAS** (Synthetic Accessibility Score): 1-10, lower is better
- **pLogP** (Penalized LogP): unbounded, typically higher is better for optimization

Running the Pipeline
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Run full VAE-based pipeline
   uv run python scripts/run_zinc250k_pipeline.py

   # Run with fingerprints (no VAE needed, for testing)
   uv run python scripts/run_zinc250k_pipeline.py --use-fingerprints

   # Run specific stages
   uv run python scripts/run_zinc250k_pipeline.py \
       --stages process_data,identify_pareto,evaluate,plot

Pipeline Stages
~~~~~~~~~~~~~~~

1. **process_data**: Load ZINC250K and compute QED/SAS/pLogP
2. **pretrain**: VAE reconstruction training
3. **finetune**: VAE + surrogate head for 3 properties
4. **flow**: Flow matching model training
5. **identify_pareto**: Find Pareto-optimal molecules in train/test sets
6. **optimize**: Run directional optimization on Pareto candidates
7. **evaluate**: Compute hypervolume metrics and bootstrap CIs
8. **plot**: Generate Pareto scatter plots with optimization arrows

Fingerprint Mode
~~~~~~~~~~~~~~~~

For testing the evaluation pipeline without a trained VAE, use fingerprint mode:

.. code-block:: bash

   uv run python scripts/run_zinc250k_pipeline.py --use-fingerprints

This mode:

- Skips VAE training stages (pretrain, finetune, flow)
- Uses Morgan fingerprints as molecular representations
- Validates Pareto identification, hypervolume computation, and plotting

Directional Optimization
~~~~~~~~~~~~~~~~~~~~~~~~

The pipeline supports directional guidance for maximize/minimize objectives:

.. code-block:: python

   from moltenflow.guidance.objectives import create_optimization_objective

   # Create objective: maximize QED, minimize SAS, maximize pLogP
   loss_fn = create_optimization_objective(
       {"qed": "max", "sas": "min", "plogp": "max"},
       property_order=["qed", "sas", "plogp"],
   )

   # Use with guided sampling
   from moltenflow.inference.sample import sample_guided_smiles

   samples = sample_guided_smiles(
       vae, flow, surrogate, vocab,
       target=torch.zeros(n, 3),  # Target unused for directional
       gamma=500.0,
       n=100,
       loss_fn=loss_fn,
   )

Hypervolume Metrics
~~~~~~~~~~~~~~~~~~~

The evaluation computes hypervolume (HV) for multi-objective assessment:

.. code-block:: python

   from moltenflow.eval.metrics import (
       compute_hypervolume,
       compute_hypervolume_metrics,
       bootstrap_hypervolume_ci,
   )

   # Basic hypervolume
   hv = compute_hypervolume(pareto_points, ref_point, sense=["max", "min", "max"])

   # With bootstrap confidence interval
   ci = bootstrap_hypervolume_ci(
       data, sense, ref_point,
       n_bootstrap=1000,
       confidence=0.95,
   )
   print(f"HV: {ci.mean:.4f} +/- {ci.std:.4f}")
   print(f"95% CI: [{ci.ci_lower:.4f}, {ci.ci_upper:.4f}]")

Oracle Evaluation for Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For accurate evaluation, optimization results are compared against an RDKit oracle
rather than relying solely on surrogate model predictions:

.. code-block:: python

   from moltenflow.eval.oracle_eval import (
       compute_oracle_properties,
       compare_surrogate_to_oracle,
       plot_surrogate_vs_oracle,
   )

   # Compute oracle properties for optimized molecules
   oracle_result = compute_oracle_properties(optimized_smiles, ["qed", "sas", "plogp"])

   # Compare surrogate predictions to oracle values
   comparison = compare_surrogate_to_oracle(
       surrogate_predictions,
       oracle_result.oracle_properties,
       property_names=["qed", "sas", "plogp"],
       valid_mask=oracle_result.valid_mask,
   )

   # Generate comparison plots
   plot_surrogate_vs_oracle(
       surrogate_predictions,
       oracle_result.oracle_properties,
       property_names=["qed", "sas", "plogp"],
       save_path="surrogate_vs_oracle.png",
   )

The oracle evaluation ensures that reported hypervolume improvements are based on
actual property values rather than potentially optimistic surrogate predictions.

Ablation Options
~~~~~~~~~~~~~~~~

The pipeline supports two ablation modes for assessing the contribution of different
components:

**1. No Latent Space Organizing (skip_latent_organizing)**

Skip VAE finetuning and train a surrogate head on frozen pretrained latents:

.. code-block:: yaml

   ablation:
     skip_latent_organizing: true  # Train surrogate on frozen VAE latents

This ablation assesses the value of organizing the latent space around properties
during finetuning. When enabled, the pipeline:

- Skips the VAE finetuning stage
- Trains a standalone surrogate head on frozen pretrained latents
- The surrogate can still provide guidance signals, but the latent space is not
  organized by property values

**2. No Flow Matching (use_flow)**

Use pure gradient ascent instead of flow-guided optimization:

.. code-block:: yaml

   ablation:
     use_flow: false  # Pure gradient ascent without flow velocity

This ablation assesses the value of flow matching for property-guided generation.
When enabled:

- Generation and optimization use pure gradient ascent: ``z_{t+1} = z_t - lr * g``
- The flow model is not required (can be None)
- Unconditioned generation is skipped (requires flow)

You can combine both ablations to assess their independent and joint effects.

Molecular Representation
~~~~~~~~~~~~~~~~~~~~~~~~~

MoltenFlow supports two molecular representations for VAE training:

- **SMILES** (default): Standard SMILES strings
- **SELFIES**: Self-referencing embedded strings (100% valid by construction)

Configure the representation in the data section:

.. code-block:: yaml

   data:
     representation: selfies  # or "smiles" (default)

When using SELFIES:

- The VAE is trained on SELFIES tokens
- Generation produces SELFIES strings internally
- Output is automatically converted back to canonical SMILES
- Oracle evaluation always uses SMILES (RDKit requirement)

SELFIES provide a guarantee of 100% valid molecular strings during generation,
which can be beneficial for:

- Reducing invalid molecule generation
- Ensuring smooth gradients during optimization
- Training on diverse chemical spaces with complex functional groups

The SELFIES implementation uses the ``selfies`` package (already included as a
dependency) and integrates seamlessly with all pipeline stages including
pretraining, finetuning, generation, and optimization.

Configuration
~~~~~~~~~~~~~

See ``configs/experiments/zinc250k_pipeline.yaml`` for full options:

.. code-block:: yaml

   data:
     dataset: zinc250k
     properties: [qed, sas, plogp]
     property_directions: [max, min, max]

   # Surrogate head with bounded outputs
   surrogate:
     hidden_dim: 256
     output_bounds:
       qed: [0.0, 1.0]    # QED bounded 0-1
       sas: [1.0, 10.0]   # SAS bounded 1-10
       plogp: null        # pLogP unbounded

   # Pareto front with K-neighbors expansion
   pareto:
     sense: [max, min, max]
     k_neighbors: 5           # Expand selection with neighbors
     normalize_distance: true

   optimization:
     n_candidates: 100
     gamma: 500.0
     steps: 80

   evaluation:
     hypervolume:
       ref_point_margin: 0.1
     bootstrap:
       n_samples: 1000
       confidence: 0.95

   # UMAP visualization options
   plotting:
     umap_plots:
       pretrain_vs_finetuned: true
       pretrain_vs_finetuned_contours: true
       real_analysis: true
       multitype_overlay: true
       splits_overlay: true

Bounded Property Predictions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The surrogate head supports bounded output transformations using scaled sigmoid:

.. code-block:: python

   from moltenflow.models.surrogate_head import SurrogateHead

   # Create surrogate with bounded outputs
   surrogate = SurrogateHead(
       K=8,
       d_latent=128,
       out_dim=3,
       output_bounds=[
           (0.0, 1.0),   # QED: bounded 0-1
           (1.0, 10.0),  # SAS: bounded 1-10
           None,         # pLogP: unbounded
       ],
   )

This ensures predictions stay within physically meaningful ranges while
maintaining differentiability for gradient-based optimization.

Expanded Pareto Candidate Selection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the true Pareto front contains few candidates, use K-nearest neighbor
expansion to include "near-Pareto" molecules:

.. code-block:: python

   from moltenflow.eval.pareto import get_pareto_neighbors

   pareto_mask, selection_mask, neighbor_mask = get_pareto_neighbors(
       property_values,
       sense=["max", "min", "max"],
       k_neighbors=5,        # Include 5 nearest neighbors per Pareto point
       normalize=True,       # Normalize property space for distance
   )

   # pareto_mask: True Pareto-optimal points
   # selection_mask: Pareto + neighbors (use for optimization)
   # neighbor_mask: Near-Pareto points only

Pareto Visualization
~~~~~~~~~~~~~~~~~~~~

The pipeline generates scatter plots showing:

- All candidate molecules (gray points)
- Near-Pareto molecules (orange squares, if K-neighbors enabled)
- Pareto-optimal molecules (red circles, highlighted)
- Optimization arrows from selected candidates to optimized outputs (blue)

.. code-block:: python

   from moltenflow.utils.plotting import plot_pareto_optimization

   plot_pareto_optimization(
       candidates=property_values,
       optimized=optimized_values,
       pareto_mask=pareto_mask,
       prop_names=["QED", "SAS", "pLogP"],
       save_path="pareto_qed_vs_sas.png",
       prop_indices=(0, 1),  # Plot QED vs SAS
       near_pareto_mask=neighbor_mask,  # Show near-Pareto differently
       selection_mask=selection_mask,   # Draw arrows for all selected
   )

UMAP Visualizations
~~~~~~~~~~~~~~~~~~~

The pipeline generates comprehensive UMAP visualizations for latent space analysis:

.. code-block:: python

   from moltenflow.eval.umap_analysis import create_standard_umap_suite

   figures = create_standard_umap_suite(
       output_dir="analysis/",
       z_train=z_train,
       z_val=z_val,
       z_test=z_test,
       y_real=property_values,
       property_names=["QED", "SAS", "pLogP"],
       z_pretrain=z_pretrain,  # For pretrain vs finetuned comparison
       z_uncond=z_uncond,
       z_cond=z_cond,
       plot_config={
           "pretrain_vs_finetuned": True,
           "pretrain_vs_finetuned_contours": True,  # With property contours
           "real_analysis": True,
           "multitype_overlay": True,
           "splits_overlay": True,
       },
   )

Property Contour Overlays
~~~~~~~~~~~~~~~~~~~~~~~~~

For visualizing how properties are organized in latent space, use contour overlays.
The contour plots use RBF (Radial Basis Function) interpolation for smooth,
interpretable property gradients:

.. code-block:: python

   from moltenflow.eval.umap_analysis import plot_umap_with_contours

   plot_umap_with_contours(
       z_2d,                    # 2D UMAP coordinates
       property_values,         # Property values for coloring
       property_name="QED",
       n_levels=10,             # Number of contour levels
       cmap="viridis",
       save_path="umap_qed_contours.png",
       smoothing=0.0,           # RBF smoothing (0 = exact, higher = smoother)
       kernel="thin_plate_spline",  # RBF kernel type
   )

Available RBF kernels include ``thin_plate_spline`` (default, smooth minimal-energy
surface), ``cubic``, ``gaussian``, and others from ``scipy.interpolate.RBFInterpolator``.

This creates a smooth interpolated surface showing property gradients across
the latent space, making it easier to see how finetuning organizes the space.
