Budgeted Multi-Objective Optimization
=====================================

This module provides a framework for comparing molecular optimization methods
under fixed oracle budgets. It implements:

- **MoltenFlow**: Guided flow optimization from Pareto seeds
- **BO (2-GP)**: Bayesian Optimization with two independent GPs and qEHVI
- **BO (MOGP)**: Bayesian Optimization with multi-output GP and qEHVI

Quick Start
-----------

Run a budgeted optimization experiment:

.. code-block:: bash

    # Run MoltenFlow optimizer
    python scripts/run_budgeted_optimization.py \
        --method moltenflow \
        --budget 100 \
        --seed 42

    # Run BO baseline (requires optional dependencies)
    pip install 'moltenflow[bo]'
    python scripts/run_budgeted_optimization.py \
        --method bo_2gp \
        --budget 100

Generate visualization figures:

.. code-block:: bash

    python scripts/plot_optimization_results.py \
        --log-dir outputs/budgeted_optimization/ \
        --figures hv_convergence pareto_comparison \
        --summary

Key Concepts
------------

Oracle Budget
~~~~~~~~~~~~~

All methods operate under a fixed oracle call budget. Each decode-evaluate
cycle consumes one oracle call, regardless of whether the molecule is valid.
Invalid molecules receive penalty values (QED=0, -SA=-10) that are dominated
by any valid molecule.

Initialization
~~~~~~~~~~~~~~

Two initialization regimes are supported:

1. **Random**: Sample N_0 molecules uniformly from the dataset
2. **Near-Pareto**: Compute Pareto front from a larger pool, then select
   seeds from the front and its K nearest neighbors

Hypervolume
~~~~~~~~~~~

We measure optimization progress using hypervolume (HV) and hypervolume
improvement (HVI) over the initial front. The reference point is fixed at
(QED=0, -SA=-10) for consistency across experiments.

Oracle vs Surrogate Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

HVI is computed using **oracle** (RDKit) property values, not surrogate
predictions. This ensures accurate evaluation since surrogate predictions
may overestimate improvements:

.. code-block:: python

   from moltenflow.eval.oracle_eval import compute_oracle_properties

   # After optimization, compute oracle properties
   oracle_result = compute_oracle_properties(optimized_smiles, ["qed", "sas"])

   # Use oracle values for HVI computation
   hvi = compute_hypervolume_improvement(
       baseline_points=initial_pareto,
       optimized_points=oracle_result.oracle_properties,
       sense=["max", "min"],
       ref_point=ref_point,
   )

Pure Gradient Ascent Ablation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For ablation studies, optimization can use pure gradient ascent instead of
flow-guided optimization:

.. code-block:: python

   from moltenflow.inference.optimize_conditioned import optimize_molecules

   # Standard: flow + guidance
   results = optimize_molecules(
       vae, flow, surrogate, vocab, smiles, target,
       gamma=1.0, sigma=0.1, steps=30,
       use_flow=True,  # Default
   )

   # Ablation: pure gradient ascent (no flow velocity)
   results = optimize_molecules(
       vae, flow=None, surrogate=surrogate, vocab=vocab,
       input_smiles=smiles, target=target,
       gamma=1.0, sigma=0.1, steps=30,
       use_flow=False,  # Pure gradient ascent: z = z - lr * g
   )

This ablation assesses the contribution of the flow model's velocity term
to optimization quality.

API Reference
-------------

Oracle
~~~~~~

.. autoclass:: moltenflow.optimization.PropertyOracle
   :members:
   :undoc-members:

.. autoclass:: moltenflow.optimization.OracleResult
   :members:

Initialization
~~~~~~~~~~~~~~

.. autofunction:: moltenflow.optimization.initialize_dataset
.. autofunction:: moltenflow.optimization.initialize_random
.. autofunction:: moltenflow.optimization.initialize_near_pareto

.. autoclass:: moltenflow.optimization.InitializedDataset
   :members:

Proposers
~~~~~~~~~

.. autoclass:: moltenflow.optimization.BaseProposer
   :members:
   :undoc-members:

.. autoclass:: moltenflow.optimization.MoltenFlowProposer
   :members:
   :undoc-members:

Runner
~~~~~~

.. autoclass:: moltenflow.optimization.BudgetedOptimizer
   :members:
   :undoc-members:

.. autoclass:: moltenflow.optimization.OptimizationResult
   :members:

Logging
~~~~~~~

.. autoclass:: moltenflow.optimization.OptimizationLogger
   :members:
   :undoc-members:

.. autofunction:: moltenflow.optimization.load_optimization_log
.. autofunction:: moltenflow.optimization.load_experiment_logs

Summary
~~~~~~~

.. autofunction:: moltenflow.optimization.generate_summary_table
.. autofunction:: moltenflow.optimization.bootstrap_ci
.. autofunction:: moltenflow.optimization.compute_method_summary

Plotting
~~~~~~~~

.. autofunction:: moltenflow.optimization.plot_hv_convergence
.. autofunction:: moltenflow.optimization.plot_pareto_comparison
.. autofunction:: moltenflow.optimization.plot_molecule_gallery
.. autofunction:: moltenflow.optimization.plot_all_figures

Configuration
-------------

Example configuration file (``configs/experiments/budgeted_optimization.yaml``):

.. code-block:: yaml

    optimization:
      budget: 100
      n_init: 20
      batch_size: 1

    init:
      method: "random"

    hypervolume:
      ref_point: [0.0, -10.0]

    moltenflow:
      gamma: 1.0
      sigma: 0.1
      steps: 30

    bo:
      num_restarts: 10
      raw_samples: 512

Output Format
-------------

The optimization logger produces structured output:

``optimization_log.csv``
    Per-step log with columns: step, method, seed, init, smiles, qed, neg_sa,
    valid, hv, hvi, cumulative_validity

``run_metadata.json``
    Run configuration including reference point, budget, and timestamps

``pareto_t{step}.csv``
    Periodic Pareto front snapshots

``summary.json``
    Final summary statistics
