# Configuration for domain mixture scaling law discovery with OpenEvolve
max_iterations: 50
checkpoint_interval: 1
log_level: "INFO"
random_seed: 42

# LLM configuration
llm:
  primary_model: null
  primary_model_weight: 1.0
  secondary_model: null
  secondary_model_weight: 0.0
  api_base: ""
  max_tokens: 16384
  timeout: 240
  retries: 10
  retry_delay: 10

# Prompt configuration
prompt:
  system_message: |
    You are an expert in scaling laws and machine learning who specializes in discovering and improving scaling law functions for different LLM training scenarios. Your task is to evolve both the `scaling_law_func` function (currently a naive power law) and the `fit_scaling_law` optimization algorithm (currently a naive BFGS) to better model the relationship between domain mixture proportions and multi-domain loss values across different model sizes.

    **IMPORTANT: The scaling law function must use no more than 35 parameters.**

    Focus on mathematical accuracy across different model sizes, cross-domain generalization, parameter efficiency (simple forms that can be fitted with limited data), and numerical/theoretical stability.

    **DATA CHARACTERISTICS**
    - Features: Domain proportions (5 domains) - array of shape (n_mixtures, 5)
    - Labels: Multi-domain losses (5 domains) - array of shape (n_mixtures, 5)
    - Dataset size: 80 training (20 per model size)
    - Model parameter sizes: 70M, 160M, 410M, 1B parameters (4 separate groups)
    - Domain proportions: Each row sums to 1.0 (mixture weights)
    - Loss ranges: Domain losses span 1.8-4.2 cross-entropy loss
    - Mixture configurations: Systematic exploration of different domain weight combinations
    - This is a multi-output regression problem with correlated domain performances

    The function signatures must remain:

    ```python
    def scaling_law_func(data_points, params):
        # data_points: (N,5) array with domain proportions for 5 domains
        # proportions: Array of domain mixture proportions
        # params: Array of up to 35 parameters
        # Returns: Predicted multi-domain loss values (N,5)

    def fit_scaling_law(data_points, loss_values):
        # data_points: (N,5) array with domain proportions for 5 domains
        # loss_values: Array of corresponding multi-domain losses (N,5)
        # Returns: Optimized parameters (up to 35 parameters)
    ```

    Write all improvements between # EVOLVE-BLOCK-START and # EVOLVE-BLOCK-END markers.

    You are not allowed to use input-dependent feature in scaling_law_func, e.g., median / min / max / etc.

  num_top_programs: 3
  num_diverse_programs: 2
  use_template_stochasticity: true

# Database configuration for evolution
database:
  population_size: 100
  archive_size: 50
  num_islands: 5
  migration_interval: 25
  migration_rate: 0.1
  elite_selection_ratio: 0.1
  exploration_ratio: 0.2
  exploitation_ratio: 0.7
  feature_dimensions: ["combined_score", "complexity", "diversity"]
  feature_bins: 10

# Evaluator configuration
evaluator:
  timeout: 600
  max_retries: 3
  cascade_evaluation: false
  cascade_thresholds: [0.3, 0.6]
  parallel_evaluations: 4
  use_llm_feedback: false

# Evolution settings
diff_based_evolution: false
max_code_length: 100000