ReadMe for FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments

For anonymity, information regarding city, terrain, and document paths have been omitted. 

To facilitate understanding of the proposed algorithms, only agent-related code segments are presented here. The complete folder structure is documented in the accompanying "Folder_structure.pdf".

Following acceptance, we will release the full executable code for function validation.

"Code Modules Overview"

agri_eac_gnn_model.py
    Implements network architectures for GNN, BPN, and FM-EAC tailored to agricultural applications.
agri_eac_pan_model.py
    Implements network architectures for PAN, BPN, and FM-EAC tailored to agricultural applications.
urban_eac_gnn_model.py
    Implements network architectures for GNN and FM-EAC tailored to urban applications.
urban_eac_pan_model.py
    Implements network architectures for PAN and FM-EAC tailored to urban applications.


"Detailed Description of Key Modules"

"agri_eac_gnn_model.py"
    Actor: Policy network generating actions conditioned on input states.
    Critic: Evaluation network estimating Q-values for state-action pairs.
    BatteryPredictionNetwork: Predicts energy consumption based on state and environmental features.
    normalize_adjacency_matrix(A): Normalizes adjacency matrices used in graph convolution layers.
    GCNLayer: Single graph convolutional layer processing node features with normalized adjacency.
    GNN: Two-layer graph neural network producing a global graph representation via node features and adjacency.
    GNN_Agent: 
        compute_gnn_loss(batch_state, batch_action, batch_reward, batch_done): Computes critic loss for training GNN and critic networks.
        choose_action(state, explore=True): Selects an action given the current state, optionally with exploration noise.
        store_transition(state, action, reward, next_state, done): Saves experience tuples into replay buffer, managing buffer capacity.
        update(): Samples batches from replay buffer and updates GNN, Critic, and Actor networks with soft target updates.
        save(i, path, eps): Saves model weights (Actor, Critics, GNN) to checkpoint files.
        load(i, path, eps): Loads model weights from checkpoint files; raises error if files are absent.


"agri_eac_pan_model.py"
    Actor: Policy network generating actions based on input states.
    Critic: Value network estimating Q-values for state-action pairs.
    BatteryPredictionNetwork: Estimates energy consumption from input state and environment features.
    PointArrayFeatureExtractor: Extracts features from input environmental point array data.
    PAN_Agent:
        choose_action(state, explore=True): Selects action given state, optionally applying exploration noise.
        store_transition(state, action, reward, next_state, done): Stores experience tuples in replay buffer, handling buffer size constraints.
        update(): Samples from replay buffer and updates Critic and Actor networks, including soft target network updates.
        save(i, path, eps): Persists model parameters (Actor, Critics) to disk with iteration and episode identifiers.
        load(i, path, eps): Loads model parameters from saved checkpoints; raises error if missing.
        

"urban_eac_gnn_model.py"
    Actor: Policy network producing two types of actions from input states:
    Primary action (mean mu and standard deviation std): continuous actions modeled as a Gaussian distribution.
    Secondary action (softmax_out): categorical distribution over three discrete options via softmax.
    Critic_Pri: Value network estimating Q-values for state-primary action pairs.
    Critic_Sec: Value network estimating Q-values for state-secondary action pairs.
    normalize_adjacency_matrix(A): Adds self-loops and normalizes adjacency matrix for graph convolutional layers.
    GCNLayer: Single graph convolution layer performing adjacency normalization and learnable feature transformation.
    GNN: Two-layer graph neural network processing node features and adjacency, outputting global graph features by mean pooling.
    GNN_Agent:
        compute_gnn_loss(batch_state, batch_action): Computes loss on critic Q-values to update the GNN network by encouraging higher Q-values.
        update(): Conducts a training step by sampling batches, computing target Q-values, updating critics and actors, and applying GNN loss optimization.
        choose_action(state): Samples actions combining continuous primary actions and discrete secondary actions from the actor output.
        save(i, path, eps): Saves model weights (actor, critics, GNN) to checkpoint files.
        load(i, path, eps): Loads model weights from checkpoint files.


"urban_eac_pan_model.py"
    Actor: Policy network outputting two action types from the input state:
    Primary action (mean mu and standard deviation std): continuous actions modeled as a Gaussian distribution, dimension (action_dim - 3).
    Secondary action (softmax_out): categorical distribution over three discrete options via softmax.
    Critic_Pri: Value network estimating Q-values for state and primary action pairs (continuous).
    Critic_Sec: Value network estimating Q-values for state and secondary action pairs (discrete, one-hot encoded).
    PointArrayFeatureExtractor: Extracts features from environmental point array inputs.
    PAN_Agent:
        update(): Executes a training iteration by sampling from replay buffer, computing target Q-values, optimizing critics via MSE loss, and updating actor networks to maximize expected Q-values.
        choose_action(state): Samples combined continuous and discrete actions from actor outputs.
        save(i, path, eps): Saves current model parameters (actor and critics) with iteration and episode labels.
        load(i, path, eps): Loads model parameters from saved checkpoints.