r"""
The Synthesis module is responsible for generating new materials based on existing ones.

The pipeline consists of 5 kinds of workers:
1. Startup worker - initializes the pipeline and creates the necessary directories and config files.
   Populates the initial augmentations.
2. Augmentation Workers - Selects how to add materials next, and sets up the files in workspace/to_process.
   There are 3 types of augmentation workers:
    - Parametric: Generates new materials based on a parametric generator.
    - Hybridization: Combines two or more materials into a new material.
    - Enumerator: Runs an enumerator to generate many potential new materials.
3. Material Processor - Runs the material generation code and validates the output.
4. Integration Worker - Integrates the generated materials into the dataset. This updates dataset statistics
    and queues up new work for the augmentation workers.
5. Cleanup Worker - Cleans up the workspace, ensures that dataset statistics are up-to-date, and syncs the
   generated data with S3.

The workers communicate by means of files on disk. This means they can be run locally (though an external process
runner would be needed to start them all up) for testing, but the intention is to run them in AWS Batch.

The job architecture that the Startup, cleanup, integration, and augmentation workers are single jobs, whereas the
material proecssing workers are run as array jobs for parallelization. Since processing takes orders of magnitude
longer than the other steps, we assume that these single processes can always keep the pipeline busy.

The dependency graph, based on the numbering above, is
     
   ┌─►2.─┐    
1.─┼─►3.─┼─►5.
   └─►4.─┘    

This way, none of the continuous workers start until the setup is complete, and the cleanup process only runs
once every other worker has stopped.

The Material Processors run in a loop that is constantly checking for new materials in workspace/to_process.
They look for a file called workspace/signals/stop_processing. If this file exists, the worker will stop after
their next loop where no work is done. To avoid duplicating work, each worker creates a lockfile in the to_process 
subdirectory of the material they are working on, and delets the directory when they are done. This does not completely
remove the possibility of duplicate work, but we add a small random delay to the start of each worker to try
and avoid this.

The integration worker runs in a loop that is constantly checking for new materials in workspace/to_integrate.
It computes any updated statistics for these materials, copies the materials to the final materials/ location,
and at the end of each loop updates the overall dataset statistics in metadata/ and populates
workspace/signals/augmentations/ with new augmentation directives (hybridize, sample, or enumerate w/ given 
settings). It is ultimately responsible for deciding when to stop processing and set the stop_processing signal.

Augmentation workers look at workspace/signals/augmentation/{parametric,hybridize,enumerate}.jsonl, which are 
append-onlyfiles where each line lists which materials to hybridize, generators to sample, or enumerators to run, 
and what parameters should be used for this purpose. These files have a corresponding xxx.lockfile version to avoid
race conditions while the integration worker is updating them. Augmentation workers keep track of how many
lines they have processed, and ingest all new lines as new work on each loop. They generate materials into
a temporary directory, then copy them to the workspace/to_process directory.

"""