# Code Structure (DFA-NSP)

High-level layout of the code and where to find core functionality, with concrete pointers to the main scripts.

## automata/
Implements DFAs and the learning algorithms (L*-NSP and L*). Shared logic for membership/equivalence queries, observation tables, and caching lives in `lstar_utils.py`.  
**LRU cache:** the bounded LRU used for membership queries is implemented in `class MQ` inside `automata/lstar_utils.py`.

## datagen/
Defines target languages and DFA specifications, plus utilities to sample data from them. These components produce the training/evaluation corpora for language models and the ground-truth DFAs used for comparison after extraction.

## model_src/
Implements Transformer language models, sampling helpers, and the oracle wrappers that let the LM serve as a membership or generative oracle (including NSP labeling) for the learners. NSP labeling and membership/generative oracles are in `model_src/oracles.py`.

## Top-level scripts (DFA-NSP/)
Entry points to run the pipeline: generate data, train LMs, label NSP data, and run L* or L*-NSP. Core examples: `train_lm.py` (train LM), `lstar_data_gen.py` / `nsp_data_gen.py` (label data via LM), `run_lstar_nsp.py` (learn DFA). See `README.md` for command examples.
