Below is a citation-backed, implementation-oriented dossier to support your GATv2 + NS-3 hybrid research program. Where arXiv IDs are available, I provide them, along with URLs.

1) Cisco Networks Dataset (existence, availability, licensing, schema; and alternatives; Cisco/related graph papers)
- Existence and hosting
  - UCI Machine Learning Repository: “Cisco Secure Workload Networks of Computing Hosts”
    - States: “This dataset contains 22 disjoint graphs … TCP and UDP … in different enterprises … Ground truth grouping information is provided for two of the graphs …” License: “Creative Commons Attribution 4.0 International (CC BY 4.0).”
    - URL: https://archive.ics.uci.edu/dataset/735/cisco+secure+workload+networks+of+computing+hosts
  - SNAP: “Cisco Networks”
    - Listed under Computer communication networks; 22 graphs; properties show Directed: Yes, Edge features: Yes, Temporal: Yes; (SNAP page snippet confirmed).
    - URL: https://snap.stanford.edu/data/cisco-networks.html
  - Introductory dataset paper: IWSPA ’22 (ACM)
    - Omid Madani, Sai Ankith Averineni, Shashidhar Gandham. “A Dataset of Networks of Computing Hosts.” IWSPA 2022. PDF: http://omadani.net/pubs/2022/iwspa2022_dataset_madani.pdf (paper describes 21 graphs; UCI/SNAP list 22).
- Availability and licensing
  - Available for download on UCI; license CC BY 4.0 (UCI page explicitly states this).
- Schema highlights (from host pages and paper)
  - 22 disjoint directed, temporal enterprise graphs of host communications (TCP/UDP). Edge features available; ground-truth “groupings” for two graphs (functional/role-based node grouping).
- Graphification alignment with your spec
  - Nodes: hosts/compute entities. Edges: directed communications (flow aggregation over consecutive hours across days). Your features (flow timestamps, inter-arrival times, node degree) align naturally with the dataset’s temporal and edge-attributed nature; node degree is derivable per graph snapshot. Labels: dataset provides groupings for two graphs; attack labels are not broadly provided—see substitutes below for labeled IDS tasks.
- Closest public alternatives for enterprise network IDS and how to build graph representations
  - LANL 2015 (Los Alamos National Laboratory Unified Host and Network Data Set)
    - arXiv:1708.07518, “Unified Host and Network Data Set” (Kent et al.) https://arxiv.org/abs/1708.07518
    - Graph construction: nodes=hosts, optionally users/processes; edges=NetFlow communications or authentication events; labels: red team ground truth windows; tasks: anomaly/attack detection. (Use event time windows to create temporal snapshots; edge attributes: counts/bytes/ports, inter-arrival stats.)
  - CIC-IDS 2017 / 2018 (widely used flow datasets; many arXiv analyses reference them)
    - Examples of arXiv usage/benchmarks returned by search (for performance baselines): e.g., arXiv:2109.07593; arXiv:2310.00070; others in search outputs. Graph construction: nodes=IP addresses or (IP, port) endpoints; edges=bidirectional NetFlows aggregated in windows; labels from dataset attack types (binary or multiclass). Features: per-edge time, IAT, bytes, pkts, flags. 
  - UNSW-NB15
    - Referenced in surveys (e.g., arXiv:1903.02460 “A Survey of Network-based Intrusion Detection Data Sets”). Similar graph construction as CIC: IP nodes; flows as edges; labels from dataset.
  - UGR’16 (large ISP flows)
    - Referenced in surveys (arXiv:1903.02460). Build IP-level graphs with flow edges; label by attack events (DoS, scan) provided in the dataset docs.
  - CTU-13 (botnet)
    - Multiple arXiv references found (e.g., arXiv:2004.00234; arXiv:2107.02896). Graph: nodes=hosts; edges=flows grouped by windows; label edges/hosts using botnet IP lists and scenario labeling (e.g., edges between bot/victim labeled attack).
  - MAWILab/MAWI
    - Recent flow benchmark on MAWI: arXiv:2506.17041, “MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection.” (Presents flow-based evaluation; can be graphified by aggregating flows between IPs/subnets.) https://arxiv.org/abs/2506.17041
- Cisco/related sources modeling enterprise networks as graphs
  - The Cisco dataset paper above (IWSPA ’22) is Cisco-authored.
  - Cisco whitepaper on graph-based email threat analysis (relationship graphs): “Using Relationship Graphs to Mitigate Email-based Threats” (Cisco) URL: https://www.cisco.com/c/en/us/products/collateral/security/cloud-email-security/graph-mitigate-email-based-threats-wp.html
  - Additional enterprise graph IDS work (not Cisco-specific but directly relevant):
    - arXiv:2503.14284, “Entente: Cross-silo Intrusion Detection on Network Log Graphs with Federated Learning.” https://arxiv.org/abs/2503.14284
    - arXiv:2411.10279, “Lateral Movement Detection via Time-aware Subgraph …” (time-aware subgraph methods on enterprise login graphs). https://arxiv.org/abs/2411.10279
    - arXiv:2504.13527, “Designing a Reliable Lateral Movement Detector Using a Graph …” (enterprise lateral movement). https://arxiv.org/abs/2504.13527

Graph construction recipes (for substitutes)
- Nodes: hosts (IP addresses) or devices; optionally users/processes for LANL. 
- Edges: communications (bidirectional flows summarized over a time window), with edge attributes: total bytes/packets, mean/std inter-arrival times, ports/protocols, start/end timestamps. 
- Node features: degree, in/out degree, rolling temporal stats (IAT mean/variance), counts by protocol/port. 
- Labels: 
  - Binary anomaly/attack labels from dataset (CIC, UNSW, UGR’16, CTU-13); per-flow labels can be mapped to edge labels; aggregate to node labels via max/any attack rule within window. 
  - LANL: use red-team windows and event-type labels to tag anomalous edges/subgraphs. 
- Train/val/test splits to mirror Cisco’s 14/4/4 graphs: reserve entire graphs (enterprises or days) as split units to prevent leakage.

2) Model architectures and baselines
- GATv2
  - How Attentive are Graph Attention Networks? (ICLR 2022)
    - arXiv:2105.14491 https://arxiv.org/abs/2105.14491
  - Hyperparameters (extracted from the paper)
    - Number of heads: 1 or 8 (multi-head improves stability/expressivity).
    - Hidden dim: {64, 128, 256}; common node-level settings: 128–256.
    - Layers: typically 2–3 for node classification; deeper (e.g., 6) for certain tasks.
    - Dropout: 0.4, 0.6, or 0.8 explored.
    - Learning rate: 5e-4, 1e-3, 5e-3, 1e-2 (dataset-dependent).
    - Notes: GATv2 outperforms GAT across OGB and other benchmarks; dynamic attention more expressive than GAT’s static attention (paper’s contribution).
  - Benchmarks specific to security/traffic: the paper evaluates generic graph benchmarks (OGB); not security-specific. Use your dataset and baselines for security evaluation.
- GraphSAGE (Hamilton et al., 2017)
  - Inductive Representation Learning on Large Graphs
    - arXiv:1706.02216 https://arxiv.org/abs/1706.02216
  - PyG configurations and pointers
    - PyG SAGEConv API: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.SAGEConv.html
    - GraphSAGE model wrapper: https://pytorch-geometric.readthedocs.io/en/stable/generated/torch_geometric.nn.models.GraphSAGE.html
    - Typical settings (node classification): 2–3 layers; hidden dim 64–256; mean aggregator; dropout 0.2–0.5; LR 1e-3 to 1e-2; BatchNorm between layers. Edge_attr or edge_weight can be used if available in conv layer; otherwise incorporate edge info by preprocessing or edge MLPs feeding into message passing (when supported).
    - Heterogeneous temporal features: normalize timestamps (e.g., z-score), derive inter-arrival time features per edge window; bucket/cycle encodings for diurnal patterns; concatenate to node/edge feature tensors. (General practical guidance; no dataset-specific constraints.)
- GIN (Xu et al., 2018/2019)
  - How Powerful are Graph Neural Networks?
    - arXiv:1810.00826 https://arxiv.org/abs/1810.00826
  - PyG GIN model wrapper: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.GIN.html
  - Typical settings (graph/node classification): MLP aggregators with sum pooling; 3–5 layers; hidden dim 64–256; ReLU+BatchNorm; dropout 0.2–0.5; LR ~ 1e-3–1e-2.
- Attention-based explanation heads for graphs (interpretable attention)
  - “Exploring Explainability Methods for Graph Neural Networks” arXiv:2211.01770 https://arxiv.org/abs/2211.01770
  - “Evaluating Link Prediction Explanations for Graph Neural Networks” arXiv:2308.01682 https://arxiv.org/abs/2308.01682
  - GATv2 itself (arXiv:2105.14491) motivates more faithful, dynamic attention; attention weights can be surfaced as explainable signals, but faithfulness should be validated against explainer metrics (below).
  - Recent explainability surveys (below) provide broader context on attention-based interpretability.

3) NS-3 integration with ML (ns3-gym, ns3-ai), mapping features, counterfactuals
- ns3-gym (paper and usage)
  - “ns3-gym: Extending OpenAI Gym for Networking Research.” arXiv:1810.03943 https://arxiv.org/abs/1810.03943
  - Summary (from paper): integrates ns-3 with OpenAI Gym for RL; ns-3 simulations can interact with Python agents; provides templates/environments; emphasizes reproducible RL experiments in networking.
  - GitHub: https://github.com/tkn-tub/ns3-gym
- ns3-ai (module within ns-3 source)
  - Repo tree: https://gitlab.com/nsnam/ns-3-dev/-/tree/master/src/ai (provides AI interface code in ns-3 mainline tree). Use for in-the-loop learning or inference without standalone Gym environments.
- NS-3 manual pages for mapping feature engineering to simulation components
  - Traffic generators and monitoring:
    - OnOffApplication: https://www.nsnam.org/docs/models/html/applications.html#onoffapplication
      - Map flow timestamps/inter-arrival times to OnTime/OffTime distributions and DataRate/PacketSize params.
    - BulkSendApplication: https://www.nsnam.org/docs/models/html/applications.html#bulksendapplication
      - Map large-volume transfers or bursty flows (size-limited or unlimited).
    - FlowMonitor: https://www.nsnam.org/docs/models/html/flow-monitor.html
      - Collect per-flow stats (delay, jitter, throughput, drop).
  - Topology and link helpers:
    - PointToPointHelper: https://www.nsnam.org/docs/models/html/point-to-point.html
    - CsmaHelper: https://www.nsnam.org/docs/models/html/csma.html
    - Wifi: https://www.nsnam.org/docs/models/html/wifi.html
  - Routing:
    - Static routing: https://www.nsnam.org/docs/models/html/internet-stack.html#static-routing
    - OLSR (example dynamic routing): https://www.nsnam.org/docs/models/html/olsr.html
  - QoS policies and queueing:
    - Traffic Control layer (QueueDisc): https://www.nsnam.org/docs/models/html/traffic-control.html
    - CoDel: https://www.nsnam.org/docs/models/html/codel.html
    - FQ-CoDel: https://www.nsnam.org/docs/models/html/fq-codel.html
- Mapping engineered features to NS-3 parameters
  - Flow timestamps/inter-arrival times → OnOffApplication OnTime/OffTime, DataRate; BulkSend for bulk flows. Use FlowMonitor to extract RTT/drops for feedback.
  - Topology constraints → choose PointToPoint, CSMA, or WiFi helpers to mirror enterprise segments; set per-link bandwidth/delay/queue.
  - Routing tables → configure Ipv4StaticRouting; for dynamic “OSPF-like” behavior in ns-3, use Global Routing or FRR extensions where applicable (see data-center routing paper in ns-3 search results), or OLSR as a stand-in for dynamic behavior in controlled experiments.
  - QoS policies → configure QueueDiscs (RED/CoDel/FQ-CoDel) on NetDevices via Traffic Control; attributes can be toggled to emulate policy changes.
- Counterfactual scenario generation (what-if simulations)
  - Link failures: use NetDevice/Channel “down” toggles (SetLinkDown examples appear in ns-3 user forum/search results); simulate failures at time t, then reroute.
  - Routing changes: trigger recomputation (e.g., Ipv4GlobalRouting::RecomputeRoutingTables or modify static routes mid-sim where supported; example/static routing resources appear in search results and examples).
  - QoS policy toggles: switch QueueDisc types/parameters (e.g., enable FQ-CoDel, change RED thresholds) per Traffic Control docs above; see examples and FQ-CoDel pages (FqCoDel-L4S example mentioned in docs).
  - References:
    - ns-3 docs cited above; searches: “SetLinkDown example”, “recompute routing tables”, “FQ-CoDel example.”

4) Explanation fidelity metrics for GNNs (definitions, metrics, runtime)
- GNNExplainer
  - arXiv:1903.03894, “GNNExplainer: Generating Explanations for Graph Neural Networks.” https://arxiv.org/abs/1903.03894
  - Fidelity/quality definitions (extracted from paper):
    - Mutual Information (MI) between model prediction Y and explanatory subgraph/features (G_S, X_S): maximize MI ⇒ equivalently minimize conditional entropy H(Y|G_S, X_S).
    - Sparsity/size constraints: limit subgraph size |G_S| ≤ K_M and feature count K_F; entropy regularization encourages discrete/sparse masks.
    - Evaluation via alignment with ground-truth explanatory structures (treat as binary edge-label classification; measure accuracy/AUC over importance scores vs. true important edges on synthetic datasets).
- PGExplainer
  - arXiv:2011.04573, “Parameterized Explainer for Graph Neural Network.” https://arxiv.org/abs/2011.04573
  - Fidelity and efficiency (extracted from paper):
    - Also MI/conditional entropy framing; uses l1 and entropy regularization on latent edge variables to promote sparsity/discreteness.
    - Runtime: O(|E|) per-instance; parameters shared across edges and transferable to new graphs. Reported up to 108× faster than GNNExplainer in their experiments.
- Additional surveys/benchmarks (2021–2025)
  - arXiv:2203.09258, “Explainability in Graph Neural Networks: An Experimental Survey.”
  - arXiv:2012.15445, “Explainability in Graph Neural Networks: A Taxonomic Survey.”
  - arXiv:2210.15304, “Explaining the Explainers in Graph Neural Networks: a Comparative Study.”
  - arXiv:2204.08570, “A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability.”
  - arXiv:2411.02540, “GraphXAIN: Narratives to Explain Graph Neural Networks.”
- Metrics to report (definitions)
  - Fidelity (MI-based or conditional entropy reduction), Deletion/Insertion (measure prediction score change when removing/adding top-k edges/features), Sparsity (size of explanation mask), Ground-truth edge/subgraph accuracy (on synthetic/annotated datasets). Surveys above standardize these metrics. For our ablation (“remove simulation feedback”), use fidelity+ or deletion/insertion to quantify how much the explanation aligned with model behavior degrades without simulation feedback.
- Typical runtimes
  - Paper-level results confirm PGExplainer’s significant speedup (up to 108× vs GNNExplainer). Absolute millisecond values are dataset/implementation dependent; use our own measurements to enforce the <100 ms explanation-time target, and prefer PGExplainer-style parameter sharing or our lightweight attention-head explanations.

5) Adversarial robustness on graphs (threat models and protocols)
- Foundational attack methods
  - Nettack: arXiv:1805.07984, “Adversarial Attacks on Neural Networks for Graph Data.” https://arxiv.org/abs/1805.07984
  - Metattack (a.k.a. Meta attack): arXiv:1902.08412, “Adversarial Attacks on Graph Neural Networks via Meta Learning.” https://arxiv.org/abs/1902.08412
  - RL-S2V: arXiv:1806.02371, “Adversarial Attack on Graph Structured Data.” https://arxiv.org/abs/1806.02371
  - Newer method: arXiv:2202.12993, “Projective Ranking-based GNN Evasion Attacks,” IEEE TKDE. https://arxiv.org/abs/2202.12993
- Adapting to enterprise network graphs
  - Malicious node injection: add nodes with edges to targeted enterprise hosts following degree/port distributions; constraint budgets to preserve degree distributions; features mimic benign traffic statistics while probing model decision boundaries (use above methods’ heuristics).
  - Edge perturbations: add/remove communication edges (flows) within budgets; maintain flow totals to remain stealthy.
  - Evaluation protocol: measure FPR (false-positive rate) under attack at fixed TPR or recall; report robustness curves vs. perturbation budget; include per-graph results. Trustworthy GNN survey (arXiv:2204.08570) provides broader robustness/evaluation context.

6) Scalability and latency measurement (100–10k nodes)
- Best practices (PyG/DGL docs)
  - PyG speed and sparse tensor notes:
    - PyG speed guide: https://pytorch-geometric.readthedocs.io/en/latest/notes/speed.html
    - PyG SparseTensor: https://pytorch-geometric.readthedocs.io/en/latest/notes/sparse_tensor.html
  - DGL efficiency guide: https://docs.dgl.ai/en/latest/guide/basics-efficient.html
- Measurement protocol
  - Report hardware (CPU model, RAM, GPU model/VRAM, CUDA/cuDNN versions).
  - Use warm-up iterations; synchronize GPU timers (torch.cuda.synchronize); average over ≥50 runs per graph size; report mean±std and 95% CI.
  - Use SparseTensor adjacency, fused message passing (where available), neighbor sampling for large graphs, and mini-batching for multi-graph inference.
  - Explanation timing: time the explainer call only (not data loading); cache intermediate embeddings if your explanation head permits; aim for <100 ms at target sizes using PGExplainer-style or lightweight attention-head explainers.
- Typical latencies in literature
  - Not uniformly