This document describes the workflow that we've used to generate the final paper. We've used multiple AI agents with different configurations to complete the process. All of the materials that we've submitted were created with AI agents.
### Initial Research
Initial research was done using [Holosophos](https://github.com/IlyaGusev/holosophos/tree/main) agent. The agent was setup with DeepSeek v3.1 model and the following prompt:
```
Create a list of research proposals for the Computer networks journal. Focus on Stanford SNAP datasets. For each research proposal assess the novelty and provide information about the methods and datasets used. 

Rank the proposals by the publication potential.
```
The agent proposed the following research directions:
```
Here is the fully developed research proposal for the Computer Networks journal, ranked by publication potential:

1. Interpretable AI for Network Traffic Analysis (Simulation-Guided Approach)  
   - *Novelty*: First integration of lightweight NS-3 simulation core within neural models for real-time counterfactual explanations (surpasses post-hoc methods in arXiv:2312.06342)  
   - *Method*: Hybrid architecture where GNN predictions trigger targeted NS-3 micro-simulations to generate "what-if" scenarios  
   - *Dataset Usage*: Leverages Cisco's flow-level metadata and topology constraints for simulation parameterization  
   - *Potential Impact*: Could become new standard for explainable AI in network operations (cited 150+ times based on similar breakthroughs)  
   - *Ranking Justification*: Highest feasibility score (9/10) with immediate industry applications  

2. AI-Driven Anomaly Detection Using Quantum-Inspired GNNs  
   - *Novelty*: Quantum decoherence model detects contextual anomalies missed by classical PCA/RNN (Latif et al.)  
   - *Breakthrough Potential*: Could achieve 15-20% higher F1-score on Cisco dataset's complex traffic patterns  

3. Federated Learning with Nash Equilibrium Alignment  
   - *Novelty*: Game-theoretic approach to non-IID data problem in enterprise networks  
   - *Practical Advantage*: Maintains 85%+ accuracy even with 20% malicious clients  

4. Energy-Efficient Slicing via Darwinian Auctioneering  
   - *Novelty*: First bio-inspired energy credit system for 5G slicing  
   - *Sustainability Impact*: Could reduce base station energy use by 30%  

5. Network Immunization Strategies  
   - *Novelty*: Graph neural networks for enterprise threat modeling  
   - *Security Impact*: Prevents 92% of lateral movement attacks in simulations  

All proposals are supported by:
- The Cisco dataset's unique 22-graph structure
- Real-world traffic patterns from enterprise networks
- Physical topology constraints missing in synthetic datasets

The simulation-guided interpretable AI proposal is particularly strong because:
1) Directly addresses industry demand for explainable systems
2) Uses novel methodology with immediate deployment potential
3) Leverages Cisco data's realism for convincing validation
4) Aligns with journal's 2025 special issue on "AI-Network Co-Design"
```
We've decided to investigate the first proposal further and asked the agent to continue. This time the task was to gather comprehensive technical dossier with citations, datasets, model architectures, and implementation details. We've tasked the agent to save all results in the working directory as Markdown files. After completing the task, the agent created the following files:
```
- gatv2_ns3_hybrid_research.md
- implementation_guide.md
- proposal_1.md
- proposal_2.md
- research_summary.md
- system_architecture.md
```
We've attached the files to the report as well. For the next step we've decided to focus on **Proposal 2: Self-focusing simulations**.
### Proposal implementation
To implement the project we've used Holosphos agent again but this time configured it with GPT-5 model for enhanced coding capabilities. We've placed Markdown files from the previous iteration in the agent's working directory and assigned the task:
```
# Task: Implement GATv2-NS3 Hybrid IDS Research

Analyze the research documents in the working directory and implement the complete GATv2-NS3 hybrid intrusion detection system as described in the research program.

## Instructions
1. Start by reading `research_summary.md` for complete context
2. Follow the implementation guide in `implementation_guide.md`
3. Focus on **Proposal 2 (Self-Focusing Simulations)** as the primary approach
4. Create a working system with all components: data pipeline, GATv2 model, NS-3 integration, simulation feedback, and evaluation framework

## Deliverables
- Complete Python codebase with modular architecture
- Training and evaluation scripts
- Documentation and setup instructions
- Performance benchmarks and case studies

All technical specifications, datasets, baselines, and success criteria are detailed in the provided research documents.
```
This iteration gave us most of the code for the research, however we saw multiple problems:
1. NS-3 simulations: The agent couldn't properly setup the NS-3
2. Datasets: Agent has mixed up multiple datasets and used outdated versions for NSL-KDD dataset
3. Experiment results: The results were very optimistic and we were concerned there is a problem in the pipeline
We've decided to focus on this issues and use other AI agents to fix them
### Project refinement
For this part we've decided to use different LLMs with tools because we needed less automated workflow rather than Holosophos agent provides. 
###### Fixing datasets
We've given the agent the correct link to the datasets we would like to use. After the setup was completed successfully, we found another potential issue - Cisco dataset presented very large graphs and this highlighted the potential issue for the NS-3 simulations as they would require a lot of computational resources
###### NS-3 simulations
For the reproducibility we've asked the agent to setup NS-3 using Docker. Then we've focused on fixing the simulation code and the AI agent successfully identified the main bottleneck - the simulation graphs that were created had a lot of nodes and edges which leads to two main problems - long training of the model and very high hardware requirements. 
We've decided to use small simulations for our research to ensure that others can setup the project as well and guided the AI agent to fix all the simulation scripts
###### Experiments
After we were finally satisfied with the pipeline it was time to conduct all the experiments again and get up-to-date results. We've used the baseline methods that were found during initial research and asked the agent to implement them. After that the agent fixed the scripts and performed model training and validation. We've asked the agent to save all results to the CSV file (it is available in `leaderboards/all_results.csv` in the project's code). We've used this table to generate all the visualizations for the final paper
### Project validation
For the validation step we've used Holosophos agent again but this time configured it with Gemini-2.5-Flash model. For this specific step we also fixed the agent to using only local tools for reading files, as we've found out that Holosophos agent has issues with syncing files between local and remote instances.
```
# Task
In the working directory you will find `gatv2_ns3_ids` research project. Familirize yourself with research idea by reading `README.md` and `ai_scientist_materials/proposal_2.md` file. Validate the research idea and results using the following criterias:
1. Novelty: Is this project new? Has someone explored this idea before?
2. Results: Do experiment results look valid? Is the comparison to other methods correct? 
3. Experimental setup: Does the code align with the proposal? Are there any significant problems that needs to be studied?

# Output
Create a directory `assessment` and save the validation results inside of it. Use Markdown format for all the files that you create

## Important notes
- Do not change files inside `gatv2_ns3_ids`. You can use it only for reading
- Use only `bash` and `text_editor` tools to read and write files. Do not invoke `remote_*` tools
- All your thoughts should be reinforced with search results from the resources you have access to
```
We've provided the generated files in the `assessment` directory. After this step we were finally satisfied with the results and decided to move on to the paper generation.
### Final paper generation
Using AI agent we've created an initial `.tex` file based on the provided template. Then we started refining each section step by step. We've used the following prompt:
```
Conduct comprehensive browser research on best practices for writing effective [[SECTION_TO_WRITE]] sections in research papers. Study multiple authoritative academic sources and writing guides to understand the key criteria and structural elements that define a high-quality target section.

After completing your research, analyze the current "Introduction" section in the attached paper against these established criteria. Perform an in-depth evaluation covering:

1. **Structure and Flow**: Does it follow the recommended section framework and logical organization principles?

2. **Content Quality**: 
   - Clear and appropriate content for this section type
   - Adequate depth and coverage of required elements
   - Proper integration with overall paper narrative
   - Clear articulation of section-specific objectives
   - Appropriate scope and focus

3. **Writing Quality**:
   - Clarity and readability
   - Logical progression of ideas
   - Appropriate technical depth for the audience
   - Engaging and professional presentation
   - Consistent tone and style

4. **Completeness**: Are all essential elements for a section present and sufficiently developed?

5. **Academic Standards**: Does it meet disciplinary conventions and publication standards for sections?

Based on your analysis, provide specific, actionable recommendations for improvement and implement the necessary revisions to enhance the section's effectiveness and alignment with academic writing standards. Then apply the changes in the `.tex` file.
```
After refining each section and when we were satisfied with the result, we've performed human and LLM-based validation and submitted the final paper.

### Conclusion
All artifacts in this project were produced across several iterations using a single multi-agent system. While the system is not yet sufficiently finalized to execute end-to-end studies entirely out of the box, it can — given modest assistance from a human researcher — generate near-publication-ready papers with minimal additional effort. We regard this capability as a significant milestone, demonstrating the practicality of human-in-the-loop multi-agent workflows and motivating further hardening for full autonomy, reliability, and reproducibility.