# System Overview

We present a large–language–model–driven framework for **knowledge graph construction and narrative analysis**. The system transforms unstructured narrative texts (e.g., novels, screenplays) into structured multi-layer graphs, supporting downstream reasoning and question answering. Its design integrates **multi-agent collaboration**, **reflection-based quality control**, and **multi-database orchestration**, enabling both structured inference and evidence-grounded retrieval.

---

## Core Capabilities

- **Knowledge Extraction**: automatic identification of entities, relations, and attributes from long-form text.  
- **Multi-layer Graph Building**: construction of (i) a base knowledge graph, (ii) an event–causality graph, and (iii) a plot-unit graph.  
- **Multi-Agent Orchestration**: specialized agents (for extraction, schema probing, attribute enrichment, etc.) cooperate in a coordinated pipeline.  
- **Reflection Loops**: each extraction step is coupled with iterative evaluation and refinement.  
- **Hybrid QA Support**: the system integrates a graph database, a vector database, and a relational database to answer complex narrative questions.  
- **Scalability**: parallelized processing and modular configuration allow handling of large-scale narrative corpora.

---

## Multi-Agent Components

**Information Extraction Agent**  
Responsible for entity and relation detection. It retrieves similar past experiences, extracts structured elements (characters, events, actions, objects, timepoints), and applies reflection scoring to refine outputs.  

**Attribute Extraction Agent**  
Dedicated to capturing fine-grained attributes (e.g., properties of characters or objects) using an extract–reflect–optimize cycle.  

**Graph Probing Agent**  
Ensures schema adaptivity. It updates background terminology, adjusts schema design, performs trial extractions, prunes redundant types, and iteratively refines the schema until convergence.  

**Query Agent**  
Routes user queries to the appropriate backend (graph DB, vector DB, relational DB). It supports structured reasoning over causal chains, semantic retrieval of narrative descriptions, and attribute lookups. Results from multiple backends can be fused into a single grounded response.

---

## Architecture

The framework orchestrates three complementary storage and reasoning layers:

1. **Graph Database** for reasoning over event causality, plot structure, and character interactions.  
2. **Vector Database** for semantic retrieval of dialogues and descriptions with explicit evidence tracing.  
3. **Relational Database** for normalized attributes (e.g., costume, props, or other categorical data).  

These are coordinated through a hybrid QA layer that composes results across modalities.  

---

## Workflow

1. **Document Processing**: narrative texts are split into semantically coherent chunks.  
2. **Knowledge Extraction**: entities, relations, and attributes are extracted with reflection-based refinement.  
3. **Graph Preprocessing**: entity disambiguation and clustering ensure canonical forms.  
4. **Graph Construction**: knowledge, event-causality, and plot-unit graphs are written into the graph DB.  
5. **Vectorization**: text chunks are embedded and stored for semantic retrieval.  
6. **Hybrid QA**: user queries are answered via the appropriate backend(s), combining structured reasoning and textual evidence.  

---

## Core Directory Structure

The implementation follows a modular design, where each component under `core/` encapsulates a specific functional layer:

- **`core/agent/` – Agent Layer**  
  Specialized multi-agent components responsible for different extraction and reasoning tasks.  
  - `attribute_extraction_agent.py`: extracts fine-grained attributes of entities.  
  - `knowledge_extraction_agent.py`: performs entity and relation detection.  
  - `graph_probing_agent.py`: adapts and refines the schema via probing.  
  - `retriever_agent.py`: routes queries to the appropriate backend (query agent).  
  - `cmp_extraction_agent.py`: comparative extraction across multiple text segments.  

- **`core/builder/` – Graph Builder Layer**  
  Modules for document processing, graph construction, and database integration.  
  - `document_processor.py`: text segmentation and metadata generation.  
  - `graph_builder.py`: construction of knowledge graphs.  
  - `database_builder.py`: integration with relational and vector databases.  
  - `graph_preprocessor.py`: entity disambiguation and clustering.  
  - `narrative_graph_builder.py`: construction of event–causality and plot graphs.  
  - `reflection.py`: evaluation and refinement loops.  
  - `manager/`: orchestration of document and graph management.  

- **`core/functions/` – Function Library**  
  Defines callable tool functions for use by agents in multi-agent collaboration.  

- **`core/memory/` – Memory Layer**  
  Implements semantic and episodic memory modules, supporting experience retrieval and reflection.  

- **`core/model_providers/` – Model Interfaces**  
  Wrappers around external LLMs and embedding providers, ensuring model-agnostic design.  

- **`core/models/` – Data Models**  
  Defines structured data classes for documents, chunks, entities, relations, and graph elements.  

- **`core/prompts/` – Prompt Templates**  
  Contains curated prompt templates for extraction, reflection, and schema probing.  

- **`core/schema/` – Schema Definitions**  
  Stores graph schema specifications, type hierarchies, and relation constraints.  

- **`core/storage/` – Storage Backends**  
  Abstracts access to graph, vector, and relational database connections.  

- **`core/utils/` – Utilities**  
  Shared helpers for configuration, logging, concurrency, and prompt management.  

This layered structure allows clear separation of concerns: **agents** handle reasoning and extraction, **builders** handle graph construction and storage, **memory** supports reflection, and **schemas/storage** ensure consistent representation across databases.  

---

## Outputs

- **Knowledge Graph**: entities, relations, and attributes.  
- **Event-Causality Graph**: temporal and causal dependencies among events.  
- **Plot-Unit Graph**: higher-level narrative structures distilled from events.  
- **Vector Index**: supports semantic retrieval with evidence.  
- **Processing Logs**: quality scores and reflection outcomes.

---

## Technical Highlights

- Multi-agent design with specialized extraction, probing, and reflection agents.  
- Schema adaptivity through iterative probing and pruning.  
- Reflection-based loops to improve robustness of LLM extraction.  
- Hybrid orchestration of graph, vector, and relational backends for QA.  
- Modular and scalable implementation, agnostic to specific LLM or embedding models.  
