 # USER: 
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  > I have a series of json files in the data_use folder, each json file contains a list, and each item in the list is    {                                                             │
│            "title": "Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators",                              │
│            "authors": "Bohan Xiao, Peiyong Wang, Qisheng He, Ming Dong; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition │
│     (CVPR), 2025, pp. 28232-28241",                                                                                                                       │
│            "abstract": "Image-to-Image (I2I) translation involves converting an im- age from one domain to another. Deterministic I2I transla- tion, such │
│     as in image super-resolution, extends this con- cept by guaranteeing that each input generates a consistent and predictable output, closely matching  │
│    the ground truth (GT) with high fidelity. In this paper, we propose a denois- ing Brownian bridge model with dual approximators (Dual- approx Bridge), │
│     a novel generative model that exploits the Brownian bridge dynamics and two neural network-based approximators (one for forward and one for reverse   │
│    pro- cess) to produce faithful output with negligible variance and high image quality in I2I translations. Our extensive exper- iments on benchmark    │
│    datasets including image generation and super-resolution demonstrate the consistent and supe- rior performance of Dual-approx Bridge in terms of im-   │
│    age quality and faithfulness to GT when compared to both stochastic and deterministic baselines. Project page and code:                                │
│    https://github.com/bohan95/dual-app-bridge",                                                                                                           │
│            "pdf_link": "https://openaccess.thecvf.com/content/CVPR2025/papers/Xiao_Deterministic_Image-to-Image_Translation_via_Denoising_Brownian_Bridge │
│    _Models_with_Dual_CVPR_2025_paper.pdf",                                                                                                                │
│            "page_url": "https://openaccess.thecvf.com/content/CVPR2025/html/Xiao_Deterministic_Image-to-Image_Translation_via_Denoising_Brownian_Bridge_M │
│    odels_with_Dual_CVPR_2025_paper.html"                                                                                                                  │
│        },this kind. Please refer to the following code and help me construct a knowledge base and vectors for storage.                                                                                       │
│    # --- 6. Helper & Main Execution ---                                                                                                                   │
│    def build_faiss_index(vectors: np.ndarray):                                                                                                            │
│        print("Building FAISS index...")                                                                                                                   │
│        dimension = vectors.shape[1]                                                                                                                       │
│        index = faiss.IndexFlatIP(dimension)                                                                                                               │
│        faiss.normalize_L2(vectors)                                                                                                                        │
│        index.add(vectors)                                                                                                                                 │
│        print("FAISS index built.")                                                                                                                        │
│        return index                                                                                                                                       │
│                                                                                                                                                           │
│    if __name__ == "__main__":                                                                                                                             │
│        cfg = Config()                                                                                                                                     │
│        llm_interface = LLMInterface(cfg.MODEL_NAME, cfg.DEVICE)                                                                                           │
│                                                                                                                                                           │
│        novelty_documents = [                                                                                                                              │
│            "AlphaFold predicts protein structures using a deep learning approach.",                                                                       │
│            "CRISPR-Cas9 is a gene-editing tool that allows for precise modification of DNA sequences.",                                                   │
│            "BERT is a transformer-based model for natural language understanding.",                                                                       │
│            "Graphene is a single layer of carbon atoms arranged in a two-dimensional honeycomb lattice.",                                                 │
│            "Lithium-ion batteries are a type of rechargeable battery.",                                                                                   │
│            "Perovskite solar cells include a perovskite-structured compound as the light-harvesting active layer."                                        │
│        ]                                                                                                                                                  │
│                                                                                                                                                           │
│        print(f"\nVectorizing {len(novelty_documents)} base documents...")                                                                                 │
│        vectors = np.array([llm_interface.get_vector(text) for text in novelty_documents]).astype('float32')                                               │
│                                                                                                                                                           │
```python
import numpy as np
import faiss

def build_faiss_index(vectors):
    """
    Build a FAISS index from the given vectors.
    
    Args:
        vectors: numpy array of shape (n_samples, n_features) containing the vectors
        
    Returns:
        A FAISS index object
    """
    # Convert vectors to float32 if they're not already
    if vectors.dtype != np.float32:
        vectors = vectors.astype(np.float32)
    
    # Determine the dimensionality of the vectors
    d = vectors.shape[1]
    
    # Create a FAISS index (using L2 distance by default)
    index = faiss.IndexFlatL2(d)
    
    # Add the vectors to the index
    index.add(vectors)
    
    return index

# Example usage:
# vectors = np.random.random((1000, 128)).astype('float32')
# novelty_db = build_faiss_index(vectors)
```
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

# AGENT: 



✦ Okay, I'll help you construct a knowledge base and vector storage.

  Since I cannot directly access external LLM services to generate real vectors, I'll provide you with a Python code containing Config and LLMInterface classes, where the LLMInterface.get_vector
  method will return a simulated vector (for example, a randomly generated NumPy array). You will need to replace this part with the actual LLM vectorization implementation.

  This code will:
   1. Iterate through all JSON files in the data_use folder.
   2. Read the paper data from each file.
   3. Concatenate the title and abstract of each paper as the text to be vectorized.
   4. Use the simulated LLMInterface to generate vectors.
   5. Use faiss to build the vector index.

  Please make sure you have installed the numpy and faiss libraries (pip install numpy faiss-cpu or faiss-gpu).
  ```
     1 import json
     2 import os
     3 import glob
     4 import numpy as np
     5 import faiss
     6 
     7 # --- 1. Configuration Class ---
     8 class Config:
     9     def __init__(self):
    10         # These would typically be loaded from environment variables or a config file
    11         self.MODEL_NAME = "your_llm_model_name"
    12         self.DEVICE = "cpu" # or "cuda" if you have a GPU
    13 
    14 # --- 2. LLM Interface Class (Placeholder) ---
    15 class LLMInterface:
    16     def __init__(self, model_name, device):
    17         self.model_name = model_name
    18         self.device = device
    19         print(f"Initializing LLMInterface with model: {self.model_name} on device: {self.device}")
    20         # In a real scenario, you would load your LLM model here
    21         # For demonstration, we'll just simulate vector generation
    22 
    23     def get_vector(self, text: str) -> np.ndarray:
    24         """
    25         This method should call your actual LLM embedding model
    26         to get a vector representation of the input text.
    27         For now, it returns a dummy vector.
    28         """
    29         # print(f"Generating vector for text (first 50 chars): '{text[:50]}...'")
    30         # Simulate a 768-dimensional embedding vector (common for many models)
    31         return np.random.rand(768).astype('float32')
    32 
    33 # --- 3. Helper to build FAISS index ---
    34 def build_faiss_index(vectors: np.ndarray):
    35     print("Building FAISS index...")
    36     dimension = vectors.shape[1]
    37     # Using IndexFlatIP for Inner Product similarity, common for normalized embeddings
    38     index = faiss.IndexFlatIP(dimension)
    39     faiss.normalize_L2(vectors) # Normalize vectors for cosine similarity with IndexFlatIP
    40     index.add(vectors)
    41     print(f"FAISS index built with {index.ntotal} vectors.")
    42     return index
    43 
    44 # --- 4. Main Execution Logic ---
    45 if __name__ == "__main__":
    46     cfg = Config()
    47     llm_interface = LLMInterface(cfg.MODEL_NAME, cfg.DEVICE)
    48 
    49     data_use_dir = "./data_use" # Assuming this script is run from the 'database' directory
    50     json_files = glob.glob(os.path.join(data_use_dir, "*.json"))
    51 
    52     if not json_files:
    53         print(f"No JSON files found in {data_use_dir}. Please ensure the path is correct and files exist.")
    54     else:
    55         print(f"Found {len(json_files)} JSON files in '{data_use_dir}'.")
    56 
    57         all_papers_text = []
    58         paper_metadata = [] # To store original paper data for retrieval later
    59 
    60         for file_path in json_files:
    61             print(f"Processing file: {file_path}")
    62             try:
    63                 with open(file_path, 'r', encoding='utf-8') as f:
    64                     papers_in_file = json.load(f)
    65                     if isinstance(papers_in_file, list):
    66                         for paper in papers_in_file:
    67                             title = paper.get("title", "")
    68                             abstract = paper.get("abstract", "")
    69                             # Combine title and abstract for vectorization
    70                             combined_text = f"Title: {title}\nAbstract: {abstract}"
    71                             all_papers_text.append(combined_text)
    72                             paper_metadata.append(paper) # Store original paper data
    73                     else:
    74                         print(f"Warning: File {file_path} does not contain a list of papers.")
    75             except json.JSONDecodeError as e:
    76                 print(f"Error decoding JSON from {file_path}: {e}")
    77             except Exception as e:
    78                 print(f"An unexpected error occurred while reading {file_path}: {e}")
    79 
    80         if all_papers_text:
    81             print(f"\nVectorizing {len(all_papers_text)} documents...")
    82             # Generate vectors for all combined texts
    83             # Note: For very large datasets, you might want to process in batches
    84             vectors = np.array([llm_interface.get_vector(text) for text in all_papers_text]).astype('float32')
    85 
    86             # Build the FAISS index
    87             paper_knowledge_base_index = build_faiss_index(vectors)
    88 
    89             # You can now save the index and metadata for later use
    90             # faiss.write_index(paper_knowledge_base_index, "paper_faiss_index.bin")
    91             # with open("paper_metadata.json", "w", encoding='utf-8') as f:
    92             #     json.dump(paper_metadata, f, ensure_ascii=False, indent=4)
    93 
    94             print("\nKnowledge base and FAISS index created successfully.")
    95             print("You can now use 'paper_knowledge_base_index' for similarity searches.")
    96             # Example: Search for similar papers (dummy query)
    97             # query_text = "New methods for image recognition using deep learning"
    98             # query_vector = llm_interface.get_vector(query_text).reshape(1, -1)
    99             # D, I = paper_knowledge_base_index.search(query_vector, k=5) # Search for top 5 similar
   100             # print("\nTop 5 similar papers (indices):", I)
   101             # print("Distances:", D)
   102             # for i in I[0]:
   103             #     print(f"- {paper_metadata[i]['title']}")
   104         else:
   105             print("No paper texts were extracted for vectorization.")

```

# USER:

   ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮

│  > I provide you with more information:                                                                                                             │
│    # --- 3. LLM Interface ---                                                                                                         │
│    class LLMInterface:                                                                                                                │
│        def __init__(self, model_name, device):                                                                                        │
│            print(f"Loading model: {model_name} on {device}...")                                                                       │
│            self.tokenizer = AutoTokenizer.from_pretrained(model_name)                                                                 │
│            self.model = AutoModelForCausalLM.from_pretrained(                                                                         │
│                model_name,                                                                                                            │
│                dtype="auto",                                                                                                          │
│                device_map="auto"                                                                                                      │
│            ).eval()                                                                                                                   │
│            self.device = self.model.device                                                                                            │
│            print("Model loaded successfully.")                                                                                        │
│                                                                                                                                       │
│        def get_vector(self, text: str) -> np.ndarray:                                                                                 │
│            with torch.no_grad():                                                                                                      │
│                inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(self.device)                    │
│                outputs = self.model(**inputs, output_hidden_states=True)                                                              │
│                vector = outputs.hidden_states[-1][0, -1, :].cpu().to(torch.float32).numpy()                                           │
│                del inputs, outputs                                                                                                    │
│                return vector                                                                                                          │
│                                                                                                                                       │
│        def generate_chat_completion(self, messages: list, max_length: int, temperature: float = 0.7, thinking=True) -> str:           │
│            text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=thinking)  │
│            model_inputs = self.tokenizer([text], return_tensors="pt").to(self.device)                                                 │
│            generated_ids = self.model.generate(                                                                                       │
│                model_inputs.input_ids,                                                                                                │
│                attention_mask=model_inputs.attention_mask,                                                                            │
│                max_new_tokens=max_length,                                                                                             │
│                do_sample=True, top_p=0.9, temperature=temperature,                                                                    │
│                pad_token_id=self.tokenizer.eos_token_id                                                                               │
│            )                                                                                                                          │
│            response = self.tokenizer.batch_decode(generated_ids[:, model_inputs.input_ids.shape[-1]:], skip_special_tokens=True)[0]   │
│            del text, model_inputs, generated_ids                                                                                      │
│            return response                                                                                                            │
│                                                                                                                                       │
│        def get_prob_and_vector(self, text: str) -> tuple[float, np.ndarray]:                                                          │
│            with torch.no_grad():                                                                                                      │
│                try:                                                                                                                   │
│                    inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(self.device)                │
│                    outputs = self.model(**inputs, labels=inputs.input_ids, output_hidden_states=True)                                 │
│                    log_prob = -outputs.loss.item()                                                                                    │
│                    vector = outputs.hidden_states[-1][0, -1, :].cpu().to(torch.float32).numpy()                                       │
│                    del inputs, outputs                                                                                                │
│                    return log_prob, vector                                                                                            │
│                except Exception as e:                                                                                                 │
│                    print(f"Error in get_prob_and_vector: {e}")                                                                        │
│                    return 0.0, None \n # --- 1. Configuration ---                                                                     │
│    class Config:                                                                                                                      │
│        MODEL_NAME = "../Qwen3-0.6B"                                                                                                   │
│        DEVICE = "cuda" if torch.cuda.is_available() else "cpu"                                                                        │
│        NUM_ITERATIONS = 30                                                                                                            │
│                                                                                                                                       │
│        # MCTS Core                                                                                                                    │
│        EXPLORATION_CONSTANT = 1.5                                                                                                     │
│        K_EXPAND = 3                                                                                                                   │
│        # N_ROLLOUTS is deprecated in Narrative MCTS, evaluation is direct.                                                            │
│                                                                                                                                       │
│        # LLM Generation Lengths                                                                                                       │
│        EXPAND_MAX_LENGTH = 1024                                                                                                       │
│        THEME_GEN_MAX_LENGTH = 1024                                                                                                    │
│                                                                                                                                       │
│        # Automated Theme Generation                                                                                                   │
│        NUM_CLUSTERS = 3                                                                                                               │
│                                                                                                                                       │
│        # T-MCTS/PE Hyperparameters                                                                                                    │
│        ALPHA_NOVELTY = 0.7                                                                                                            │
│        W_DIR = 1.0                                                                                                                    │
│        W_COH = 0.5                                                                                                                    │
│        W_NOV = 0.3                                                                                                                    │
│        W_PROG = 0.2                                                                                                                   │
│                                                                                                                                       │
│    # --- 2. Helper Functions ---                                                                                                      │
│    def parse_llm_json_output(response: str) -> dict | None:                                                                           │
│        """Robustly parses JSON from LLM output that might include markdown."""                                                        │
│        try:                                                                                                                           │
│            match = re.search(r"```json\n(.*?)\n```", response, re.DOTALL)                                                             │
│            if match:                                                                                                                  │
│                json_str = match.group(1)                                                                                              │
│                return json.loads(json_str)                                                                                            │
│            else:                                                                                                                      │
│                return json.loads(response)                                                                                            │
│        except (json.JSONDecodeError, IndexError):                                                                                     │
│            print(f"Warning: Failed to parse LLM response as JSON.")                                                                   │
│            return None。This time, you need to store both vector and db into files, and the filename for vector should include the model name (which is model-related). Finally, write the code into a file.      │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

# AGENT:


✦ Okay, I understand. This time we will use the LLMInterface and Config classes you provided, and store the generated vectors, FAISS index, and original metadata in files, where the vector filename will include the model name. I will write the complete Python code into a file named build_paper_knowledge_base.py.
 ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✓  WriteFile Writing to build_paper_knowledge_base.py                                                                                            │
 │                                                                                                                                                  │
 │    ... first 67 lines hidden ...                                                                                                                 │
 │     68             del inputs, outputs                                                                                                           │
 │     69             # Clear CUDA cache if using GPU                                                                                               │
 │     70             if self.device.type == 'cuda':                                                                                                │
 │     71                 torch.cuda.empty_cache()                                                                                                  │
 │     72             return vector                                                                                                                 │
 │     73                                                                                                                                           │
 │     74     def generate_chat_completion(self, messages: list, max_length: int, temperature: float = 0.7, thinking=True) -> str:                  │
 │     75         text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=thinking)         │
 │     76         model_inputs = self.tokenizer([text], return_tensors="pt").to(self.device)                                                        │
 │     77         generated_ids = self.model.generate(                                                                                              │
 │     78             model_inputs.input_ids,                                                                                                       │
 │     79             attention_mask=model_inputs.attention_mask,                                                                                   │
 │     80             max_new_tokens=max_length,                                                                                                    │
 │     81             do_sample=True, top_p=0.9, temperature=temperature,                                                                           │
 │     82             pad_token_id=self.tokenizer.eos_token_id                                                                                      │
 │     83         )                                                                                                                                 │
 │     84         response = self.tokenizer.batch_decode(generated_ids[:, model_inputs.input_ids.shape[-1]:], skip_special_tokens=True)[0]          │
 │     85         del text, model_inputs, generated_ids                                                                                             │
 │     86         if self.device.type == 'cuda':                                                                                                    │
 │     87             torch.cuda.empty_cache()                                                                                                      │
 │     88         return response                                                                                                                   │
 │     89                                                                                                                                           │
 │     90     def get_prob_and_vector(self, text: str) -> tuple[float, np.ndarray]:                                                                 │
 │     91         with torch.no_grad():                                                                                                             │
 │     92             try:                                                                                                                          │
 │     93                 inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(self.device)                       │
 │     94                 outputs = self.model(**inputs, labels=inputs.input_ids, output_hidden_states=True)                                        │
 │     95                 log_prob = -outputs.loss.item()                                                                                           │
 │     96                 vector = outputs.hidden_states[-1][0, -1, :].cpu().to(torch.float32).numpy()                                              │
 │     97                 del inputs, outputs                                                                                                       │
 │     98                 if self.device.type == 'cuda':                                                                                            │
 │     99                     torch.cuda.empty_cache()                                                                                              │
 │    100                 return log_prob, vector                                                                                                   │
 │    101             except Exception as e:                                                                                                        │
 │    102                 print(f"Error in get_prob_and_vector: {e}")                                                                               │
 │    103                 return 0.0, None                                                                                                          │
 │    104                                                                                                                                           │
 │    105 # --- 4. Helper to build FAISS index ---                                                                                                  │
 │    106 def build_faiss_index(vectors: np.ndarray):                                                                                               │
 │    107     print("Building FAISS index...")                                                                                                      │
 │    108     dimension = vectors.shape[1]                                                                                                          │
 │    109     # Using IndexFlatIP for Inner Product similarity, common for normalized embeddings                                                    │
 │    110     index = faiss.IndexFlatIP(dimension)                                                                                                  │
 │    111     faiss.normalize_L2(vectors) # Normalize vectors for cosine similarity with IndexFlatIP                                                │
 │    112     index.add(vectors)                                                                                                                    │
 │    113     print(f"FAISS index built with {index.ntotal} vectors.")                                                                              │
 │    114     return index                                                                                                                          │
 │    115                                                                                                                                           │
 │    116 # --- 5. Main Execution Logic ---                                                                                                         │
 │    117 if __name__ == "__main__":                                                                                                                │
 │    118     cfg = Config()                                                                                                                        │
 │    119     llm_interface = LLMInterface(cfg.MODEL_NAME, cfg.DEVICE)                                                                              │
 │    120                                                                                                                                           │
 │    121     data_use_dir = "./data_use" # Assuming this script is run from the 'database' directory                                               │
 │    122     json_files = glob.glob(os.path.join(data_use_dir, "*.json"))                                                                          │
 │    123                                                                                                                                           │
 │    124     if not json_files:                                                                                                                    │
 │    125         print(f"No JSON files found in {data_use_dir}. Please ensure the path is correct and files exist.")                               │
 │    126     else:                                                                                                                                 │
 │    127         print(f"Found {len(json_files)} JSON files in '{data_use_dir}'.")                                                                 │
 │    128                                                                                                                                           │
 │    129         all_papers_text = []                                                                                                              │
 │    130         paper_metadata = [] # To store original paper data for retrieval later                                                            │
 │    131                                                                                                                                           │
 │    132         for file_path in json_files:                                                                                                      │
 │    133             print(f"Processing file: {file_path}")                                                                                        │
 │    134             try:                                                                                                                          │
 │    135                 with open(file_path, 'r', encoding='utf-8') as f:                                                                         │
 │    136                     papers_in_file = json.load(f)                                                                                         │
 │    137                     if isinstance(papers_in_file, list):                                                                                  │
 │    138                         for paper in papers_in_file:                                                                                      │
 │    139                             title = paper.get("title", "")                                                                                │
 │    140                             abstract = paper.get("abstract", "")                                                                          │
 │    141                             # Combine title and abstract for vectorization                                                                │
 │    142                             combined_text = f"Title: {title}\nAbstract: {abstract}"                                                       │
 │    143                             all_papers_text.append(combined_text)                                                                         │
 │    144                             paper_metadata.append(paper) # Store original paper data                                                      │
 │    145                     else:                                                                                                                 │
 │    146                         print(f"Warning: File {file_path} does not contain a list of papers.")                                            │
 │    147             except json.JSONDecodeError as e:                                                                                             │
 │    148                 print(f"Error decoding JSON from {file_path}: {e}")                                                                       │
 │    149             except Exception as e:                                                                                                        │
 │    150                 print(f"An unexpected error occurred while reading {file_path}: {e}")                                                     │
 │    151                                                                                                                                           │
 │    152         if all_papers_text:                                                                                                               │
 │    153             print(f"\nVectorizing {len(all_papers_text)} documents...")                                                                   │
 │    154             # Generate vectors for all combined texts                                                                                     │
 │    155             # Note: For very large datasets, you might want to process in batches                                                         │
 │    156             vectors = np.array([llm_interface.get_vector(text) for text in all_papers_text]).astype('float32')                            │
 │    157                                                                                                                                           │
 │    158             # Extract a clean model name for filenames                                                                                    │
 │    159             model_name_for_file = os.path.basename(cfg.MODEL_NAME)                                                                        │
 │    160             if not model_name_for_file: # Handle cases like ".." or "/"                                                                   │
 │    161                 model_name_for_file = "default_model"                                                                                     │
 │    162                                                                                                                                           │
 │    163             # Save vectors to a .npy file                                                                                                 │
 │    164             vectors_filename = f"vectors_{model_name_for_file}.npy"                                                                       │
 │    165             np.save(vectors_filename, vectors)                                                                                            │
 │    166             print(f"Vectors saved to {vectors_filename}")                                                                                 │
 │    167                                                                                                                                           │
 │    168             # Build the FAISS index                                                                                                       │
 │    169             paper_knowledge_base_index = build_faiss_index(vectors)                                                                       │
 │    170                                                                                                                                           │
 │    171             # Save the FAISS index                                                                                                        │
 │    172             faiss_index_filename = f"faiss_index_{model_name_for_file}.bin"                                                               │
 │    173             faiss.write_index(paper_knowledge_base_index, faiss_index_filename)                                                           │
 │    174             print(f"FAISS index saved to {faiss_index_filename}")                                                                         │
 │    175                                                                                                                                           │
 │    176             # Save the paper metadata                                                                                                     │
 │    177             metadata_filename = "paper_metadata.json"                                                                                     │
 │    178             with open(metadata_filename, "w", encoding='utf-8') as f:                                                                     │
 │    179                 json.dump(paper_metadata, f, ensure_ascii=False, indent=4)                                                                │
 │    180             print(f"Paper metadata saved to {metadata_filename}")                                                                         │
 │    181                                                                                                                                           │
 │    182             print("\nKnowledge base and FAISS index created successfully.")                                                               │
 │    183             print("You can now use the saved files for your application.")                                                                │
 │    184         else:                                                                                                                             │
 │    185             print("No paper texts were extracted for vectorization.")                                                                     │
 ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ Code has been written to the build_paper_knowledge_base.py file.

You can execute it by running the following command:
   1 python build_paper_knowledge_base.py
Please ensure that all necessary libraries (torch, transformers, numpy, faiss-cpu or faiss-gpu) are installed in your environment.