(/data2-HDD-SATA-20T/nzq/env/hipporag) (fishspeech) nzq@algernon:/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG$ python main_azure.py --dataset musique --llm_base_url https://gpt-nzq-east-us.openai.azure.com/ --llm_name gpt-4o-mini --embedding_name nvidia/NV-Embed-v2
[lifang535] len(all_queries): 1000
[lifang535] len(gold_docs) = 1000
INFO:src.hipporag.prompts.prompt_template_manager:Loading templates from directory: /data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/src/hipporag/prompts/templates
INFO:src.hipporag.TAG:Loaded graph from outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/graph.pickle with 113297 nodes, 1575025 edges
INFO:datasets:PyTorch version 2.5.1 available.
INFO:datasets:Polars version 1.29.0 available.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.31s/it]
INFO:src.hipporag.embedding_store:Loaded 11656 records from outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/chunk_embeddings/vdb_chunk.parquet
INFO:src.hipporag.embedding_store:Loaded 101641 records from outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/entity_embeddings/vdb_entity.parquet
INFO:src.hipporag.embedding_store:Loaded 125903 records from outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/fact_embeddings/vdb_fact.parquet
INFO:src.hipporag.prompts.prompt_template_manager:Loading templates from directory: /data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/src/hipporag/prompts/templates
[lifang535] [TAG] [topic_index] len(docs): 11656
INFO:src.hipporag.TAG:Indexing Documents
INFO:src.hipporag.TAG:Performing OpenIE
[lifang535] [TAG] [topic_index] self.global_config.openie_mode != 'offline'
INFO:src.hipporag.embedding_store:Inserting 0 new records, 11656 records already exist.
[lifang535] [TAG] [topic_index] self.global_config.save_openie
INFO:src.hipporag.TAG:OpenIE results saved to outputs/musique/openie_results_ner_gpt-4o-mini.json
[lifang535] chunk_ids (type=<class 'list'>): 
['chunk-0f19b42d483dc05cd5a62ea7dffa0864', 'chunk-34b4763cffb110a8f304510900b4f691']
[lifang535] chunk_triples (type=<class 'list'>): 
[[['lionel messi', 'enrolled in', 'royal spanish football federation'], ['lionel messi', 'started at', 'barcelona'], ['lionel messi', 'trained at', 'la masia'], ['lionel messi', 'befriended', 'cesc f bregas'], ['lionel messi', 'befriended', 'gerard piqu'], ['lionel messi', 'part of', 'baby dream team'], ['lionel messi', 'was top scorer in', '2002 03'], ['lionel messi', 'scored goals for', 'cadetes a'], ['cadetes a', 'won', 'copa catalunya'], ['copa catalunya', 'defeated', 'espanyol'], ['copa catalunya final', 'known as', 'partido de la m scara'], ['lionel messi', 'received an offer from', 'arsenal'], ['cesc f bregas', 'left for', 'england'], ['gerard piqu', 'left for', 'england'], ['lionel messi', 'chose to remain in', 'barcelona']], [['fc barcelona', 'finished', '2006 07 season without trophies'], ['fc barcelona', 'went on a', 'pre season us tour'], ['eto o', 'was key player for', 'fc barcelona'], ['lionel messi', 'is a rising star of', 'fc barcelona'], ['eto o', 'criticized', 'frank rijkaard'], ['ronaldinho', 'admitted lack of fitness affected', 'his form'], ['fc barcelona', 'was in first place of', 'la liga'], ['real madrid', 'overtook', 'fc barcelona'], ['fc barcelona', 'advanced to', 'semi finals of copa del rey'], ['fc barcelona', 'won first leg against', 'getafe'], ['lionel messi', 'scored a goal compared to', 'diego maradona s goal of the century'], ['fc barcelona', 'lost second leg against', 'getafe'], ['fc barcelona', 'participated in', '2006 fifa club world cup'], ['fc barcelona', 'was beaten by', 'internacional'], ['fc barcelona', 'was knocked out by', 'liverpool'], ['liverpool', 'were eventual runners up in', 'champions league']]]
[lifang535] entity_nodes (type=<class 'list'>): 
['', '0']
[lifang535] chunk_triple_entities (type=<class 'list'>): 
[['gerard piqu', 'arsenal', 'lionel messi', 'cesc f bregas', 'cadetes a', '2002 03', 'copa catalunya', 'copa catalunya final', 'england', 'partido de la m scara', 'espanyol', 'la masia', 'baby dream team', 'barcelona', 'royal spanish football federation'], ['ronaldinho', 'semi finals of copa del rey', '2006 fifa club world cup', 'his form', 'lionel messi', 'fc barcelona', '2006 07 season without trophies', 'frank rijkaard', 'internacional', 'la liga', 'champions league', 'diego maradona s goal of the century', 'getafe', 'pre season us tour', 'eto o', 'real madrid', 'liverpool']]
[lifang535] facts (type=<class 'list'>): 
[('heinz barwich', 'worked during', 'world war ii'), ('few integrated schools', 'were located in', 'new orleans')]
INFO:src.hipporag.TAG:Encoding Entities
INFO:src.hipporag.embedding_store:Inserting 0 new records, 101641 records already exist.
INFO:src.hipporag.TAG:Encoding Facts
INFO:src.hipporag.embedding_store:Inserting 0 new records, 125903 records already exist.
^CTraceback (most recent call last):
  File "/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/main_azure.py", line 280, in <module>
    main()
  File "/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/main_azure.py", line 275, in main
    hipporag.topic_index(docs)
  File "/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/src/hipporag/TAG.py", line 358, in topic_index
    time.sleep(100000) # lifang535 add
KeyboardInterrupt

(/data2-HDD-SATA-20T/nzq/env/hipporag) (fishspeech) nzq@algernon:/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG$ ls
 aggregated_cache.txt   images              outputs                               settings.yaml    tests_azure.py
 aggregated_json        LICENSE            'outputs copy'                         setup.py        'tests_azure_v0(original).py'
 CONTRIBUTING.md        main_azure.py      'outputs copy 2'                       src              tests_local.py
 demo_azure.py          main_azure_TAG.py  'outputs_v2(截取一下数据集跑跑试试)'   static           tests_openai.py
 demo_local.py          main_dpr.py         README.md                             TAG_data
 demo_openai.py         main.py             reproduce                            'TAG_data copy'
 demo.py                output              requirements.txt                      TAG_experiment
(/data2-HDD-SATA-20T/nzq/env/hipporag) (fishspeech) nzq@algernon:/data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG$ python main_azure.py --dataset musique --llm_base_url https://gpt-nzq-east-us.openai.azure.com/ --llm_name gpt-4o-mini --embedding_name nvidia/NV-Embed-v2
[lifang535] len(all_queries): 1000
[lifang535] len(gold_docs) = 1000
INFO:src.hipporag.TAG:Creating working directory: outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2
INFO:src.hipporag.prompts.prompt_template_manager:Loading templates from directory: /data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/src/hipporag/prompts/templates
INFO:datasets:PyTorch version 2.5.1 available.
INFO:datasets:Polars version 1.29.0 available.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.40s/it]
INFO:src.hipporag.embedding_store:Creating working directory: outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/chunk_embeddings
INFO:src.hipporag.embedding_store:Creating working directory: outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/entity_embeddings
INFO:src.hipporag.embedding_store:Creating working directory: outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/fact_embeddings
INFO:src.hipporag.prompts.prompt_template_manager:Loading templates from directory: /data2-HDD-SATA-20T/nzq/jmf/new_rag_2/experiment/HippoRAG/src/hipporag/prompts/templates
[lifang535] [TAG] [topic_index] len(docs): 11656
INFO:src.hipporag.TAG:Indexing Documents
INFO:src.hipporag.TAG:Performing OpenIE
[lifang535] [TAG] [topic_index] self.global_config.openie_mode != 'offline'
INFO:src.hipporag.embedding_store:Inserting 11656 new records, 0 records already exist.
Batch Encoding:   0%|                                                                                                                                                     | 0/11656 [00:00<?, ?it/s]/data2-HDD-SATA-20T/nzq/huggingface_cache/modules/transformers_modules/nvidia/NV-Embed-v2/c50d55f43bde7e6a18e0eaa15a62fd63a930f1a1/modeling_nvembed.py:349: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
/data2-HDD-SATA-20T/nzq/env/hipporag/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
Batch Encoding: 11660it [13:03, 14.89it/s]                                                                                                                                                          
INFO:src.hipporag.embedding_store:Saving new records.
INFO:src.hipporag.embedding_store:Saved 11656 records to outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/chunk_embeddings/vdb_chunk.parquet
[lifang535] [TAG] [topic_index] len(chunk_keys_to_process) > 0
NER:   1%|█                                                                          | 168/11656 [00:23<23:17,  8.22it/s, total_prompt_tokens=44752, total_completion_tokens=10359, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d646a65f0 state=finished raised BadRequestError>]
NER:   4%|██▋                                                                       | 414/11656 [00:56<22:57,  8.16it/s, total_prompt_tokens=108087, total_completion_tokens=25668, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d646a4970 state=finished raised BadRequestError>]
NER:   9%|██████▉                                                                  | 1104/11656 [02:28<25:25,  6.92it/s, total_prompt_tokens=293231, total_completion_tokens=69808, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0ea0160 state=finished raised BadRequestError>]
NER:  19%|█████████████▉                                                          | 2250/11656 [05:00<22:27,  6.98it/s, total_prompt_tokens=600789, total_completion_tokens=142701, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0c94d60 state=finished raised BadRequestError>]
NER:  21%|███████████████▎                                                        | 2483/11656 [05:31<20:21,  7.51it/s, total_prompt_tokens=661043, total_completion_tokens=155901, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0d5e050 state=finished raised BadRequestError>]
NER:  23%|████████████████▌                                                       | 2681/11656 [05:57<19:37,  7.62it/s, total_prompt_tokens=713469, total_completion_tokens=167567, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f318802e860 state=finished raised BadRequestError>]
NER:  24%|█████████████████▎                                                      | 2795/11656 [06:12<19:43,  7.49it/s, total_prompt_tokens=741880, total_completion_tokens=173948, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0d89000 state=finished raised AssertionError>]
NER:  24%|█████████████████▎                                                      | 2796/11656 [06:12<18:58,  7.78it/s, total_prompt_tokens=742090, total_completion_tokens=174001, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d64773c40 state=finished raised BadRequestError>]
NER:  28%|████████████████████▎                                                   | 3282/11656 [07:15<17:12,  8.11it/s, total_prompt_tokens=872374, total_completion_tokens=204189, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0be80d0 state=finished raised BadRequestError>]
NER:  30%|█████████████████████▌                                                  | 3488/11656 [07:42<17:53,  7.61it/s, total_prompt_tokens=928857, total_completion_tokens=218257, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcff0280 state=finished raised BadRequestError>]
NER:  31%|█████████████████████▉                                                  | 3559/11656 [07:51<20:33,  6.57it/s, total_prompt_tokens=947443, total_completion_tokens=222464, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f3358c35960 state=finished raised BadRequestError>]
NER:  33%|███████████████████████▋                                               | 3895/11656 [08:35<16:41,  7.75it/s, total_prompt_tokens=1038514, total_completion_tokens=244246, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f3358c381c0 state=finished raised BadRequestError>]
NER:  34%|████████████████████████▍                                              | 4011/11656 [08:50<18:11,  7.01it/s, total_prompt_tokens=1067357, total_completion_tokens=250957, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f3358c395d0 state=finished raised AssertionError>]
NER:  35%|████████████████████████▋                                              | 4061/11656 [08:56<16:47,  7.54it/s, total_prompt_tokens=1079378, total_completion_tokens=253860, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0b76320 state=finished raised BadRequestError>]
NER:  39%|███████████████████████████▍                                           | 4501/11656 [09:55<16:45,  7.11it/s, total_prompt_tokens=1193696, total_completion_tokens=281611, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f31880cd150 state=finished raised BadRequestError>]
NER:  39%|███████████████████████████▊                                           | 4558/11656 [10:02<15:02,  7.86it/s, total_prompt_tokens=1209097, total_completion_tokens=285340, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0fb6830 state=finished raised BadRequestError>]
NER:  40%|████████████████████████████▍                                          | 4671/11656 [10:17<15:01,  7.75it/s, total_prompt_tokens=1236879, total_completion_tokens=291962, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f1012560 state=finished raised AssertionError>]
NER:  44%|███████████████████████████████▍                                       | 5151/11656 [11:20<14:42,  7.37it/s, total_prompt_tokens=1365065, total_completion_tokens=323554, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f36f0e74a90 state=finished raised BadRequestError>]
NER:  48%|█████████████████████████████████▊                                     | 5559/11656 [12:14<13:30,  7.52it/s, total_prompt_tokens=1471153, total_completion_tokens=348666, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcbc8a30 state=finished raised BadRequestError>]
NER:  50%|███████████████████████████████████▎                                   | 5806/11656 [12:47<13:48,  7.06it/s, total_prompt_tokens=1535743, total_completion_tokens=364205, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcb1f460 state=finished raised BadRequestError>]
NER:  50%|███████████████████████████████████▍                                   | 5823/11656 [12:49<12:03,  8.06it/s, total_prompt_tokens=1540266, total_completion_tokens=365181, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:unhashable type: 'dict'
NER:  51%|████████████████████████████████████▏                                  | 5941/11656 [13:05<13:25,  7.10it/s, total_prompt_tokens=1570869, total_completion_tokens=372604, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcb40f70 state=finished raised BadRequestError>]
NER:  52%|█████████████████████████████████████▎                                  | 6039/11656 [13:18<12:28,  7.50it/s, total_prompt_tokens=1.6e+6, total_completion_tokens=378308, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcc0eda0 state=finished raised BadRequestError>]
NER:  52%|█████████████████████████████████████▎                                  | 6044/11656 [13:18<10:47,  8.67it/s, total_prompt_tokens=1.6e+6, total_completion_tokens=378615, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfca36a70 state=finished raised BadRequestError>]
NER:  53%|█████████████████████████████████████▎                                 | 6126/11656 [13:29<12:10,  7.57it/s, total_prompt_tokens=1619795, total_completion_tokens=383745, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcd3f820 state=finished raised BadRequestError>]
NER:  54%|█████████████████████████████████████▉                                 | 6236/11656 [13:43<11:53,  7.60it/s, total_prompt_tokens=1649438, total_completion_tokens=391331, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcbecaf0 state=finished raised BadRequestError>]
NER:  54%|██████████████████████████████████████▏                                | 6266/11656 [13:47<11:21,  7.91it/s, total_prompt_tokens=1656988, total_completion_tokens=393317, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d6460a800 state=finished raised BadRequestError>]
NER:  54%|██████████████████████████████████████▍                                | 6311/11656 [13:53<12:17,  7.25it/s, total_prompt_tokens=1668291, total_completion_tokens=396041, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:unhashable type: 'dict'
NER:  56%|███████████████████████████████████████▋                               | 6524/11656 [14:21<11:17,  7.58it/s, total_prompt_tokens=1724171, total_completion_tokens=411110, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d64634160 state=finished raised AssertionError>]
NER:  57%|████████████████████████████████████████▎                              | 6609/11656 [14:32<11:16,  7.46it/s, total_prompt_tokens=1745758, total_completion_tokens=416360, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:invalid decimal literal (<string>, line 2)
NER:  57%|████████████████████████████████████████▌                              | 6666/11656 [14:39<10:16,  8.10it/s, total_prompt_tokens=1761370, total_completion_tokens=419968, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcabf730 state=finished raised BadRequestError>]
NER:  59%|██████████████████████████████████████████▏                            | 6925/11656 [15:13<09:57,  7.92it/s, total_prompt_tokens=1830673, total_completion_tokens=436252, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcef28f0 state=finished raised BadRequestError>]
NER:  61%|██████████████████████████████████████████▉                            | 7057/11656 [15:30<10:03,  7.62it/s, total_prompt_tokens=1865382, total_completion_tokens=444373, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f31880f10c0 state=finished raised BadRequestError>]
NER:  61%|███████████████████████████████████████████                            | 7073/11656 [15:32<10:15,  7.45it/s, total_prompt_tokens=1869504, total_completion_tokens=445152, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df032b430 state=finished raised BadRequestError>]
NER:  65%|█████████████████████████████████████████████▉                         | 7535/11656 [16:33<09:44,  7.05it/s, total_prompt_tokens=1985705, total_completion_tokens=472900, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df0281d80 state=finished raised BadRequestError>]
NER:  66%|███████████████████████████████████████████████                        | 7721/11656 [16:58<08:57,  7.32it/s, total_prompt_tokens=2033934, total_completion_tokens=484058, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df00c7d00 state=finished raised AssertionError>]
NER:  67%|███████████████████████████████████████████████▏                       | 7754/11656 [17:02<08:06,  8.01it/s, total_prompt_tokens=2043025, total_completion_tokens=486686, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f3358c37250 state=finished raised BadRequestError>]
NER:  68%|████████████████████████████████████████████████▎                      | 7923/11656 [17:25<09:29,  6.55it/s, total_prompt_tokens=2087143, total_completion_tokens=496908, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcd80820 state=finished raised AssertionError>]
NER:  70%|█████████████████████████████████████████████████▍                     | 8114/11656 [17:50<07:32,  7.82it/s, total_prompt_tokens=2136447, total_completion_tokens=508624, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f3358c35a50 state=finished raised BadRequestError>]
NER:  76%|██████████████████████████████████████████████████████▏                | 8887/11656 [19:33<05:50,  7.90it/s, total_prompt_tokens=2343964, total_completion_tokens=556864, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d44613f10 state=finished raised BadRequestError>]
NER:  78%|████████████████████████████████████████████████████████                | 9083/11656 [19:59<05:38,  7.61it/s, total_prompt_tokens=2.4e+6, total_completion_tokens=568990, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df0151cc0 state=finished raised BadRequestError>]
NER:  82%|██████████████████████████████████████████████████████████             | 9539/11656 [21:00<04:35,  7.68it/s, total_prompt_tokens=2516684, total_completion_tokens=600761, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:'[' was never closed (<string>, line 2)
NER:  84%|███████████████████████████████████████████████████████████▊           | 9810/11656 [21:36<03:59,  7.71it/s, total_prompt_tokens=2584666, total_completion_tokens=618447, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df00ebd90 state=finished raised BadRequestError>]
NER:  85%|████████████████████████████████████████████████████████████▎          | 9911/11656 [21:49<03:59,  7.28it/s, total_prompt_tokens=2611662, total_completion_tokens=624352, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df012fac0 state=finished raised BadRequestError>]
NER:  87%|████████████████████████████████████████████████████████████▌         | 10094/11656 [22:14<03:50,  6.77it/s, total_prompt_tokens=2659853, total_completion_tokens=635690, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df03928f0 state=finished raised BadRequestError>]
NER:  87%|█████████████████████████████████████████████████████████████         | 10159/11656 [22:23<03:09,  7.90it/s, total_prompt_tokens=2677931, total_completion_tokens=640077, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfca126b0 state=finished raised BadRequestError>]
NER:  93%|█████████████████████████████████████████████████████████████████▍    | 10897/11656 [24:00<01:41,  7.51it/s, total_prompt_tokens=2877321, total_completion_tokens=689982, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2df0170c10 state=finished raised BadRequestError>]
NER:  97%|████████████████████████████████████████████████████████████████████  | 11335/11656 [24:59<00:42,  7.57it/s, total_prompt_tokens=2992709, total_completion_tokens=717886, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2d84504a00 state=finished raised BadRequestError>]
NER:  98%|███████████████████████████████████████████████████████████████████████▎ | 11380/11656 [25:05<00:35,  7.86it/s, total_prompt_tokens=3e+6, total_completion_tokens=720749, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:RetryError[<Future at 0x7f2dfcb1f3a0 state=finished raised BadRequestError>]
NER: 100%|██████████████████████████████████████████████████████████████████████| 11656/11656 [25:42<00:00,  7.56it/s, total_prompt_tokens=3075144, total_completion_tokens=737121, num_cache_hit=0]
Extracting triples:   4%|██▏                                                        | 432/11656 [01:01<24:26,  7.65it/s, total_prompt_tokens=263086, total_completion_tokens=82494, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-5ff7f8ab2fc50688463ab16a8d816f1f: RetryError[<Future at 0x7f2d2229e830 state=finished raised BadRequestError>]
Extracting triples:   5%|██▊                                                       | 565/11656 [01:19<27:11,  6.80it/s, total_prompt_tokens=343250, total_completion_tokens=108104, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-c63ef69d492fa4cc80a8e332dbade29b: RetryError[<Future at 0x7f35ec73df90 state=finished raised AssertionError>]
Extracting triples:   9%|█████▎                                                   | 1076/11656 [02:31<25:57,  6.79it/s, total_prompt_tokens=661640, total_completion_tokens=213947, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-33f0b7401f090ad6f6f472585335e57c: RetryError[<Future at 0x7f2d2217bd90 state=finished raised AssertionError>]
Extracting triples:  20%|███████████▏                                            | 2328/11656 [05:25<22:07,  7.03it/s, total_prompt_tokens=1433163, total_completion_tokens=467490, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-b3a98b36b964c9264597d4c96e340194: RetryError[<Future at 0x7f2d442eaa10 state=finished raised AssertionError>]
Extracting triples:  24%|█████████████▍                                          | 2803/11656 [06:30<19:03,  7.74it/s, total_prompt_tokens=1719305, total_completion_tokens=556058, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-66eb3c20deee2a06b7a93239229b98ea: RetryError[<Future at 0x7f2d2217abf0 state=finished raised BadRequestError>]
Extracting triples:  24%|█████████████▌                                          | 2821/11656 [06:32<21:27,  6.86it/s, total_prompt_tokens=1729680, total_completion_tokens=559443, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-7916fb85b930d9979672af4a8f6f1331: RetryError[<Future at 0x7f2d22f011e0 state=finished raised AssertionError>]
Extracting triples:  35%|███████████████████▍                                    | 4047/11656 [09:21<19:47,  6.41it/s, total_prompt_tokens=2487431, total_completion_tokens=808336, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-2744a560f51e5872276d3c453f4a4800: RetryError[<Future at 0x7f2d222f2830 state=finished raised AssertionError>]
Extracting triples:  39%|█████████████████████▊                                  | 4547/11656 [10:30<15:49,  7.48it/s, total_prompt_tokens=2791499, total_completion_tokens=908344, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-354639604404b6c0d003371580bc627a: RetryError[<Future at 0x7f2d22dbbfd0 state=finished raised AssertionError>]
Extracting triples:  39%|██████████████████████▎                                  | 4556/11656 [10:31<15:08,  7.82it/s, total_prompt_tokens=2.8e+6, total_completion_tokens=910088, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-9a86b30ac0c4d5e96ab239ddbc558f27: RetryError[<Future at 0x7f2d22d2d660 state=finished raised BadRequestError>]
Extracting triples:  50%|███████████████████████████▍                           | 5819/11656 [13:27<13:38,  7.14it/s, total_prompt_tokens=3569577, total_completion_tokens=1160542, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-9cf59b2ff8707fa7f6c11c28fe33c6fd: RetryError[<Future at 0x7f2d22b399c0 state=finished raised BadRequestError>]
Extracting triples:  51%|████████████████████████████                           | 5949/11656 [13:45<11:49,  8.04it/s, total_prompt_tokens=3648071, total_completion_tokens=1186863, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-b5d9e2c4f8152938bcd2722a15d2c9dd: RetryError[<Future at 0x7f2d212e5240 state=finished raised BadRequestError>]
Extracting triples:  52%|█████████████████████████████▌                           | 6039/11656 [13:57<13:34,  6.90it/s, total_prompt_tokens=3.7e+6, total_completion_tokens=1.2e+6, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-fad8071ca97e6765659cc87972e19b02: RetryError[<Future at 0x7f2d22db8190 state=finished raised BadRequestError>]
Extracting triples:  54%|█████████████████████████████▋                         | 6284/11656 [14:31<11:37,  7.70it/s, total_prompt_tokens=3853297, total_completion_tokens=1255258, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-d447741092bb25ca7ebbfddd5b0fa83d: RetryError[<Future at 0x7f2d22178250 state=finished raised BadRequestError>]
Extracting triples:  57%|███████████████████████████████▏                       | 6602/11656 [15:15<10:36,  7.94it/s, total_prompt_tokens=4047349, total_completion_tokens=1320589, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-7f0936ca9c71c0e8872ff7bd908f1804: unterminated string literal (detected at line 10) (<string>, line 10)
Extracting triples:  60%|████████████████████████████████▊                      | 6961/11656 [16:04<09:48,  7.98it/s, total_prompt_tokens=4268401, total_completion_tokens=1392131, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-a00ea624e6ceb4385b6cdb9cb1c85b85: RetryError[<Future at 0x7f2d22f2b6d0 state=finished raised BadRequestError>]
Extracting triples:  61%|█████████████████████████████████▍                     | 7099/11656 [16:23<10:57,  6.93it/s, total_prompt_tokens=4352845, total_completion_tokens=1419640, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-2e3c338ddd80f2c67849b06151392c63: RetryError[<Future at 0x7f2d44158b20 state=finished raised AssertionError>]
Extracting triples:  62%|██████████████████████████████████▍                     | 7180/11656 [16:34<09:17,  8.04it/s, total_prompt_tokens=4.4e+6, total_completion_tokens=1435452, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-e9c0a1aba12f1f1ad135fdb071924683: RetryError[<Future at 0x7f36f0c95450 state=finished raised BadRequestError>]
Extracting triples:  65%|████████████████████████████████████▏                   | 7544/11656 [17:23<09:45,  7.02it/s, total_prompt_tokens=4618802, total_completion_tokens=1.5e+6, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-353f6ed8de75528705abcf9c18cd9d4d: RetryError[<Future at 0x7f36f0d88a00 state=finished raised AssertionError>]
Extracting triples:  69%|██████████████████████████████████████                 | 8073/11656 [18:36<08:16,  7.22it/s, total_prompt_tokens=4940454, total_completion_tokens=1608710, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-aff75a3440825261ed1d6a7b5b48c420: invalid syntax. Perhaps you forgot a comma? (<string>, line 8)
Extracting triples:  93%|██████████████████████████████████████████████████▍   | 10876/11656 [25:04<01:56,  6.71it/s, total_prompt_tokens=6669010, total_completion_tokens=2185161, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-6e0e0c9db67783fa73ede8e4a6e98e56: RetryError[<Future at 0x7f2d221b44c0 state=finished raised AssertionError>]
Extracting triples:  98%|████████████████████████████████████████████████████▋ | 11376/11656 [26:12<00:39,  7.15it/s, total_prompt_tokens=6976765, total_completion_tokens=2287937, num_cache_hit=0]WARNING:src.hipporag.information_extraction.openie_openai:Exception for chunk chunk-74b05225ddebc31937af96efd5b10a45: RetryError[<Future at 0x7f2d224071c0 state=finished raised BadRequestError>]
Extracting triples: 100%|██████████████████████████████████████████████████████| 11656/11656 [26:56<00:00,  7.21it/s, total_prompt_tokens=7145921, total_completion_tokens=2342583, num_cache_hit=0]
[lifang535] [TAG] [topic_index] self.global_config.save_openie
INFO:src.hipporag.TAG:OpenIE results saved to outputs/musique/openie_results_ner_gpt-4o-mini.json
[lifang535] chunk_ids (type=<class 'list'>): 
['chunk-0f19b42d483dc05cd5a62ea7dffa0864', 'chunk-34b4763cffb110a8f304510900b4f691']
[lifang535] chunk_triples (type=<class 'list'>): 
[[['lionel messi', 'enrolled in', 'royal spanish football federation'], ['lionel messi', 'joined', 'barcelona s youth academy'], ['barcelona s youth academy', 'is known as', 'la masia'], ['lionel messi', 'befriended', 'cesc f bregas'], ['lionel messi', 'befriended', 'gerard piqu'], ['lionel messi', 'became part of', 'baby dream team'], ['lionel messi', 'was top scorer in', '2002    03'], ['lionel messi', 'scored', '36 goals'], ['lionel messi', 'played for', 'cadetes a'], ['cadetes a', 'won', 'treble of the league'], ['cadetes a', 'won', 'spanish cup'], ['cadetes a', 'won', 'catalan cup'], ['copa catalunya', 'was a victory over', 'espanyol'], ['copa catalunya', 'is known as', 'partido de la m scara'], ['lionel messi', 'suffered', 'broken cheekbone'], ['lionel messi', 'received an offer from', 'arsenal'], ['cesc f bregas', 'left for', 'england'], ['gerard piqu', 'left for', 'england'], ['lionel messi', 'chose to remain in', 'barcelona']], [['fc barcelona', 'finished', '2006 07 season without trophies'], ['fc barcelona', 'had injuries to', 'eto o'], ['fc barcelona', 'had injuries to', 'lionel messi'], ['eto o', 'criticized', 'frank rijkaard'], ['eto o', 'criticized', 'ronaldinho'], ['ronaldinho', 'admitted', 'lack of fitness affected his form'], ['fc barcelona', 'were in', 'first place in la liga'], ['real madrid', 'overtook', 'fc barcelona'], ['fc barcelona', 'advanced to', 'semi finals of the copa del rey'], ['fc barcelona', 'won first leg against', 'getafe'], ['lionel messi', 'scored a goal bringing comparison to', 'diego maradona s goal of the century'], ['fc barcelona', 'lost second leg against', 'getafe'], ['fc barcelona', 'took part in', '2006 fifa club world cup'], ['fc barcelona', 'were beaten by', 'internacional'], ['fc barcelona', 'were knocked out of', 'champions league'], ['liverpool', 'were eventual runners up against', 'fc barcelona']]]
[lifang535] entity_nodes (type=<class 'list'>): 
['', '0']
[lifang535] chunk_triple_entities (type=<class 'list'>): 
[['england', 'espanyol', 'la masia', 'gerard piqu', 'arsenal', 'lionel messi', 'barcelona s youth academy', 'partido de la m scara', '36 goals', 'cesc f bregas', 'copa catalunya', 'spanish cup', 'catalan cup', 'cadetes a', 'barcelona', 'treble of the league', 'broken cheekbone', '2002    03', 'baby dream team', 'royal spanish football federation'], ['getafe', 'frank rijkaard', 'champions league', 'semi finals of the copa del rey', 'internacional', 'ronaldinho', 'fc barcelona', 'liverpool', '2006 07 season without trophies', 'lionel messi', 'eto o', '2006 fifa club world cup', 'diego maradona s goal of the century', 'first place in la liga', 'real madrid', 'lack of fitness affected his form']]
[lifang535] facts (type=<class 'list'>): 
[('soviet union', 'data found', 'incorrect information about navy size'), ('back home', 'released on', 'reprise records')]
INFO:src.hipporag.TAG:Encoding Entities
INFO:src.hipporag.embedding_store:Inserting 102122 new records, 0 records already exist.
Batch Encoding:   0%|                                                                                                                                                    | 0/102122 [00:00<?, ?it/s]/data2-HDD-SATA-20T/nzq/huggingface_cache/modules/transformers_modules/nvidia/NV-Embed-v2/c50d55f43bde7e6a18e0eaa15a62fd63a930f1a1/modeling_nvembed.py:349: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
/data2-HDD-SATA-20T/nzq/env/hipporag/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
Batch Encoding:  22%|█████████████████████████████▌                                                                                                          | 22230/102122 [04:12<17:31, 76.01it/s]Batch Encoding: 102125it [19:03, 89.27it/s]                                                                                                                                                         
INFO:src.hipporag.embedding_store:Saving new records.
INFO:src.hipporag.embedding_store:Saved 102122 records to outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/entity_embeddings/vdb_entity.parquet
INFO:src.hipporag.TAG:Encoding Facts
INFO:src.hipporag.embedding_store:Inserting 127957 new records, 0 records already exist.
Batch Encoding: 127960it [32:57, 64.70it/s]                                                                                                                                                         
INFO:src.hipporag.embedding_store:Saving new records.
INFO:src.hipporag.embedding_store:Saved 127957 records to outputs/musique/gpt-4o-mini_nvidia_NV-Embed-v2/fact_embeddings/vdb_fact.parquet

