# Kurisu-G²

## Documentation/Download Section

## Kt_Gen Docs

## Installation

There’s a Poetry setup to install the project dependencies.

```bash
poetry install
```

then

```bash
poetry env activate
```

There is a need to install special package for spacy, with the following command:
``` python -m spacy download en_core_web_sm ```
or 
``` python -m spacy download fr_core_web_sm```

The Kurisu-G² framework use ollama that can be installed with: 
```curl -fsSL https://ollama.com/install.sh | sh``` 
And then ```ollama pull llama3-8b```

There are `.env` in both kt_gen and faqtorisation folder with a path leading to the `all-MiniLM-L6-v2` folder that needs to be pulled with the command `git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2` 

Warning: this pull needs `git lfs` to be pulled. 

## Launch

I didn’t create Poetry run scripts. The main Streamlit pages exposing the features are grouped in the `st_app.py` app and can be accessed by running:

```bash
streamlit run src/kt_gen/st_app.py
```

## Project Structure

The project is organized as follows:
```
├── src
│   ├── kt_gen
│   │   ├── __init__.py
│   │   ├── st_app.py
│   │   ├── utils
│   │   │   ├── __init__.py
│   │   │   ├── llm
│   │   │   ├── streamlit
│   │   │   ├── model
│   │   ├── pages
│   │   ├── knowledge_graph
│   │   │   ├── utils
│   │   │   |    ├── pot_gpu
│   │   ├── examples
```

Scripts related to document graphs are in the `knowledge_graph` folder, utility scripts are in `utils`, and the Streamlit pages are in `pages`.  
All versions using FGW are in `kg_fgw.py`, and the baseline versions are in `kg_base_algo.py`.

Example document graphs are located in the `examples` folder.

The `.env` is used to set the path to the chosen embedding model. By default, it’s HuggingFace’s `all-MiniLM-L6-v2`, which will be downloaded from the web if nothing is specified in `.env`.

The Ollama localhost address is hardcoded to the default port 11434; I haven’t yet migrated this to the `.env`.

The `utils/pot_gpu` folder contains scripts to run approximations of the fused Gromov–Wasserstein distance on GPU, which speeds up processing time on very large knowledge graphs.

## Streamlit Pages

The Streamlit pages are organized as follows:
```
├── pages
│   ├── kg_text_brut.py
│   ├── st_knowledge_cards.py
│   ├── st_question_generation.py
│   ├── st_graph_llm_all.py
```

The `kg_text_brut.py` page lets you create a document graph from raw text.  
The `st_graph_llm_all.py` page is used to test the fused Gromov–Wasserstein distance coded for GPU.
The `st_question_generation.py` file is used to generate questions from a graph.
The `st_knowledge_cards.py` page is the code used to generate documentation thanks to the retrieval of context and the clustering of questions.
