# Routoo

[![made-with-python](https://img.shields.io/badge/Made%20with-Python-red.svg)](#python)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) 

Overview
-------------
### TL;DR

Routoo is a pioneering architecture that orchestrates expert LLMs to deliver state-of-the-art AI solutions. It's designed to dynamically select and leverage the best expert models for any given task, continuously evolving to incorporate new expertise and achieve superior performance with cost efficiency.

Content
-------------

- **[News Release](#news)**
- **[Installation](#installation)**
- **[Citation](#citation)**  


<a name="news"/>  

News Release
-------------

### Routoo: Next-level LLMs by Routoo

Routoo features a lightweight LLM that knows when and how to leverage underlying LLM experts to perform a task with the best outcome. We create multiple new state-of-the-art results:

- **State-of-the-art open-source**: When leveraging open-source models, the Routoo achieves **76% accuracy on the MMLU benchmark— with the same budget as Mixtral (70.6%)**.

- **Achieving and beyond GPT4**: Combining open-source models with GPT4, the Routoo nearly matches GPT4's performance at half the cost. Moreover, it even **surpasses GPT4’s performance with 25% less cost**.

- **Accessible**: Routoo can be served on accessible consumer hardware and can be deployed on any cloud provider or on-prem.

<a name="installation"/>  

Installation
-------------

Setup the repo and install required pakages.
```
pip intall -e .
```

### Running on Ec2 using Vllm
For users who favor EC2 on-premise private deployment, we have open-sourced an AWS AMI that comes preinstalled with PyTorch2 and VLLM. We provide demo configurations that facilitate deploying one model per instance. If you wish to deploy multiple models on the same instance, simply alter the tmux session name in the corresponding configuration. It is recommended to have the following environment variables prepared for the seamless launch of an instance declared in *.env* file :     
```py
AWS_ACCESS_KEY_ID       = "*****"
AWS_SECRET_ACCESS_KEY   = "*****"
SECURITY_GROUP_ID       = "*****"    
```
The SecurityGroupId is utilized in the instance creation process. It is imperative to ensure that the inference port is open for this specific security group.  
Refer this file *app/configs/demo_orch_ec2_mix.json* for preparing required configs for ec2 deploynent.

### Running on Sagemaker
Amazon SageMaker is a service designed to facilitate one-click private on-premise deployment of models, offering managed infrastructure, tools, and workflows. Users are required to set up their own execution role, thereby gaining access to SageMaker deployments. [link](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). Following vaiables should be declared in *.env* file :   
```py
AWS_SAGEMAKER_ROLE_NAME = "*****"   
AWS_ACCESS_KEY_ID       = "*****"    
AWS_SECRET_ACCESS_KEY   = "*****"   
HUGGING_FACE_HUB_TOKEN  = "*****"   
```
Refer this file *app/configs/demo_orch_sagemaker_mix.json* for preparing required configs for sagemaker deployments.


### Closed Source Expert
We currently offer support for OpenAI text generation models, and API's that are callable using openai sdk. In the near future, we plan to extend our support to include all major closed-source language model APIs (Claude2, Gemini, etc). To establish a connection with OpenAI, the following variables must be specified in the .env file:
```py
OPENAI_ORGANIZATION     = "*****"
OPENAI_API_KEY          = "*****"
```


#### Run Routoo
```python
import json
from app.routoo import Routoo

config = json.load(open("app/configs/demo_orch_sagemaker_mix.json", "r"))

# init
routoo = Routoo(config)

# boot the machines
routoo.load_routoo_server()
routoo.load_experts_server()

# start the inference endpoints
routoo.start_inference_endpoints(max_wait_time=120)

# Wait until all endpoints are up
status = False
while not status:
    status = routoo.check_servers_state()
    if status: print("Servers are running..."); break
    time.sleep(30)

# Get text Generations, running the complete pipeline
response = routoo.get_response("If you could have a conversation with a fictional character, who would it be and why?")
print(response)

# turn off the machines
routoo.routoo.stop_server()
for expert_id, expert in routoo.experts.items():
    res = expert.stop_server()
    print(res)
```

<a name="citation"/>  

Citation
-------------  

If you use this code for your research, please cite the following work:  

