- Task overview:
You should formalize a player-blackbox interaction process into runnable python code. Generally speaking, you need to focus on 3 things: 
  - Generate the function of a blackbox, which implements a interactive puzzle. You need to check the validity of the player's queries and give feedback according to the truth.
  - Generate a boolean format checking function, which checks the answers format when the player tries to give answers.
  - Generate the main function which calls some essential functions.


- Detailed Coding Instructions:
  - Before starting, we list all the related packages. You need to import them at the beginning of programme:
    - `import os`
    - `import sys`
    - `current_path = os.path.abspath(__file__)`
    - `oracle_path = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(current_path))))`
    - `if oracle_path not in sys.path:`
    - `    sys.path.insert(0, oracle_path)`
    - `from eva_models import ReasoningLLM`

  - Generate the code of a blackbox:
    - Write a function named `blackbox`. It implements {algorithm}. The description of this algorithm is that {description}. 
    - Note That `blackbox` just answers every query and does not include other functions.
    - The function takes 2 variables as input, which is `truth` and `query`, and return some feedback according to above description. Note that `truth` is a string that represents the correct answer (i.e. the player will answer it finally) and `query` is a string that represents the player's query.
    - If `query` doesn't follow correct format, please give some correcting advice as a `str` using `return`.

  - Generate the boolean format checking function
    - Write a function named `check_answer_format`. It is used to check whether answer format matches `truth` according to above description.
    - The function takes only 1 variable as input, which is `answer`, and return True / False. 
    - Note that answer will be given directly as the description above. Do not assume `answer` including any other text.

  - Generate the main code:
    - The main function takes some input variables `main(model_family, model_name, task, eva_mode, n_runs, difficulty, task_id, failure_num, output_dir, max_turns, version, mode, thinking_mode)`.
    - First, the main code need to instantiate `ReasoningLLM` class through `player = ReasoningLLM(model_family, model_name, task, eva_mode, n_runs, difficulty, task_id, thinking_mode, mode)`.
    - Then, call `player.evaluate(failure_num, version, max_turns)` and `player.save_history(output_dir, version)` to evaluate and save logs.
    - Finish the main function.
    - Only contain above code in the main function, do not include any other code.

  - Add `if __name__ == "__main__":`, `args = sys.argv[1:]`, `main(args[0], args[1], args[2], args[3], int(args[4]), args[5], args[6], int(args[7]), args[8], int(args[9]), int(args[10]), args[11], bool(eval(args[12])))` at the end of this programme to make it runnable.


- Tips: DO NOT include other text output. Make sure the output is runnable. Code and think step by step.