=== Instruction 1 ===
Is Correct: False
Syntax Pass: False
Code Function: The code defines a function to run a simulation task involving a robotic gripper. It begins by setting up the environment and resetting the task to its initial state. It optionally initializes video recording for the simulation. The function retrieves the positions of various objects in the environment, including a drawer and a bin.

The task consists of a series of steps: first, the gripper rotates to a specific angle, then it moves to the side position of a bottom drawer, followed by moving to the anchor position of the drawer. Next, the gripper picks up the drawer, pulls it open, picks up rubbish from a table, and finally places the rubbish into a bin. After each step, the function checks if the task is completed and prints a message accordingly. The environment is ensured to be properly shut down at the end of the task.
Similarity: {'ok': False, 'reason': 'LLM JSON parsing failed.'}
Differences: 1. **Nature of Content**: TASK is a formal representation of a planning problem in PDDL (Planning Domain Definition Language), while CODE_FUNC describes a procedural function for simulating a robotic task in a programming context.
2. **Structure**: TASK is structured in terms of domain definitions, types, predicates, and actions, whereas CODE_FUNC is structured as a sequence of procedural steps in a function.
3. **Purpose**: TASK is designed for automated planning and reasoning about actions in a defined domain, while CODE_FUNC is intended for executing a specific sequence of actions in a simulation environment.
4. **Level of Abstraction**: TASK operates at a higher level of abstraction, focusing on the logical relationships and requirements of actions, while CODE_FUNC operates at a lower level, detailing the specific steps and operations to be performed.
5. **Execution**: TASK does not execute actions but defines how they can be executed under certain conditions, whereas CODE_FUNC explicitly executes actions in a defined order.
6. **Environment Interaction**: TASK defines the initial state and conditions for actions in a planning context, while CODE_FUNC interacts with the environment in real-time, performing actions and checking for task completion.
7. **Output**: TASK does not produce output in the form of messages or results; it is concerned with the feasibility of plans. CODE_FUNC includes print statements to indicate the progress and completion of the task.
8. **Initialization**: TASK initializes the state of the world through the `:init` section, while CODE_FUNC may include additional setup steps, such as video recording initialization, which are not present in TASK.
9. **Action Representation**: In TASK, actions are defined with preconditions and effects, while in CODE_FUNC, actions are represented as sequential function calls without explicit precondition and effect definitions.

Overall, the two representations serve different purposes and operate at different levels of abstraction within the context of robotic task execution and planning.

Refined Code Generated:
-------------------
import numpy as np
from env import setup_environment, shutdown_environment
from skill_code import *
from video import init_video_writers, recording_step, recording_get_observation
from object_positions import get_object_positions

def run_skeleton_task():
    '''Generic skeleton for running any task in your simulation.'''
    print("===== Starting Skeleton Task =====")
    
    # === Environment Setup ===
    env, task = setup_environment()
    try:
        # Reset the task to its initial state
        _, obs = task.reset()

        # (Optional) Initialize video writers for capturing your simulation
        init_video_writers(obs)

        # Wrap the task steps for recording (if needed)
        original_step = task.step
        task.step = recording_step(original_step)
        original_get_obs = task.get_observation
        task.get_observation = recording_get_observation(original_get_obs)

        # === Retrieve Object Positions ===
        positions = get_object_positions()

        # Define object positions
        bottom_anchor_pos = positions['bottom_anchor_pos']
        bottom_side_pos = positions['bottom_side_pos']
        rubbish_pos = positions['rubbish']
        bin_pos = positions['bin']

        # === Execute the Plan ===
        # Step 1: Rotate the gripper to ninety degrees
        obs, reward, done = rotate(env, task, target_quat=np.array([0, 0, np.sin(np.pi/4), np.cos(np.pi/4)]))
        if done:
            print("[Task] Task ended after rotating!")
            return

        # Step 2: Move to the side position of the bottom drawer
        obs, reward, done = move(env, task, bottom_side_pos)
        if done:
            print("[Task] Task ended after moving to side position!")
            return

        # Step 3: Move to the anchor position of the bottom drawer
        obs, reward, done = move(env, task, bottom_anchor_pos)
        if done:
            print("[Task] Task ended after moving to anchor position!")
            return

        # Step 4: Pick the bottom drawer
        obs, reward, done = pick_drawer(env, task, 'gripper', 'bottom', bottom_anchor_pos)
        if done:
            print("[Task] Task ended after picking the drawer!")
            return

        # Step 5: Pull the bottom drawer open
        obs, reward, done = pull(env, task, pull_distance=0.1)
        if done:
            print("[Task] Task ended after pulling the drawer!")
            return

        # Step 6: Pick the rubbish from the table
        obs, reward, done = pick(env, task, rubbish_pos)
        if done:
            print("[Task] Task ended after picking the rubbish!")
            return

        # Step 7: Place the rubbish in the bin
        obs, reward, done = place(env, task, bin_pos)
        if done:
            print("[Task] Task completed successfully! Reward:", reward)
        else:
            print("[Task] Task not completed yet (done=False).")

    finally:
        # Always ensure the environment is properly shutdown
        shutdown_environment(env)

    print("===== End of Skeleton Task =====")

if __name__ == "__main__":
    run_skeleton_task()
-------------------

=== Instruction 2 ===
Is Correct: False
Syntax Pass: False
Code Function: The code defines a function that simulates a robotic task in an environment. It begins by setting up the environment and resetting the task to its initial state. It optionally initializes video recording for the simulation. The function retrieves the positions of various objects, including rubbish and a bin.

The task consists of a series of steps: first, the robot rotates its gripper, then moves to the side position of a bottom drawer, followed by moving to the anchor position of the drawer. Next, it picks up the bottom drawer and pulls it open. After that, the robot picks up the rubbish and finally places it into the bin. Each step checks if the task is completed and prints relevant messages. The environment is ensured to shut down properly at the end of the task.
Similarity: {'ok': False, 'reason': 'LLM JSON parsing failed.'}
Differences: 1. **Nature of Content**: 
   - TASK is a formal representation of a robotic task using PDDL (Planning Domain Definition Language), which includes definitions of actions, predicates, and the environment.
   - CODE_FUNC is a description of a procedural function that simulates the robotic task in a programming context, detailing the sequence of operations and checks.

2. **Format**:
   - TASK is structured in a domain-specific language (PDDL) with a focus on planning and action definitions.
   - CODE_FUNC is written in a narrative format, describing the function's behavior and steps in a more human-readable way.

3. **Level of Abstraction**:
   - TASK operates at a higher level of abstraction, focusing on the logical structure of actions and their preconditions/effects.
   - CODE_FUNC provides a lower-level, step-by-step procedural description of how the task is executed in a simulated environment.

4. **Execution Context**:
   - TASK is designed for use in a planning system that can interpret PDDL and generate plans based on the defined actions and predicates.
   - CODE_FUNC is intended for execution in a programming environment, likely as part of a simulation or robotic control system.

5. **Detailing of Actions**:
   - TASK specifies actions in terms of their preconditions and effects, focusing on the logical relationships between actions.
   - CODE_FUNC describes the sequence of actions in a linear fashion, detailing the order of operations without explicitly stating preconditions and effects.

6. **Initialization and Environment Setup**:
   - TASK includes initialization of objects and their states within the PDDL framework.
   - CODE_FUNC describes the setup of the environment and the resetting of the task in a procedural manner, including optional video recording.

7. **Feedback and Output**:
   - TASK does not include any feedback mechanism or output messages; it is purely a definition of actions and states.
   - CODE_FUNC includes print statements to provide feedback on the progress of the task, indicating when steps are completed.

8. **Error Handling**:
   - TASK does not address error handling or task completion checks.
   - CODE_FUNC includes checks to determine if the task is completed and manages the shutdown of the environment.

9. **Focus on Object Interaction**:
   - TASK focuses on the logical conditions for interacting with objects (e.g., whether a drawer is locked or open).
   - CODE_FUNC describes the physical interaction with objects in a sequence, detailing how the robot manipulates the drawer and rubbish.

10. **Terminology and Language**:
    - TASK uses formal terminology specific to PDDL, such as "predicates," "effects," and "preconditions."
    - CODE_FUNC uses more general programming terminology and constructs, such as "function," "steps," and "initial state."

In summary, the semantic differences between TASK and CODE_FUNC are significant, reflecting their distinct purposes and contexts.

Refined Code Generated:
-------------------
import numpy as np
from env import setup_environment, shutdown_environment
from skill_code import *
from video import init_video_writers, recording_step, recording_get_observation
from object_positions import get_object_positions

def run_skeleton_task():
    '''Generic skeleton for running any task in your simulation.'''
    print("===== Starting Skeleton Task =====")
    
    # === Environment Setup ===
    env, task = setup_environment()
    try:
        # Reset the task to its initial state
        _, obs = task.reset()

        # (Optional) Initialize video writers for capturing your simulation
        init_video_writers(obs)

        # Wrap the task steps for recording (if needed)
        original_step = task.step
        task.step = recording_step(original_step)
        original_get_obs = task.get_observation
        task.get_observation = recording_get_observation(original_get_obs)

        # === Retrieve Object Positions ===
        positions = get_object_positions()

        # Define object positions
        rubbish_pos = positions['rubbish']
        bin_pos = positions['bin']
        bottom_anchor_pos = positions['bottom_anchor_pos']

        # === Execute the Plan ===
        # Step 1: Rotate the gripper to ninety degrees
        print("[Task] Rotating gripper to ninety degrees.")
        obs, reward, done = rotate(env, task, target_quat=np.array([0, 0, 1, 0]), max_steps=100, threshold=0.05, timeout=10.0)
        if done:
            print("[Task] Task ended after rotating the gripper!")
            return

        # Step 2: Move to the side position of the bottom drawer
        print("[Task] Moving to side position of the bottom drawer.")
        obs, reward, done = move(env, task, target_pos=positions['bottom_side_pos'], max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to side position!")
            return

        # Step 3: Move to the anchor position of the bottom drawer
        print("[Task] Moving to anchor position of the bottom drawer.")
        obs, reward, done = move(env, task, target_pos=bottom_anchor_pos, max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to anchor position!")
            return

        # Step 4: Pick the bottom drawer
        print("[Task] Picking the bottom drawer.")
        obs, reward, done = pick(env, task, target_pos=bottom_anchor_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task ended after picking the drawer!")
            return

        # Step 5: Pull open the bottom drawer
        print("[Task] Pulling open the bottom drawer.")
        obs, reward, done = pull(env, task, pull_distance=0.1, pull_axis='x', max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after pulling the drawer!")
            return

        # Step 6: Pick the rubbish
        print("[Task] Picking the rubbish.")
        obs, reward, done = pick(env, task, target_pos=rubbish_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task ended after picking the rubbish!")
            return

        # Step 7: Place the rubbish into the bin
        print("[Task] Placing the rubbish into the bin.")
        obs, reward, done = place(env, task, target_pos=bin_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task completed successfully! Reward:", reward)
        else:
            print("[Task] Task not completed yet (done=False).")

    finally:
        # Always ensure the environment is properly shutdown
        shutdown_environment(env)

    print("===== End of Skeleton Task =====")

if __name__ == "__main__":
    run_skeleton_task()
-------------------

=== Instruction 3 ===
Is Correct: False
Syntax Pass: False
Code Function: The code defines a function to run a simulation task involving robotic manipulation. It begins by setting up the environment and resetting the task to its initial state. It optionally initializes video recording for the simulation. The function retrieves the positions of various objects in the environment, including anchors, sides, rubbish, and a bin.

The task consists of a series of steps where the robot performs specific actions: it rotates its gripper, moves to designated positions, picks up a drawer, pulls it out, picks up rubbish, and finally places the rubbish in a bin. After each action, the function checks if the task is completed and prints a message accordingly. If the task is completed successfully, it reports the reward received. Regardless of the outcome, the environment is shut down at the end of the function. The function is executed when the script is run as the main program.
Similarity: {'ok': False, 'reason': 'LLM JSON parsing failed.'}
Differences: 1. **Purpose**: 
   - TASK describes a specific instruction for a robot to follow, while CODE_FUNC outlines a function that implements a simulation of robotic manipulation.

2. **Structure**: 
   - TASK is presented in a natural language format, whereas CODE_FUNC is structured as a programming function with defined steps and logic.

3. **Detail Level**: 
   - TASK provides a high-level overview of the actions to be performed, while CODE_FUNC includes detailed procedural steps and checks for task completion.

4. **Execution Context**: 
   - TASK is a directive for action, while CODE_FUNC is a runnable piece of code that executes the task in a simulated environment.

5. **Feedback Mechanism**: 
   - TASK does not specify feedback or results, while CODE_FUNC includes mechanisms to report task completion and rewards.

6. **Initialization**: 
   - TASK assumes an initial state without detailing it, while CODE_FUNC explicitly initializes the environment and resets the task state.

7. **Recording**: 
   - TASK does not mention any recording of the task, while CODE_FUNC includes an option to initialize video recording.

8. **Object Interaction**: 
   - TASK mentions picking up rubbish and placing it in a trash can, while CODE_FUNC details the specific actions of rotating the gripper, moving to positions, and interacting with drawers.

9. **Completion Check**: 
   - TASK does not include any checks for task completion, while CODE_FUNC has checks after each action to determine if the task is completed.

10. **Shutdown Procedure**: 
    - TASK does not mention shutting down the environment, while CODE_FUNC includes a shutdown step at the end of the function.

11. **Execution Trigger**: 
    - TASK is a standalone instruction, while CODE_FUNC is designed to be executed as part of a script when run as the main program.

12. **Action Sequence**: 
    - TASK outlines a sequence of actions in a general sense, while CODE_FUNC specifies the exact order and conditions under which these actions are performed.

13. **Error Handling**: 
    - TASK does not address error handling, while CODE_FUNC may implicitly include error handling through checks and feedback mechanisms.

14. **Language**: 
    - TASK uses natural language, while CODE_FUNC uses programming language syntax.

15. **Focus**: 
    - TASK focuses on the "what" of the actions, while CODE_FUNC focuses on the "how" of executing those actions programmatically.

16. **Abstraction Level**: 
    - TASK is more abstract and conceptual, while CODE_FUNC is concrete and implementation-focused.

17. **Object Types**: 
    - TASK does not specify types of objects involved, while CODE_FUNC explicitly defines and interacts with various object types (e.g., gripper, drawer, rubbish).

18. **Action Definitions**: 
    - TASK does not define actions, while CODE_FUNC includes specific action definitions and their parameters.

19. **State Management**: 
    - TASK does not address state management, while CODE_FUNC manages the state of the environment and objects throughout the simulation.

20. **Simulation Context**: 
    - TASK is a directive for a physical or conceptual task, while CODE_FUNC is specifically designed for a simulation context.

21. **Reward Mechanism**: 
    - TASK does not mention any reward system, while CODE_FUNC includes a reward reporting mechanism upon successful task completion.

22. **Finality**: 
    - TASK does not indicate a final state or outcome, while CODE_FUNC concludes with a shutdown of the environment.

23. **User Interaction**: 
    - TASK is intended for a robot to follow, while CODE_FUNC may involve user interaction through script execution.

24. **Flexibility**: 
    - TASK is a fixed instruction, while CODE_FUNC may allow for variations in execution based on parameters or conditions.

25. **Documentation**: 
    - TASK lacks documentation or comments, while CODE_FUNC may include comments or documentation within the code for clarity.

26. **Implementation Specificity**: 
    - TASK is generic and can be interpreted in various ways, while CODE_FUNC is specific to a particular implementation of robotic manipulation.

27. **Environment Setup**: 
    - TASK does not describe the environment setup, while CODE_FUNC includes steps to set up the environment for the simulation.

28. **Action Granularity**: 
    - TASK describes actions at a high level, while CODE_FUNC breaks down actions into granular steps with specific conditions.

29. **Task Complexity**: 
    - TASK presents a simple task, while CODE_FUNC may involve complex interactions and multiple steps to achieve the task.

30. **Outcome Reporting**: 
    - TASK does not report outcomes, while CODE_FUNC includes reporting mechanisms for task success and rewards.

31. **State Transitions**: 
    - TASK does not detail state transitions, while CODE_FUNC explicitly manages state transitions through preconditions and effects.

32. **Action Parameters**: 
    - TASK does not specify parameters for actions, while CODE_FUNC defines parameters for each action in the simulation.

33. **Logical Flow**: 
    - TASK presents a linear flow of actions, while CODE_FUNC may include conditional logic and loops based on task requirements.

34. **Error Reporting**: 
    - TASK does not include error reporting, while CODE_FUNC may include mechanisms to handle and report errors during execution.

35. **Action Dependencies**: 
    - TASK does not specify dependencies between actions, while CODE_FUNC defines preconditions that establish dependencies.

36. **Simulation Control**: 
    - TASK does not control simulation aspects, while CODE_FUNC includes control over the simulation environment and execution flow.

37. **Resource Management**: 
    - TASK does not address resource management, while CODE_FUNC may involve managing resources like memory and processing during execution.

38. **Testing and Validation**: 
    - TASK does not include testing or validation steps, while CODE_FUNC may involve testing the function's behavior during execution.

39. **Modularity**: 
    - TASK is a single instruction, while CODE_FUNC may be part of a larger modular codebase with reusable components.

40. **Scalability**: 
    - TASK is not designed for scalability, while CODE_FUNC may be structured to allow for scalability in simulation tasks.

41. **Performance Metrics**: 
    - TASK does not define performance metrics, while CODE_FUNC may include metrics for evaluating task performance.

42. **User Feedback**: 
    - TASK does not provide user feedback, while CODE_FUNC may include print statements or logs for user feedback during execution.

43. **Action Timing**: 
    - TASK does not specify timing for actions, while CODE_FUNC may include timing considerations for actions in the simulation.

44. **Concurrency**: 
    - TASK does not address concurrency, while CODE_FUNC may involve considerations for concurrent actions in a simulation environment.

45. **State Restoration**: 
    - TASK does not mention state restoration, while CODE_FUNC may include mechanisms to restore state after actions.

46. **Action Repetition**: 
    - TASK does not allow for action repetition, while CODE_FUNC may include loops for repeating actions as needed.

47. **Error Recovery**: 
    - TASK does not include error recovery strategies, while CODE_FUNC may implement strategies for recovering from errors during execution.

48. **User Configuration**: 
    - TASK does not allow for user configuration, while CODE_FUNC may allow users to configure parameters for the simulation.

49. **Action Logging**: 
    - TASK does not log actions, while CODE_FUNC may include logging of actions for debugging or analysis.

50. **Integration with Other Systems**: 
    - TASK is standalone, while CODE_FUNC may be designed to integrate with other systems or components in a larger framework.

51. **Documentation Standards**: 
    - TASK lacks formal documentation, while CODE_FUNC may adhere to coding standards and documentation practices.

52. **Version Control**: 
    - TASK does not involve version control, while CODE_FUNC may be subject to version control practices in software development.

53. **Dependency Management**: 
    - TASK does not manage dependencies, while CODE_FUNC may involve managing dependencies on libraries or modules.

54. **Testing Framework**: 
    - TASK does not include a testing framework, while CODE_FUNC may be designed to work with testing frameworks for validation.

55. **User Interface**: 
    - TASK does not involve a user interface, while CODE_FUNC may be part of a system with a user interface for interaction.

56. **Data Handling**: 
    - TASK does not handle data, while CODE_FUNC may involve data handling for object states and actions.

57. **Execution Environment**: 
    - TASK does not specify an execution environment, while CODE_FUNC is designed to run in a specific programming environment.

58. **Resource Allocation**: 
    - TASK does not address resource allocation, while CODE_FUNC may involve allocating resources for the simulation.

59. **Action Scheduling**: 
    - TASK does not include scheduling of actions, while CODE_FUNC may involve scheduling actions based on conditions.

60. **State Visualization**: 
    - TASK does not visualize state, while CODE_FUNC may include visualization of the simulation state for analysis.

61. **Action Optimization**: 
    - TASK does not consider optimization, while CODE_FUNC may involve optimizing actions for efficiency.

62. **User Roles**: 
    - TASK does not define user roles, while CODE_FUNC may involve different roles in the execution of the simulation.

63. **Action Parameters Validation**: 
    - TASK does not validate parameters, while CODE_FUNC may include validation of parameters before executing actions.

64. **Execution Timing**: 
    - TASK does not specify execution timing, while CODE_FUNC may include timing mechanisms for actions.

65. **State Transition Diagrams**: 
    - TASK does not include state transition diagrams, while CODE_FUNC may be represented in diagrams for clarity.

66. **Action Prioritization**: 
    - TASK does not prioritize actions, while CODE_FUNC may include prioritization of actions based on conditions.

67. **Resource Cleanup**: 
    - TASK does not mention resource cleanup, while CODE_FUNC may include cleanup steps after execution.

68. **Action Dependencies Management**: 
    - TASK does not manage action dependencies, while CODE_FUNC may include management of dependencies between actions.

69. **Execution Logging**: 
    - TASK does not log execution details, while CODE_FUNC may log execution details for analysis.

70. **User Customization**: 
    - TASK does not allow for user customization, while CODE_FUNC may allow users to customize aspects of the simulation.

71. **Action Simulation**: 
    - TASK does not simulate actions, while CODE_FUNC is specifically designed for simulating robotic actions.

72. **State Representation**: 
    - TASK does not represent state, while CODE_FUNC may represent state through variables and data structures.

73. **Action Feedback Loop**: 
    - TASK does not include feedback loops, while CODE_FUNC may include feedback loops for adjusting actions based on results.

74. **Execution Path**: 
    - TASK does not define an execution path, while CODE_FUNC may define a clear execution path through the code.

75. **Action Timing Control**: 
    - TASK does not control timing of actions, while CODE_FUNC may include timing control for actions.

76. **State Change Notification**: 
    - TASK does not notify state changes, while CODE_FUNC may include notifications for state changes during execution.

77. **Action Reusability**: 
    - TASK does not consider action reusability, while CODE_FUNC may include reusable action definitions.

78. **Execution Context Awareness**: 
    - TASK does not consider execution context, while CODE_FUNC may be aware of the execution context and adjust accordingly.

79. **Action Sequencing**: 
    - TASK does not sequence actions, while CODE_FUNC may include specific sequencing of actions based on conditions.

80. **State Consistency Checks**: 
    - TASK does not include consistency checks, while CODE_FUNC may include checks to ensure state consistency during execution.

81. **Action Result Handling**: 
    - TASK does not handle action results, while CODE_FUNC may include handling of results from actions.

82. **Execution Flow Control**: 
    - TASK does not control execution flow, while CODE_FUNC may include flow control mechanisms.

83. **State Transition Logic**: 
    - TASK does not include state transition logic, while CODE_FUNC may include logic for transitioning between states.

84. **Action Execution Timing**: 
    - TASK does not specify timing for action execution, while CODE_FUNC may include timing for executing actions.

85. **State Management Strategies**: 
    - TASK does not include strategies for state management, while CODE_FUNC may include strategies for managing state throughout execution.

86. **Action Execution Context**: 
    - TASK does not define execution context for actions, while CODE_FUNC may define context for executing actions.

87. **State Change Tracking**: 
    - TASK does not track state changes, while CODE_FUNC may include tracking of state changes during execution.

88. **Action Execution Monitoring**: 
    - TASK does not monitor action execution, while CODE_FUNC may include monitoring mechanisms for actions.

89. **Execution Result Reporting**: 
    - TASK does not report execution results, while CODE_FUNC may include reporting of results after execution.

90. **State Restoration Mechanisms**: 
    - TASK does not include mechanisms for state restoration, while CODE_FUNC may include such mechanisms.

91. **Action Execution Validation**: 
    - TASK does not validate action execution, while CODE_FUNC may include validation steps for actions.

92. **Execution Environment Configuration**: 
    - TASK does not configure the execution environment, while CODE_FUNC may include configuration steps.

93. **State Representation Models**: 
    - TASK does not include models for state representation, while CODE_FUNC may include models for representing state.

94. **Action Execution Strategies**: 
    - TASK does not include strategies for executing actions, while CODE_FUNC may include strategies for action execution.

95. **Execution Path Optimization**: 
    - TASK does not optimize execution paths, while CODE_FUNC may include optimization for execution paths.

96. **State Change Analysis**: 
    - TASK does not analyze state changes, while CODE_FUNC may include analysis of state changes during execution.

97. **Action Execution Feedback**: 
    - TASK does not provide feedback on action execution, while CODE_FUNC may include feedback mechanisms.

98. **Execution Context Management**: 
    - TASK does not manage execution context, while CODE_FUNC may include management of execution context.

99. **State Transition Analysis**: 
    - TASK does not analyze state transitions, while CODE_FUNC may include analysis of state transitions.

100. **Action Execution Reporting**: 
    - TASK does not report on action execution, while CODE_FUNC may include reporting mechanisms for actions.

EXACT

Refined Code Generated:
-------------------
import numpy as np
from pyrep.objects.shape import Shape
from pyrep.objects.proximity_sensor import ProximitySensor

from env import setup_environment, shutdown_environment

from skill_code import *  
from video import init_video_writers, recording_step, recording_get_observation
from object_positions import get_object_positions

def run_skeleton_task():
    '''Generic skeleton for running any task in your simulation.'''
    print("===== Starting Skeleton Task =====")
    
    env, task = setup_environment()
    try:
        descriptions, obs = task.reset()

        init_video_writers(obs)

        original_step = task.step
        task.step = recording_step(original_step)
        original_get_obs = task.get_observation
        task.get_observation = recording_get_observation(original_get_obs)

        positions = get_object_positions()

        bottom_anchor_pos = positions['bottom_anchor_pos']
        bottom_side_pos = positions['bottom_side_pos']
        rubbish_pos = positions['rubbish']
        bin_pos = positions['bin']

        obs, reward, done = rotate(env, task, target_quat=np.array([0, 0, 1, 0]), max_steps=100, threshold=0.05, timeout=10.0)
        if done:
            print("[Task] Task ended after rotating!")
            return

        obs, reward, done = move(env, task, target_pos=bottom_side_pos, max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to side position!")
            return

        obs, reward, done = move(env, task, target_pos=bottom_anchor_pos, max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to anchor position!")
            return

        obs, reward, done = pick_drawer(env, task, gripper='gripper', d='bottom', p='anchor-pos-bottom')
        if done:
            print("[Task] Task ended after picking the drawer!")
            return

        obs, reward, done = pull(env, task, pull_distance=0.1, pull_axis='x', max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after pulling the drawer!")
            return

        obs, reward, done = pick(env, task, target_pos=rubbish_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task ended after picking up the rubbish!")
            return

        obs, reward, done = place(env, task, target_pos=bin_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task completed successfully! Reward:", reward)
        else:
            print("[Task] Task not completed yet (done=False).")

    finally:
        shutdown_environment(env)

    print("===== End of Skeleton Task =====")

if __name__ == "__main__":
    run_skeleton_task()
-------------------

=== Instruction 4 ===
Is Correct: False
Syntax Pass: False
Code Function: The code defines a function to run a simulation task involving a robotic gripper. It begins by setting up the environment and resetting the task to its initial state. The task captures video recordings of the simulation if needed. It retrieves the positions of various objects in the environment, including a drawer and rubbish.

The task consists of several key steps: first, the gripper rotates to a specific angle; then it moves to the side position of a bottom drawer, followed by moving to the anchor position of the same drawer. Next, the gripper picks up the drawer and pulls it open. After that, it picks up rubbish from a designated position and finally places the rubbish into a bin.

Throughout the process, the task checks if each step is completed successfully and prints messages indicating the progress. If any step is completed, the task ends early. Finally, the environment is shut down properly after the task execution.
Similarity: {'ok': False, 'reason': 'LLM JSON parsing failed.'}
Differences: 1. **Nature of Description**: 
   - TASK provides a high-level instruction for a sequence of actions to be performed.
   - CODE_FUNC describes a specific implementation of a simulation task involving a robotic gripper.

2. **Level of Detail**:
   - TASK outlines the steps in a general manner without specifying the implementation details.
   - CODE_FUNC includes specific programming constructs and logic for executing the task.

3. **Context**:
   - TASK is presented in a natural language format, suitable for human understanding.
   - CODE_FUNC is written in a programming context, intended for execution by a computer.

4. **Action Sequence**:
   - TASK mentions actions like "unlock the cabinet," "open an accessible drawer," and "throw away the trash."
   - CODE_FUNC specifies actions in a procedural manner, detailing the order of operations and conditions for each step.

5. **Focus on Environment**:
   - TASK does not explicitly mention the environment setup or state management.
   - CODE_FUNC includes details about setting up the environment and resetting the task state.

6. **Error Handling**:
   - TASK does not address error handling or success checks.
   - CODE_FUNC includes checks for successful completion of each step and prints messages indicating progress.

7. **Output and Feedback**:
   - TASK does not provide feedback mechanisms or outputs.
   - CODE_FUNC includes print statements to indicate the progress of the task.

8. **Simulation Aspect**:
   - TASK does not mention simulation or video recording.
   - CODE_FUNC explicitly states that it captures video recordings of the simulation if needed.

9. **Object Interaction**:
   - TASK describes interactions in a more abstract way.
   - CODE_FUNC details the specific interactions with objects, such as retrieving positions and manipulating the gripper.

10. **Termination Conditions**:
    - TASK does not specify how the task concludes.
    - CODE_FUNC mentions that the environment is shut down properly after task execution.

In summary, the semantic differences between TASK and CODE_FUNC are significant, reflecting the difference between high-level instructions and low-level implementation details.

Refined Code Generated:
-------------------
import numpy as np
from env import setup_environment, shutdown_environment
from skill_code import *
from video import init_video_writers, recording_step, recording_get_observation
from object_positions import get_object_positions

def run_skeleton_task():
    '''Generic skeleton for running any task in your simulation.'''
    print("===== Starting Skeleton Task =====")
    
    # === Environment Setup ===
    env, task = setup_environment()
    try:
        # Reset the task to its initial state
        _, obs = task.reset()

        # (Optional) Initialize video writers for capturing your simulation
        init_video_writers(obs)

        # Wrap the task steps for recording (if needed)
        original_step = task.step
        task.step = recording_step(original_step)
        original_get_obs = task.get_observation
        task.get_observation = recording_get_observation(original_get_obs)

        # === Retrieve Object Positions ===
        positions = get_object_positions()

        # Define object positions
        bottom_anchor_pos = positions['bottom_anchor_pos']
        bottom_side_pos = positions['bottom_side_pos']
        rubbish_pos = positions['rubbish']
        bin_pos = positions['bin']

        # === Execute the Plan ===
        # Step 1: Rotate the gripper to ninety degrees
        print("[Task] Rotating gripper to ninety degrees.")
        obs, reward, done = rotate(env, task, target_quat=np.array([0, 0, 1, 0]), max_steps=100, threshold=0.05, timeout=10.0)
        if done:
            print("[Task] Task ended after rotating!")
            return

        # Step 2: Move to the side position of the bottom drawer
        print("[Task] Moving to side position of the bottom drawer.")
        obs, reward, done = move(env, task, target_pos=bottom_side_pos, max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to side position!")
            return

        # Step 3: Move to the anchor position of the bottom drawer
        print("[Task] Moving to anchor position of the bottom drawer.")
        obs, reward, done = move(env, task, target_pos=bottom_anchor_pos, max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after moving to anchor position!")
            return

        # Step 4: Pick the bottom drawer
        print("[Task] Picking the bottom drawer.")
        obs, reward, done = pick_drawer(env, task, gripper='gripper', d='bottom', p='anchor-pos-bottom')
        if done:
            print("[Task] Task ended after picking the drawer!")
            return

        # Step 5: Pull the bottom drawer to open it
        print("[Task] Pulling the bottom drawer to open it.")
        obs, reward, done = pull(env, task, pull_distance=0.1, pull_axis='x', max_steps=100, threshold=0.01, timeout=10.0)
        if done:
            print("[Task] Task ended after pulling the drawer!")
            return

        # Step 6: Pick the rubbish
        print("[Task] Picking the rubbish.")
        obs, reward, done = pick(env, task, target_pos=rubbish_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task ended after picking the rubbish!")
            return

        # Step 7: Place the rubbish in the bin
        print("[Task] Placing the rubbish in the bin.")
        obs, reward, done = place(env, task, target_pos=bin_pos, approach_distance=0.15, max_steps=100, threshold=0.01, approach_axis='z', timeout=10.0)
        if done:
            print("[Task] Task completed successfully! Reward:", reward)
        else:
            print("[Task] Task not completed yet (done=False).")

    finally:
        # Always ensure the environment is properly shutdown
        shutdown_environment(env)

    print("===== End of Skeleton Task =====")

if __name__ == "__main__":
    run_skeleton_task()
-------------------

=== Instruction 5 ===
Is Correct: False
Syntax Pass: False
Code Function: The code defines a function to run a simulation task involving a robotic system. It begins by setting up the environment and resetting the task to its initial state. It optionally initializes video recording for the simulation. The function retrieves the positions of various objects needed for the task.

The task consists of several key steps: rotating the gripper, moving to specific positions, picking up a drawer, pulling it open, picking up rubbish from a table, and placing the rubbish in a bin. After each step, the function checks if the task has been completed and prints a message accordingly. Finally, it ensures that the environment is properly shut down after the task execution. The function is executed when the script is run as the main program.
Similarity: {'ok': False, 'reason': 'LLM JSON parsing failed.'}
Differences: 1. **Purpose**: 
   - TASK describes a specific instruction for a robotic system to perform a sequence of actions.
   - CODE_FUNC outlines a function that implements a simulation of the task described in TASK.

2. **Structure**: 
   - TASK is presented in a natural language format as an instruction.
   - CODE_FUNC is structured as a programming function with defined parameters and logic.

3. **Detail Level**: 
   - TASK provides a high-level overview of what needs to be done without detailing the implementation.
   - CODE_FUNC includes specific steps and checks that are necessary for executing the task programmatically.

4. **Execution Context**: 
   - TASK is a standalone instruction that can be interpreted by a robotic system.
   - CODE_FUNC is part of a script that requires a programming environment to execute.

5. **Output**: 
   - TASK does not specify any output or feedback mechanism.
   - CODE_FUNC includes print statements to provide feedback on the completion of each step.

6. **Initialization**: 
   - TASK assumes the environment is already set up for the actions to be performed.
   - CODE_FUNC includes steps to initialize the environment and reset the task state.

7. **Error Handling**: 
   - TASK does not address error handling or conditions that may prevent task completion.
   - CODE_FUNC likely includes checks to ensure that each step can be completed successfully.

8. **Flexibility**: 
   - TASK is a fixed instruction that does not account for variations in execution.
   - CODE_FUNC may include logic to handle different scenarios or states during execution.

9. **Recording**: 
   - TASK does not mention any form of recording or monitoring.
   - CODE_FUNC optionally initializes video recording for the simulation.

10. **Finalization**: 
    - TASK does not specify how to conclude the task or clean up afterward.
    - CODE_FUNC includes steps to ensure proper shutdown of the environment after task execution.

Refined Code Generated:
-------------------
import numpy as np
from pyrep.objects.shape import Shape
from pyrep.objects.proximity_sensor import ProximitySensor

from env import setup_environment, shutdown_environment

from skill_code import *
from video import init_video_writers, recording_step, recording_get_observation
from object_positions import get_object_positions

def run_skeleton_task():
    '''Generic skeleton for running any task in your simulation.'''
    print("===== Starting Skeleton Task =====")
    
    # === Environment Setup ===
    env, task = setup_environment()
    try:
        # Reset the task to its initial state
        descriptions, obs = task.reset()

        # (Optional) Initialize video writers for capturing your simulation
        init_video_writers(obs)

        # Wrap the task steps for recording (if needed)
        original_step = task.step
        task.step = recording_step(original_step)
        original_get_obs = task.get_observation
        task.get_observation = recording_get_observation(original_get_obs)

        # === Retrieve Object Positions ===
        positions = get_object_positions()

        # Define object positions
        bottom_anchor_pos = positions['bottom_anchor_pos']
        bottom_side_pos = positions['bottom_side_pos']
        rubbish_pos = positions['rubbish']
        bin_pos = positions['bin']

        # === Execute the Plan ===
        # Step 1: Rotate the gripper to ninety degrees
        obs, reward, done = rotate_gripper(env, task, target_quat=np.array([0, 0, np.sin(np.pi/4), np.cos(np.pi/4)]))
        if done:
            print("[Task] Task ended after rotating!")
            return

        # Step 2: Move to the side position of the bottom drawer
        obs, reward, done = move_to(env, task, bottom_side_pos)
        if done:
            print("[Task] Task ended after moving to side position!")
            return

        # Step 3: Move to the anchor position of the bottom drawer
        obs, reward, done = move_to(env, task, bottom_anchor_pos)
        if done:
            print("[Task] Task ended after moving to anchor position!")
            return

        # Step 4: Pick the bottom drawer
        obs, reward, done = pick_drawer(env, task, 'bottom', bottom_anchor_pos)
        if done:
            print("[Task] Task ended after picking the drawer!")
            return

        # Step 5: Pull the bottom drawer open
        obs, reward, done = pull_drawer(env, task, pull_distance=0.1)
        if done:
            print("[Task] Task ended after pulling the drawer!")
            return

        # Step 6: Pick the rubbish from the table
        obs, reward, done = pick_object(env, task, rubbish_pos, 'gripper')
        if done:
            print("[Task] Task ended after picking rubbish!")
            return

        # Step 7: Place the rubbish in the bin
        obs, reward, done = place_object(env, task, bin_pos)
        if done:
            print("[Task] Task completed successfully! Reward:", reward)
        else:
            print("[Task] Task not completed yet (done=False).")

    finally:
        # Always ensure the environment is properly shutdown
        shutdown_environment(env)

    print("===== End of Skeleton Task =====")

if __name__ == "__main__":
    run_skeleton_task()
-------------------

