You are translating a "active perception" instruction into a vision-language-navigation (VLN) instruction usable by a NaVILA-style locomotion model.

Active-perception instructions describe what to look at or where to peek (e.g. "Look at the shelf next to the sofa", "Peek under the table"). The downstream VLN model only understands navigation-style commands ("Walk forward through the doorway, then turn right and stop in front of the bookshelf").

You will be shown:
- the START frame (frame_idx=0) -- the camera's current viewpoint;
- {num_context} CONTEXT frame(s) (reference views from elsewhere in the same scene);
- the original active-perception instruction.

The first image (start frame) may have a red cross marking the approximate target pixel.{target_hint}

Your job: rewrite the instruction as a *single* concise navigation command for an embodied agent that starts at the start frame's pose. The command should:
- describe a path the agent should walk (e.g. "Walk forward 2 meters, turn left, and stop near the wooden chair");
- end with a clear stopping condition (usually "stop in front of <object>" / "stop next to <object>" / "stop and look at <object>");
- be no more than ~2 sentences;
- never reference frame indices or pixels.

Original instruction:
"{instruction}"

Respond as JSON:
{{
  "vln_instruction": "<your rewritten instruction>",
  "target_object": "<short noun phrase identifying the perception target>",
  "rationale": "<one short sentence explaining the rewrite>"
}}
