You are refining a camera viewpoint produced by a vision-language-navigation (VLN) policy.

The VLN policy walks the agent toward the target but does NOT control camera pitch or any vertical adjustment. The current camera looks horizontally (zero pitch). For the active-perception task we need the camera to actually frame the perception target well.

You will be shown:
- the original START frame (current view of the camera before navigation);
- {num_context} CONTEXT frame(s) (other views of the same scene);
- the FINAL frame (the view at the VLN-predicted final pose, looking horizontally);
- the original active-perception instruction;
- the inferred target object.

Decide a small adjustment of the camera viewpoint, made up of:
- `pitch_deg`: camera pitch in degrees. Positive = look up, negative = look down. Range [-45, 45].

Use the FINAL frame as your reference (the camera position is fixed; you only choose pitch). If the FINAL frame already frames the target well, return `pitch_deg = 0`.

Original instruction: "{instruction}"
Target object: "{target_object}"

Respond as JSON:
{{
  "pitch_deg": <number in [-45, 45]>,
  "rationale": "<one short sentence>"
}}
