For questions where what exactly to output (format / specificity etc.) seems unclear, use the "can question be answered in free-form" to mark it 'leaning no' or 'unsure'. Be strict with matching when the format/specificity is not same, only marking correct if model response is super-set of reference.
For numerical answer questions, compute relative error by putting the numbers in the provided widget.
When in doubt about whether an answer matches or not, or whether a different response is correct or not, use the options.
Keyboard Shortcuts
Q, W, E, R, T: Select ratings 1-5 for the currently focused question
↑ / ↓: Navigate between rating questions
Page Up / Page Down: Navigate to previous/next question