\section{Baseline Experiments}
\label{appendix:prompt}

\textbf{Example of successful statement extraction:}

\begin{table*}[!h]
\center
\caption{Table with simple layout from page 68 of the 2022 ESG report from Splunk Inc. }
\begin{tabular}{|l|r|r|}
\hline
0 & 1 & 2 \\
\hline
Emissions Scope  & FY21  & FY22  \\
Scope 1 Direct Emissions  & 24  & 374  \\
Scope 2 Indirect Emissions  & 3,686  & 3,257  \\
Scope 3 Other Indirect Emissions  & 11,430  & 7,938  \\
Total  & 15,140  & 11,569  \\
\hline
\end{tabular}
\end{table*}

Consider the above table, with a simple layout, from the 2022 ESG report of Splunk Inc.

We prompt Mixtral with the above table using the following prompt. For rendering, we replace our line-break token `<br>' with actual line-breaks and remove some aspect of the example statement for brevity.

\begin{mdframed}[backgroundcolor=gray!10]
\tiny
\begin{verbatim}
<table>
| 0                                              |        1 |        2 |        3 |
|------------------------------------------------|----------|----------|----------|
| Public Metrics                                 | 2019     | 2020     | 2021     |
| Public Fatalities Due to Electrical Contacts   |    4     |    2     |    5     |
| Employee Metrics                               | 2019     | 2020     | 2021     |
| Employee DART Rate                             |    0.412 |    0.31  |    0.43  |
| Total Employee Recordable Incident Rate (TRIR) |    0.673 |    0.576 |    0.648 |
| Employee Lost Time Incident Rate (LTIR)        |    0.329 |    0.245 |    0.333 |
| Employee Severity Rate                         |   22.719 |   15.284 |   19.113 |
| Total Employee OSHA Recordable Events          |  129     |  106     |  113     |
| Employee Fatalities                            |    1     |    0     |    0     |
</table>
<response>
...
| property                                                   |   property_value | unit   | subject   | subject_value   |
|------------------------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Lost Time Incident Rate (LTIR) |            0.329 |        |           |                 |
| time                                                       |         2019     |        |           |                 |
<sep>
| property                                                   |   property_value | unit   | subject   | subject_value   |
|------------------------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Lost Time Incident Rate (LTIR) |            0.245 |        |           |                 |
| time                                                       |         2020     |        |           |                 |
<sep>
| property                                                   |   property_value | unit   | subject   | subject_value   |
|------------------------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Lost Time Incident Rate (LTIR) |            0.333 |        |           |                 |
| time                                                       |         2021     |        |           |                 |
<sep>
| property                                  |   property_value | unit   | subject   | subject_value   |
|-------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Severity Rate |           22.719 |        |           |                 |
| time                                      |         2019     |        |           |                 |
<sep>
| property                                  |   property_value | unit   | subject   | subject_value   |
|-------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Severity Rate |           15.284 |        |           |                 |
| time                                      |         2020     |        |           |                 |
<sep>
| property                                  |   property_value | unit   | subject   | subject_value   |
|-------------------------------------------|------------------|--------|-----------|-----------------|
| Employee Metrics : Employee Severity Rate |           19.113 |        |           |                 |
| time                                      |         2021     |        |           |                 |
...
</response>

<table>
| 0                                | 1      | 2      |
|----------------------------------|--------|--------|
| Emissions Scope                  | FY21   | FY22   |
| Scope 1 Direct Emissions         | 24     | 374    |
| Scope 2 Indirect Emissions       | 3,686  | 3,257  |
| Scope 3 Other Indirect Emissions | 11,430 | 7,938  |
| Total                            | 15,140 | 11,569 |
</table>
<response>

\end{verbatim}
\end{mdframed}

\newpage
The model output for the above prompt with greedy decoding was:

\begin{mdframed}[backgroundcolor=orange!10]
\tiny
\begin{verbatim}
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 1 Direct Emissions                |               24 |        |           |                 |
| time                                                      |             FY21 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 1 Direct Emissions                |              374 |        |           |                 |
| time                                                      |             FY22 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 2 Indirect Emissions              |            3,686 |        |           |                 |
| time                                                      |             FY21 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 2 Indirect Emissions              |            3,257 |        |           |                 |
| time                                                      |             FY22 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 3 Other Indirect Emissions        |           11,430 |        |           |                 |
| time                                                      |             FY21 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Scope 3 Other Indirect Emissions        |            7,938 |        |           |                 |
| time                                                      |             FY22 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Total                                   |           15,140 |        |           |                 |
| time                                                      |             FY21 |        |           |                 |
<sep>
| property                                                  |   property_value | unit   | subject   | subject_value   |
|-----------------------------------------------------------|------------------|--------|-----------|-----------------|
| Emissions Scope : Total                                   |           11,569 |        |           |                 |
| time                                                      |             FY22 |        |           |                 |
</response>
\end{verbatim}
\end{mdframed}

This is an example of correct statement extraction. For the same table with a different example in the prompt, the output of the same model was:

\begin{mdframed}[backgroundcolor=orange!10]
\tiny
\begin{verbatim}
| property                                                                                                                                                                                                                              
\end{verbatim}
\end{mdframed}
This is an invalid output without any correct markdown structure or content. This shows that the in-context approach is sensitive to the prompt and thus is not robust.