\section{True Cost Accounting using Knowledge Graphs}
Traditional \gls{tca} inspects the lifecycle of a product to find externalities not reflected by the price. It assumes unrestricted access to company data and supply chain introspection, which are almost never publicly accessible. To illustrate how we may approximate \gls{tca} with public data, we may try to reproduce an existing report about the true cost of coffee beans for a particular roaster such as "Bocca Coffee"~\cite{truepricecoffee}.
We may know the revenue in a given year to be \euro6.9 million. For illustrative reasons, imagine we measure emissions to be 100 tons of $\text{CO}_2$ equivalent alongside water consumption of 6000$m^3$ and electricity consumption of 100 Mwh. We may then consult a table of costs for each externality,%
\footnote{In this case using the monetization table from \url{https://github.com/Truepricemethod/Monetisation_factors}
see Appendix \ref{app:cost_table}} to find emissions priced at 312 \euro/ton and water at 1.62 \euro /$m^3$. For electricity consumption we use the estimate of 70 \euro/Mwh by \citet{sovacool2021energy}.
Aggregating these priced externalities lets us compute a hidden cost per euro of revenue, which in turn can be multiplied with the price of a product to obtain an approximate true price under the assumption, that all products contribute to all externalities in proportion to their price.
Figure \ref{fig:bocca_coffee} illustrates this compared to a thorough \gls{tca} report.
With plausible data, the approach undershoots the ground truth due to incomplete data. Nonetheless it contains insights, that the market price alone did not capture.
The rest of this section explains the architecture we developed to scale this approach.

\begin{figure}
	\centering
	\includesvg[scale=0.35]{res/coffee_approx.svg}%
	\caption{\gls{atca} result for Bocca Coffee Beans (right) compared to real data from \cite{truepricecoffee}(left).}
	\label{fig:bocca_coffee}
\end{figure}


\subsection{System Overview}
At the core of this demo lies a \gls{kg} supplemented by a \gls{ui}.
Figure \ref{fig:architecture} provides an overview of the architecture, which is split into three parts: ingestion, storage, and uses, described below.

\paragraph{Ingestion}
The \gls{kg} is constructed from Wikirate~\cite{Wikirate} and OpenProductsFacts~\cite{OpenProductsFacts}, with sources being toggleable from the \gls{ui} to suit different use cases; e.g., a targeted analysis of fast food products might rely on OpenFoodfacts \cite{OpenFoodFacts} for ingredient information, where investigations into child labour might not need it.
Wikirate is an open data platform which crowdsources various company data relating to \gls{esg} issues.
For the scope of our demo, a subset of~1000 companies and 1000 metrics is transformed into \gls{rdf} triples according to the schema shown in Appendix~\ref{app:schema}.
% OpenProductsFacts is also exported and transformed to triple format. 
At the time of writing, OpenProductsFacts only includes around 40,000 products. As a placeholder for missing products, we allow specifying a price for a product manually.
Additional sources may directly be integrated into the graph, or injected at query time through the use of federated queries.
The latter allows convenient reuse of existing resources linked through common attributes such as the OpenCorporatesID for companies, illustrated in Figure~\ref{lst:federated}.


\begin{figure}
	\footnotesize
	\centering
	\begin{minted}{sparql}
PREFIX hcr: <http://hiddencostreport.org/schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?company_name ?industry WHERE {
  ?company hcr:Name ?company_name ;
           hcr:OpenCorporatesID ?OCID .

  SERVICE <https://query.wikidata.org/sparql> {
    ?wdEntity wdt:P1616 ?OCID ;
              wdt:P452 ?industry .
  }
}
\end{minted}
	\caption{Example SPARQL query to fetch industries of a company from Wikidata~\cite{vrandecic2014wikidata} using a federated query.}
	\label{lst:federated}
\end{figure}

\begin{figure}
	\centering
	\input{res/architecture-diagram}
	\caption{A data ingestion component transforms data into a common format followed by on-disk storage and indexing. The data is then consumed by interfaces: the \gls{ui}, the Qlever endpoint and our text2sparql system.}
	\label{fig:architecture}
\end{figure}

All sources are harmonized and disambiguated with a set of heuristics, i.e., applying a form of stemming to company names to strip away organizational indicators such as ``inc.'' or ``limited'', supplemented by manual curation.
Additionally, metrics are categorized to allow unit conversion and cost lookup from a predefined table.
Our hierarchical keyword matching approach assigns a category to approximately 30\% of metrics, which may already be close to the ceiling, as many metrics such as the number of employees or readability of financial reports cannot meaningfully be translated to costs directly.

To check this assertion we randomly sample 50 metrics and inspect the assigned mapping.
We find~17~(37\%) correctly assigned metrics, of which five are monetizable, nine are disclosure rates and three are derived metrics, such as \textit{energy consumption per dollar of revenue}.
Among the monetizable metrics is \textit{water discharge quality} which has its unit implicitly misinterpreted as tons of water, when in actuality it reports tons of organic material within the water discharge.
Of the remaining 33 metrics, three derived metrics and two disclosure rates are missed. None of the unclassified metrics in the sample are easily monetizable.
\enlargethispage{1\baselineskip}

\paragraph{Data Storage}
After parsing into \gls{rdf} triples the data is stored in an Oxigraph~\cite{Pellissier_Tanon_Oxigraph} database, which uses the RocksDB key-value store internally,%
\footnote{\url{https://rocksdb.org/}}
chosen for its speed and high-level interface. While parsing we build an index of company names and known aliases mapped to their ID to avoid unnecessary string matching at runtime.
Next the data is additionally indexed using Qlever~\cite{bast2017qlever} to allow query autocompletion and faster string search.
Since Qlever does not allow editing the graph once indexed, we keep the Oxigraph store and only defer to Qlever for exploratory queries.


\paragraph{Data Uses}
Once the \gls{kg} is constructed, we provide two ways to interact with it:
Firstly \gls{atca} through the \gls{ui}, where a company and date may be specified, which triggers running pre-written queries and generates a visualization of the approximate hidden cost by metric.
We also visualize the evolution of the approximate hidden cost for a window surrounding the queried year to alert to potential outliers from data quality issues.


Secondly, we offer direct query access to the graph for fine grained control. While possible through the \gls{ui}, we also expose a Qlever \cite{bast2017qlever} endpoint, which provides convenience features such as autocomplete, syntax highlighting and query performance analysis.
To improve accessibility to users unfamiliar with SPARQL syntax, we also provide a \gls{llm}-aided text-to-SPARQL interface.



