% \cy{Add Nikolaj feedback. Add qualitative analysis}
% 56 SpecBot-generated invariants \\
% 54 are validated by Z3 unit tests.
\begin{table}[!htp]\centering
\caption{Statistics of the studied data structures in Z3}\label{tab:z3-data}
\small
\begin{tabular}{l|rrrrrrrr}\toprule
&ema &dlist &heap &hashtable &permutation &scoped\_vector &bdd\_manager \\\midrule
\# LoC &57 &243 &309 &761 &177 &220 &1635 \\
\# dependencies &0 &2 &1 &9 &2 &2 &13 \\
\bottomrule
\end{tabular}
\end{table}

As a real-world case study, we apply \tech to synthesize invariants for 7 core data structures from Z3~\cite{z3}, ranging from the simple 57-line \CodeIn{ema} class to the complex 1635-line \CodeIn{bdd\_manager}. The complete set includes \CodeIn{dlist}, \CodeIn{heap}, \CodeIn{hashtable}, \CodeIn{permutation}, and \CodeIn{scoped\_vector}, with varying implementation complexity and the number of dependent classes as shown in Table~\ref{tab:z3-data}. 
% \shuvendu{Say how many generations, isolated unit tests,etc.}\livia{addressed}
Our results were validated by one of the Z3 authors, who confirmed at least one \emph{correct and useful} invariant for each studied class, with the \CodeIn{bdd\_manager} class yielding 11 valuable invariants including the 2 already written by Z3 authors.

% \shuvendu{Say a sentence or two about z3, use cases and the codebase popularity etc.}
Z3 is a widely adopted SMT solver used in a variety of high-stakes applications requiring rigorous correctness, such as formal verification, program analysis, and automated reasoning. It is integrated into tools like LLVM~\cite{llvm}, KLEE~\cite{klee}, Dafny~\cite{leino2010dafny} and Frama-C~\cite{kirchner2015frama}.
% \shuvendu{Say the stats of these modules LOC etc.}\livia{addressed}
We selected the Z3 codebase due to its stringent correctness requirements; as an SMT solver, Z3 is employed in applications demanding high reliability. This high-stakes environment makes Z3 an ideal testbed for assessing the effectiveness of synthesized invariants.


% 1 \textit{correct and useful} invariant for each of the \CodeIn{ema}, \CodeIn{dlist}, \CodeIn{heap}, \CodeIn{hashtable}, and \CodeIn{permutation} classes, 3 for \CodeIn{scoped\_vector}, and 11 for \CodeIn{bdd\_manager}.
% \cy{maybe we can put a table here for each class with the LoC and number of useful invariants by \tech}
% \shuvendu{Can you clarify if correct and useful includes the invariants already present?}\livia{addressed}



The \CodeIn{bdd\_manager} class\footnote{\url{https://github.com/Z3Prover/z3/blob/master/src/math/dd/dd_bdd.h}} is particularly noteworthy. 
It was chosen because it is a self-contained example with developer-written unit tests for validation, presenting a realistic yet manageable challenge. 
Note that the existing developer tests were used after invariants were generated, not as input to the LLM. 
The \CodeIn{bdd\_manager} class in Z3 is a utility for managing Binary Decision Diagrams (BDDs), which are data structures used to represent Boolean functions efficiently. In BDDs, Boolean functions are represented as directed acyclic graphs, where each non-terminal node corresponds to a Boolean variable, and edges represent the truth values of these variables (\textit{true} or \textit{false}). This representation simplifies complex Boolean expressions and enables efficient operations on Boolean functions.


With 382 lines of code in its header and 1253 lines in the implementation file, \CodeIn{bdd\_manager} surpasses standard data structure complexity, offering an opportunity to evaluate \tech's capability to generate meaningful invariants relevant to real-world scenarios. 
\tech achieves this by compositional generation, recursively traversing the source program's AST (Section~\ref{sec:llm}). 
Recursive generation became crucial when handling large classes like \CodeIn{bdd\_manager}, which exceeded the LLM’s context window. Decomposing and processing its components separately allowed us to fit relevant parts into the model’s input, demonstrating the utility of recursive invariant generation for large codebases. This supports its relevance in real-world applications beyond the benchmarks.

The  \CodeIn{bdd\_manager} class includes a developer-written member function for checking its well-formedness, as shown in Figure~\ref{fig:bdd_well_formed}, which we removed during \tech generation. Of the 56 invariants generated by \tech, one of Z3 main authors identified 11 distinct \textit{correct and useful} invariants (e.g., Figure~\ref{fig:bdd_correct_useful}) including the 2 developer-written invariants; these invariants could potentially be integrated into the codebase.
An additional 5 distinct \textit{ok} invariants (e.g., Figure~\ref{fig:bdd_ok})  are labeled correct but have limited utility, 16 distinct \textit{correct but useless} invariants (e.g., those already checked during compilation, such as type checks and constants, Figure~\ref{fig:bdd_correct_useless}), and 2 \textit{incorrect} invariants (e.g., Figure~\ref{fig:bdd_incorrect}). The remaining invariants were repetitions within these categories. This evaluation aligns with \tech's validation results, as our validation pipeline also identified 2 incorrect invariants that failed \CodeIn{bdd\_manager} unit tests.
\shuvendu{Why did you choose to report invariants that already failed the unit test?}

Overall, the Z3 authors' evaluation results further confirm \tech's potential utility in real-world, large-scale codebases.
%\shuvendu{What about the other modules that you mention in "Benchmarks" subsection? We can talk about how many were labeled correct and useful for the different classes without mentioning the pull requests that compromise anonymity.} \livia{addressed}

%\shuvendu{Did this module require us to construct the class invariant compositionally over the types or did the files fit into the prompt?}\livia{addressed}

%\shuvendu{Give anecdotes of invariants in the above categories.}\livia{addressed}
% correct and useful:
% Node consistency: Each node's index should * 3
% Free nodes should be internal and have a reference count of zero * 2
% Reference counts should not exceed the maximum allowed value 
% For non-constant nodes, the levels of m\_lo and m\_hi should be less than
% Variable consistency: Variables should be correctly rese
%  Invariant 1 */\nfor (unsigned i = 0; i < m\_nodes.s
% Invariant 2 */\nfor (unsigned i : 
%   Invariant 4: m\_nodes[i].m\_refcount is between 0 and max\_rc]
% Invariant 13: m\_cost\_bdd is a valid BDD */\nassert

% The size of m\_var2bdd should be twice the size of m\_var2l

% The size of m\_level2var should be equal to the size of m\_var2level

% ok:
% Cache consistency non-null * 4
% Invariant 6: m\_free\_nodes contains valid indices
% The reorder reference count vector should be the same s
% The level2nodes vector should have entries for each level

% correct and useless:
%  Marking consistency: Marked nodes should be correctly i
%  8 more identifiers are already declared as const or static const.
%  4 more This is not too interesting because it is covered by the type system. It wo
%  not interesting * 2 covered by type

% maybe:
% All nodes in m\_nodes should have valid levels and refer


% incorrect:

% The number of nodes should not exceed the maximum number of BDD nodes
% The mark level should be non-zero

% repetitions:


\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++]
bool bdd_manager::well_formed() {
    bool ok = true;
    for (unsigned n : m_free_nodes) {
        ok &= (lo(n) == 0 && hi(n) == 0 && m_nodes[n].m_refcount == 0);
        if (!ok) {
            IF_VERBOSE(0, verbose_stream() << "free node is not internal " << n << " " << lo(n) << " " << hi(n) << " " << m_nodes[n].m_refcount << "\n";
            display(verbose_stream()););
            UNREACHABLE();
            return false;
        }
    }
    
    for (bdd_node const& n : m_nodes) {
        if (n.is_internal()) continue;
        unsigned lvl = n.m_level;
        BDD lo = n.m_lo;
        BDD hi = n.m_hi;
        ok &= is_const(lo) || level(lo) < lvl;
        ok &= is_const(hi) || level(hi) < lvl;
        ok &= is_const(lo) || !m_nodes[lo].is_internal();
        ok &= is_const(hi) || !m_nodes[hi].is_internal();
        if (!ok) {
            IF_VERBOSE(0, display(verbose_stream() << n.m_index << " lo " << lo << " hi " << hi << "\n"););
            UNREACHABLE();
            return false;
        }
    }
    return ok;
}
\end{lstlisting}
    \caption{Z3 developer-written class invariants for \CodeIn{bdd\_manager} class}
    \label{fig:bdd_well_formed}
\end{figure}

\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++]
// Node consistency: Each node's index should match its position in m_nodes
for (unsigned i = 0; i < m_nodes.size(); ++i) {
    assert(m_nodes[i].m_index == i);
}
\end{lstlisting}
    \caption{\textit{Correct and useful} invariant for \CodeIn{bdd\_manager} class}
    \label{fig:bdd_correct_useful}
\end{figure}

\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++]
// Cache consistency: Entries in the operation cache should be valid
for (const auto* e : m_op_cache) {
    assert(e != nullptr);
    assert(e->m_result != null_bdd);
}
\end{lstlisting}
    \caption{\textit{Ok} invariant for \CodeIn{bdd\_manager} class}
    \label{fig:bdd_ok}
\end{figure}

\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++]
// m_is_new_node is a boolean
assert(m_is_new_node == true || m_is_new_node == false);
\end{lstlisting}
    \caption{\textit{Correct and useless} invariant for \CodeIn{bdd\_manager} class}
    \label{fig:bdd_correct_useless}
\end{figure}

% \begin{figure}[htp]
%     \centering
% \begin{lstlisting}[language=c++]
% // All nodes in m_nodes should have valid levels and reference counts
% for (const auto& node : m_nodes) {
%     assert(node.m_refcount <= max_rc);
%     assert(node.m_level < (1 << 22));
% }
% \end{lstlisting}
%     \caption{Maybe invariant for \CodeIn{bdd\_manager} class}
%     \label{fig:bdd_maybe}
% \end{figure}

\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++]
// The number of nodes should not exceed the maximum number of BDD nodes
assert(m_nodes.size() <= m_max_num_bdd_nodes);
\end{lstlisting}
    \caption{\textit{Incorrect} invariant for \CodeIn{bdd\_manager} class}
    \label{fig:bdd_incorrect}
\end{figure}

% The primary responsibilities of the \CodeIn{bdd\_manager} class include:

% \begin{itemize}
%     \item \textbf{Memory Management}: The \CodeIn{bdd\_manager} class manages memory for BDD nodes. Since BDDs can have numerous nodes, efficient memory management is critical. The class allocates and frees nodes as needed and may use strategies to reuse memory from deleted nodes.

%     \item \textbf{Node Management and Allocation}: BDD nodes are the building blocks of BDDs, each representing a variable in a Boolean expression. The \CodeIn{bdd\_manager} class handles the creation and connection of these nodes, ensuring properties like unique tables to avoid duplicate nodes for the same variable combination.

%     \item \textbf{Operations on BDDs}: The \CodeIn{bdd\_manager} class provides methods for logical operations (such as \textit{AND}, \textit{OR}, \textit{NOT}) on BDDs. These operations are essential for combining and manipulating Boolean expressions, enabling efficient computations for Boolean functions.

%     \item \textbf{Garbage Collection}: As BDDs can grow large, the \CodeIn{bdd\_manager} class likely includes garbage collection mechanisms to reclaim memory from unused nodes, thus ensuring memory efficiency.

%     \item \textbf{Invariant Management}: To maintain correctness, \CodeIn{bdd\_manager} enforces structural invariants, such as ensuring each node has a unique path and maintaining an optimized graph structure.
% \end{itemize}

% The \CodeIn{bdd\_manager} class is crucial in \textit{z3Prover} for efficiently handling Boolean functions in symbolic reasoning and decision-making tasks, which are fundamental to SMT solving.

