\begin{figure*}[ht]
    \centering
    \includegraphics[width=0.95\textwidth]{figure/framework.pdf}  % Adjust the width as needed
    \caption{Overview of \tech.}
    \label{fig:framework}
\end{figure*}

% The \tech framework (Figure~\ref{fig:framework}) describes an automatic pipeline for inferring and validating class invariants from source programs. In the Generation Phase, \tech takes a complete source program as input. It performs internal validation with co-generated test cases (we refer to as \textit{filtering tests}) to identify high-confidence invariant candidates (we refer to as \textit{filtered invariants})(Section~\ref{subsec:generation}). When one candidate fails \textit{filtering tests}, a possible refinement loop is employed to repair the target buggy invariant candidate.
% In the Validation Phase, when available, existing tests for the input source program then provide external validation, serving as ground truth to further evaluate these \tech generated \textit{filtered invariants} (Section~\ref{subsec:verification}).

An overview of the \tech framework is shown in Figure~\ref{fig:framework}. It outlines an automated pipeline for inferring class invariants from source code. 
% \shuvendu{Validation phase cannot be part of \tech. Otherwise, we do not need to generate tests, and exploit the validation tests. We should only describe them in Evaluation metrics.}\livia{addressed}
\tech takes a complete source program as input and outputs invariant candidates it has identified with high confidence (called \textit{filtered invariants}).
\tech starts with a preprocessing step which performs static analysis on the program (Section~\ref{subsec:generation}).  Next, an LLM is used to generate candidate invariants and filtering test suites (Section~\ref{subsec:generation}).  Then, the code is instrumented to facilitate checking candidate invariants (Section~\ref{subsec:generation}).  
Finally, \tech uses generated tests to prune invariant candidates (Section~\ref{sec:testing}), and a refinement loop is used to iteratively improve the results (Section~\ref{sec:refinement}). 
\shuvendu{Still cannot justify the test ranking}

% In the \emph{validation phase}, described in detail in Section~\ref{subsec:verification}, existing tests from the input source program serve as ground truth to evaluate the \textit{filtered invariants}. 
% It is important to note that these ground truth tests are not available during the \emph{generation phase}; they are used solely for evaluation purposes.
% \shuvendu{We can separate this as (a) generation, (b) heuristic pruning, (c) refinement. We should mention that the fact we instrument these invariants allows us to use the compiler to rule out syntactic/type-based errors in the invariants.} \livia{addressed}
%\shuvendu{We should state that we do not expect these validation tests to be present during inference time. The validation phase is only the evaluation phase.}\livia{addressed}


\subsection{Generation}
\label{subsec:generation}


%Through some prompt templates, \tech calls LLM APIs and get in response some generated tests
%It then produces both invariants and test cases.
%These generated tests play a crucial role in validation by identifying inaccurate invariants. When invariants %fail to pass these tests, \tech employs a feedback-driven refinement process to improve them. The generation phase outputs only those invariants that successfully pass this internal validation process.
% serve as the candidate invariants for the subsequent verification phase.
% \livia{what to call astred}
\subsubsection{Preprocessing}
\label{sec:preprocess}

As illustrated in Figure~\ref{fig:framework}, the generation phase begins with a static analysis of the source program. \tech uses a Tree-Sitter-based parser for program preprocessing; Tree-Sitter~\cite{treesitter} is a parser generator tool that constructs a syntax tree from source files.

\tech parses the entire source program into an abstract syntax tree (AST) to extract class members and their recursive dependencies. It then identifies the target class and gathers details (e.g., method declarations, field declarations, and subclass definitions) relevant to forming correct class invariants. \tech recursively analyzes all identified classes (i.e., the target class and its subclasses) and performs a topological sort to prepare generation from the leaf class upward as shown in Algorithm~\ref{algo:algo}.
% \clark{What exactly are the outputs of the preprocessing stage, and how are these used in the LLM stage?} \livia{The output is a sorted sequence of ASTs lining up for LLM generation. For each AST, you have the option to include/exclude source code based on the length of the program and LLM context window. \eg user of \tech may choose to exclude method implementation when dealing with \CodeIn{bdd\_manager} bc the source files are too large, but may choose to include everything from \CodeIn{AvlTree} bc it is short enough for any LLM context window.}
\shuvendu{If we define concepts such as get\_method\_bodies, we have to show how we use them.}

\begin{algorithm}[t]
\caption{Function to generate invariants for target class AST}
\small
\label{algo:algo}
\DontPrintSemicolon
\SetKwProg{Fn}{Function}{:}{}

\SetKwFunction{GenerateInvariant}{\textsc{GenerateInvariant}}
\SetKwFunction{getClassRecursively}{\textsc{getClassRecursively}}
\SetKwFunction{sort}{\textsc{reverseToplogicalSort}}
\SetKwFunction{getClassDependencies}{\textsc{getClassDependencies}}
\SetKwFunction{needsInvariant}{\textsc{needsInvariant}}
\SetKwFunction{getCodeForClass}{\textsc{getCodeForClass}}
\SetKwFunction{getCompletions}{\textsc{generateInvariantWithLLM}}

\SetKwFunction{getClassMethods}{\textsc{getClassMethods}}
\SetKwFunction{getText}{\textsc{getText}}
% \livia{change block, slice, annotate to types and member methods}

\Fn{\GenerateInvariant{target\_class}}{
    % \If{target\_class = None}{
    %     \Return{``''}\;
    % }
    \If{target\_class.id $\in$ invariants\_dict}{
        \Return{invariants\_dict[target\_class.id]}\; \label{line:cache}
    }
    
    dep\_classes $\leftarrow$ \getClassRecursively{target\_class}\; \label{line:collect}
    rev\_topsorted\_classes $\leftarrow$ \sort{dep\_classes}\; \label{line:sort}
    
    \ForEach{class $\in$ rev\_topsorted\_classes}{
            class\_code $\leftarrow$ \getCodeForClass{class, invariants\_dict}\; \label{line:classcode}
            
            \ForEach{dep $\in$ \getClassDependencies{class}}{
            \tcp{Invariant: dep has been generated invariants for}
                dep\_code $\leftarrow$ \getCodeForClass{dep, invariants\_dict}\; \label{line:dev-code}
                class\_code $\leftarrow$ class\_code  + dep\_code\; \label{line:dev-code-ed}
            }
            
            invariants\_dict[class.id] $\leftarrow$ \getCompletions{class\_code}\; \label{line:geninv}
    }
    
    \Return{invariants\_dict[target\_class.id]}\;
}

\Fn{\getCodeForClass{class, invariants\_dict, include\_method\_bodies=False}}{
    class\_text $\leftarrow$ class.get\_declaration\_text()\tcp*{Header} \label{line:helper-1}
    \If{class.id $\in$ invariants\_dict \textbf{ and } invariants\_dict[class.id]}{
        class\_text $\leftarrow$ invariants\_dict[class.id] + class\_text  \tcp*{Get generated invariants} \label{line:helper-2}
    }
    
    \If{include\_method\_bodies}{
        \tcp{Add method bodies if context allows} \label{line:contextwd}
    }
    \Return{class\_text}\;
}
\end{algorithm}

%\shuvendu{Why do you need the doubly nested loop for classes in Algorithm~\ref{algo:algo}}
%\shuvendu{Who decides the value of get\_method\_bodies flag?}\livia{user decide}

\subsubsection{Generation by LLM}
\label{sec:llm}

After building the source program AST, \tech uses LLMs to analyze the class module and infers both invariants for the target class and tests that exercise the class's implementation as thoroughly as possible. \tech uses a fixed system prompt that defines class invariants and outlines two main tasks: (1) generating class invariants from the source code, and (2) creating a test suite of valid API calls without specifying expected outputs (see prompt details in Appendix).
Next, \tech instantiates a user prompt template with the actual target class.

From the source program AST, \tech identifies program dependencies and populates the prompt template with the leaf struct/class. Starting from the leaf nodes, \tech leverages previously generated invariants by including them in the prompt for later classes. 
\shuvendu{I am still unconvinced that class invariants need to be exposed to clients without expanding to pre/post conditions.}
To accommodate the LLM's context window limit, only the relevant child classes of the current target class are included in the prompt, with method implementations and private fields/methods hidden when necessary. An algorithm for this process is presented in Algorithm~\ref{algo:algo}.

Algorithm~\ref{algo:algo} presents the invariant generation process for a source program AST. The main function \GenerateInvariant takes a \CodeIn{target\_class} and leverages a caching mechanism through \CodeIn{invariants\_dict} to avoid redundant computations (Line~\ref{line:cache}).
The algorithm first collects dependent classes via \getClassRecursively (Line~\ref{line:collect}) and sorts them using \sort to ensure dependency-aware processing (Line~\ref{line:sort}). For each \CodeIn{class} in the sorted order, it constructs the necessary context by obtaining the class code through \getCodeForClass (Line~\ref{line:classcode}). For each dependency \CodeIn{dep} of the current \CodeIn{class}, it retrieves the \CodeIn{dep\_code} and concatenates it with \CodeIn{class\_code} (Lines~\ref{line:dev-code}--\ref{line:dev-code-ed}). 
The algorithm then generates invariants using \getCompletions and stores them in \CodeIn{invariants\_dict} (Line~\ref{line:geninv}).
The helper function \getCodeForClass constructs class representations by combining the declaration text with any existing invariants from \CodeIn{invariants\_dict} (Lines~\ref{line:helper-1}--\ref{line:helper-2}).
It optionally includes method bodies based on context window constraints. This approach ensures efficient invariant generation while maintaining all necessary context and dependencies (Line~\ref{line:contextwd}).
The algorithm concludes by returning \CodeIn{final\_invariants} for the target class, effectively managing the invariant generation process while respecting LLM context limitations.


\tech accommodates large codebases by dividing the source program into smaller modules that fit within the LLM's context window. It then iteratively generates invariants and test cases, starting from leaf classes and working up towards the root class. At each step, \tech leverages previous invariants generated for child classes to inform the invariants for parent classes.

For the \CodeIn{AvlTree} example, we begin by instantiating the prompt with \CodeIn{Node} for annotation, followed by \CodeIn{AvlTree}, since \CodeIn{Node} is a subclass of \CodeIn{AvlTree}, as illustrated in Figure~\ref{fig:avl_header}. In this specific case, however, Algorithm~\ref{algo:algo} does not make a difference due to the small size of the source program; the entire \CodeIn{AvlTree} code fits within the LLM's context window easily.

\begin{figure}[htp]
    \centering
\begin{lstlisting}[language=c++, escapechar=!, basicstyle=\ttfamily\scriptsize, basewidth=0.5em]
class AvlTree {
public:
  AvlTree();
  AvlTree(const AvlTree &t);
  AvlTree &operator=(const AvlTree &t);
  ~AvlTree();

  void insert(const T &v);
  void remove(const T &v);
  bool contains(const T &v);
  void clear();
  int height();
  int size();
  bool empty();
  std::vector<T> in_order_traversal() const;
  std::vector<T> pre_order_traversal() const;
  std::vector<T> post_order_traversal() const;
private:
  struct Node {
    T data;
    std::unique_ptr<Node> left;
    std::unique_ptr<Node> right;
    int balance_factor();
  };
// rest of the file
}
\end{lstlisting}
    \caption{Header file of AvlTree.}
    \label{fig:avl_header}
\end{figure}

In contrast, when working with the \CodeIn{bdd\_manager} class in Z3 (around 1700 lines of code), \tech begins generation with \CodeIn{bdd}, a subclass of \CodeIn{bdd\_manager}. Algorithm~\ref{algo:algo} enables \tech to partition \CodeIn{bdd\_manager} class and outputs meaningful class invariants (see Section~\ref{sec:z3}).
%\shuvendu{Provide details on the prompt here as Clark mentions}. 
%\livia{addressed}
% \clark{This subsection covers one of the key innovations of the paper.  So you need a lot more details about how this works.  How is the information obtained during preprocessing used?  What prompt do you use? How did you come up with it?  Did you try other things?}\livia{addressed with Algo}
% They are both correct invariants and pass ground truth unit tests.
% \cy{TODO: explain with some example here}



% \subsubsection{Postprocessing}
% \tech parses the response from the target invariants and extracts the invariants and the tests.
% \cy{maybe we could skip this}

\subsubsection{Instrumentation}
\label{sec:instrumentation}

To check candidate invariants, each public method is automatically instrumented with a \CodeIn{check\_invariant} call at both the start and end of its implementation. This allows us to verify that invariants hold both before and after method execution. Each invariant is implemented as a method call to prevent conflicts with local variables.

When a specific invariant is being checked, its code is plugged into the \CodeIn{check\_invariant} function with assertions. This ensures that during pruning, whenever a public API call is made, each invariant candidate is automatically verified.

Additional examples of instrumentation and invariant checking are provided in Appendix~\ref{sec:appendix_approach_details}.

\subsection{Heuristic Pruning}
\label{sec:testing}

The LLM generates test suites that serve as filters for invariant candidates. To select the most effective test suite, we use line coverage as a metric, as it provides a straightforward proxy for test suite completeness. The test suite with the highest coverage becomes our set of \emph{filtering tests}.

When generating tests, \tech creates valid sequences of API calls without asserting expected behavior, since our goal is to filter invariant candidates rather than test the source program directly. Among all generated test suites, we compile and run each one with the source program, selecting the one with the highest line coverage as the \textit{filtering tests}.

\tech dynamically expands the \textit{filtering tests} only if coverage falls below a specified threshold (default $80\%$). In our experiments, each benchmark task's \textit{filtering tests} includes 5 to 15 individual tests, with each test comprising 5 to 20 lines of code (see example in Appendix~\ref{sec:appendix_approach_details}).

If an invariant candidate successfully compiles and runs with the \textit{filtering tests}, it is designated a \textit{filtered invariant} and included in the final output of \tech.

\subsection{Refinement}
\label{sec:refinement}

For invariants that fail during compilation or runtime, \tech implements a feedback-driven refinement process. The system collects compiler output, error messages, and test results, then feeds this information back to the LLM using a dedicated prompt template. 

This feedback loop allows \tech to repair failing invariants by providing the LLM with specific error information and the context in which the error occurred. We set a default threshold of 3 refinement attempts per invariant, balancing the cost of LLM calls with the benefit of repairs.

Refinement allows \tech to fix common issues such as type errors, undefined references, and logical inconsistencies. More detailed examples of the refinement process are provided in Appendix~\ref{sec:appendix_approach_details}.

% \subsection{Neural Verifier}
% \viraj{TODO}
% When test execution is not feasible.

% \begin{figure}[ht]
%     \centering
%     \includegraphics[width=\linewidth]{figure/neural_verifier.pdf}  % Adjust the width as needed
%     \caption{Naive Neural Verifier}
%     \label{fig:pdf-figure}
% \end{figure}




