
\begin{figure*}[t]
    \centering
    \includegraphics[width=0.95\textwidth]{figure/framework.pdf}  % Adjust the width as needed
    \caption{Overview of \tech.}
    \label{fig:framework}
\end{figure*}

The \tech framework performs automated analysis of source code to generate class-level invariants. Taking a complete source program as input, it produces validated invariant candidates through two main phases: class invariant generation (\S~\ref{subsec:generation}) and class invariant verification (\S~\ref{subsec:verification}).


\livia{purpose of tests generation is to filter invariants. Not for testing code.}

\subsection{Generation Phase}
\label{subsec:generation}

As depicted in Figure~\ref{fig:framework}, the generation phase begins by pre-processing the source code.
It then produces both invariants and test cases.
The generated tests serve as a filter, identifying and discarding inaccurate invariants.
The remaining, validated invariants are the output of this phase and serve as the candidate invariants for the subsequent verification phase.

\subsubsection{Pre-processing}

\tech uses ?????, a Tree-Sitter based static analysis tool for program pre-processing. It extracts the class and members and recursive dependencies. It parses the entire source program into an AST. \tech looks for the target class and extracts all relevant information. It recursively performs similar analysis on all its sub-classes. For all identified classes (\ie target class and all sub-classes), \tech performs a topological sort and starts generating from the leaf class.

\subsubsection{Generation by LLM}

Once all relevant code has been gathered, \tech employs LLMs to analyze the class module and infer both invariants and tests. Importantly, when generating tests, \tech focuses on creating valid sequences of API calls to the class, rather than predicting specific test output values. This approach simplifies the process by eliminating the need to infer the test oracle.
The tests containing valid sequences are employed to filter out inaccurate invariants. To accomplish this, these tests are executed, and the generated invariants are evaluated to determine if they hold true under the test conditions. Figure~\ref{fig:avl_correct_inv1} invariant states that the number of nodes is correctly calculated. Figure~\ref{fig:avl_correct_inv2} states that BST property holds on AvlTree. They are both correct invariants and pass ground truth unit tests.
% \cy{TODO: explain with some example here}

% And as \tech is filtering the invariants, it needs only to call public methods after they have been instrumented with \CodeIn{check\_invariant} functions.
% \livia{append prompt} In the user prompt, \tech provides the current target class module source code including header and cpp files. 

\subsubsection{Postprocessing}
\tech parses the response from the target invariants and extracts the invariants and the tests.
\cy{maybe we could skip this}

\subsubsection{Instrumentation}
\label{instrumentation}
As shown in Figure~\ref{fig:avl_correct_inv1}, invariants are expressed as assertions and wrapped by an \CodeIn{check\_invariant} function. Each public method is instrumented with a \CodeIn{check\_invariant} call at the very beginning and the very end of the implementation. In this way, during filtering, when users make public API calls, the invariant candidate is automatically inspected.
\cy{Not sure whether we need to put much implementation details here}

\subsubsection{Pruning}

Not all tests are equally effective in identifying inaccurate invariants. To select the most suitable test suite, we employ line coverage as a metric. The tests with the highest coverage are considered 'good tests' and serve as a gold standard for filtering invariants. An invariant is deemed 'good' if and only if it successfully passes these 'good tests' (i.e., the invariants hold true when executing the tests).

When executing tests locally is impossible, \tech adopts a neural pruning method. \viraj{TODO}

\subsubsection{Refinements}

For invariants that fail 'good tests' execution, \tech takes in compiler output and error messages and feedback \cy{them?} to LLM. As an example, GCC produces Figure~\ref{fig:gcc_compiler_error} when compiling Figure~\ref{fig:avl_bst_before_refinement}. One refinement will fix it to the correct version shown in Figure~\ref{fig:avl_correct_inv2}.
% \cy{what is the feedback here}
 Along with 'good tests' and the invariant in question, LLM is asked to give a better answer.
An invariant is added to `good invariants' if and only if it holds true during the `good tests' execution.



\subsection{Verification Phase}
\label{subsec:verification}

The verification pipeline begins by taking the 'good' invariants from the generation phase as input and instrumenting the source code with each candidate invariant.
To evaluate the validity of these invariants, we utilize the benchmark's unit tests as a ground truth. Note that these unit tests are different from generated tests. While the generated tests are a sequence of API calls without expected test output, the unit tests contain input-output pairwise comparison.
By executing the instrumented code with these unit tests, we can determine the correctness of the candidate invariants by checking if they hold true during unit test execution.
Similarly to the instrumentation in Section~\ref{instrumentation}, \tech runs the ground truth unit tests to validate candidate invariants.

\subsection{Neural Verifier}
\viraj{TODO}
When test execution is not feasible.

\begin{figure}[ht]
    \centering
    \includegraphics[width=\linewidth]{figure/neural_verifier.pdf}  % Adjust the width as needed
    \caption{Naive Neural Verifier}
    \label{fig:pdf-figure}
\end{figure}

