%\shuvendu{Lock on Related work.}

In this section, we discuss how \tech relates to previous works on synthesizing program invariants statically, dynamically and neurally.
% https://dl.acm.org/doi/pdf/10.1145/3510003.3510141
% https://www.gregorykapfhammer.com/download/research/papers/key/Cobb2011-paper.pdf#page=2.92 
% uses 787 67 41 test cases

 % https://lingming.cs.illinois.edu/publications/issta2014.pdf#page=5.75

% Towards combining the cognitive abilities of large language models with the rigor of deductive progam verification

\subsection{Static approaches}
Static techniques, such as interpolation~\cite{mcmillan2004interpolants} or abstract interpretation~\cite{cousot1977abstract} perform a symbolic analysis of source code to compute static over-approximations of runtime behavior and represent them as program invariants over suitable domains.
These techniques are often used to prove the safety properties of the code.
They focus on synthesizing loop invariants and method pre/postconditions, and a few around module-level specifications~\cite{lahiri-cav09}. 
Given the undecidability of program verification, these techniques scale poorly for real-world programs, especially in the presence of complex data structures and frameworks. 
In contrast, \tech can be applied to large codebases to synthesize high-quality class invariants but does not guarantee soundness by construction. 
%symbolic execution~\cite{Tillmann2006} rely on source code for invariant generation. 
%They usually give up the dynamic aspect of the code which is critical in revealing which inputs to the program are relevant and capturing the user's real intent.
%Dynamic techniques (e.g., ICE-learning~\cite{garg2014learning,garg2016learning}, LoopInvGen~\cite{padhi2016loopinvgen}) derive invariants from concrete examples. 


\subsection{Dynamic approaches}
Dynamic synthesis techniques, such as Daikon~\cite{ernst2007daikon}, DIG~\cite{nguyen2012dig}, SLING~\cite{le2019sling}, and specification mining~\cite{ammons2002}, learn invariants by observing the dynamic behaviors of programs over a set of concrete execution traces. 
One advantage of these dynamic techniques is that they can be agnostic to the code and generally applicable to different languages. 
However, these approaches are limited by the templates or patterns over which the invariants can be expressed. 
DySy~\cite{dysy} employs dynamic symbolic execution to alleviate the problem of fixed templates for bounded executions but resorts to ad-hoc abstraction for loops or recursion. 
\citep{hellendoorn2019are} trained models to predict the quality of invariants generated by tools such as Daikon, but do not generate new invariants. 
SpecFuzzer~\cite{facundo2022specfuzzer} generates numerous candidate assertions via fuzzing to construct templates and filters them using Daikon and mutation testing. 
Finally, Geminus~\citep{boockmann2024geminius} aims at synthesizing sound and complete class invariants representing the set of reachable states, guiding their search using random test cases termed Random Walk.

Unlike these approaches, \tech can generate a much larger class of invariants, leveraging multimodal inputs, including source code, test cases, comments, and even the naming convention learned from training data, to enhance invariant synthesis.
Further, unlike prior dynamic approaches, LLM-based test generation (an active area of research~\cite{codamosa-icse23,schäfer2023empiricalevaluationusinglarge,yang2024whitefox}) reduces the need to have a high-quality test suite to obtain the invariants.

For the use case of static verification, learning-based approaches have been used to iteratively improve the quality of the synthesized inductive invariants~\cite{garg2014learning, garg2016learning, padhi2016loopinvgen} from dynamic traces. 
However, these approaches have not been evaluated in real-world programs due to the need for symbolic reasoning. 

\subsection{Neural approaches}
LLM-based invariant synthesis is an emerging area of research with some noteworthy recent contributions. \citep{pei2023learning} trained a model for zero-shot invariant synthesis, which incurs high training costs and lacks feedback-driven repair. 
Their approach uses Daikon-generated invariants as both training data and ground truth, which can lead to spurious invariants. %SpecGen~\cite{ma2024specgen} generates user intent and proof artifacts, aiming for the input program verification. They rely solely on the source code, assuming it to be entirely correct, and do not leverage examples. They did, however, evaluate the quality of the generated specifications through a user study.

Prior work on nl2postcond~\cite{nl2postcond} prompts LLMs to generate pre and postcondition of Python and Java benchmarks, illustrating LLMs' ability to generate high-quality specifications. 
However, they do not prune incorrect invariants and do not generate class invariants that \tech does. 
It is an interesting future work to combine this work with \tech to generate complete class-level specifications including pre and postconditions for the public methods of the class.

Two very recent neuro–symbolic pipelines extend LLM prompting to \emph{other} kinds of specifications. \cite{WuASE24} combine GPT-4 with bounded-model checking to infer \emph{loop invariants}: the LLM enumerates candidate predicates, a BMC oracle filters them, and the surviving predicates are re-assembled into provable invariants, yielding a 97\,\% success rate on 316 numeric-loop benchmarks.
\cite{WenCAV24} (\textsc{AutoSpec}) weave static slicing and an off-the-shelf program verifier with LLM generation to synthesise \emph{function-level contracts}; AutoSpec verifies 79\,\% of heterogeneous benchmarks plus an X.509 parser case study.
Both systems rely on \emph{static} or SMT-based oracles and target scalar loops or procedure specifications,
whereas \tech tackles \emph{pointer-rich class/object invariants} in idiomatic~C++ and validates them chiefly through \emph{dynamic} test-suite execution plus mutation testing.
The different oracle allows our approach to scale to data-structure code bases where precise SMT models are hard to obtain.


For static verification, recent works include the use of LLM for intent-formalization from natural language~\cite{lahiri2024evaluating},  and inferring specifications and inductive program invariants~\cite{loopy,ma2024specgen}.
None of these techniques scale to real-world programs due to the need for complex symbolic reasoning. 

%\subsection{Class Invariant Synthesis.}



%\paragraph{Loop Invariant Synthesis.}

%There is much prior work on loop invariant synthesis. Counterexample-guided refinement techniques such as ICE~\cite{garg2014learning, garg2016learning} and LoopInvGen~\cite{padhi2016loopinvgen} operate through a learning-checking loop. In each iteration, a candidate invariant is inferred from positive, negative, and implication examples, and then checked for correctness. Our approach adopts a similar refinement loop.

%\paragraph{Quality of Invariants.}

%\citet{hellendoorn2019are} trained models to predict the quality of invariants. However, their definition of quality is only concerned with whether the invariants always hold, and their prediction focuses on pre/post-conditions. The training data for their models were collected through both random testing and manual labeling.

% In summary, \tech seeks to navigate the gap in current research by producing high-quality, user-intended object invariants using both dynamic and static data, enhanced through the capabilities of LLMs.

%----------- old related work -----------

%The goal of \tech is to synthesize object invariants that capture the user's intent, as opposed to previous methods based on proof search or trail-and-error template instantiation.

%\livia{explain daikon. DySy. Alex recommended work.}
%\livia{There is a long line of work in deriving program invariants for the observed execution behavior of the program. These include systems such as Daikon [8] and DySy \cite{dysy}, which extends the derived program invariants with symbolic execution.} \xiaokang{Incorporated this comment into the next para.}
% https://dl.acm.org/doi/pdf/10.1145/3510003.3510141
% https://www.gregorykapfhammer.com/download/research/papers/key/Cobb2011-paper.pdf#page=2.92 
% uses 787 67 41 test cases

 % https://lingming.cs.illinois.edu/publications/issta2014.pdf#page=5.75

 


% Towards combining the cognitive abilities of large language models with the rigor of deductive progam verification

%\paragraph{Traditional Invariant Synthesis.}

%There are two primary categories of traditional invariant synthesis techniques: dynamic and static. Dynamic synthesis techniques, such as Daikon~\cite{ernst2007daikon}, DIG~\cite{nguyen2012dig}, SLING~\cite{le2019sling}, and specification mining~\cite{ammons2002}, can be employed for intent formalization given a set of concrete execution traces. One advantage of these dynamic techniques is that they can be agnostic to the code and generally applicable to different languages. Moreover, they can be incorporated into a counterexample-guided framework to iteratively improve the quality of the synthesized invariants~\cite{garg2014learning, garg2016learning, padhi2016loopinvgen}. However, despite some effort of utilizing symbolic execution~\cite{dysy}, the naturalness of the synthesized invariants are not adequately examined in these works.

%Static techniques, such as interpolation~\cite{mcmillan2004interpolants} or symbolic execution~\cite{Tillmann2006}, in contrast, rely on source code for invariant generation. 
%They usually give up the dynamic aspect of the code which is critical in revealing which inputs to the program are relevant and capturing the user's real intent.
%Dynamic techniques (e.g., ICE-learning~\cite{garg2014learning,garg2016learning}, LoopInvGen~\cite{padhi2016loopinvgen}) derive invariants from concrete examples. 

%Our LLM-based approach combines the strengths of both traditional approaches while opening avenues for leveraging multimodal inputs, including source code, test cases, comments, and even the naming convention learned from training data, to enhance invariant synthesis.

%\paragraph{LLM-based Invariant Synthesis.}

%LLM-based invariant synthesis is an emerging area of research with some noteworthy recent contributions. \citet{pei2023learning} trained a model for zero-shot invariant synthesis, which incurs high training costs and lacks counterexample-guided refinement. Their approach uses Daikon-generated invariants as both training data and ground truth, which can lead to spurious invariants. SpecGen~\cite{ma2024specgen} generates user intent and proof artifacts, aiming for the input program verification. They rely solely on the source code, assuming it to be entirely correct, and do not leverage examples. They did, however, evaluate the quality of the generated specifications through a user study.

%A line of recent work on NL2PostCond~\cite{nl2postcond}, Loopy~\cite{loopy}, and specification evaluation~\cite{lahiri2024evaluating} also contributes to this area by focusing on leveraging various forms of input (source code, test cases, comments) for invariant generation.

%\paragraph{Class Invariant Synthesis.}

%Recent progress in class invariant synthesis includes the work by Geminus~\citep{boockmann2024geminius}, which aims at generating sound and complete class invariants representing the set of reachable states, rather than inductive invariants for verification. They guide their search using random test cases termed Random Walk.

%SpecFuzzer~\cite{facundo2022specfuzzer} generates numerous candidate assertions via fuzzing and filters them using Daikon. Although SpecFuzzer focuses on filtering good assertions among many candidates, \tech's main concern is to find a single invariant that can pass all tests, thus making SpecFuzzer a potential baseline for comparison.        

%\paragraph{Loop Invariant Synthesis.}

%There is significant previous work on loop invariant synthesis. Counterexample-guided refinement techniques such as ICE~\cite{garg2014learning, garg2016learning} and LoopInvGen~\cite{padhi2016loopinvgen} through a learning-checking loop. In each iteration, a candidate invariant is inferred from positive, negative and inference examples, and checked for correctness. Our approach adopts a similar refinement loop. 

%\paragraph{Quality of Invariants.}

%\citet{hellendoorn2019are} trained models to predict the quality of invariants. However, their definition of quality is only concerned with whether the invariants always hold, and their prediction focuses on pre/post-conditions. The training data for their models were collected through both random testing and manual labeling.

% In summary, \tech seeks to navigate the gap in current research by producing high-quality, user-intended object invariants using both dynamic and static data, enhanced through the capabilities of LLMs.