% \subsection{Can \tech find bugs in code?}
% We provide an anecdote how \tech with a user in the loop found a bug in the code for queue.

% \subsection{Limitation}
At present, \tech judges an invariant’s correctness with the same test suite that the LLM co-generates alongside that invariant. This design keeps the pipeline fully automated, but it also risks co-adaptation: the model can drift toward invariants that merely fit the behaviours exercised by its own tests, overstating their generality.

\tech uses generated tests for invariant pruning, but the test suite may include spurious tests that can incorrectly prune valid invariants. The generated tests might not represent valid sequences of method calls; for example, invoking a \CodeIn{pop()} method before a \CodeIn{push()} method could fail certain assertions, leading to improper pruning.

Another limitation is the LLM's context window, which restricts the amount of code that can be processed in a single call. This limitation makes it challenging to handle large codebases. \tech partially addresses this issue through compositional generation, breaking down the code into manageable parts. Ongoing advancements in LLMs, as highlighted in recent work~\cite{liu2024lost,gao2023retrieval}, are also expected to mitigate this limitation.

For future work, we plan to integrate invariant generation with the generation of formal specifications for member functions, enabling LLM a more comprehensive understanding of program behavior. Additionally, we aim to evaluate \tech on larger and more complex systems beyond Z3, demonstrating its scalability to diverse codebases.


% \dave{Prompt size is important. Probably mention that there's lots of work on the LLM side to improve this.}\livia{addressed}

% \begin{itemize}
%     \item Large prompt size (z3): Compositional prompt
%     \item Unable to run tests (Drivers)
% \end{itemize}


% \subsection{Future Work}
% Integrate with the generation of specifications for member functions.
% Do more and larger systems. (Not specifically z3.)
% \dave{Future work can be brief, maybe longer term than this, and not detailed.
% A lot of it is obvious.  I'd say:  Integrate with the generation of specifications for member functions.
% Do more and larger systems. (Not specifically z3.)
% If you learned about new problems from this work, you might include them (but you are also giving them away to the competition, which may or may not matter).
% }

% \begin{itemize}
%     \item C++ data structures
%     \item Java?
%     \item Defects4j?
%     \item More code in z3?
% \end{itemize}
% \subsection{Found Bug in \CodeIn{Queue}}


% \begin{figure}[htp]
% \centering
% \begin{lstlisting}[language=c++, escapechar=!]
% T Queue::back() {
%   if (n <= 0)
%     throw std::out_of_range("Queue is empty");
%   assert (0 < tail < maxSize);
% !\CodeDelete\textbf{-}!  return data[tail];
% !\CodeAdd\textbf{+}!  return data[(tail - 1 + maxSize) % maxSize];
% }
% \end{lstlisting}
%     \caption{\CodeIn{Queue} bug in \CodeIn{expand} method}
%     \label{fig:inv_addition_kill}
% \end{figure}




