Abstract: Large language models such as CodeBERT perform very well on tasks such as natural language code search. We show that this is most likely due to the high token overlap and similarity between the queries and the code in datasets obtained from large codebases, rather than any deeper understanding of the syntax or semantics of the query or code.
Paper Type: long
0 Replies
Loading