Abstract: Recently, deep learning has made rapid progress in antibody design, which plays a key role in the advancement of therapeutics. A dominant paradigm is to train a model to jointly generate the antibody sequence and the structure as a candidate. However, the joint generation requires the model to generate both the discrete amino acid categories and the continuous 3D coordinates; this limits the space of possible architectures and may lead to suboptimal performance. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach is simple, our idea allows the use of powerful neural architectures and demonstrates notable performance improvements. We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens. Such sequences are both out-of-distribution and prone to undesirable developability properties that can trigger harmful immune responses in patients. To resolve this, we introduce a composition-based objective that allows an efficient trade-off between high performance and low token repetition. Our results demonstrate that ASSD consistently outperforms existing antibody design models, while the composition-based objective successfully mitigates token repetition of non-autoregressive models. Our code is available at \url{https://anonymous.4open.science/r/ASSD_public-5E17}.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the reviewers' requests and feedback, we have made the following revisions:
1. Further emphasized that our aim is not to develop a state-of-the-art model, but rather to propose a general approach for enhancing existing joint prediction models.
2. Clarified that the token repetition issue in non-AR models is limited to one-shot or few-shot models.
3. Highlighted that our proposed regularization loss is adaptable to any metric, in principle (Section 4.2).
4. Explained the repeated entries in Table 7 (caption, Section 7.1).
5. Corrected a missing reference on page 7.
6. Modified the legend in Figure 4b.
7. Added a discussion on why NAR models outperform AR counterparts (Section 8, Appendix H).
8. Revised the discussion section on structure-based drug design (Section 8).
9. Clarified the color bar meaning in Figure 4.
Assigned Action Editor: ~Mark_Coates1
Submission Number: 3272
Loading