Bridging Protein Structure to Sequence via Local Structure for Inverse Folding

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Protein inverse folding;Protein language model;Hierarchical generation
Abstract: The design of protein sequences based on given structures, known as inverse folding, has important applications in protein engineering. Protein structures are inherently hierarchical, composed of local structures (e.g., α-helices and β-sheets) connected by loops and coils. However, most existing methods treat inverse folding as a direct 3D structure to 1D sequence task, ignoring this crucial hierarchical information embedded in local structures. In this work, we propose Hier-IF, a controllable inverse folding model that explicitly incorporates structural hierarchy. Hier-IF reformulates the task as a “Tertiary Structure (TS) to Local Structure (LS) to Sequence (Seq)” process by first generating the sequence tokens corresponding to local structures and then building the connecting loops and coils. We introduce classifier-free guidance for controllable hierarchical generation and employ a bidirectional structure-sequence reconstruction loss during the training process. In the sampling process, we design a remask strategy that enables controllable generation following the structural hierarchy. When evaluating Hier-IF across multiple datasets, it surpasses other baselines and achieves high structural fidelity in local structures. Visualizations on generation results and ablation studies in different experimental settings further validate the effectiveness of our approach and provide interpretability in protein hierarchical inverse folding.
Supplementary Material: pdf
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22811
Loading