TEDLH: Domain HMMs for sensitive detection of remote homologues

Claudia Alvarez Carreño, Anton S. Petrov, Vaishali P. Waman, Ian Sillitoe, Christine Orengo

Published: 08 Jan 2026, Last Modified: 27 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Motivation The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods. We used these TED domain annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH). TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships.Results TEDLH links domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships. Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships. These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity.HMM–HMM comparisons within CATH superfamily 3.30.70.100 illustrate how transitive relationships expand sequence coverage in TEDLH. In this superfamily, 4,813 TEDLH HMMs are connected to 212 CATH-PDB representatives. Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score >0.6) and a conserved two-layer α/β sandwich core fold.All-against-all HMM–HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits). At low query coverage (<50%), cross-hits are more frequent between CATH classes, whereas at higher coverage thresholds (>70%) they predominantly occur between superfamilies. These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation. As an example, analysis of cross-hits between superfamilies 2.170.130.30 and 3.10.20.30 reveals evolutionary relationships between these groups.Availability and Implementation TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.org/10.6084/m9.figshare.28531754 for local use.Contact c.carreno{at}ucl.ac.uk
Loading