EmbedTAD Using Graph Embedding and Unsupervised Learning to Identify TADs from High-Resolution Hi-C Data
Abstract: Topologically Associating Domains (TADs) serve a functional purpose as self-interacting regions whose boundaries are enriched with various proteins. Identifying these TAD regions is essential for examining several biological characteristics, including immune system function and chromosome organization. In this study, we propose EmbedTAD for identifying TAD regions from high-resolution Hi-C data. To achieve this, we utilize NetMF, a graph embedding technique that employs low computational resources, and cluster the embeddings into TAD regions using the HDBSCAN algorithm. We demonstrate that, during T-cell differentiation, EmbedTAD detects TAD rearrangements and can differentiate between active and inactive cells. Furthermore, we show that EmbedTAD recovers a significant number of TADs also present in PLAC-seq data, demonstrating its reproducibility. We confirm that EmbedTAD detects TADs with distinct ChIP-seq signals surrounding their boundaries, including CTCF, RAD21, and SMC3. Overall, EmbedTAD reliably and efficiently identifies TADs with minimal computational resources, outperforming many state-of-the-art methods. EmbedTAD applies Graph Embedding and Unsupervised Learning to discover TADs and demonstrates outstanding use cases in PLAC-seq and T-cell data, as well as remarkable computational and biological validation scores.
External IDs:doi:10.1038/s42003-025-09224-z
Loading