BioCode: A Data-Driven Procedure to Learn the Growth of Biological Networks

Emre Sefer

2022 (modified: 05 Feb 2023)IEEE ACM Trans. Comput. Biol. Bioinform. 2022Readers: Everyone

Abstract: Probabilistic biological network growth models have been utilized for many tasks including but not limited to capturing mechanism and dynamics of biological growth activities, null model representation, capturing anomalies, etc. Well-known examples of these probabilistic models are Kronecker model, preferential attachment model, and duplication-based model. However, we should frequently keep developing new models to better fit and explain the observed network features while new networks are being observed. Additionally, it is difficult to develop a growth model each time we study a new network. In this paper, we propose B <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">io C <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ode , a framework to automatically discover novel biological growth models matching user-specified graph attributes in directed and undirected biological graphs. B <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">io C <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ode designs a basic set of instructions which are common enough to model a number of well-known biological graph growth models. We combine such instruction-wise representation with a genetic algorithm based optimization procedure to encode models for various biological networks. We mainly evaluate the performance of B <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">io C <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ode in discovering models for biological collaboration networks, gene regulatory networks, and protein interaction networks which features such as assortativity, clustering coefficient, degree distribution closely match with the true ones in the corresponding real biological networks. As shown by the tests on the simulated graphs, the variance of the distributions of biological networks generated by B <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">io C <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ode is similar to the known models’ variance for these biological network types.

0 Replies