TL;DR: We study the desirable traits for a feedforward computational graph used by a neural network, finding fidelity and mixing time to be important complementary metrics to optimise.
Abstract: As implied by the plethora of literature on graph rewiring, the choice of computational graph employed by a neural network can make a significant impact on its downstream performance. Certain effects related to the computational graph, such as under-reaching and over-squashing, may even render the model incapable of learning certain functions. Most of these effects have only been thoroughly studied in the domain of undirected graphs; however, recent years have seen a significant rise in interest in feedforward computational graphs: directed graphs without any back edges. In this paper, we study the desirable properties of a feedforward computational graph, discovering two important complementary measures: fidelity and mixing time, and evaluating a few popular choices of graphs through the lens of these measures. Our study is backed by both theoretical analyses of the metrics' asymptotic behaviour for various graphs, as well as correlating these metrics to the performance of trained neural network models using the corresponding graphs.
Lay Summary: AI systems frequently have to process information that arrives _sequentially_ -- whether it's dealing with forecasting the dynamic structure of friendship connections of a social network, or future words in a sentence, or anything in between. A key defining property of contemporary efficient approaches to such tasks is processing this data in a _feedforward_ manner: explicitly disallowing information from future snapshots of the data from influencing the past. But even with this constraint in place, we do not need to allow _all_ previous snapshots to directly interact with the current one: in fact, this might even be problematic when data is collected over long horizons, given several previous theoretical results. Naturally, we are interested in discovering a "good" set of allowed connections between data points under such a feedforward regime -- a _good feedforward computational graph_. In this work, we define two interesting but complementary metrics that quantify the properties of information transfer within a given feedforward graph. We make several theoretical connections between these metrics and known structural quantities of graphs, as well as analysing several well-known graph distributions from this perspective, both theoretically and empirically. To our surprise, we find that this issue was relatively understudied even in mathematics (in spite of its importance), and that many existing concepts helping us determine good graphs only apply to graphs with allowed backwards connections. As our research makes some of the first steps towards answering this question, we hope it meaningfully paves the way for works that come after.
Primary Area: Deep Learning->Graph Neural Networks
Keywords: graphs, graph neural networks, random walks, mixing time
Submission Number: 12222
Loading