Graph Attention Networks

Petar Veličković; Guillem Cucurull; Arantxa Casanova; Adriana Romero; Pietro Liò; Yoshua Bengio

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of computationally intensive matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

TL;DR: A novel approach to processing graph-structured data by neural networks, leveraging attention over a node's neighborhood. Achieves state-of-the-art results on transductive citation network tasks and an inductive protein-protein interaction task.

Keywords: Deep Learning, Graph Convolutions, Attention, Self-Attention

Code: [![github](/images/github_icon.svg) PetarV-/GAT](https://github.com/PetarV-/GAT) + [![Papers with Code](/images/pwc_icon.svg) 89 community implementations](https://paperswithcode.com/paper/?openreview=rJXMpikCZ)

Data: [Brazil Air-Traffic](https://paperswithcode.com/dataset/brazil-air-traffic), [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [Chameleon(60%/20%/20% random splits)](https://paperswithcode.com/dataset/chameleon-60-20-20-random-splits-1), [Citeseer](https://paperswithcode.com/dataset/citeseer), [Cora](https://paperswithcode.com/dataset/cora), [Cornell (60%/20%/20% random splits)](https://paperswithcode.com/dataset/cornell-60-20-20-random-splits), [Deezer-Europe](https://paperswithcode.com/dataset/deezer-europe-1), [Film (60%/20%/20% random splits)](https://paperswithcode.com/dataset/film-60-20-20-random-splits), [Flickr30k](https://paperswithcode.com/dataset/flickr30k), [JHMDB](https://paperswithcode.com/dataset/jhmdb), [OGB](https://paperswithcode.com/dataset/ogb), [PPI](https://paperswithcode.com/dataset/ppi), [Penn94](https://paperswithcode.com/dataset/penn94), [PubMed (60%/20%/20% random splits)](https://paperswithcode.com/dataset/pubmed-60-20-20-random-splits), [Pubmed](https://paperswithcode.com/dataset/pubmed), [Squirrel (60%/20%/20% random splits)](https://paperswithcode.com/dataset/squirrel-60-20-20-random-splits), [Texas(60%/20%/20% random splits)](https://paperswithcode.com/dataset/texas-60-20-20-random-splits-1), [USA Air-Traffic](https://paperswithcode.com/dataset/usa-air-traffic), [Wisconsin(60%/20%/20% random splits)](https://paperswithcode.com/dataset/wisconsin-60-20-20-random-splits-1), [ZINC](https://paperswithcode.com/dataset/zinc), [genius](https://paperswithcode.com/dataset/genius)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 54 code implementations](https://www.catalyzex.com/paper/graph-attention-networks/code)

22 Replies

Loading