deeprobust.graph.defense package

Submodules

deeprobust.graph.defense.adv_training module

class AdvTraining(model, adversary=None, device='cpu')[source]

Adversarial training framework for defending against attacks.

Parameters
  • model – model to protect, e.g, GCN

  • adversary – attack model

  • device (str) – ‘cpu’ or ‘cuda’

adv_train(features, adj, labels, idx_train, train_iters, **kwargs)[source]

Start adversarial training.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • train_iters (int) – number of training epochs

deeprobust.graph.defense.gcn module

class GCN(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]

2 Layer Graph Convolutional Network.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCN.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GCN
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gcn = GCN(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gcn = gcn.to('cpu')
>>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping
>>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping
>>> gcn.test(idx_test)
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]

Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

  • normalize (bool) – whether to normalize the input adjacency matrix.

  • patience (int) – patience for early stopping, only valid when idx_val is given

initialize()[source]

Initialize parameters of GCN.

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCN

Return type

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters

idx_test – node testing indices

class GraphConvolution(in_features, out_features, with_bias=True)[source]

Simple GCN layer, similar to https://github.com/tkipf/pygcn

forward(input, adj)[source]

Graph Convolutional Layer forward function

deeprobust.graph.defense.gcn_preprocess module

class GCNJaccard(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNJaccard.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNJaccard
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNJaccard(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
drop_dissimilar_edges(features, adj, metric='similarity')[source]

Drop dissimilar edges.(Faster version using numba)

fit(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features. The format can be numpy.array or scipy matrix

  • adj – the adjacency matrix.

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCNJaccard

Return type

torch.FloatTensor

class GCNSVD(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNSVD.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNSVD
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNSVD(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
fit(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • k (int) – number of singular values and vectors to compute.

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCNSVD

Return type

torch.FloatTensor

truncatedSVD(data, k=50)[source]

Truncated SVD on input data.

Parameters
  • data – input matrix to be decomposed

  • k (int) – number of singular values and vectors to compute.

Returns

reconstructed matrix.

Return type

numpy.array

deeprobust.graph.defense.pgd module

class PGD(params, proxs, alphas, lr=torch.optim.optimizer.required, momentum=0, dampening=0, weight_decay=0)[source]

Proximal gradient descent.

Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • proxs (iterable) – iterable of proximal operators

  • alpha (iterable) – iterable of coefficients for proximal gradient descent

  • lr (float) – learning rate

  • momentum (float) – momentum factor (default: 0)

  • weight_decay (float) – weight decay (L2 penalty) (default: 0)

  • dampening (float) – dampening for momentum (default: 0)

class ProxOperators[source]

Proximal Operators.

prox_l1(data, alpha)[source]

Proximal operator for l1 norm.

prox_nuclear(data, alpha)[source]

Proximal operator for nuclear norm (trace norm).

class SGD(params, lr=torch.optim.optimizer.required, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]
step(closure=None)[source]

Performs a single optimization step.

Parameters

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

deeprobust.graph.defense.prognn module

class EstimateAdj(adj, symmetric=False, device='cpu')[source]

Provide a pytorch parameter matrix for estimated adjacency matrix and corresponding operations.

class ProGNN(model, args, device)[source]

ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.

Parameters
  • model – model: The backbone GNN model in ProGNN

  • args – model configs

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

See details in https://github.com/ChandlerBang/Pro-GNN.

fit(features, adj, labels, idx_train, idx_val, **kwargs)[source]

Train Pro-GNN.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices

test(features, labels, idx_test)[source]

Evaluate the performance of ProGNN on test set

deeprobust.graph.defense.r_gcn module

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.

http://pengcui.thumedialab.com/papers/RGCN.pdf

Author’s Tensorflow implemention:

https://github.com/thumanlab/nrlweb/tree/master/static/assets/download

class GGCL_D(in_features, out_features, dropout)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is distribution

class GGCL_F(in_features, out_features, dropout=0.6)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is feature

class GaussianConvolution(in_features, out_features)[source]

[Deprecated] Alternative gaussion convolution layer.

class RGCN(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.

Parameters
  • nnodes (int) – number of nodes in the input grpah

  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • gamma (float) – hyper-parameter for RGCN. See more details in the paper.

  • beta1 (float) – hyper-parameter for RGCN. See more details in the paper.

  • beta2 (float) – hyper-parameter for RGCN. See more details in the paper.

  • lr (float) – learning rate for GCN

  • dropout (float) – dropout rate for GCN

  • device (str) – ‘cpu’ or ‘cuda’.

fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]

Train RGCN.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • train_iters (int) – number of training epochs

  • verbose (bool) – whether to show verbose logs

Examples

We can first load dataset and then train RGCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import RGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1],
                 nclass=labels.max()+1, nhid=32, device='cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val,
              train_iters=200, verbose=True)
>>> model.test(idx_test)
predict()[source]
Returns

output (log probabilities) of RGCN

Return type

torch.FloatTensor

test(idx_test)[source]

Evaluate the peformance on test set

Module contents

class GCN(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]

2 Layer Graph Convolutional Network.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCN.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GCN
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gcn = GCN(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gcn = gcn.to('cpu')
>>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping
>>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping
>>> gcn.test(idx_test)
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]

Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

  • normalize (bool) – whether to normalize the input adjacency matrix.

  • patience (int) – patience for early stopping, only valid when idx_val is given

initialize()[source]

Initialize parameters of GCN.

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCN

Return type

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters

idx_test – node testing indices

class GCNSVD(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNSVD.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNSVD
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNSVD(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
fit(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • k (int) – number of singular values and vectors to compute.

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCNSVD

Return type

torch.FloatTensor

truncatedSVD(data, k=50)[source]

Truncated SVD on input data.

Parameters
  • data – input matrix to be decomposed

  • k (int) – number of singular values and vectors to compute.

Returns

reconstructed matrix.

Return type

numpy.array

class GCNJaccard(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNJaccard.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNJaccard
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNJaccard(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
drop_dissimilar_edges(features, adj, metric='similarity')[source]

Drop dissimilar edges.(Faster version using numba)

fit(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.

Parameters
  • features – node features. The format can be numpy.array or scipy matrix

  • adj – the adjacency matrix.

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCNJaccard

Return type

torch.FloatTensor

class RGCN(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.

Parameters
  • nnodes (int) – number of nodes in the input grpah

  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • gamma (float) – hyper-parameter for RGCN. See more details in the paper.

  • beta1 (float) – hyper-parameter for RGCN. See more details in the paper.

  • beta2 (float) – hyper-parameter for RGCN. See more details in the paper.

  • lr (float) – learning rate for GCN

  • dropout (float) – dropout rate for GCN

  • device (str) – ‘cpu’ or ‘cuda’.

fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]

Train RGCN.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping

  • train_iters (int) – number of training epochs

  • verbose (bool) – whether to show verbose logs

Examples

We can first load dataset and then train RGCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import RGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1],
                 nclass=labels.max()+1, nhid=32, device='cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val,
              train_iters=200, verbose=True)
>>> model.test(idx_test)
predict()[source]
Returns

output (log probabilities) of RGCN

Return type

torch.FloatTensor

test(idx_test)[source]

Evaluate the peformance on test set

class ProGNN(model, args, device)[source]

ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.

Parameters
  • model – model: The backbone GNN model in ProGNN

  • args – model configs

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

See details in https://github.com/ChandlerBang/Pro-GNN.

fit(features, adj, labels, idx_train, idx_val, **kwargs)[source]

Train Pro-GNN.

Parameters
  • features – node features

  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix

  • labels – node labels

  • idx_train – node training indices

  • idx_val – node validation indices

test(features, labels, idx_test)[source]

Evaluate the performance of ProGNN on test set

class GraphConvolution(in_features, out_features, with_bias=True)[source]

Simple GCN layer, similar to https://github.com/tkipf/pygcn

forward(input, adj)[source]

Graph Convolutional Layer forward function

class GGCL_F(in_features, out_features, dropout=0.6)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is feature

class GGCL_D(in_features, out_features, dropout)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is distribution

class GAT(nfeat, nhid, nclass, heads=8, output_heads=1, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

2 Layer Graph Attention Network based on pytorch geometric.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • heads (int) – number of attention heads

  • output_heads (int) – number of attention output heads

  • dropout (float) – dropout rate for GAT

  • lr (float) – learning rate for GAT

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_bias (bool) – whether to include bias term in GAT weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GAT.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GAT
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gat = GAT(nfeat=features.shape[1],
          nhid=8, heads=8,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gat = gat.to('cpu')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> gat.fit(pyg_data, patience=100, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=1000, initialize=True, verbose=False, patience=100, **kwargs)[source]

Train the GAT model, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • pyg_data – pytorch geometric dataset object

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

  • patience (int) – patience for early stopping, only valid when idx_val is given

initialize()[source]

Initialize parameters of GAT.

predict()[source]
Returns

output (log probabilities) of GAT

Return type

torch.FloatTensor

test()[source]

Evaluate GAT performance on test set.

Parameters

idx_test – node testing indices

train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class ChebNet(nfeat, nhid, nclass, num_hops=3, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

2 Layer ChebNet based on pytorch geometric.

Parameters
  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • num_hops (int) – number of hops in ChebConv

  • dropout (float) – dropout rate for ChebNet

  • lr (float) – learning rate for ChebNet

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_bias (bool) – whether to include bias term in ChebNet weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train ChebNet.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import ChebNet
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> cheby = ChebNet(nfeat=features.shape[1],
          nhid=16, num_hops=3,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> cheby = cheby.to('cpu')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> cheby.fit(pyg_data, patience=10, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]

Train the ChebNet model, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • pyg_data – pytorch geometric dataset object

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

  • patience (int) – patience for early stopping, only valid when idx_val is given

initialize()[source]

Initialize parameters of ChebNet.

predict()[source]
Returns

output (log probabilities) of ChebNet

Return type

torch.FloatTensor

test()[source]

Evaluate ChebNet performance on test set.

Parameters

idx_test – node testing indices

train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class SGC(nfeat, nclass, K=3, cached=True, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

SGC based on pytorch geometric. Simplifying Graph Convolutional Networks.

Parameters
  • nfeat (int) – size of input feature dimension

  • nclass (int) – size of output dimension

  • K (int) – number of propagation in SGC

  • cached (bool) – whether to set the cache flag in SGConv

  • lr (float) – learning rate for SGC

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_bias (bool) – whether to include bias term in SGC weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train SGC.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import SGC
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> sgc = SGC(nfeat=features.shape[1], K=3, lr=0.1,
          nclass=labels.max().item() + 1, device='cuda')
>>> sgc = sgc.to('cuda')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> sgc.fit(pyg_data, train_iters=200, patience=200, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]

Train the SGC model, when idx_val is not None, pick the best model according to the validation loss.

Parameters
  • pyg_data – pytorch geometric dataset object

  • train_iters (int) – number of training epochs

  • initialize (bool) – whether to initialize parameters before training

  • verbose (bool) – whether to show verbose logs

  • patience (int) – patience for early stopping, only valid when idx_val is given

initialize()[source]

Initialize parameters of SGC.

predict()[source]
Returns

output (log probabilities) of SGC

Return type

torch.FloatTensor

test()[source]

Evaluate SGC performance on test set.

Parameters

idx_test – node testing indices

train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class SimPGCN(nnodes, nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, lambda_=5, gamma=0.1, bias_init=0, with_bias=True, device=None)[source]
SimP-GCN: Node similarity preserving graph convolutional networks.

https://arxiv.org/abs/2011.09643

Parameters
  • nnodes (int) – number of nodes in the input grpah

  • nfeat (int) – size of input feature dimension

  • nhid (int) – number of hidden units

  • nclass (int) – size of output dimension

  • lambda_ (float) – coefficients for SSL loss in SimP-GCN

  • gamma (float) – coefficients for adaptive learnable self-loops

  • bias_init (float) – bias init for the score

  • dropout (float) – dropout rate for GCN

  • lr (float) – learning rate for GCN

  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.

  • with_bias (bool) – whether to include bias term in GCN weights.

  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train SimPGCN.

See the detailed hyper-parameter setting in https://github.com/ChandlerBang/SimP-GCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import SimPGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> model = SimPGCN(nnodes=features.shape[0], nfeat=features.shape[1],
    nhid=16, nclass=labels.max()+1, device='cuda')
>>> model = model.to('cuda')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True)
>>> model.test(idx_test)
initialize()[source]

Initialize parameters of SimPGCN.

myforward(fea, adj)[source]

output embedding and log_softmax

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized data

Parameters
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.

Returns

output (log probabilities) of GCN

Return type

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters

idx_test – node testing indices

class Node2Vec[source]

node2vec: Scalable Feature Learning for Networks. KDD’15. To use this model, you need to “pip install node2vec” first.

Examples

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.global_attack import NodeEmbeddingAttack
>>> from deeprobust.graph.defense import Node2Vec
>>> data = Dataset(root='/tmp/', name='cora_ml', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # set up attack model
>>> attacker = NodeEmbeddingAttack()
>>> attacker.attack(adj, attack_type="remove", n_perturbations=1000)
>>> modified_adj = attacker.modified_adj
>>> print("Test Node2vec on clean graph")
>>> model = Node2Vec()
>>> model.fit(adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test Node2vec on attacked graph")
>>> model = Node2Vec()
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
node2vec(adj, embedding_dim=64, walk_length=30, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1, p=4, q=1)[source]

Compute Node2Vec embeddings for the given graph.

Parameters
  • adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph

  • embedding_dim (int, optional) – Dimension of the embedding

  • walks_per_node (int, optional) – Number of walks sampled from each node

  • walk_length (int, optional) – Length of each random walk

  • workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)

  • window_size (int, optional) – Window size (see gensim.models.Word2Vec)

  • num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)

  • p (float) – The hyperparameter p in node2vec

  • q (float) – The hyperparameter q in node2vec

class DeepWalk(type='skipgram')[source]

DeepWalk: Online Learning of Social Representations. KDD’14. The implementation is modified from https://github.com/abojchevski/node_embedding_attack

Examples

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.global_attack import NodeEmbeddingAttack
>>> from deeprobust.graph.defense import DeepWalk
>>> data = Dataset(root='/tmp/', name='cora_ml', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # set up attack model
>>> attacker = NodeEmbeddingAttack()
>>> attacker.attack(adj, attack_type="remove", n_perturbations=1000)
>>> modified_adj = attacker.modified_adj
>>> print("Test DeepWalk on clean graph")
>>> model = DeepWalk()
>>> model.fit(adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test DeepWalk on attacked graph")
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test DeepWalk SVD")
>>> model = DeepWalk(type="svd")
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
deepwalk_skipgram(adj, embedding_dim=64, walk_length=80, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1)[source]

Compute DeepWalk embeddings for the given graph using the skip-gram formulation.

Parameters
  • adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph

  • embedding_dim (int, optional) – Dimension of the embedding

  • walks_per_node (int, optional) – Number of walks sampled from each node

  • walk_length (int, optional) – Length of each random walk

  • workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)

  • window_size (int, optional) – Window size (see gensim.models.Word2Vec)

  • num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)

deepwalk_svd(adj, window_size=10, embedding_dim=64, num_neg_samples=1, sparse=True)[source]

Compute DeepWalk embeddings for the given graph using the matrix factorization formulation. adj: sp.csr_matrix, shape [n_nodes, n_nodes]

Adjacency matrix of the graph

window_size: int

Size of the window

embedding_dim: int

Size of the embedding

num_neg_samples: int

Number of negative samples

sparse: bool

Whether to perform sparse operations

Returns

Embedding matrix.

Return type

np.ndarray, shape [num_nodes, embedding_dim]

svd_embedding(x, embedding_dim, sparse=False)[source]

Computes an embedding by selection the top (embedding_dim) largest singular-values/vectors. :param x: sp.csr_matrix or np.ndarray

The matrix that we want to embed

Parameters
  • embedding_dim – int Dimension of the embedding

  • sparse – bool Whether to perform sparse operations

Returns

np.ndarray, shape [?, embedding_dim], np.ndarray, shape [?, embedding_dim] Embedding matrices.