deeprobust.graph.defense package¶
Submodules¶
deeprobust.graph.defense.adv_training module¶
-
class
AdvTraining
(model, adversary=None, device='cpu')[source]¶ Adversarial training framework for defending against attacks.
- Parameters
model – model to protect, e.g, GCN
adversary – attack model
device (str) – ‘cpu’ or ‘cuda’
-
adv_train
(features, adj, labels, idx_train, train_iters, **kwargs)[source]¶ Start adversarial training.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
train_iters (int) – number of training epochs
deeprobust.graph.defense.gcn module¶
-
class
GCN
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]¶ 2 Layer Graph Convolutional Network.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCN.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GCN >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gcn = GCN(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gcn = gcn.to('cpu') >>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping >>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping >>> gcn.test(idx_test)
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]¶ Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
normalize (bool) – whether to normalize the input adjacency matrix.
patience (int) – patience for early stopping, only valid when idx_val is given
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCN
- Return type
torch.FloatTensor
-
class
GraphConvolution
(in_features, out_features, with_bias=True)[source]¶ Simple GCN layer, similar to https://github.com/tkipf/pygcn
deeprobust.graph.defense.gcn_preprocess module¶
-
class
GCNJaccard
(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNJaccard.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNJaccard >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNJaccard(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
-
drop_dissimilar_edges
(features, adj, metric='similarity')[source]¶ Drop dissimilar edges.(Faster version using numba)
-
fit
(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features. The format can be numpy.array or scipy matrix
adj – the adjacency matrix.
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCNJaccard
- Return type
torch.FloatTensor
-
class
GCNSVD
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNSVD.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNSVD >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNSVD(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
-
fit
(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
k (int) – number of singular values and vectors to compute.
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCNSVD
- Return type
torch.FloatTensor
deeprobust.graph.defense.pgd module¶
-
class
PGD
(params, proxs, alphas, lr=torch.optim.optimizer.required, momentum=0, dampening=0, weight_decay=0)[source]¶ Proximal gradient descent.
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
proxs (iterable) – iterable of proximal operators
alpha (iterable) – iterable of coefficients for proximal gradient descent
lr (float) – learning rate
momentum (float) – momentum factor (default: 0)
weight_decay (float) – weight decay (L2 penalty) (default: 0)
dampening (float) – dampening for momentum (default: 0)
deeprobust.graph.defense.prognn module¶
-
class
EstimateAdj
(adj, symmetric=False, device='cpu')[source]¶ Provide a pytorch parameter matrix for estimated adjacency matrix and corresponding operations.
-
class
ProGNN
(model, args, device)[source]¶ ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.
- Parameters
model – model: The backbone GNN model in ProGNN
args – model configs
device (str) – ‘cpu’ or ‘cuda’.
Examples
See details in https://github.com/ChandlerBang/Pro-GNN.
deeprobust.graph.defense.r_gcn module¶
- Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
- Author’s Tensorflow implemention:
https://github.com/thumanlab/nrlweb/tree/master/static/assets/download
-
class
GGCL_D
(in_features, out_features, dropout)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is distribution
-
class
GGCL_F
(in_features, out_features, dropout=0.6)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is feature
-
class
GaussianConvolution
(in_features, out_features)[source]¶ [Deprecated] Alternative gaussion convolution layer.
-
class
RGCN
(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]¶ Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
- Parameters
nnodes (int) – number of nodes in the input grpah
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
gamma (float) – hyper-parameter for RGCN. See more details in the paper.
beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
lr (float) – learning rate for GCN
dropout (float) – dropout rate for GCN
device (str) – ‘cpu’ or ‘cuda’.
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]¶ Train RGCN.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
train_iters (int) – number of training epochs
verbose (bool) – whether to show verbose logs
Examples
We can first load dataset and then train RGCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import RGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1], nclass=labels.max()+1, nhid=32, device='cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
Module contents¶
-
class
GCN
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]¶ 2 Layer Graph Convolutional Network.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCN.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GCN >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gcn = GCN(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gcn = gcn.to('cpu') >>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping >>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping >>> gcn.test(idx_test)
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]¶ Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
normalize (bool) – whether to normalize the input adjacency matrix.
patience (int) – patience for early stopping, only valid when idx_val is given
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCN
- Return type
torch.FloatTensor
-
class
GCNSVD
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNSVD.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNSVD >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNSVD(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
-
fit
(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
k (int) – number of singular values and vectors to compute.
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCNSVD
- Return type
torch.FloatTensor
-
class
GCNJaccard
(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNJaccard.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNJaccard >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNJaccard(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
-
drop_dissimilar_edges
(features, adj, metric='similarity')[source]¶ Drop dissimilar edges.(Faster version using numba)
-
fit
(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.
- Parameters
features – node features. The format can be numpy.array or scipy matrix
adj – the adjacency matrix.
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCNJaccard
- Return type
torch.FloatTensor
-
class
RGCN
(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]¶ Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
- Parameters
nnodes (int) – number of nodes in the input grpah
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
gamma (float) – hyper-parameter for RGCN. See more details in the paper.
beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
lr (float) – learning rate for GCN
dropout (float) – dropout rate for GCN
device (str) – ‘cpu’ or ‘cuda’.
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]¶ Train RGCN.
- Parameters
features – node features
adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
labels – node labels
idx_train – node training indices
idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
train_iters (int) – number of training epochs
verbose (bool) – whether to show verbose logs
Examples
We can first load dataset and then train RGCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import RGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1], nclass=labels.max()+1, nhid=32, device='cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
-
class
ProGNN
(model, args, device)[source]¶ ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.
- Parameters
model – model: The backbone GNN model in ProGNN
args – model configs
device (str) – ‘cpu’ or ‘cuda’.
Examples
See details in https://github.com/ChandlerBang/Pro-GNN.
-
class
GraphConvolution
(in_features, out_features, with_bias=True)[source]¶ Simple GCN layer, similar to https://github.com/tkipf/pygcn
-
class
GGCL_F
(in_features, out_features, dropout=0.6)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is feature
-
class
GGCL_D
(in_features, out_features, dropout)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is distribution
-
class
GAT
(nfeat, nhid, nclass, heads=8, output_heads=1, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ 2 Layer Graph Attention Network based on pytorch geometric.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
heads (int) – number of attention heads
output_heads (int) – number of attention output heads
dropout (float) – dropout rate for GAT
lr (float) – learning rate for GAT
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_bias (bool) – whether to include bias term in GAT weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GAT.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GAT >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gat = GAT(nfeat=features.shape[1], nhid=8, heads=8, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gat = gat.to('cpu') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> gat.fit(pyg_data, patience=100, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=1000, initialize=True, verbose=False, patience=100, **kwargs)[source]¶ Train the GAT model, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
pyg_data – pytorch geometric dataset object
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
patience (int) – patience for early stopping, only valid when idx_val is given
-
class
ChebNet
(nfeat, nhid, nclass, num_hops=3, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ 2 Layer ChebNet based on pytorch geometric.
- Parameters
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
num_hops (int) – number of hops in ChebConv
dropout (float) – dropout rate for ChebNet
lr (float) – learning rate for ChebNet
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_bias (bool) – whether to include bias term in ChebNet weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train ChebNet.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import ChebNet >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> cheby = ChebNet(nfeat=features.shape[1], nhid=16, num_hops=3, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> cheby = cheby.to('cpu') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> cheby.fit(pyg_data, patience=10, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]¶ Train the ChebNet model, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
pyg_data – pytorch geometric dataset object
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
patience (int) – patience for early stopping, only valid when idx_val is given
-
class
SGC
(nfeat, nclass, K=3, cached=True, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ SGC based on pytorch geometric. Simplifying Graph Convolutional Networks.
- Parameters
nfeat (int) – size of input feature dimension
nclass (int) – size of output dimension
K (int) – number of propagation in SGC
cached (bool) – whether to set the cache flag in SGConv
lr (float) – learning rate for SGC
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_bias (bool) – whether to include bias term in SGC weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train SGC.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import SGC >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> sgc = SGC(nfeat=features.shape[1], K=3, lr=0.1, nclass=labels.max().item() + 1, device='cuda') >>> sgc = sgc.to('cuda') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> sgc.fit(pyg_data, train_iters=200, patience=200, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]¶ Train the SGC model, when idx_val is not None, pick the best model according to the validation loss.
- Parameters
pyg_data – pytorch geometric dataset object
train_iters (int) – number of training epochs
initialize (bool) – whether to initialize parameters before training
verbose (bool) – whether to show verbose logs
patience (int) – patience for early stopping, only valid when idx_val is given
-
class
SimPGCN
(nnodes, nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, lambda_=5, gamma=0.1, bias_init=0, with_bias=True, device=None)[source]¶ - SimP-GCN: Node similarity preserving graph convolutional networks.
- Parameters
nnodes (int) – number of nodes in the input grpah
nfeat (int) – size of input feature dimension
nhid (int) – number of hidden units
nclass (int) – size of output dimension
lambda_ (float) – coefficients for SSL loss in SimP-GCN
gamma (float) – coefficients for adaptive learnable self-loops
bias_init (float) – bias init for the score
dropout (float) – dropout rate for GCN
lr (float) – learning rate for GCN
weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
with_bias (bool) – whether to include bias term in GCN weights.
device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train SimPGCN.
See the detailed hyper-parameter setting in https://github.com/ChandlerBang/SimP-GCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import SimPGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> model = SimPGCN(nnodes=features.shape[0], nfeat=features.shape[1], nhid=16, nclass=labels.max()+1, device='cuda') >>> model = model.to('cuda') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized data
- Parameters
features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- Returns
output (log probabilities) of GCN
- Return type
torch.FloatTensor
-
class
Node2Vec
[source]¶ node2vec: Scalable Feature Learning for Networks. KDD’15. To use this model, you need to “pip install node2vec” first.
Examples
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.global_attack import NodeEmbeddingAttack >>> from deeprobust.graph.defense import Node2Vec >>> data = Dataset(root='/tmp/', name='cora_ml', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # set up attack model >>> attacker = NodeEmbeddingAttack() >>> attacker.attack(adj, attack_type="remove", n_perturbations=1000) >>> modified_adj = attacker.modified_adj >>> print("Test Node2vec on clean graph") >>> model = Node2Vec() >>> model.fit(adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test Node2vec on attacked graph") >>> model = Node2Vec() >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test)
-
node2vec
(adj, embedding_dim=64, walk_length=30, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1, p=4, q=1)[source]¶ Compute Node2Vec embeddings for the given graph.
- Parameters
adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
embedding_dim (int, optional) – Dimension of the embedding
walks_per_node (int, optional) – Number of walks sampled from each node
walk_length (int, optional) – Length of each random walk
workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
window_size (int, optional) – Window size (see gensim.models.Word2Vec)
num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
p (float) – The hyperparameter p in node2vec
q (float) – The hyperparameter q in node2vec
-
-
class
DeepWalk
(type='skipgram')[source]¶ DeepWalk: Online Learning of Social Representations. KDD’14. The implementation is modified from https://github.com/abojchevski/node_embedding_attack
Examples
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.global_attack import NodeEmbeddingAttack >>> from deeprobust.graph.defense import DeepWalk >>> data = Dataset(root='/tmp/', name='cora_ml', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # set up attack model >>> attacker = NodeEmbeddingAttack() >>> attacker.attack(adj, attack_type="remove", n_perturbations=1000) >>> modified_adj = attacker.modified_adj >>> print("Test DeepWalk on clean graph") >>> model = DeepWalk() >>> model.fit(adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test DeepWalk on attacked graph") >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test DeepWalk SVD") >>> model = DeepWalk(type="svd") >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test)
-
deepwalk_skipgram
(adj, embedding_dim=64, walk_length=80, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1)[source]¶ Compute DeepWalk embeddings for the given graph using the skip-gram formulation.
- Parameters
adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
embedding_dim (int, optional) – Dimension of the embedding
walks_per_node (int, optional) – Number of walks sampled from each node
walk_length (int, optional) – Length of each random walk
workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
window_size (int, optional) – Window size (see gensim.models.Word2Vec)
num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
-
deepwalk_svd
(adj, window_size=10, embedding_dim=64, num_neg_samples=1, sparse=True)[source]¶ Compute DeepWalk embeddings for the given graph using the matrix factorization formulation. adj: sp.csr_matrix, shape [n_nodes, n_nodes]
Adjacency matrix of the graph
- window_size: int
Size of the window
- embedding_dim: int
Size of the embedding
- num_neg_samples: int
Number of negative samples
- sparse: bool
Whether to perform sparse operations
- Returns
Embedding matrix.
- Return type
np.ndarray, shape [num_nodes, embedding_dim]
-
svd_embedding
(x, embedding_dim, sparse=False)[source]¶ Computes an embedding by selection the top (embedding_dim) largest singular-values/vectors. :param x: sp.csr_matrix or np.ndarray
The matrix that we want to embed
- Parameters
embedding_dim – int Dimension of the embedding
sparse – bool Whether to perform sparse operations
- Returns
np.ndarray, shape [?, embedding_dim], np.ndarray, shape [?, embedding_dim] Embedding matrices.
-