2.2.0
版本发布时间: 2022-12-01 15:31:52
pyg-team/pytorch_geometric最新发布版本:2.5.3(2024-04-19 19:37:44)
We are excited to announce the release of PyG 2.2 🎉🎉🎉
PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0
.
Highlights
pyg-lib
Integration
We are proud to release and integrate pyg-lib==0.1.0
into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).
You can install pyg-lib
as described in our README.md
:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
import pyg_lib
Once pyg-lib
is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:
-
pyg-lib
provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.
-
pyg-lib
provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.
GraphStore
and FeatureStore
Abstractions
PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore
and a GraphStore
that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).
feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ... # Add paper features
feature_store['author', 'x', None] = ... # Add author features
graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ... # Add edges in "COO" format
# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
graph_store=graph_store,
num_neighbors=[10, 20],
...
)
from torch_geometric.loader import NodeLoader
loader = NodeLoader(
data=(feature_store, graph_store),
node_sampler=graph_sampler,
batch_size=20,
input_nodes='paper',
)
for batch in loader:
pass
Data loading and sampling routines are refactored and decomposed into torch_geometric.loader
and torch_geometric.sampler
modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404, #5418).
Optimized and Fused Aggregations
PyG 2.2 further accelerates scatter
aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0
and torch-scatter>=2.1.0
) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).
We also optimized the usage of nn.aggr.MultiAggregation
by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).
Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):
Aggregators | Vanilla | Fusion |
---|---|---|
[sum, mean] |
0.3325s | 0.1996s |
[sum, mean, min, max] |
0.7139s | 0.5037s |
[sum, mean, var] |
0.6849s | 0.3871s |
[sum, mean, var, std] |
1.0955s | 0.3973s |
Lastly, we have incorporated "fused" GNN operators via the dgNN
package, starting with a FusedGATConv
implementation (#5140).
Community Sprint: Type Hints and TorchScript Support
We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.
We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768, #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).
Explainability
Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain
module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.
Some of the features developed in the sprint are incorporated into this release:
- Added the
torch_geometric.explain
module (#5804, #6054, #6089) - Moved and adapted the
GNNExplainer
module totorch_geometric.explain
(#5967, #6065). See here and here for the accompanying examples. - Extended
GNNExplainer
to support edge level explanations (#6056) - Added explainability support for heterogeneous GNNs via
to_captum_model
andto_captum_input
(#5886, #5934)
data = HeteroData(...)
model = HeteroGNN(...)
# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)
ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
inputs=inputs,
target=int(y[output_idx]),
additional_forward_args=additional_forward_args,
internal_batch_size=1,
)
Breaking Changes
- Renamed
drop_unconnected_nodes
todrop_unconnected_node_types
anddrop_orig_edges
todrop_orig_edge_types
inAddMetapaths
(#5490)
Deprecations
- The usage of
nn.models.GNNExplainer
is now deprecated in favor ofexplain.GNNExplainer
- The usage of
utils.dropout_adj
is now deprecated in favor ofutils.dropout_edge
- The usage of
loader.RandomNodeSampler
is now deprecated in favor ofloader.RandomNodeLoader
- The usage of
to_captum
is now deprecated in favor ofto_captum_model
.
Features
Layers, Models and Examples
- Added a "Link Prediction on MovieLens" Colab notebook (#5823)
- Added a bipartite link-prediction example (#5834)
- Added the
SSGConv
layer (#5599) - Added the
WLConvContinuous
layer for performing WL-refinement with continuous attributes (#5316) - Added the
PositionalEncoding
module (#5381) - Added a node classification example instrumented with Weights and Biases (#5192)
Data Loaders
- Added support for triplet sampling in
LinkNeighborLoader
(#6004) - Added
temporal_strategy = uniform/last
option toNeighborLoader
andLinkNeighborLoader
(#5576) - Added a
disjoint
option toNeighborLoader
andLinkNeighborLoader
(#5717, #5775) - Added
HeteroData
support inRandomNodeLoader
(#6007 - Added
int32
-basededge_index
support inNeighborLoader
(#5948) - Added support for
input_time
inNeighborLoader
(#5763) - Added
np.memmap
support inNeighborLoader
(#5696) - Added CPU affinitization support to
NeighborLoader
(#6005)
Transformations
- Added a
FeaturePropagation
transform (#5387) - Added
IndexToMask
andMaskToIndex
transforms (#5375, #5455) - Added
shuffle_node
,mask_feature
andadd_random_edge
augmentations (#5548) - Added
dropout_node
,dropout_edge
anddropout_path
augmentations (#5481, #5495, #5531) - Added a
AddRandomMetaPaths
transform that adds edges based on random walks along a metapath (#5397) - Added a
utils.to_smiles
function (#6038) - Added
HeteroData
support fortransforms.Constant
(#5700)
Datasets
- Added the
LRGBDataset
to include 5 datasets from the Long Range Graph Benchmark (#5935) - Added the
HydroNet
water cluster dataset (#5537, #5902, #5903) - Added the
DGraphFin
dynamic graph dataset (#5504) - Added the official splits to the
MalNetTiny
dataset (#5078) - Added a
print_summary
method totorch_geometric.data.Dataset
(#5438)
General Improvements
- Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
- Added the
utils.assortativity
function to compute the degree assortativity coefficient (#5587) - Add support for filling labels with dummy values in
HeteroData.to_homogeneous()
(#5540) - Added
torch.onnx.export
support (see here for an example) (#5877, #5997) - Added option to make normalization coefficients trainable in
PNAConv
(#6039) - Added a
semi_grad
option inVarAggregation
andStdAggregation
(#6042) - Added a warning for invalid node and edge type names in
HeteroData
(#5990) - Added
lr_scheduler_solver
and customizedlr_scheduler
classes (#5942) - Added
to_fixed_size
graph transformer (#5939) - Added support for symbolic tracing in the
SchNet
model (#5938) - Added support for customizing the interaction graph in the
SchNet
model (#5919) - Added
SparseTensor
support toSuperGATConv
(#5888) - Added TorchScript support for
AttentiveFP
(#5868) - Added a
return_semantic_attention_weights
argumentHANConv
(#5787) - Added temperature value customization in
dense_mincut_pool
(#5908) - Added support for a tuple of
in_channels
inGENConv
for bipartite message passing (#5627, #5641) - Added
Aggregation.set_validate_args
option to skip validation ofdim_size
(#5290) - Added
BaseStorage.get()
functionality (#5240) - Added support for batches of size one in
BatchNorm
(#5530, #5614) - The
AttentionalAggregation
module can now be applied to compute attention on a per-feature level (#5449) - Added TorchScript support to
ASAPooling
(#5395) - Updated the unsupervised
GraphSAGE
example to leverageLinkNeighborLoader
(#5317) - Added better out-of-bounds error message in
MessagePassing
(#5339) - Added support to customize the activation function in
PNAConv
(#5262)
Bugfixes
- Fixed a bug in
TUDataset
, in which node features were wrongly constructed whenevernode_attributes
only hold a single feature (e.g., inPROTEINS
) (#5441) - Fixed a bug in the
VirtualNode
transform, in which node features were mistakenly treated as edge features (#5819) - Fixed a bug when applying several scalers with
PNAConv
(#5514) - Fixed
setter
andgetter
handling inBaseStorage
(#5815) - Fixed the
auto_select_device
routine in GraphGym forpytorch_lightning>=1.7
(#5677) - Fixed
RandomLinkSplit
in case there aren't enough negative edges to sample (#5642) - Fixed the in-place modification to
mode_kwargs
inMultiAggregation
(#5601) - Fixed the
utils.to_dense_adj
routine in caseedge_index
is empty (#5476) - Fixed the
PointTransformerConv
to now correctly usesum
aggregation (#5332) - Fixed the output of
InMemoryDataset.num_classes
in case atransform
modifiesdata.y
(#5274) - Fail gracefully on
GLIBC
errors withintorch-spline-conv
(#5276)
Full Changelog
Added
- Extended
GNNExplainer
to support edge level explanations (#6056) - Added CPU affinitization for
NodeLoader
(#6005) - Added triplet sampling in
LinkNeighborLoader
(#6004) - Added
FusedAggregation
of simple scatter reductions (#6036) - Added a
to_smiles
function (#6038) - Added option to make normalization coefficients trainable in
PNAConv
(#6039) - Added
semi_grad
option inVarAggregation
andStdAggregation
(#6042) - Allow for fused aggregations in
MultiAggregation
(#6036, #6040) - Added
HeteroData
support forto_captum_model
and addedto_captum_input
(#5934) - Added
HeteroData
support inRandomNodeLoader
(#6007) - Added bipartite
GraphSAGE
example (#5834) - Added
LRGBDataset
to include 5 datasets from the Long Range Graph Benchmark (#5935) - Added a warning for invalid node and edge type names in
HeteroData
(#5990) - Added PyTorch 1.13 support (#5975)
- Added
int32
support inNeighborLoader
(#5948) - Add
dgNN
support andFusedGATConv
implementation (#5140) - Added
lr_scheduler_solver
and customizedlr_scheduler
classes (#5942) - Add
to_fixed_size
graph transformer (#5939) - Add support for symbolic tracing of
SchNet
model (#5938) - Add support for customizable interaction graph in
SchNet
model (#5919) - Started adding
torch.sparse
support to PyG (#5906, #5944, #6003) - Added
HydroNet
water cluster dataset (#5537, #5902, #5903) - Added explainability support for heterogeneous GNNs (#5886)
- Added
SparseTensor
support toSuperGATConv
(#5888) - Added TorchScript support for
AttentiveFP
(#5868) - Added
num_steps
argument to training and inference benchmarks (#5898) - Added
torch.onnx.export
support (#5877, #5997) - Enable VTune ITT in inference and training benchmarks (#5830, #5878)
- Add training benchmark (#5774)
- Added a "Link Prediction on MovieLens" Colab notebook (#5823)
- Added custom
sampler
support inLightningDataModule
(#5820) - Added a
return_semantic_attention_weights
argumentHANConv
(#5787) - Added
disjoint
argument toNeighborLoader
andLinkNeighborLoader
(#5775) - Added support for
input_time
inNeighborLoader
(#5763) - Added
disjoint
mode for temporalLinkNeighborLoader
(#5717) - Added
HeteroData
support fortransforms.Constant
(#5700) - Added
np.memmap
support inNeighborLoader
(#5696) - Added
assortativity
that computes degree assortativity coefficient (#5587) - Added
SSGConv
layer (#5599) - Added
shuffle_node
,mask_feature
andadd_random_edge
augmentation methdos (#5548) - Added
dropout_path
augmentation that drops edges from a graph based on random walks (#5531) - Add support for filling labels with dummy values in
HeteroData.to_homogeneous()
(#5540) - Added
temporal_strategy
option toneighbor_sample
(#5576) - Added
torch_geometric.sampler
package to docs (#5563) - Added the
DGraphFin
dynamic graph dataset (#5504) - Added
dropout_edge
augmentation that randomly drops edges from a graph - the usage ofdropout_adj
is now deprecated (#5495) - Added
dropout_node
augmentation that randomly drops nodes from a graph (#5481) - Added
AddRandomMetaPaths
that adds edges based on random walks along a metapath (#5397) - Added
WLConvContinuous
for performing WL refinement with continuous attributes (#5316) - Added
print_summary
method for thetorch_geometric.data.Dataset
interface (#5438) - Added
sampler
support toLightningDataModule
(#5456, #5457) - Added official splits to
MalNetTiny
dataset (#5078) - Added
IndexToMask
andMaskToIndex
transforms (#5375, #5455) - Added
FeaturePropagation
transform (#5387) - Added
PositionalEncoding
(#5381) - Consolidated sampler routines behind
torch_geometric.sampler
, enabling ease of extensibility in the future (#5312, #5365, #5402, #5404), #5418) - Added
pyg-lib
neighbor sampling (#5384, #5388) - Added
pyg_lib.segment_matmul
integration withinHeteroLinear
(#5330, #5347)) - Enabled
bf16
support in benchmark scripts (#5293, #5341) - Added
Aggregation.set_validate_args
option to skip validation ofdim_size
(#5290) - Added
SparseTensor
support to inference and training benchmark suite (#5242, #5258, #5881) - Added experimental mode in inference benchmarks (#5254)
- Added node classification example instrumented with Weights and Biases (W&B) logging and W&B Sweeps (#5192)
- Added experimental mode for
utils.scatter
(#5232, #5241, #5386) - Added missing test labels in
HGBDataset
(#5233) - Added
BaseStorage.get()
functionality (#5240) - Added a test to confirm that
to_hetero
works withSparseTensor
(#5222) - Added
torch_geometric.explain
module with base functionality for explainability methods (#5804, #6054, #6089)
Changed
- Moved and adapted
GNNExplainer
fromtorch_geometric.nn
totorch_geometric.explain.algorithm
(#5967, #6065) - Optimized scatter implementations for CPU/GPU, both with and without backward computation (#6051, #6052)
- Support temperature value in
dense_mincut_pool
(#5908) - Fixed a bug in which
VirtualNode
mistakenly treated node features as edge features (#5819) - Fixed
setter
andgetter
handling inBaseStorage
(#5815) - Fixed
path
inhetero_conv_dblp.py
example (#5686) - Fix
auto_select_device
routine in GraphGym for PyTorch Lightning>=1.7 (#5677) - Support
in_channels
withtuple
inGENConv
for bipartite message passing (#5627, #5641) - Handle cases of not having enough possible negative edges in
RandomLinkSplit
(#5642) - Fix
RGCN+pyg-lib
forLongTensor
input (#5610) - Improved type hint support (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852)
- Avoid modifying
mode_kwargs
inMultiAggregation
(#5601) - Changed
BatchNorm
to allow for batches of size one during training (#5530, #5614) - Integrated better temporal sampling support by requiring that local neighborhoods are sorted according to time (#5516, #5602)
- Fixed a bug when applying several scalers with
PNAConv
(#5514) - Allow
.
inParameterDict
key names (#5494) - Renamed
drop_unconnected_nodes
todrop_unconnected_node_types
anddrop_orig_edges
todrop_orig_edge_types
inAddMetapaths
(#5490) - Improved
utils.scatter
performance by explicitly choosing better implementation foradd
andmean
reduction (#5399) - Fix
to_dense_adj
with emptyedge_index
(#5476) - The
AttentionalAggregation
module can now be applied to compute attentin on a per-feature level (#5449) - Ensure equal lenghts of
num_neighbors
across edge types inNeighborLoader
(#5444) - Fixed a bug in
TUDataset
in which node features were wrongly constructed whenevernode_attributes
only hold a single feature (e.g., inPROTEINS
) (#5441) - Breaking change: removed
num_neighbors
as an attribute of loader (#5404) -
ASAPooling
is now jittable (#5395) - Updated unsupervised
GraphSAGE
example to leverageLinkNeighborLoader
(#5317) - Replace in-place operations with out-of-place ones to align with
torch.scatter_reduce
API (#5353) - Breaking bugfix:
PointTransformerConv
now correctly usessum
aggregation (#5332) - Improve out-of-bounds error message in
MessagePassing
(#5339) - Allow file names of a
Dataset
to be specified as either property and method (#5338) - Fixed separating a list of
SparseTensor
withinInMemoryDataset
(#5299) - Improved name resolving of normalization layers (#5277)
- Fail gracefully on
GLIBC
errors withintorch-spline-conv
(#5276) - Fixed
Dataset.num_classes
in case atransform
modifiesdata.y
(#5274) - Allow customization of the activation function within
PNAConv
(#5262) - Do not fill
InMemoryDataset
cache ondataset.num_features
(#5264) - Changed tests relying on
dblp
datasets to instead use synthetic data (#5250) - Fixed a bug for the initialization of activation function examples in
custom_graphgym
(#5243) - Allow any integer tensors when checking edge_index input to message passing (5281)
Removed
- Removed
scatter_reduce
option from experimental mode (#5399)
Full commit list: https://github.com/pyg-team/pytorch_geometric/compare/2.1.0...2.2.0