MyGit

v2.3.0

pytorch/pytorch

版本发布时间: 2024-04-25 00:12:17

pytorch/pytorch最新发布版本:v2.5.1(2024-10-30 01:58:24)

PyTorch 2.3 Release notes

Highlights

We are excited to announce the release of PyTorch® 2.3! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training runs for 100B parameter models.

This release is composed of 3393 commits and 426 contributors since PyTorch 2.2. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.3. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

Stable Beta Prototype Performance Improvements
User-defined Triton kernels in torch.compile torch.export adds new API to specify dynamic_shapes Weight-Only-Quantization introduced into Inductor CPU backend
Tensor parallelism within PyTorch Distributed Asynchronous checkpoint generation
Support for semi-structured sparsity

*To see a full list of public feature submissions click here.

Tracked Regressions

torch.compile on MacOS is considered unstable for 2.3 as there are known cases where it will hang (#124497)

torch.compile imports many unrelated packages when it is invoked (#123954)

This can cause significant first-time slowdown and instability when these packages are not fully compatible with PyTorch within a single process.

torch.compile is not supported on Python 3.12 (#120233)

PyTorch support for Python 3.12 in general is considered experimental. Please use Python version between 3.8 and 3.11 instead. This is an existing issue since PyTorch 2.2.

Backwards Incompatible Changes

Change default torch_function behavior to be disabled when torch_dispatch is defined (#120632)

Defining a subclass with a torch_dispatch entry will now automatically set torch_function to be disabled. This aligns better with all the use cases we’ve observed for subclasses. The main change of behavior is that the result of the torch_dispatch handler will not go through the default torch_function handler anymore, wrapping it into the current subclass. This allows in particular for your subclass to return a plain Tensor or another subclass from any op.

The original behavior can be recovered by adding the following to your Tensor subclass:

@classmethod
def __torch_function__(cls, func, types, args=(), kwargs=None):
      return super().__torch_function__(func, types, args, kwargs)

ProcessGroupNCCL removes multi-device-per-thread support from C++ level (#119099, #118674)

Removes no_dist and coordinator_rank from public DCP API's (#121317)

As part of an overall effort to simplify our public facing API's for Distributed Checkpointing, we've decided to deprecate usage of the coordinator_rank and no_dist parameters under torch.distributed.checkpoint. In our opinion, these parameters can lead to confusion around the intended effect during API usage, and have limited value to begin with. One concrete example is here, https://github.com/pytorch/pytorch/issues/118337, where there is ambiguity in which Process Group is referenced by the coordinator rank (additional context: https://github.com/pytorch/pytorch/issues/118337). In the case of the no_dist parameter, we consider this an implementation detail which should be hidden from the user. Starting in this release, no_dist is inferred from the initialized state of the process group, assuming the intention is to use collectives if a process group is initialized, and assuming the opposite in the case it is not.

2.2 2.3
# Version 2.2.2
import torch.distributed.checkpoint as dcp

dcp.save(
	state_dict={"model": model.state_dict()},
       checkpoint_id="path_to_model_checkpoint"
       no_dist=True,
       coordinator_rank=0
)
# ...
dcp.load(
	state_dict={"model": model.state_dict()},
       checkpoint_id="path_to_model_checkpoint"
       no_dist=True,
       coordinator_rank=0
)
# Version 2.2.3
# no dist is assumed from pg state, and rank 0 is always coordinator.
import torch.distributed.checkpoint as dcp

dcp.save(
	state_dict={"model": model.state_dict()},
       checkpoint_id="path_to_model_checkpoint"
) 
# ...
dcp.load(
	state_dict={"model": model.state_dict()},
       checkpoint_id="path_to_model_checkpoint"
)

Remove deprecated tp_mesh_dim arg (#121432)

Starting from PyTorch 2.3, parallelize_module API only accepts a DeviceMesh (the tp_mesh_dim argument has been removed). If having a N-D DeviceMesh for multi-dimensional parallelism, you can use mesh_nd["tp"] to obtain a 1-D DeviceMesh for tensor parallelism.

torch.export

Enable fold_quantize by default in PT2 Export Quantization (#118701, #118605, #119425, #117797)

Previously, the PT2 Export Quantization flow did not generate quantized weight by default, but instead used fp32 weight in the quantized model in this pattern: fp32 weight -> q -> dq -> linear. Setting fold_quantize=True produces a graph with quantized weights in the quantized model in this pattern by default after convert_pt2e, and users will see a reduction in the model size: int8 weight -> dq -> linear.

2.2 2.3
folded_model = convert_pt2e(model, fold_quantize=True)
non_folded_model = convert_pt2e(model)
folded_model = convert_pt2e(model)
non_folded_model = convert_pt2e(model, fold_quantize=False)

Remove deprecated torch.jit.quantized APIs (#118406)

All functions and classes under torch.jit.quantized will now raise an error if called/instantiated. This API has long been deprecated in favor of torch.ao.nn.quantized.

2.2 2.3
# torch.jit.quantized APIs

torch.jit.quantized.quantize_rnn_cell_modules

torch.jit.quantized.quantize_rnn_modules
torch.jit.quantized.quantize_linear_modules

torch.jit.quantized.QuantizedLinear
torch.jit.QuantizedLinearFP16

torch.jit.quantized.QuantizedGRU
torch.jit.quantized.QuantizedGRUCell
torch.jit.quantized.QuantizedLSTM
torch.jit.quantized.QuantizedLSTMCell
# Corresponding torch.ao.quantization APIs

torch.ao.nn.quantized.dynamic.RNNCell

torch.ao.quantization.quantize_dynamic APIs

torch.ao.nn.quantized.dynamic.Linear

torch.ao.nn.quantized.dynamic.GRU
torch.ao.nn.quantized.dynamic.GRUCell
torch.ao.nn.quantized.dynamic.LSTM

Remove deprecated fbgemm operators (#112153)

TorchScript models that were exported with the deprecated torch.jit.quantized API will no longer be loadable, as the required internal operators have been removed. Please re-export your models using the newer torch.ao.quantization API instead.

Other

Deprecations

torch.autograd.Function: Using the torch.autograd.function.traceable decorator and getting/setting torch.autograd.Function's is_traceable is now deprecated (#121413)

These decorators were previously marked for internal use only. They will be removed in version 2.4.

torch.utils.checkpoint: not passing use_reentrant explicitly to activation checkpoint and checkpoint_sequential is deprecated (#116710)

torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.

(Note that this was already deprecated in a previous release. In this version, we improve the deprecation message.)

Deprecated torch.backends.cuda.sdp_kernel and replace with torch.nn.attention.sdpa_kernel (#114689)

This PR deprecated torch.backends.cuda.sdp_kernel, users can now use torch.nn.attention.sdpa_kernel instead. The old code will raise the following warning: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.

2.2 2.3
import torch
from torch.backends.cuda import sdp_kernel

with sdp_kernel(enable_math=False, enable_flash=False, enable_mem_efficient=True):
    torch.nn.functional.scaled_dot_product_attention(...)
import torch
from torch.nn.attention import sdpa_kernel, SDPBackend

with sdpa_kernel(backends=[SDPBackend.EFFICIENT_ATTENTION]):
    torch.nn.functional.scaled_dot_product_attention(...)

Distributed API

Releng

New Features

Autograd API

CUDA

Distributed API

FX

torch.compile

Dynamo

Inductor

torch.export

Linalg

MPS

torch.nn API

Profiler

Python API

Sparse

Vulkan

XPU

Other

Improvements

Autograd API

Composability

Dynamic shapes:

CPP API

CUDA

Distributed API

torch.compile

Dynamo

Inductor

torch.export

FX

JIT

NestedTensors

Linalg

MPS

torch.nn API

ONNX

Optimizer

Profiler

Python API

Quantization

PT2 Export Quantization Flow:

XNNPACKQuantizer:

X86 CPU Inductor Backend:

DTypes:

Others:

Releng

ROCm

Other

Bug Fixes

Autograd API

Composability

CPP API

Distributed API

torch.compile

Dynamo

Inductor

torch.export

FX

JIT

Linalg

MPS

torch.nn API

Nested Tensors

ONNX

Optimizer

Profiler

Python API

Quantization

Releng

Sparse

Other

Performance

Composability

CUDA

Inductor

MPS

Optimizer

Profiler

Python API

ROCm

Other

Documentation

Autograd API

CUDA

Distributed API

FX

torch.compile

Inductor

torch.export

Linalg

torch.nn API

Optimizer

Other

相关地址:原始地址 下载(tar) 下载(zip)

1、 pytorch-v2.3.0.tar.gz 265.07MB

查看:2024-04-25发行的版本