v1.2.0
版本发布时间: 2019-08-09 00:06:38
pytorch/pytorch最新发布版本:v2.5.1(2024-10-30 01:58:24)
We have just released PyTorch v1.2.0.
It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.
Highlights
[JIT] New TorchScript API
Version 1.2 includes a new, easier-to-use API for converting nn.Module
s into ScriptModule
s. A sample usage is:
class MyModule(torch.nn.Module):
...
# Construct an nn.Module instance
module = MyModule(args)
# Pass it to `torch.jit.script` to compile it into a ScriptModule.
my_torchscript_module = torch.jit.script(module)
torch.jit.script()
will attempt to recursively compile the given nn.Module
, including any submodules or methods called from forward()
. See the migration guide for more info on what's changed and how to migrate.
[JIT] Improved TorchScript Python language coverage
In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:
- Early returns, breaks and continues.
- Iterator-based constructs, like
for..in
loops,zip()
, andenumerate()
. -
NamedTuples
. -
math
andstring
library support. - Support for most Python builtin functions.
See the detailed notes below for more information.
Expanded Onnx Export
In PyTorch 1.2, working with Microsoft, we’ve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). We’ve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:
- Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
- Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
- More than a dozen additional PyTorch operators supported including the ability to export a custom operator.
Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.
Tensorboard is no Longer Considered Experimental
Read the documentation or simply type from
torch.utils.tensorboard
import
SummaryWriter
to get started!
NN.Transformer
We include a standard nn.Transformer module, based on the paper “Attention is All You Need”. The nn.Transformer
module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer
module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer
. New APIs include:
-
nn.Transformer
-
nn.TransformerEncoder
andnn.TransformerEncoderLayer
-
nn.TransformerDecoder
andnn.TransformerDecoderLayer
See the Transformer Layers documentation for more info.
Breaking Changes
Comparison operations (lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=)
) return dtype has changed from torch.uint8
to torch.bool
(21113)
Version 1.1:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([1, 0, 0], dtype=torch.uint8)
Version 1.2:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([True, False, False])
For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.
Mask Inversion
In prior versions of PyTorch, the idiomatic way to invert a mask was to call 1 - mask
. This behavior is no longer supported; use the ~
or bitwise_not()
operator instead.
Version 1.1:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([0, 1, 1], dtype=torch.uint8)
Version 1.2:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.
>>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([False, True, True])
sum(Tensor) (python built-in) does not upcast dtype
like torch.sum
Python's built-in sum
returns results in the same dtype
as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the dtype
of the tensor.
Version 1.1:
# value can be represented in result dtype
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(3, dtype=torch.uint8)
# value can NOT be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(44, dtype=torch.uint8)
# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
Version 1.2:
# value cannot be represented in result dtype (now torch.bool)
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(True)
# value cannot be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(True)
# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
TLDR: use torch.sum
instead of the built-in sum
. Note that the built-in sum()
behavior will more closely resemble torch.sum
in the next release.
Note also that masking via torch.uint8
Tensors is now deprecated, see the Deprecations section for more information.
__invert__
/ ~
: now calls torch.bitwise_not
instead of 1 - tensor
and is supported for all integral+Boolean dtypes instead of only torch.uint8
. (22326)
Version 1.1:
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([ 1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)
Version 1.2:
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)
torch.tensor(bool)
and torch.as_tensor(bool)
now infer torch.bool
dtype instead of torch.uint8
. (19097)
Version 1.1:
>>> torch.tensor([True, False])
tensor([1, 0], dtype=torch.uint8)
Version 1.2:
>>> torch.tensor([True, False])
tensor([ True, False])
nn.BatchNorm{1,2,3}D
: gamma (weight
) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)
Version 1.1:
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496],
requires_grad=True)
Version 1.2:
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([1., 1., 1., 1., 1.], requires_grad=True)
A number of deprecated Linear Algebra operators have been removed (22841)
Removed | Use Instead |
---|---|
btrifact |
lu |
btrifact_with_info |
lu with get_infos=True |
btrisolve |
lu_solve |
btriunpack |
lu_unpack |
gesv |
solve |
pstrf |
cholesky |
potrf |
cholesky |
potri |
cholesky_inverse |
potrs |
cholesky_solve |
trtrs |
triangular_solve |
Sparse Tensors: Changing the sparsity of a Tensor through .data
is no longer supported. (17072)
>>> x = torch.randn(2,3)
>>> x.data = torch.sparse_coo_tensor((2, 3))
RuntimeError: Attempted to call `variable.set_data(tensor)`,
but `variable` and `tensor` have incompatible tensor type.
Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)
Version 1.1:
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)
>>> s.coalesce().indices().shape
torch.Size([1, 1])
>>> s.coalesce().values().shape
torch.Size([1])
Notice indices()
and values()
reflect the resized tensor shapes.
Version 1.2:
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)
>>> s.coalesce().indices().shape
torch.Size([1, 2])
>>> s.coalesce().values().shape
torch.Size([2])
Notice indices()
and values()
reflect the original tensor shapes.
Sparse Tensors: Accumulating dense gradients into a sparse .grad
will no longer retain Python object identity. (17072)
Version 1.1:
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad
# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved still refers to the .grad of m's weight
# even though the sparsity has changed
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
Version 1.2:
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad
# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved NO LONGER refers to the .grad of m's weight
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
AssertionError
nn.utils.convert_sync_batchnorm
has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm
(18787)
Example of new usage:
>>> # Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>> torch.nn.Linear(20, 100),
>>> torch.nn.BatchNorm1d(100)
>>> ).cuda()
>>> # creating process group (optional)
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
Error Checking: torch.addcmul
and torch.lerp
operators enforce stronger shape requirements on the output tensor (out=
keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.
Version 1.1:
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
tensor([[0., 0., 0.],
[0., 0., 0.]])
Version 1.2:
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]
If you run into this error, please ensure the out
parameter is of the correct output shape (post-broadcasting).
Error Checking: Improved Variable version tracking (20391, 22821, 21865)
PyTorch’s autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they haven’t been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.
In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the Variable()
constructor, the nn.Parameter()
constructor, after setting .data
, and via nn.Module._apply
(internal API).
Track changes through Variable constructor:
>>> x = torch.ones(1, requires_grad=True)+1
>>> y = x*x
# do an in-place update through Variable constructor
>>> torch.autograd.Variable(x).add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.
Track changes on an nn.Parameter:
>>> x = torch.ones(1)
>>> p = torch.nn.Parameter(x)
>>> y = p * p
# do an in-place update on a saved Parameter
>>> x.add_(1)
>>> y.sum().backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.
Track changes after setting .data
:
>>> x = torch.zeros(1, requires_grad=True)+1
>>> y = x * x
>>> x.data = torch.zeros(1, requires_grad=True)+1
>>> x.add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
is at version 1; expected version 0 instead.
[JIT] Python called from scripted modules must be @ignore
d
torch.jit.script
now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly @ignore
it. See the new API guide for more details.
Version 1.1
def my_unscriptable_python_fn():
# weird stuff
@torch.jit.script
def fn():
# This gets inserted as a Python call, and only errors on `save()`.
my_unscriptable_python_fn()
Version 1.2
@torch.jit.ignore # this needs to be added ...
def my_unscriptable_python_fn():
...
@torch.jit.script
def fn():
# ... or else recursive compilation will attempt to compile this call
my_unscriptable_python_fn()
NOTE: This is also a change to behavior of the @torch.jit.ignore
decorator. In version 1.1, @ignore
tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 @ignore
, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.
To get the old behavior, use @torch.jit.ignore(drop_on_export=True)
(@torch.jit.ignore
with no arguments is equivalent to @torch.jit.ignore(drop_on_export=False
)).
[JIT] optimize
for ScriptModules is now a context manager
Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).
Version 1.1
@torch.jit.script(optimize=False)
def fn(inputs):
...
fn(inputs)
Version 1.2
@torch.jit.script
def fn(inputs):
...
with @torch.jit.optimized_execution(False):
fn(inputs)
[jit] script::Module
is now a reference type
To better align with the PyTorch C++ API philosophy, script::Module
and script::Method
are now reference types. Our APIs have been updated to use script::Module
instead of std::shared_ptr<script::Module>
.
Version 1.1
using torch::jit::script::Module;
std::shared_ptr<Module> m = torch::jit::load("my_model.py");
m->forward(...);
Version 1.2
using torch::jit::script::Module;
Module m = torch::jit::load("my_model.py");
m.forward(...);
[C++ only] mean() / sum() / prod() APIs have changed slightly (21088)
Version 1.1 API:
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
Version 1.2 API:
Tensor sum(IntArrayRef dim, bool keepdim=false,
c10::optional<ScalarType> dtype=c10::nullopt) const;
that is, to override dtype
, keepdim
must now be provided.
Binary distribution and nightly changes
We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on https://pytorch.org/ have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:
Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URL—just specify an appropriate version constraint like torch==1.2.0+cu92
.
Version 1.1 (for Python 3.7 on Linux only):
pip install numpy
pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
Version 1.2 (works for all versions of Python, and both Linux and Mac):
pip install torch==1.2.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
CPU-only binaries on conda can be selected with the cpuonly feature. We’ve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.
Version 1.1:
conda install -c pytorch pytorch-cpu
Version 1.2:
conda install -c pytorch pytorch cpuonly
Conda nightlies now live in the pytorch-nightly channel and no longer have “-nightly” in their name. We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.
Version 1.1:
conda install -c pytorch pytorch-nightly
Version 1.2:
conda install -c pytorch-nightly pytorch
Wheel nightlies no longer have -nightly in their name. Similar to the changes we made in Conda, we no longer suffix wheel nightlies with “-nightly”, to make it harder to accidentally install a copy of nightly and stable at the same time.
Version 1.1:
pip install --pre torch_nightly -f https://download.pytorch.org/whl/nightly/torch_nightly.html
Version 1.2:
pip install --pre torch -f https://download.pytorch.org/whl/nightly/torch_nightly.html
New Features
Tensor Type Support
-
torch.bool
: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity withtorch.uint8
. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.) -
torch.sparse.HalfTensor
: Added support fortorch.float16
sparse Tensors on both CPU and CUDA. (19695) -
torch.bfloat16
: Added basic creation and serialization support for Brain Floating Point Tensors. (21522, 21523, 21860, 22852)
NN Package
-
nn.Transformer
: added implementation of Transformer from Attention is All You Need. (20170, 22588) -
nn.Embedding
: supportfloat16
embeddings on CUDA. (19695) -
nn.Flatten
: added a Module that performstorch.flatten
. (22245) -
nn.functional.gelu
: Added support for Gaussian Error Linear Units. (20665, 21237) -
nn.Module hooks
: add ability to replace input/output viaforward_pre_hook
andforward_hook
. (22285) -
nn.Module
: addrequires_grad_()
method for turning on/offrequires_grad
for Module parameters. (22576)
Operators
-
Tensor.to_sparse
: now supports autograd. (20458) -
Tensor.fill_diagonal_
: operator to fill the main diagonal of a Tensor. (21892) -
torch.qr
: supports autograd. (21274) -
torch.bitwise_not
: add operator for boolean/integer types. Also have python~
operator use this. (22283, 22320) -
torch.trapz
: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610) -
torch.var_mean
/torch.std_mean
: compute variance and mean at the same time.(18731) -
torch.utils.ThroughputBenchmark
: benchmark utility for measuring the throughput of PyTorch operators. (20766). -
Logging
: lightweight at-most-once logging to record operators that are used (c10::Logging
). (20745)
Optim Package
-
optim.AdamW
: introduce AdamW optimizer from Decoupled Weight Decay Regularization. (21250) -
optim.LBFGS
: added support for strong Wolfe line search. (8824)
Distributed Package
-
DistributedDataParallel
: support CPU modules. (20236) -
DistributedDataParallel
: support sparse tensors. (19146) -
DistributedDataParallel
: support local gradient accumulation. (21736)
IterableDataset
-
IterableDataset
: introduces a new type of Dataset designed for data read from a stream. (19228)
Tensorboard Package
- TensorBoard support in PyTorch has improved and is no longer experimental!
-
SummaryWriter.flush
: now supported. (20607) -
SummaryWriter.add_mesh
: add support for 3D point clouds. (20413)
JIT Features
- Improved support for iterator infrastructure. TorchScript now supports looping through a
List
,Tuple
,Dict
,Tensor
,String
and you can also usezip()
,enumerate()
, andfor...in
. (21801, 22006, 21990, 21985) - Support
in
membership checks. (21527) - Improved support for strings and the string libraries. (20826, 20188, 20761, 21656, 20617)
- Improved
math
support. (20979, 19707, 21151, 21131, 21129, 21130, 21512, 21126, 21127, 21128) - Support for various other Python builtin functions. (21451)
- Support for
NamedTuple
. (21428) - All the rest of the
dict
methods. (21979) -
sorted()
keyword for lists and dicts. (23274) - Add support for breaks and continues. (21692)
- Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports
torch::List
,torch::Dict
andtorch::Optional
, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator). - Support
nn.GRU
in script. (23266) - Support
pack_padded_sequence
andpad_packed_sequence
. (23249) - Support
torch._C._get_tracing_state
in TorchScript. (23248) - Support
torch.as_tensor
in TorchScript. (23247) - add support for recursive compilation on
Modules
. (20708) - add
all
builtin. (20521) - Add
Final[T]
annotated members to__constants__
. (21603) - Add
save()
to scriptedFunction
s. (20386) - Support for serializing class attributes. (22953)
- Support for class annotations. (21379)
- support Python 3.8
Constant
node. (22007) - Support for type annotations instead of
torch.jit.annotate()
. (21390) - Support operator overloading for user-defined classes. (20033)
- Support recursive
ModuleList
/Sequential
. (21306) - Trace multiple methods in a single
Module
. (19905)
Improvements
-
Tensor.pin_memory()
: only ask for context on current device. (22229) -
Tensor.view()
: suggest usingreshape()
instead ofcontiguous()
when the input is non-contiguous. (20968) -
Tensor.numpy()
: throwTypeError
instead ofValueError
if the type isn’t supported. (21608) -
torch.norm
: add support forp="nuc"
withdim
specified. (21022) -
torch.qr
: support batching of input matrices. (20689) -
torch.qr
: supportsome
parameter akin to NumPy'smode
option. (20689) -
torch.det
/torch.logdet
/torch.slogdet
: added batching support. (22909) -
torch.cdist
: support batching. (20934) -
torch.symeig
: support batching. (21858) -
torch._dirichlet_grad
: support CUDA. (21191) -
torch.randperm
: supporttorch.float16
. (22102) -
torch.Size
is now pickle-able in Python2. (20952) -
torch.tensor
/torch.as_tensor
: infer device if input supports Numba’s__cuda_array_interface__
. (20584) -
torch.isinf
/torch.isfinite
: throwTypeError
instead ofValueError
when a non-tensor is passed in. (20817) -
nn.MultiheadedAttention
: add functional support. (20415) -
nn.MultiheadedAttention
: added support for key/value to have different number of features. (21288) -
nn.MultiheadAttention
: allow static key/values. (21288) -
nn.Conv{1,2,3}D
: supporttorch.int64
dtype in forward. (20730, 22594) -
nn.AvgPool{1,2,3}D
: supporttorch.int64
dtype in forward. (22433) -
nn.Module
: make_save_to_state_dict
overrideable. (21933) -
autograd
: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397) -
autograd
: Track in-pace changes of Tensors throughModule._apply
(internal API). (21865) -
autograd.profiler
: Add shape aggregation support. 20035) -
autograd.profiler
: Profile custom c10 ops. (20175) -
DataLoader
: support settingbatch_size=0
to disable automatic batching (collation) inDataLoader
for easier bulk loading. (19228) -
DataLoader
: addmultiprocessing_context
parameter. (22990) -
DataLoader
: added error detection forworker_init_fn
. (20150) -
DataLoader
: Retry onEINTR
. (21723) -
torch.cuda.set_rng_state
/torch.cuda.get_rng_state
: accept string asdevice
parameter. (23448) -
CUDA
: add warning when using Turing GPUs and CUDA <= 9000. (21468) -
CUDA
: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034) -
CPU
: Improve CPUAllocator OOM message. (20618) -
[memory_format]
: added support fortorch.empty
,torch.empty_like
,Tensor.contiguous()
,Tensor.is_contiguous()
to specify / check the order in which dimensions are laid out in memory. (20455, 20558) -
distributions.MultivariateNormal
: fix precision matrix instability. (21366) -
distributions.transforms.SigmoidTransform
: fix numerical instability. (19802)
Distributed Improvements
-
DistributedDataParallel
: Support DDP forward/backward calls even if no module parameter is used. (19821) -
DistributedDataParallel
: Only call into reducer if grad is enabled. (19897) -
DistributedDataParallel
: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901) -
DistributedDataParallel
: Improve DDP backward reduction error messages. (20586) -
DistributedDataParallel
: make DDP failure recoverable. (21591) -
DistributedDataParallel
: Delay reduction of unused parameters until first autograd hook is called. (22219) -
c10d:
support tensors shared across processes. (21449) -
c10d:
ProcessGroupMPI
Add device guard around MPI operations. (22446) -
utils.data.distributed.DistributedSampler
: Make shuffling optional. (22479)
Tensorboard Improvements
- Usage of kwarg-only arguments has been removed. (21786)
Numpy Compatibility Improvements
-
Tensor.T:
added numpy-like support for reversing dimensions. (20598) -
Tensor.ndim
: NumPy equivalent property for the number of dimensions. (20565) -
Tensor.nonzero
: addedas_tuple
argument (defaultFalse
) that whenTrue
, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293) -
torch.dtype
: support passing in NumPy dtypes as arguments. (21215) -
torch.normal
: addsize
parameter when called with two floats. (20545) -
torch.where
: add one-argument overload that is an alias for Numpy-likenonzero
. (21986) - support a number of argument name overrides, e.g.
axis
instead ofdim
. (20451)
JIT Improvements
- The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. (22177, 22178, 22179, 22180)
- Error message source range highlighting now includes filename, line number, and column number. (21157)
- Better Constant Propagation through Tuples. (22561)
- Add
start
andstep
parameters forrange
in TorchScript. (20795) - Support for threading options for TorchScript inference (doc)
- Add
max_pool2d
to symbolic derivatives. (19661) - Optimize
matmul
memory usage for certain cases. (23433) - Avoid kernel launches for zero-sized tensor inputs. (22790)
- Add support for steps (strides) in tensor slices. (20929)
- Added error for classes that don't have an
__init__
function. (21880) - Allow classes to be used in their own methods. (20106)
- Better error message when a variable is conditionally defined. (20911)
- Consider contained types in alias analysis. (21431)
- Convenience APIs for script objects. (20226)
- Don't print backtrace for interpreter errors. (20925)
- Improve error msg for missing attribute. (20779)
- Improve error msg on inferred type. (21058)
- Improve error msg on recursive class defs. (21842)
- Include module names in recursive error stacks. (22921)
- Improve recursive scripting error message. (21841)
- Index into a tuple with non constant integer. (20081)
- Let
ScriptModule
buffer attributes can also cast device/type. (19700) - Lower batchmm to non-diff optimization. (19987)
- Make
ScriptModule.training
an attribute instead of a parameter. (21078) - Make
strtod_c
compatible with different gcc abi. (21293) - make magic methods work with casts too. (20654)
- Improve performance of alias analysis. (20899)
- Print a warning if a type annotation prefix is invalid according to mypy. (20884)
- schema_matching.cpp: improve error messages. (21141)
- Resolve with closed over variables instead of stack frame. (22270)
- Report errors through call stack. (22280)
- Reduce number of stack manipulation instructions in interpreter. (21240)
C++ API Improvements
-
nn::PoissonNLLLoss
: Added support. (19316) -
nn::Module
: addedreplace_module
API to overwrite submodules in C++ Frontend. (22546) -
nn:Module::register_module
/register_parameter
/register_buffer
: make public (23196) -
data::datasets::ChunkDataReader
: fix include headers and a vector issue. (19485) -
data::datasets::ChunkDataset
: add newget_batch
method. (21797) -
data::datasets::ChunkDataset
: add checkpoint support. (21889) -
data::datasets::ChunkDataset
: add support for cross-chunk shuffling. (22347) -
data::datasets::ChunkDataset
: add sorting policy. (23053)
MKLDNN Tensor Improvements
Add support for a number of operators on MKLDNN Tensors including:
-
Tensor.is_mkldnn
: (22386) -
Tensor.transpose()
: (21943) -
Tensor.zero_()
: (20573) -
torch.empty
: (21184) -
torch.mul
: (20575) -
nn.AdaptiveAvgPool{1,2,3}D
: (19818) -
nn.Sigmoid
: (20820) -
nn.Softmax
: (21516) -
nn.Module
: support saving/loading MKLDNN modules. (20799) -
nn.MaxPool{1,2,3}D
: supportceil_mode
. (21310)
Bug Fixes
- Indexing: fix advanced indexing where there are more than (2^31)-1 bytes in the output. (20919)
- Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. (23123)
- Indexing: fix issue with slicing empty tensors. (20914)
-
Tensor.index_copy_:
fix segfault by properly checking dimension is in range. (21617) -
Tensor.copy_
: Fix a bug where non-blocking was not being respected. (20305) -
Tensor.clone
: Fix an issue with MKLDNN tensors. (20943) - Tensor subclassing: give a proper error instead of crashing. (20283)
-
torch.cat
: Fix segfault with tensors that can't be indexed with 32-bit ints. (21530) -
torch.range
/torch.linspace
/torch.logspace
: properly respect the currentStream
. (21619) -
torch.lu
: return the identity permutation instead of zeros when not using pivoting. (22242) -
torch.einsum
: Fix an issue where the backward pass would potentially be skipped. (22111) -
torch.cosh
: Fix an issue wheretorch.cos
was instead calculated withtorch.double
dtype and vectorized instructions. (20797) -
torch.triu
/torch.tril
: handle strides correctly for in-place versions. (22730). -
torch.triu
/torch.tril
: Fix handling of batches > 65535 on CUDA. (21067) -
torch.inverse
/torch.solve
/torch.cholesky_solve
/torch.triangular_solve
: Fix batch sizes > 65535 on CUDA. (21689) -
torch.histc
: returndtype
is now the same as the input tensor on CUDA, matching CPU behavior. (20369) -
torch.histc
: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497) -
torch.randperm
: handle non-contiguousout
parameter. (23043) -
torch.unique
: Fix empty tensor handling whendim
is passed as an argument. (19000) -
torch.min
/torch.max
: properly error on empty tensor inputs, as with CPU tensors. (19612). -
CUDA
: fix launch parameters for reductions. (22827). -
torch.hub
: fix an issue withfind_module
. (20782) -
autograd
: Fix a number of custom autogradFunction
corner cases by inverting the relationship between PyFunction and THPFunction. (22983) -
autograd
: give “Trying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533) -
optim.lr_scheduler.CosineAnnealingLR
: rename from CosineAnnealingLr. (23242) -
distributions.Binomial
: Fix overflow oflog_prob
whenlogits
is large. (20679) -
distributions.SigmoidTransform
: Fix numerical issues that could result ininf
/-inf
return values. (20288) -
distributions.Categorical.sample
: fix a view bug. (23328) -
CUDA
: Give proper error message for bad cuda forks. (23322) -
pickle
: Fix Unpickling error when loading multiple objects from a file. (20270) -
NCCL
: Fix race condition. (23040)
torch.nn Bug Fixes
-
nn.Conv{1,2,3}D
: fix memory leak on MKLDNN code path. (22392) -
nn.Conv{1,2,3}D
: properly unpickle older pickled versions. (21687) -
nn.CTCLoss
: fix backward on CUDA when 2d target tensor is larger thanmax_target_length
. (20971) -
nn.CTCLoss
: fix some numerical stability issues. (21392) -
nn.CTCLoss
: disable buggy non-deterministic CudNN algorithm. (22977) -
nn.CTCLoss
: fixed empty target handling. (21910, 23298) -
nn.SyncBatchNorm
: fix syncing of running statistics when count size differs between GPUs. (22248) -
nn.SyncBatchNorm
: retainrequires_grad
value when converting fromnn.BatchNorm
. (22569) -
nn.SyncBatchNorm
: correctly handleprocess_group
inconvert_sync_batchnorm
. (19240) -
nn.MultiheadedAttention
: fix fortorch.float16
dtype. (21658). -
nn.EmbeddingBag
: fix NaN output when input is empty. (21400) -
nn.Dropout
: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541) -
nn.MaxPool
: fix output size calculation in some corner cases. (22304) -
nn.MaxPool
: return valid indices if all entries are-inf
. (23161) -
nn.Softmax
: respect the current Stream. (22470) -
nn.LogSoftmax
: fix numerical stability issues. (21672) -
nn.Module.load_state_dict
: break ref cycle. (20397) -
nn.Module
: fix loading in 32-bit environments. (20900) -
nn.utils.rnn.pack_padded_sequence
: Fix segfault on empty tensors. (21461) -
nn.utils.spectral_norm
: fix loadingstate_dict
whenstrict=False
. (22545) -
CudNN
: Fix uninitialized PoolWindow on Windows. (22405)
Distributed Bug fixes
-
nn.parallel.DataParallel
: fix error inno_grad
mode. (21262) -
torch.distributed.all_gather
: fix errors for views and aliases. (21490) -
c10d
: fix collective communication errors on empty tensors. (20658)
JIT Bug Fixes
- Fix specialized list from dict keys. (23267)
- Switch keys to be sequential and stable in pickle serialization. (23280)
-
deepCopy
also copies type information of lists, (23271) -
dictKeys
anddictItems
ops on typed dicts return typed lists. (23270) - Fix pickler bug where it would not load if no tensors were saved. (23263)
- Avoid multiple writes to files on export. (21186)
- Better error msg for mismatched
dict
key type. (22231) - Better error msg for using Python
builtin_function_or_method
. (22935) - Better error msg in
__get_state__
to let a user know that ScriptModules can't be deep-copied at the moment.(20885) - Better error msg when seeing a unsupported builtin function. (21068)
-
dropout
derivative should respect thetrain
flag. (20760) - Fix
__constants__
for some nn modules. (21071) - Fix
ScriptModule.__dir__()
. (22426) - Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. (21425)
- Fix a bug in loop unrolling. (21239)
- Fix alias annotations for dict ops. (22900)
- Fix inaccurate SourceRange reporting. (21109)
- Fix broken indexing when using None and ellipses indexing together. (22905)
- Fix bug in
CompilationUnit::define
. (21886) - Fix compilation order for class methods. (20094)
- Fix dead code elimination over loops. (22632)
- Fix dead code elimination in onnx export. (22476)
- Fix incorrect default on
Graph::toString
. (21370) - Fix optional type promotion for classes. (21593)
- Fix optional type unification. (19813)
- Fix
NameError
withPYTORCH_JIT=0
. (20120) - Fix overspecializing constants in compilation. (22816)
- Fix
pow()
bug on overloads. (20824) - Fix recusive method compilation. (21862)
- Fix reflection on weak modules, copy attributes. (20190)
- Fix slow unpickling. (21542)
- Fix input/output type mismatch. (20829)
- Fix insert_guard for norm decomposation. (19646)
- Fix Trace inlining of graphs with optional inputs. (22686)
- Fix tracing bugs where using
1 - x
in C++ would cause the size of 1 to get hardcoded. (20932) - Fix tuple indexing bug. (21521)
- Fix type hints for
None
constants. (23029) - Fix weak module cuda()
_flat_weights bug
. (21107) - Fix
WeakIValueEq
. (21891) - Fixed gcd to use 64 bit integers. (21041)
- Fixed
list()
not making a copy. (22093) - Fix race condition on
Module::forward
method. (21398) - Made
a += b
for lists do an in place add. (21896) - Made
floor/ceil
return ints. (21124) - Out-of-memory on GPU due to the "weak_script" decorators. (20588)
- Override print when python is present. (21625)
- Set
__file__
fortorch.ops
. (21888) - Set correct list type in pybind_utils. (23188)
C++ Frontend bug fixes
-
nn::RNN
: Fix assertions in bidirectional RNN. (22850). -
nn::MaxPool
/nn::AvgPool
: expand incomplete kernel size, as in Python. (22073, 22075) -
Optim
: Fix memory leak whenweight_decay
is applied toAdam
,Adagrad
,RMSProp
. (23125) -
Optim::SGD
: fix memory leak with weight_decay. (23007) -
torch::autograd::Scatter
/ torch::autograd::Gather
: Fix nullptr bug. (20286) -
torch::nn::parallel::data_parallel
: fix gradient computation error. (20910) - [C++ Extensions] Fix an issue when building multiple extensions in the same directory. (20221)
Deprecations
Masking via torch.uint8
Tensors is now deprecated in favor of masking via torch.bool
Tensors.
See the Breaking Changes section for more details about torch.bool
Tensors and comparison operators.
torch.masked_select
, torch.masked_fill
, torch.masked_scatter
now expect torch.bool
masks rather than torch.uint8
.
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
UserWarning: masked_select received a mask with dtype torch.uint8,
this behavior is now deprecated, please use a mask with dtype torch.bool instead.
tensor([2, 3])
# instead use torch.bool
>>> a.masked_select(tensor([False, True, True]))
tensor([2, 3])
Comparison operators with out=
parameters now expect torch.bool
dtype rather than torch.uint8
.
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> res = torch.empty_like(a, dtype=torch.uint8)
>>> torch.gt(a, b, out=res)
UserWarning: torch.gt received 'out' parameter with dtype torch.uint8, this behavior
is now deprecated, please use 'out' parameter with dtype torch.bool instead.
tensor([0, 1, 1], dtype=torch.uint8)
# instead use torch.bool
>>> res = torch.empty_like(a, dtype=torch.bool)
>>> torch.gt(a, b, out=res)
tensor([False, True, True])
Legacy autograd.Function
(Function without static forward method) is now deprecated
>>> class MyLegacyFunction(Function):
>>> def forward(self, x):
>>> return x
>>>
>>> def backward(self, grad_output):
>>> return grad_output
>>>
>>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
UserWarning: Legacy autograd function with non-static forward method is deprecated
and will be removed in 1.3. Please use new-style autograd function
with static forward method.
# instead use new-style Autograd Function
>>> class MyFunction(Function):
>>> @staticmethod
>>> def forward(ctx, x):
>>> return x
>>>
>>> @staticmethod
>>> def backward(ctx, grad_output):
>>> return grad_output
>>>
>>> MyFunction.apply(torch.randn((3,), requires_grad=True)
See the torch.autograd.Function documentation for more details.
torch.gels
: has been renamed to torch.lstsq
; torch.gels
will work for this release but is now deprecated. (23460)
Performance
- Advanced Indexing: significantly improve performance of advanced indexing backward. (20557)
-
Tensor.copy_
: increase broadcasting CUDA copy performance by 25%. (20685) -
torch.matmul
: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448) -
torch.bmm
: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266) -
torch.inverse
: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904) -
torch.topk
: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865) -
torch.cdist
: Improve CPU perf by up to 10x for some cases. (20605) -
torch.normal
: Movenormal
,normal_means
,normal_stddevs
, andnormal_means_stddevs
to ATen, increasing performance by up to 3x. (21287) -
torch.bernoulli
: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300) -
torch.coalesce
: Use_sparse_coo_tensor_unsafe
incoalesce
for up to 10x speedup. (21214) -
torch.sinh
/torch.cosh
: Parallelize and vectorize on CPU. (21115) -
torch.lerp
: Vectorize on CPU. (22038) -
torch.eye
: Parallelize on CPU. (21077) -
torch.randperm
: Parallelize initialization in randperm on CPU. (21529) - Vectorization: Don't split 256-bit AVX2 load/store intrinsics. (20609).
Torch.NN Performance Improvements
-
nn.Softmax
: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827) -
nn.Embedding
/nn.EmbeddingBag
: Optimize CUDA kernel, increasing performance up to 2.7x. (22016) -
nn.Linear
: optimize BERT model perf by using mkldnn inner product. (21851) -
nn.Conv{1,2,3}D
: improve perf for depthwise convolutions intorch.float16
on Volta and Turing GPUs. (22302) -
nn.RNN
: optimize on CPU by fusing matmul ops. (22512) -
nn.Upsample
: a number of significant perf improvements on CUDA. (21879, 21694). -
nn.functional.layer_norm
: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (20345, 20883) - Use
mkldnn
inner product fornn.Linear()
to improve BERT perf. (21851).
Documentation
-
torch.bool
: doc the Boolean tensor type. (21601) -
torch.as_strided
: add docs. (22842) -
torch.empty_strided
: add docs. (23740) -
torch.lerp
: clarify broadcasting requirements. (23268) -
torch.enable_grad
/torch.no_grad
/torch.set_grad_enable
: clarify interaction between these features. (23310) -
torch.autograd.grad_mode
: Document that no_grad is thread local. (21755) -
torch.multiprocessing
: Explain refcounting of CUDA tensors. (19904) -
torch.Tensor
: Add a warning about memory usage. (20801) -
torch.utils.data.Dataloader
: Document RNG state consumption. (22540) -
torch.optim.lr_scheduler.CyclicLR
: Clarifybase_momentum
andmax_momentum
. (20880). - Document production environment features. (23010)
- Add note about contributing recently released research. (23513)
- Clarify performance implications of deterministic mode. (21337)
- Update cuda pinned memory note to include
tensor.to
. (20977)
Torch.NN Documentation
-
nn.functional / nn.init
: Break up NN in docs so they load faster. (21291) -
nn.functional.conv{1,2,3}d
: Removepadding_mode
. (20891) -
nn.functional.upsample
/nn.functional.interpolate
: add note about overshooting withmode=‘bicubic’
. (23321) -
nn.init.zeros_
/nn.init.ones_
: add documentation. (23145) -
nn.MultiheadAttention
: Add documentation foradd_bias_kv
,add_zero_attn
, andattn_mask
. (20071) -
nn.MultiheadAttention
: Fix documentation for attention mask shape. (20850) -
nn.Softmax
: Fixed to specify dimension to prevent warning in 1.1.0. (20310)
Contributor Documentation
- Updated web links on contribution_guide and governance documentation. (21243)
- Improve documentation for publishing hub models. (21307)
- Suggest a faster linker in the contributing guide. (21334)
- Add CUDA C++11 and profiling notes to the contribution guide. (21386)
Build Documentation
- Add magma for CUDA 10.1 to Windows docs. (19914)
- Improve build-from-source instructions. (20088)
- Add
ninja
to build instructions. (20079) - Update libtorch build docs. (21150)
TensorBoard Documentation
- Tensorboard Documentation has been greatly improved! Browse the latest version here.
Torch HUB Documentation
ONNX
In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.
Supporting More ONNX Opsets
- Add basic supports for multiple ONNX Opsets and support for Opset 10. (19294)
- Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. (22421, 20036)
- Export
Dropout
for Opset 10. (20710) - Export
Slice
andFlip
for Opset 10. (20533) - Export
Interpolate (Resize)
for Opset 10. (21434)
Enhancing the Support for ScriptModule
- Support multiple outputs in ScriptModule in ONNX Exporter. (20256)
- Support tensor factories in ScriptModule in ONNX Exporter. (20255)
- Support tuples as inputs and outputs in ScriptModule. (20784)
Exporting More Torch Operators to ONNX
- Export custom ops. (21321)
- Export
torch.arange
. (22601) - Export
torch.masked_fill
. (22521) - Export
torch.floor
,torch.ceil
,torch.log2
andprim::shape
. (17895) - Export
torch._dim_arange
. (20078) - Export
torch.randn_like
. (20093) - Export
torch._standard_gamma
. (20126) - Export
torch.topk
. (21104) - Export
__ and__
,__or__
. (17894) - Export
torch.sign
. (20470) - Export
torch.scatter
. (18543) - Export
torch.rand
. (20559) - Export
torch.gather
. (21235) - Export
torch.cosine_similarity
. (21884) - Export
torch.sum
. (22240) - Export
torch.logsumexp
. (22306) - Export
torch.layer_norm
. (22265)
Extending Existing Exporting Logic
- Support
torch.min
andtorch.max
with dim. (19689) - Support
maxpool
with dilations. (18721) - Support
RNN
withbatch_first=True
. (19766) - Support
Upsample
with dynamic input. (20116) - Improve support for Loop export. (20445)
- Enable
torch.full
with scalar parameters. (21931) - Added support for exporting models with variable length input/output to ONNX. (20034)
Optimizing Exported ONNX Graph
- Support constant folding in Opset 10. (22515)
- Support negative indexing for
Slice
in constant folding optimization. (21811)