v1.0rc1
版本发布时间: 2018-10-02 14:28:33
pytorch/pytorch最新发布版本:v2.5.1(2024-10-30 01:58:24)
This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important
Table of Contents
- Highlights
- Breaking Changes
- Bug Fixes
- Other Improvements
- Deprecations
- Performance
- Documentation Improvements
Highlights
JIT
The JIT is a set of compiler tools for bridging the gap between research in PyTorch and production. It includes a language called Torch Script (don't worry it is a subset of Python, so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT. Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
# Write in Python, run anywhere!
@torch.jit.script
def RNN(x, h, W_h, U_h, b_h):
y = []
for t in range(x.size(0)):
h = torch.tanh(x[t] @ W_h + h @ U_h + b_h)
y += [h]
return torch.stack(y), h
As an example, see a tutorial on deploying a seq2seq model, loading an exported model from C++, or browse the docs.
torch.distributed new "C10D" library
The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are:
- C10D is performance driven and operates entirely asynchronously for all backends:
Gloo
,NCCL
, andMPI
. - Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts
- Adds async support for all distributed collective operations in the torch.distributed package.
- Adds send and recv support in the Gloo backend
C++ Frontend [API Unstable].
The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn
, torch.optim
, torch.data
and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:
Python | C++ |
---|---|
import torch
|
#include <torch/torch.h>
|
We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next month or two. Some parts of the API may undergo breaking changes during this time.
See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.
Breaking Changes
- Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
- Operations that result in 0 element tensors may return changed shapes.
- Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
- Now: Operations return their documented shape.
# Previously: all 0-element tensors are collapsed to shape (0,) >>> torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64) # Now, proper shape is returned >>> torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
- Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
- torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
- Some inter-type operations (e.g.
*
) betweentorch.Tensors
and NumPy arrays will now favor dispatching to thetorch
variant. This may result in different return types. (#9651). - Implicit
numpy
conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')
) before an implicit conversion. (#10553). - torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
-
torch.tensor function with a
Tensor
argument now returns adetached
Tensor (i.e. a Tensor wheregrad_fn
isNone
). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061, #11815). -
torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape
(N,)
instead of(N, C)
to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable. (#9965). - The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
- Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
-
CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and
TensorOptions
as the last argument. For example, replace your call toat::ones(torch::CPU(at::kFloat)), {2, 3})
withtorch::ones({2, 3}, at::kCPU)
. This applies to the following functions:-
arange
,empty
,eye
,full
,linspace
,logspace
,ones
,rand
,randint
,randn
,randperm
,range
,zeros
.
-
Additional New Features
N-dimensional empty tensors
- Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
>>> torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
New Operators
- torch.argsort similar to numpy.argsort. (#9600).
- torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- torch.tensordot similar to numpy.tensordot. (#10025).
- torch.broadcast_tensors similar to numpy.broadcast_arrays. (#10075).
- torch.narrow support for sparse tensors. (#11342).
- torch.matrix_rank similar to numpy.linalg.matrix_rank. (#10338).
- torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- torch.nn.CeLU activation. (#8551).
- torch.nn.CTCLoss. (#9628).
New Distributions
- Weibull Distribution. (#9454).
- NegativeBinomial Distribution. (#9345).
- torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
Additions to existing Operators and Distributions
-
torch.unique now accepts an optional
dim
argument. (#10423). - torch.norm now supports matrix norms. (#11261).
- torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
-
torch.distributions now support an
expand
method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341). - torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
Bug Fixes
Serious
- torch.nn.functional.softmin was using the incorrect formula in 0.4.1. (#10066)
-
torch.as_strided backwards (called via
view
) was incorrect with overlapping data locations. (#9538). - Pointwise losses (e.g. torch.nn.MSELoss were sometimes using the wrong
reduction
method. (#10018). - torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
Backwards Compatibility
-
torch.nn.Module
load_from_state_dict
now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781). - Fix
RuntimeError: storages don't support slicing
when loading models saved with PyTorch 0.3. (#11314).
Correctness
-
torch.nn.Dropout fused kernel could change parameters in
eval
mode. (#10621). - torch.unbind backwards has been fixed. (#9995).
- Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
-
torch.bernoulli now handles
out=
parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273). - torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
-
torch.log on an expanded
Tensor
gave incorrect results on CPU. (#10269). -
torch.logsumexp now correctly modifies the
out
parameter if it is given. (#9755). -
torch.multinomial with
replacement=True
could select 0 probability events on CUDA. (#9960). -
torch.nn.ReLU will now properly propagate
NaN
. (#10277). -
torch.max and torch.min could return incorrect values on input containing
inf
/-inf
. (#11091). - Fixed an issue with calculated output sizes of
torch.nn.Conv
modules withstride
anddilation
. (#9640). - torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
Error checking
- torch.gesv now properly checks LAPACK errors. (#11634).
- Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy
Tensor
constructors (e.g.torch.FloatTensor(...)
) now correctly check theirdevice
argument. (#11669). - Properly check that
out
parameter is a CPUTensor
for CPU unary ops. (#10358). - torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
Miscellaneous
- torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- Fixed a segfault when grad to a hook function is
None
. (#12028). - Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
-
dir(torch)
has been fixed with Python 3.7. (#10271). - Fixed a device-side assert in torch.multinomial when
replacement=False
and the input has fewer nonzero elements thannum_samples
. (#11933). - Can now properly assign a
torch.float16
dtype tensor to.grad
. (#11781). - Fixed
can only join a started process
error with torch.utils.data.DataLoader. (#11432). - Prevent
unexpected exit
in torch.utils.data.DataLoader onKeyboardInterrupt
. (#11718). - torch.einsum now handles spaces consistently. (#9994).
- Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- fix a printing error with large non-contiguous tensors. (#10405).
Other Improvements
- torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
-
torch.nn.parallel.data_parallel now accepts
torch.device
inputs. (#10189). - torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- Improve printing of sparse tensors and
grad_fns
. (#10181). - Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- sparse-to-sparse copy_ is now supported. (#9005).
- torch.bincount now supports 0 element inputs. (#9757).
- torch.nn.functional.conv2d error message have been improved. (#11053).
- Allow conversion of
np.int64
to PyTorch scalar. (#9225). - torch.einsum now handles varargs. (#10067).
-
torch.symeig now returns 0-filled eigenvectors when
eigenvectors=False
is passed on CUDA rather than uninitialized data. (#10645).
Deprecations
CPP Extensions
- The
torch/torch.h
header is deprecated in favor oftorch/extension.h
, which should be used in all C++ extensions going forward. Includingtorch/torch.h
from a C++ extension will produce a warning. It is safe to batch replacetorch/torch.h
withtorch/extension.h
. - Usage of the following functions in C++ extensions is also deprecated:
-
torch::set_requires_grad
. Replacement:at::Tensor
now has aset_requires_grad
method. -
torch::requires_grad
. Replacement:at::Tensor
now has arequires_grad
method. -
torch::getVariableType
. Replacement: None.
-
torch.distributed
- the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at
torch.nn.parallel.deprecated.DistributedDataParallel
.
Performance
- torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- torch.norm has been vectorized and parallelized on CPU. (#11565).
- torch.max and torch.min has been parallelized on CPU. (#10343).
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- torch.nn.PReLU has been sped up on both CPU and GPU. (#11758).
- torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- torch.svd has been sped up on both CPU and GPU. (#11194).
- torch.einsum has been greatly sped up on CPU. (#11292).
- torch.clamp no longer does unnecessary copying. (#10352).
- torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- Profiler now has lower overhead. (#10969, #11773).
Documentation Improvements
- Reproducibility note added. (#11329).
- CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- torch.Tensor.flatten is now documented. (#9876).
- torch.digamma is now documented. (#10967).
- torch.allclose is now documented. (#11185).
- torch.eig return format clarified. (#10315).
- torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- torch.fft equation has been corrected. (#10760).
- torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- torch.nn.functional.dropout documentation has been clarified. (#10417).
- torch.nn.functional.pad documentation has been clarified. (#11623).
- Various mathematical formulas have been clarified. (#11106).