MyGit

v1.4.0

pytorch/pytorch

版本发布时间: 2020-01-16 08:03:49

pytorch/pytorch最新发布版本:v2.5.1(2024-10-30 01:58:24)

PyTorch 1.4.0 Release Notes

The PyTorch v1.4.0 release is now available.

The release contains over 1,500 commits and a significant amount of effort in areas spanning existing areas like JIT, ONNX, Distributed, Performance and Eager Frontend Improvements and improvements to experimental areas like mobile and quantization. It also contains new experimental features including rpc-based model parallel distributed training and language bindings for the Java language (inference only).

PyTorch 1.4 is the last release that supports Python 2. For the C++ API, it is the last release that supports C++11: you should start migrating to Python 3 and building with C++14 to make the future transition from 1.4 to 1.5 easier.

Highlights

PyTorch Mobile - Build level customization

Following the experimental release of PyTorch Mobile in the 1.3 release, PyTorch 1.4 adds additional mobile support including the ability to customize build scripts at a fine-grain level. This allows mobile developers to optimize library size by only including the operators used by their models and, in the process, reduce their on device footprint significantly. Initial results show that, for example, a customized MobileNetV2 is 40% to 50% smaller than the prebuilt PyTorch mobile library. Learn more about how to create your own custom builds, and please engage with the community on the PyTorch forums to provide any feedback you have.

Distributed Model Parallel Training [Experimental]

With the scale of models, such as RoBERTa, continuing to increase into the billions of parameters, model parallel training has become ever more important to help researchers push the limits. This release provides a distributed RPC framework to support distributed model parallel training. It allows for running functions remotely and referencing remote objects without copying the real data around, and provides autograd and optimizer APIs to transparently run backwards and update parameters across RPC boundaries.

To learn more about the APIs and the design of this feature, see the links below:

For the full tutorials, see the links below:

As always, you can connect with community members and discuss more on the forums.

Java bindings [Experimental]

In addition to supporting Python and C++, this release adds experimental support for Java bindings. Based on the interface developed for Android in PyTorch Mobile, the new bindings allow you to invoke TorchScript models from any Java program. Note that the Java bindings are only available for Linux for this release, and for inference only. We expect support to expand in subsequent releases. See the code snippet below for how to use PyTorch within Java:

Learn more about how to use PyTorch from Java here, and see the full Javadocs API documentation here.

Pruning

Pruning functionalities have been added to PyTorch in the nn.utils.prune module. This provides out-of-the-box support for common magnitude-based and random pruning techniques, both structured and unstructured, both layer-wise and global, and it also enables custom pruning from user-provided masks.

To prune a tensor, first select a pruning technique among those available in nn.utils.prune (or implement your own by subclassing BasePruningMethod).

from torch.nn.utils import prune
t = torch.rand(2, 5)
p = prune.L1Unstructured(amount=0.7)
pruned_tensor = p.prune(t)

To prune a module, select one of the pruning functions available in nn.utils.prune (or implement your own) and specify which module and which parameter within that module pruning should act on.

m = nn.Conv2d(3, 1, 2)
prune.ln_structured(module=m, name='weight', amount=5, n=2, dim=1)

Pruning reparametrizes the module by turning weight (in the example above) from a parameter to an attribute, and replacing it with a new parameter called weight_orig (i.e. appending "_orig" to the initial parameter name) that stores the unpruned version of the tensor. The pruning mask is stored as a buffer named weight_mask (i.e. appending "_mask" to the initial parameter name). Pruning is applied prior to each forward pass by recomputing weight through a multiplication with the updated mask using PyTorch's forward_pre_hooks.

Iterative pruning is seamlessly enabled by repeatedly calling pruning functions on the same parameter (this automatically handles the combination of successive masks by making use of a PruningContainer under the hood).

nn.utils.prune is easily extensible to support new pruning functions by subclassing the BasePruningMethod base class and implementing the compute_mask method with the instructions to compute the mask according to the logic of the new pruning technique.

Backwards Incompatible Changes

Python

torch.optim: It is no longer supported to use Scheduler.get_lr() to obtain the last computed learning rate. to get the last computed learning rate, call Scheduler.get_last_lr() instead. (26423)

Learning rate schedulers are now “chainable,” as mentioned in the New Features section below. Scheduler.get_lr was sometimes used for monitoring purposes to obtain the current learning rate. But since Scheduler.get_lr is also used internally for computing new learning rates, this actually returns a value that is “one step ahead.” To get the last computed learning rate, use Scheduler.get_last_lr instead.

Note that optimizer.param_groups[0]['lr'] was in version 1.3.1 and remains in 1.4.0 a way of getting the current learning rate used in the optimizer.

Tensor.unfold on a 0-dimensional Tensor now properly returns a 1-dimensional Tensor.

Version 1.3.1Version 1.4.0
>>> torch.tensor(5).unfold(dimension=0, size=1, step=1)
tensor(5)
      
>>> torch.tensor(5).unfold(dimension=0, size=1, step=1)
tensor([5])
      

torch.symeig now return a 0-element eigenvectors tensor when eigenvectors=False (the default).

Version 1.3.1Version 1.4.0
>>> torch.symeig(torch.randn(3,3)).eigenvectors.shape
torch.Size([3, 3])
      
>>> torch.symeig(torch.randn(3,3)).eigenvectors.shape
torch.Size([0])
      

JIT

C++

[C++] The distinction between Tensor and Variable has been eliminated at the C++ level. (28287)

This change simplifies our C++ API and matches previous changes we did at the python level that merged Tensors and Variables into a single type.

This change is unlikely to affect user code; the most likely exceptions are:

  1. Argument-dependent lookup for torch::autograd may no longer work. This can break because Variable is now defined as an alias for Tensor (using Variable = Tensor;). In this case, you must explicitly qualify the calls to torch::autograd functions.

  2. Because Variable and Tensor are now the same type, code which assumes that they are different types (e.g., for the purposes of templating, or std::enable_if checks) will not work until you delete the (now) redundant overload/specialization.

  3. Some operators may trace differently. If this happens, please file a bug. The most likely situations are:

  1. There are now more operations in your trace than before (usually, calls to aten::empty)
  2. There are now less operations in your trace than before (e.g., the trace complains that "there is no observable dependence" with the inputs)

[C++] arguments in torch::nn::LinearOptions are renamed to match the Python API. (27382)

[C++] arguments in torch::nn::Conv{1,2,3}dOptions are renamed to match the Python API. (28917) (29838)

[C++] torch::nn::Conv{1,2,3}dOptions no longer has the transposed argument. (31005)

[C++] All Reduction enums for torch::nn layers and functionals are changed to have torch::KEnumNAME syntax. (27942, 26837)

[C++] torch::tensor constructor is improved to match Python API behavior. (28523) (29632) (29066)

[C++] Some activation modules’ forward function now take Tensor instead of Tensor& as input. (28501)

torch::nn layers affected: ELU / SELU / Hardtanh / LeakyReLU / ReLU / ReLU6 / RReLU / CELU This change ensures that the above layers can be used in a torch::nn::Sequential module. If your C++ model uses any of the above layers, you must recompile your C++ code with the new libtorch binary.

New Features

torch.optim

Learning rate schedulers (torch.optim.lr_scheduler) now support “chaining.” This means that two schedulers can be defined and stepped one after the other to compound their effect, see example below. Previously, the schedulers would overwrite each other.

>>> import torch
>>> from torch.optim import SGD
>>> from torch.optim.lr_scheduler import ExponentialLR, StepLR
>>>
>>> model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
>>> optimizer = SGD(model, 0.1)
>>>
>>> scheduler1 = ExponentialLR(optimizer, gamma=0.9)
>>> scheduler2 = StepLR(optimizer, step_size=3, gamma=0.1)
>>>
>>> for epoch in range(4):
>>>     print(epoch, scheduler2.get_last_lr()[0])
>>>
>>>     optimizer.step()
>>>     scheduler1.step()
>>>     scheduler2.step()
    
0 0.1
1 0.09000000000000001
2 0.08100000000000002
3 0.00729000000000002
4 0.00656100000000002

Distributed

RPC [Experimental]

torch.distributed.rpc is a newly introduced package. It contains basic building blocks to run functions remotely in model training and inference, which will be useful for scenarios like distributed model parallel or implementing parameter server frameworks. More specifically, it contains four pillars: RPC, Remote Reference, Distributed Autograd, and Distributed Optimizer. Please refer to the documentation and the tutorial for more details.

JIT

Mobile

Improvements

Distributed

Improvements

RPC Improvements

Documentation

MISC

JIT

Mobile

Named Tensors

C++ API

New torch::nn modules

New torch::nn::functional functions

AMD Support

ONNX

In PyTorch 1.4, we have mainly focused on expanding the coverage for ONNX Opset 11, and enabling exporting torchvision models. Most of the torchvision models can be exported to ONNX (Opset 11, with fixed input size), including FasterRCNN, MaskRCNN, and KeypointRCNN. We have also enhanced export support for some tensor indexing scenarios, with more enhancements to come in the next release. In addition, 20+ new PyTorch operators are enabled in ONNX exporter.

Expanding Coverage for ONNX Opset 11

Exporting More Torch Operators/Models to ONNX

Enhancing Export/Test Infra

Quantization

Quantization updates correspond to a mix of bug-fixes and feature improvements, with feature improvements adding improved operator coverage and performance improvements. We have also made a lot of progress towards enabling graph mode quantization support.

Visualization

Other Improvements

Bug Fixes

Distributed

RPC

C++ API Bug Fixes

JIT

Quantization

Mobile

Other Bug fixes

Deprecations

Python 2 support is deprecated and will not be supported in the 1.5 release.

torch.optim: Scheduler.step(epoch) is now deprecated; use Scheduler.step() instead. (26432)

For example:

>>> for epoch in range(10):
>>>    optimizer.step()
>>>    scheduler.step(epoch)
DeprecationWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, DeprecationWarning)

[C++] C++11 is deprecated and will not be supported in the 1.5 release.

[C++] Tensor::is_variable() has been deprecated. As noted in the Backwards Incompatible Changes section, the distinction between variable and non-variable has been eliminated, so this check is no longer meaningful. Generally, is_variable() will now return true except in some special circumstances (see 29653 for more details). (29653)

[C++] torch::nn::modules_ordered_dict has been deprecated. It is generally no longer necessary and can just be removed. (28774)

torch.jit.quantized API has been deprecated in favor of torch.quantization.quantize_dynamic (28766)

Performance

A benchmark suite is available to easily measure the performance of operators with a range of input shapes. The generated benchmark data fully characterize the performance of operators in terms of execution time. For more details see README.md in the benchmarks/operator_benchmark directory.

相关地址:原始地址 下载(tar) 下载(zip)

查看:2020-01-16发行的版本