MyGit

v1.5.0

pytorch/pytorch

版本发布时间: 2020-04-22 00:26:30

pytorch/pytorch最新发布版本:v2.4.1(2024-09-05 03:59:29)

PyTorch 1.5.0 Release Notes

Highlights

This release includes several major new API additions and improvements. These include new APIs for autograd allowing for easy computation of hessians and jacobians, a significant update to the C++ frontend, ‘channels last’ memory format for more performant computer vision models, a stable release of the distributed RPC framework used for model parallel training, and a new API that allows for the creation of Custom C++ Classes that was inspired by PyBind. Additionally torch_xla 1.5 is now available and tested with the PyTorch 1.5 release providing a mature Cloud TPU experience.

C++ Frontend API [Now Stable]

The C++ frontend API is now at parity with Python and the features overall has been moved to ‘stable’. (previously tagged as experimental). Some of the major highlights include:

Channels last memory format for Computer Vision models [Experimental]

Channels Last memory format is an alternative way of ordering NCHW tensors in memory while preserving the NCHW semantic dimensions ordering. Channels Last tensors are ordered in memory in such a way that channels become the densest dimension (aka storing images pixel-per-pixel).

Channels Last memory format unlocks the ability to use performance efficient convolution algorithms and hardware (NVidia’s Tensor Cores, FBGEMM, QNNPACK). Additionally it was designed to automatically propagate through the operators, which allows easy switching between memory layouts.

Learn more here on how to write memory format aware operators.

Custom C++ Classes [Experimental]

This release adds a new API for binding custom C++ classes into TorchScript and Python simultaneously. This API is almost identical in syntax to pybind11. It allows users to expose their C++ class and its methods to the TorchScript type system and runtime system such that they can instantiate and manipulate arbitrary C++ objects from TorchScript and Python. An example C++ binding:

template <class T>
struct MyStackClass : torch::CustomClassHolder {
  std::vector<T> stack_;
  MyStackClass(std::vector<T> init) : stack_(std::move(init)) {}

  void push(T x) {
    stack_.push_back(x);
  }
  T pop() {
    auto val = stack_.back();
    stack_.pop_back();
    return val;
  }
};

static auto testStack =
  torch::class_<MyStackClass<std::string>>("myclasses", "MyStackClass")
      .def(torch::init<std::vector<std::string>>())
      .def("push", &MyStackClass<std::string>::push)
      .def("pop", &MyStackClass<std::string>::pop)
      .def("size", [](const c10::intrusive_ptr<MyStackClass>& self) {
        return self->stack_.size();
      });

Which exposes a class you can use in Python and TorchScript like so:

@torch.jit.script
def do_stacks(s : torch.classes.myclasses.MyStackClass):
    s2 = torch.classes.myclasses.MyStackClass(["hi", "mom"])
    print(s2.pop()) # "mom"
    s2.push("foobar")
    return s2 # ["hi", "foobar"]

You can try it out in the tutorial here.

Distributed RPC framework APIs [Now Stable]

The torch.distributed.rpc package aims at supporting a wide range of distributed training paradigms that do not fit into DistributedDataParallel. Examples include parameter server training, distributed model parallelism, and distributed pipeline parallelism. Features in the torch.distributed.rpc package can be categorized into four main sets of APIs.

Learn more here.

torch_xla 1.5 now available

torch_xla is a Python package that uses the XLA linear algebra compiler to accelerate the PyTorch deep learning framework on Cloud TPUs and Cloud TPU Pods. torch_xla aims to give PyTorch users the ability to do everything they can do on GPUs on Cloud TPUs as well while minimizing changes to the user experience. This release of torch_xla is aligned and tested with PyTorch 1.5 to reduce friction for developers and to provide a stable and mature PyTorch/XLA stack for training models using Cloud TPU hardware. You can try it for free in your browser on an 8-core Cloud TPU device with Google Colab, and you can use it at a much larger scale on Google Cloud.

See the full torch_xla release notes here and the full docs here.

New High level autograd API [Experimental]

PyTorch 1.5 brings new functions including jacobian, hessian, jvp, vjp, hvp and vhp to the torch.autograd.functional.* submodule. This feature builds on the current API and allow the user to easily perform these functions.

See the full docs here.

Python 2 no longer supported

For PyTorch 1.5.0 we will no longer support Python 2, specifically version 2.7. Going forward support for Python will be limited to Python 3, specifically Python 3.5, 3.6, 3.7 and 3.8 (first enabled in PyTorch 1.4.0).

Known Issues

torch.nn.parallel.DistributedDataParallel does not work in Single-Process Multi-GPU mode.

DistributedDataParallel (DDP) used to support two modes

  1. Single-Process Multi-GPU (SPMG): In this mode, each DDP process replicates the input module to all specified devices and trains on all module replicas. This mode is enabled when application passes in a device_ids argument that contains multiple devices. Or if device_ids is not presented, DDP will try to use all available devices.
  2. Multi-Process Single-GPU (MPSG): This is the recommended mode, as it is faster than SPMG. In this mode, each DDP process directly works on the provided module without creating additional replicas. This mode is enabled when device_ids only contains a single device or if there is only one visible device (e.g., by setting CUDA_VISIBLE_DEVICES).

A recent change (#33907) in torch.nn.parallel.replicate breaks DDP’s assumption on replicated modules and leads to failures in the SPMG mode. However, since SPMG is known to be slower due to GIL contention and additional overhead caused by scattering input and gathering output, we are planning to retire this mode in future releases and make MPSG the only supported mode in DDP. The code below shows an example of the recommended way to construct DDP.

import torch
from torch.nn.parallel import DistributedDataParallel as DDP

# use "cuda:1" as the target device
target_device = 1 
local_model = torch.nn.Linear(2, 2).to(target_device)
ddp_model = DDP(local_model, device_ids=[target_device])

See #36268 for more discussion.

Tensor.exponential_(0) used to return Inf, now it incorrectly returns 0

Previously in 1.4, x.exponential_(0) gives a tensor full of inf. On 1.5.0, it wrongly gives a tensor full of zeros.

Version 1.4.0Version 1.5.0
>>> torch.randn(3).exponential_(0)
tensor([inf, inf, inf])
      
>>> torch.randn(3).exponential_(0)
# This is wrong!
tensor([0., 0., 0.])
      

See #36798 for more details

Backwards Incompatible Changes

Python

Tensor.clone, Tensor.to, Tensor.empty_like, and similar functions preserve stride information instead of returning contiguous tensors

clone, to, type, cuda, cpu, byte, char, double, bool, half, int, long, short, float, bfloat16, empty_like, full_like, ones_like, zeros_like, rand_like, randn_like, randint_like operators now propagate memory format (roughly, the strides) of the input tensor to the output tensor.

Since PyTorch operators generally support non-contiguous tensors, this should have no functional effect on most PyTorch programs.

The most common incompatibility with Python programs is with the view operator, which has specific stride requirements. If these requirements are no longer met as a result of this change, you will get an error message indicating that you should use reshape instead, i.e. "RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."

Another possible exception incompatibility is if you have a (usually) C++ operator implementation that works directly on memory (i.e. calls data_ptr and relies on the strides being contiguous).

In the following example, we go through the implementation of a simple clone operation and see how it needs to change between versions.

# Version 1.4.0
Tensor simple_clone(const Tensor& input) {
    TORCH_CHECK(input.dim() == 1);
    auto output = at::empty_like(input);
    auto input_stride = input.strides()[0];
    auto* output_ptr = output.data_ptr<float>();
    auto* input_ptr = input.data_ptr<float>();
    // Before 1.5.0, the result of `empty_like` is always contiguous.
    for (int64_t idx = 0; idx < input.size(); idx++) {
        output[idx] = input[idx * input_stride]
    }
}
# Version 1.5.0
Tensor simple_clone(const Tensor& input) {
    TORCH_CHECK(input.dim() == 1);
    // From 1.5.0 on, the result of `empty_like` may not be contiguous.
    auto output = at::empty_like(input);
    
    // As a result, we need to keep track of the output stride.
    auto input_stride = input.strides()[0];
    auto output_stride = output.strides()[0];
    auto* output_ptr = output.data_ptr<float>();
    auto* input_ptr = input.data_ptr<float>();
    for (int64_t idx = 0; idx < input.size(); idx++) {
        output[idx * output_stride] = input[idx * input_stride]
    }
}

The inferred dtype of np.float_, np.float64 scalars in tensor constructors (e.g. torch.tensor(...), torch.as_tensor(...) is now torch.float64 instead of the default dtype (usually torch.float32). (#30486 (https://github.com/pytorch/pytorch/pull/30486))

Please explicitly pass in the desired dtype when constructing tensors with NumPy float64 scalars to get the old behavior.

Version 1.4.0Version 1.5.0
# Old behavior: return torch.float32 tensor (by default)
>>> torch.tensor(np.float64(0))
tensor(0.)
      
# To keep the old behavior, please explicitly pass the dtype
 >>> torch.tensor(np.float64(0), dtype=torch.get_default_dtype())
tensor(0.)
    

This can cause your program to execute in torch.float64, potentially slowing down your program or can lead to errors for operators that don't support torch.float64 or mixed-dtypes.

numpy integer scalars are now treated as integers for the purposes of type promotion (#30486 (https://github.com/pytorch/pytorch/pull/30486))

Previously, in 1.4.0, they were mistakenly treated as floats (so for example, torch.ones(3) * np.int64(3) would return a float32 tensor. In 1.5.0, we’ve fixed that behavior; torch.ones(3) * np.int64(3) returns an int32 tensor.

This can cause your code to fail if you performed operations between PyTorch tensors and numpy scalars and then passed the result into an operation that does not support integral types or mixed types. To fix your code, please cast the resulting tensor to the desired dtype.

Version 1.4.0Version 1.5.0
>>> torch.ones(3) * np.int64(3)
tensor([3., 3., 3.])
      
>>> (torch.ones(3) * np.int64(3)).float()
tensor([3., 3., 3.])
    

numpy integer scalars are now treated as integers for the purposes of type promotion (#30486)

Previously, in 1.4.0, they were mistakenly treated as floats (so for example, torch.ones(3) * np.int64(3) would return a float32 tensor. In 1.5.0, we’ve fixed that behavior; torch.ones(3) * np.int64(3) returns an int32 tensor.

This can cause your code to fail if you performed operations between PyTorch tensors and numpy scalars and then passed the result into an operation that does not support integral types or mixed types. To fix your code, please cast the resulting tensor to the desired dtype.

Version 1.4.0Version 1.5.0
>>> torch.ones(3) * np.int64(3)
tensor([3., 3., 3.])
      
>>> (torch.ones(3) * np.int64(3)).float()
tensor([3., 3., 3.])
    

torch.autograd.Function: dropped support for old-style Functions (#33956).

In previous versions of PyTorch, there were two ways to write autograd Functions. We deprecated one of them in 1.3.0 and dropped support for it entirely in 1.5.0. Old-style autograd Functions will no longer work in user code.

These Functions be identified by not having staticmethod forward and backward functions (see the example below) Please see the current documentation for how to write new-style Functions.

# Version 1.4.0
class Exp(torch.autograd.Function):
    def forward(self, i):
        result = i.exp()
        self.save_for_backward(result)
        return result

    def backward(self, grad_output):
        result, = self.saved_tensors
        return grad_output * result

Exp()(torch.tensor(1.))
# Version 1.5.0
class Exp(torch.autograd.Function):
    @staticmethod
    def forward(ctx, i):
        result = i.exp()
        ctx.save_for_backward(result)
        return result
        
    @staticmethod
    def backward(ctx, grad_output):
        result, = ctx.saved_tensors
        return grad_output * result

Exp.apply(torch.tensor(1.))   

torch.optim optimizers changed to fix in-place checks for the changes made by the optimizer (#33640, #34211)

If this causes your code to fail, there are two possible reasons:

Reason 1: The value of that parameter was actually saved and used and we were computing incorrect gradients in previous versions of PyTorch. This would result in an error message mentioning incorrect version numbers. You should replace code that uses self.my_param by self.my_param.clone() to make sure the saved version is different from the one that is modified by the optimizer. For example:

Before 1.5.0, the following may have worked.

def model(input, target, param):
    return `(input * param ** 2 - target).norm()`

param = torch.randn(2, requires_grad=True)
input = torch.randn(2)
target = torch.randn(2)
sgd = optim.SGD([param], lr=0.001)
loss = model(input, target, param)
loss.backward(retain_graph=True)
sgd.step()
loss.backward()
param.grad

If after upgrading to 1.5.0, the above fails due to a version counter error, then that means the gradient computed was incorrect. To remedy this, clone param before using it in the model:

def model(input, target, param):
    return (input * param ** 2 - target).norm()

param = torch.randn(2, requires_grad=True)
input = torch.randn(2)
target = torch.randn(2)
sgd = optim.SGD([param], lr=0.001)
loss = model(input, target, param.clone())
loss.backward(retain_graph=True)
sgd.step()
loss.backward()
param.grad

Reason 2: You know what you're doing and change the values back to the right thing before the next backward. However, you're running into an error because the version counter cannot be decremented. Open an issue with your particular use case and we will help you to work around the version counter issue.

utils.cpp_extensions now use ninja as the default compilation backend (#32495)

ninja enables parallel compilation of your C++ extension, greatly speeding up compilation. This change will not break most user code; if you do not have ninja installed, we fallback to the old distutils backend.

However, if you do have ninja installed, it is possible that this change will cause your C++ extension build to fail by oversubscribing your system with too many worker processes. There are two potential workarounds to this.

Method 1: If a previously succeeding python setup.py install now fails, try setting the MAX_JOBS environment variable.

Version 1.4.0Version 1.5.0
python setup.py install
      
MAX_JOBS=2 python setup.py install
      

Method 2: Switch back to the old distutils backend inside your setup.py

Version 1.4.0Version 1.5.0
cmdclass={'clean': clean,
          'build_ext': BuildExtension},
      
cmdclass={'clean': clean,
          'build_ext': BuildExtension.with_options(use_ninja=False)},
      

torch.optim.Adam, torch.optim.SGD changed to not modify gradients in-place (#30257)

In previous versions of PyTorch, the Adam and SGD optimizers modified gradients (e.g. param.grad) in-place via in-place addition of params.grad += weight_decay * param. To make this consistent with the behavior of other optimizers and to prevent surprises about the behavior, we’ve changed them to stop modifying gradients in-place.

This should not have an effect on most PyTorch programs unless they relied on this behavior. The easiest way to replicate the old behavior is to create a custom optimizer that implements it.

torch.masked_select now always returns a 1D tensor (#29923)

The behavior of torch.masked_select when both "self" and "mask" are 0-dimensional was changed. In previous versions of PyTorch, this would return a 0-dimensional tensor. Now, we return a 1-dimensional tensor to be consistent with other input sizes and our documentation.

Version 1.4.0Version 1.5.0
>>> torch.masked_select(torch.tensor(0), torch.tensor(True))
tensor(0)
      
>>> torch.masked_select(torch.tensor(0), torch.tensor(True))
tensor([0])
      

torch.index_select on a 0-d tensor now returns a 0-d tensor. (#30790)

In previous versions of PyTorch, the output of torch.index_select on a 0D input tensor produced a 1D tensor. This was inconsistent with our documentation on it, which stated "The returned tensor has the same number of dimensions as the original tensor (input)." Now, we return a 0D tensor.

Version 1.4.0Version 1.5.0
>>> torch.index_select(torch.tensor(5), 0, torch.tensor([0]))
tensor([5])
      
>>> torch.index_select(torch.tensor(5), 0, torch.tensor([0]))
tensor(5)
      

nn.MultiLabelMarginLoss: 'none' reduction on 1D tensor now returns a 0D tensor (#30768)

In previous versions of PyTorch, the output of nn.MultiLabelMarginLoss on 1D and 0D tensors incorrectly produced 1-D tensors. Now, those cases return a 0D tensor to be consistent with the 2-D tensor case.

Version 1.4.0Version 1.5.0
>>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long))
tensor([0.2959])
      
>>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long))
tensor(0.2959)
      

nn.MultiMarginLoss: ‘none' reduction on 1D target now returns a 1D tensor (#30826)

In previous versions of PyTorch, the output of nn.MultiMarginLoss on a 1D target tensor produced a 0D output. We changed this to return a 1D target tensor to make it consistent with other input sizes which return an output that matches the target shape.

Version 1.4.0Version 1.5.0
>>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0]))
tensor(0.)
      
>>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0]))
tensor([0.])
      

Tensor.exponential_(lambda) no longer supports lambda < 0 (#32501)

lambda, the rate parameter of the exponential distribution, mathematically should be greater than 0. We’ve disabled support lambda < 0 to be mathematically correct; most users will not have used a lambda less than zero.

Version 1.4.0Version 1.5.0
tensor = torch.empty(3).exponential_(-1.5)
      
# Negative lambda not supported!
      

nn.BCELoss, nn.functional.binary_cross_entropy no longer accept inputs with the same number of elements that are not broadcastable (#31365)

Previously, we supported accepting inputs with the same number of elements. However, this behavior was deprecated and we removed it in 1.5.0. In order to replicate the old behavior, please explicitly reshape your input and target tensors to have the same shape.

Version 1.4.0Version 1.5.0
>>> input = torch.rand(3, 3)
>>> target = torch.randn(9)
>>> torch.nn.functional.binary_cross_entropy(input, target)
      
>>> input = torch.rand(3, 3)
>>> target = torch.randn(9)
>>> torch.nn.functional.binary_cross_entropy(input, target.reshape_as(input))
      

torch.normal out argument is now required to have the same size as the computed output (#32031)

Previously, on CPU devices, torch.normal(mean, std, out=out) would resize out to the correct size. To be consistent with the CUDA implementation, we’ve changed it so that out must either already have the correct size, or be an empty tensor with size [0]. To work around this, please ensure that your out tensor has the correct size.

Version 1.4.0Version 1.5.0
>>> torch.normal(torch.zeros(3), torch.ones(3), out=torch.randn(2))
tensor([ 0.0300,  0.7830, -1.3579])
      
>>> torch.normal(torch.zeros(3), torch.ones(3), out=torch.randn(2))
RuntimeError: inconsistent tensor, output size ([2]) is not the same as broadcasted mean and std size (3)
      

Tensor.geometric_ no longer supports integral Tensors (#31878)

Previously, on CPU devices, Tensor.geometric_ supported Tensors with integral dtype. Now, it only supports floating point. We removed support for this because it doesn’t make sense for geometric_ to operate on integral dtypes.

Changed torch.floor_divide input positional argument name to self (#34552)

Before PyTorch 1.5, torch.floor_divide took two positional arguments: torch.floor_divide(input, other). We’ve changed the name of the input argument to self; this will break code that called torch.floor_divide via keyword argument. For example:

Version 1.4.0Version 1.5.0
torch.floor_divide(input=x, other=y)
      
# Either of the following works.
torch.floor_divide(self=x, other=y)
torch.floor_divide(x, y)
      

C++ API

RNN / GRU / LSTM layers (#34322)

Upsample layer / F::interpolate function (#35025)

Optimizers

struct MyOptimizer : Optimizer {
  using Optimizer::Optimizer;
  void step() override {...}
};
* you would need to update your optimizer class definition as follows:
struct MyOptimizer : Optimizer {
  using Optimizer::Optimizer;
  torch::Tensor step(LossClosure closure = nullptr) override {
    ...
    // return `torch::Tensor()` if `closure` is nullptr
    // (i.e. we are not computing the loss)
    return torch::Tensor();
  }
};
auto& param_state = static_cast<AdagradParamState&>(
  *optimizer.state()[c10::guts::to_string(parameter.unsafeGetTensorImpl())]);

// Use the following to access parameter state:
//
// param_state.sum()
// param_state.step()
auto& param_state = static_cast<SGDParamState&>(
  *optimizer.state()[c10::guts::to_string(parameter.unsafeGetTensorImpl())]);

// Use the following to access parameter state:
//
// param_state.momentum_buffer()
auto& param_state = static_cast<AdamParamState&>(
  *optimizer.state()[c10::guts::to_string(parameter.unsafeGetTensorImpl())]);

// Use the following to access parameter state:
//
// param_state.step()
// param_state.exp_avg()
// param_state.exp_avg_sq()
// param_state.max_exp_avg_sq()
auto& param_state = static_cast<RMSpropParamState&>(
  *optimizer.state()[c10::guts::to_string(parameter.unsafeGetTensorImpl())]);

// Use the following to access parameter state:
//
// param_state.square_avg()
// param_state.momentum_buffer()
// param_state.grad_avg()
auto& param_state = static_cast<LBFGSParamState&>(
  *optimizer.state()[c10::guts::to_string(parameter.unsafeGetTensorImpl())]);

// Use the following to access parameter state:
//
// param_state.d()
// param_state.H_diag()
// param_state.prev_flat_grad()
// param_state.t()
// param_state.prev_loss()
// param_state.ro()
// param_state.al()
// param_state.old_dirs()
// param_state.old_stps()
// param_state.func_evals()
// param_state.n_iter()

Removed AutoGIL/AutoNoGIL in favor of pybind11::gil_scoped_* functions (#34301)

If your code released or acquired the GIL via AutoNoGIL or AutoGIL, please change the invocations to pybind11::gil_scoped_release or pybind11::gil_scoped_release, respectively.

Others

JIT

Simple Executor Is Now On By Default

The simple executor skips the number of fusion-related passes and analyses that are very time-consuming. Disabling these optimizations fixes pathologically long compilation times. The users that rely on GPU fusion to have their desired performance profile, should turn on the profiling executor. We provide C++ and python API to enable the profiling executor:

Quantization

Remove qconfig_dict in top level eager mode quantization API (#31972).

In eager mode quantization, one needs to manually insert quant and dequant stubs in a model to specify where activations are quantized. Having a qconfig_dict that specifies the quantization configuration for each module is not useful as one needs to manually modify the model with quant/dequant stubs. The new API makes it explicit that the model needs to be manually modified for quantization.

# previously qconfig_dict was an optional argument to prepare
def prepare(model, qconfig_dict=None, inplace=False):

# now replaced with
def prepare(model, inplace=False):

RPC

Functional API for Distributed Autograd and Distributed Optimizer

More specifically, callers must pass context_id to torch.distributed.autograd.backward() and torch.distributed.optim.step(). (#33711)

# Before
import torch.distributed.autograd as dist_autograd
import torch.distributed.rpc as rpc
from torch import optim
from torch.distributed.optim import DistributedOptimizer

with dist_autograd.context() as context_id:
    # Forward pass.
    rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3))
    rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1))
    loss = rref1.to_here() + rref2.to_here()
    # Backward pass.
    dist_autograd.backward([loss.sum()])
    # Optimizer.
    dist_optim = DistributedOptimizer(
        optim.SGD,
        [rref1, rref2],
        lr=0.05,
    )
# After
import torch.distributed.autograd as dist_autograd
import torch.distributed.rpc as rpc
from torch import optim
from torch.distributed.optim import DistributedOptimizer

with dist_autograd.context() as context_id:
    # Forward pass.
    rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3))
    rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1))
    loss = rref1.to_here() + rref2.to_here()
    # Backward pass.
    dist_autograd.backward(context_id, [loss.sum()])
    # Optimizer.
    dist_optim = DistributedOptimizer(
        optim.SGD,
        [rref1, rref2],
        lr=0.05,
    )
    
    dist_optim.step(context_id)    

Disallow sending CUDA tensors over RPC

The motivation is to prevent potential invalid device errors when the number of devices on the sender and the receiver does not match. However applications, can always move CUDA tensors to CPU before sending (#33604).

Version 1.4.0Version 1.5.0
import torch
import torch.distributed.rpc as rpc
rpc.init_rpc("worker0", rank=0, world_size=2)
x = torch.zeros(2, device=0)
ret = rpc.rpc_sync("worker1", torch.add, args=(x, 3))
rpc.shutdown()
      
import torch
import torch.distributed.rpc as rpc
rpc.init_rpc("worker0", rank=0, world_size=2)
x = torch.zeros(2, device=0)
ret = rpc.rpc_sync("worker1", torch.add, args=(x.cpu(), 3))
rpc.shutdown()
      

New Features

Python

Added new functional autograd API (#34066)

New __torch_function__ API Override Mechanism (#30730, #32194, #32799, #34240, #34303).

We introduced __torch_function__, an API override mechanism for subclassing torch.Tensor in Python. This is useful for creating custom objects that implement the torch.* APIs. These currently support overriding most torch.*, and torch.nn.functional APIs; we’ve also planned future support for subclassing torch.Tensor (see tracking issue #22402).

New Operators

Distributions

C++ API

Distributed

Mobile

Quantization

RPC

Improvements

AMD/ROCm

C++ API

Distributed

Distributions

Mobile

ONNX

Exporting More Torch Operators to ONNX

In PyTorch 1.5, we have added support for 10 additional operators and also enhanced support for another set of 10+ existing operators. We have also added support for exporting large models (> 2GB) to ONNX. Additionally, we have made enhancements and optimizations to the export of ScriptModules and will continue to do that in the next release. We have also made improvements to the custom op export experience.

Enhancing the Support for ScriptModule

Enhancing Existing Export Logic

Optimizing Exported ONNX Graph

Adding Utility Functions and Refactoring

Operator Benchmark

Quantization

RPC

Type Hints

Other

Bug Fixes

C++ API

Distributed

JIT

def foo(float_matrix, scalar_ten):
    # type: (Tensor, Tensor) -> Tuple[List[List[float]], bool]
    out1 : List[List[float]] = float_matrix.tolist()
    out2 = torch.jit.annotate(bool, scalar_ten.tolist())
    return out1, out2

Mobile

ONNX

Quantization

RPC

Other

Performance

Mobile

Quantization

RPC

Other

Documentation

Python

C++ API

RPC

Mobile

Quantization

Deprecations

Python

How to figure out which line in your code is raising a warning

Attempting to use deprecated behavior will raise warnings. Unfortunately, sometimes it is not entirely obvious what line of code the warning corresponds to, especially if the the warning comes from our C++ backend. For example, with a file named foo.py with the following contents,

import torch
# This is newly deprecated behavior, see the next section
torch.tensor(1) / torch.tensor(2)

running it doesn’t give us the location of the warning:

> python foo.py
../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true
 division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

We can use the warnings module to tell us where the warning is by asking it to treat warnings as errors:

import torch
import warnings
warnings.filterwarnings('error', message='Integer division')
# This is newly deprecated behavior, see the next section
torch.tensor(1) / torch.tensor(2)

Running the file now tells us exactly where the warning is:

> python foo.py
Traceback (most recent call last):
  File "foo.py", line 5, in <module>
    torch.tensor(1) / torch.tensor(2)
UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide
or floor_divide (// in Python) instead.

Deprecated torch.div and torch.addcdiv integer floor division behavior (#34570)

In 1.5.0 and older PyTorch releases torch.div and the / operator perform integer floor division. In a future PyTorch release, torch.div (including the / operator) will perform "true" division as in Python3 and NumPy.

To floor divide integer tensors, please use torch.floor_divide instead.

BeforeAfter
>>> torch.tensor(3) / torch.tensor(2)
../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tensor(1)
      
>>> NB: the following is equivalent to `torch.floor_divide(torch.tensor(3), torch.tensor(2))
>>> torch.tensor(3) // torch.tensor(2)
tensor(1)
      

The fix for torch.addcdiv is similar.

BeforeAfter
>>> input = torch.tensor(0)
>>> tensor = torch.tensor(1)
>>> other = torch.tensor(3)
>>> value = 1
>>> torch.addcdiv(input, tensor, other, value=value)
../aten/src/ATen/native/PointwiseOps.cpp:81: UserWarning: Integer division with addcdiv is deprecated, and in a future  release addcdiv will perform a true division of tensor1 and tensor2. The current addcdiv behavior can be replicated using floor_divide for integral inputs (self + value * tensor1 // tensor2) and division for float inputs (self + value * tensor1 / tensor2). The new addcdiv behavior can be implemented with true_divide (self + value * torch.true_divide(tensor1, tensor2).
tensor(0)
      
>>> input = torch.tensor(0)
>>> tensor = torch.tensor(1)
>>> other = torch.tensor(3)
>>> value = 1
>>> (input + torch.floor_divide(value * tensor, other))
tensor(0)
      

Deprecated torch.full returning float tensors if no dtype is specified (#34709).

In a future PyTorch release, torch.full will infer its dtype from its fill value when the optional dtype and out parameters are unspecified, matching NumPy's inference for numpy.full. For example, torch.full(size, 1) will return a tensor of torch.long dtype, unlike today where it returns a tensor of torch.float dtype.

Deprecated torch.nn.modules.conv._ConvTransposeMixin (#31784).

This is an internal-facing class that is not a part of our public API. We’ve refactored some PyTorch internals to work without it and will remove it in a future release.

Deprecated positional args in multiple torch function signatures (#32009, #33428)

Below please find a list of deprecated signatures and what to change them to.

BeforeAfter
>>> torch.zeros(2,3).add(2, torch.ones(2, 3))
../torch/csrc/utils/python_arg_parser.cpp:750: UserWarning: This overload of add is deprecated:
        add(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add(Tensor other, Number alpha)
tensor([[2., 2., 2.],
        [2., 2., 2.]])
      
>>> torch.zeros(2, 3).add(torch.ones(2, 3), alpha=2)
tensor([[2., 2., 2.],
        [2., 2., 2.]])
      

Deprecate modifying in-place a view that returned by a custom autograd Function (#32839).

Modifying in-place a view that was created by a custom Function leads to the custom backward not being called or being called with a partial gradient. This behavior will be removed in 1.6.

Please clone() the output of the Function to avoid incorrect gradient computation.

class Id(Function):
    @staticmethod
    def forward(ctx, input):
        return input.view_as(input)

    @staticmethod
    def backward(ctx, grad_input):
        return grad_input

Version 1.5.0Version 1.5.0
>>> input = torch.randn(3, requires_grad=True)
>>> other = torch.randn(3)
>>> output = Id.apply(input)
>>> output.copy_(other)
# Warning: Incorrect gradients
      
>>> input = torch.randn(3, requires_grad=True)
>>> other = torch.randn(3)
>>> output = Id.apply(input).clone()
>>> output.copy_(other)
      

Deprecate modifying in-place a view created inside a no_grad block (#32839)

Modifying in-place a view created inside a no_grad block is ambiguous and error-prone so we have deprecated it.

Here is an example of some code that we’ve deprecated. In previous versions of PyTorch, the following code throws a non-descriptive error message, but we've added a deprecation in 1.5.0.

>>> base = torch.rand(10, requires_grad=True)
>>> var = torch.rand([], requires_grad=True)
>>> with torch.no_grad():
>>>     view = base[1]
>>> view.copy_(var)
>>> torch.autograd.grad(base.sum(), var)
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone,
it is deprecated and will be forbidden  starting 1.6 (see https://github.com/pytorch/pytorch/pull/32839 for more details about this). You can clarify your code and remove this warning by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked).

If you want to differentiate, you should change the above code to

>>> base = torch.rand(10, requires_grad=True)
>>> var = torch.rand([], requires_grad=True)
>>> view = base[1]
>>> view.copy_(var)
>>> torch.autograd.grad(base.sum(), var)
(tensor(1.),)

If you don’t want to differentiate, you should change it to

>>> base = torch.rand(10, requires_grad=True)
>>> var = torch.rand([], requires_grad=True)
>>> with torch.no_grad():
>>>     view = base[1]
>>>     view.copy_(var)

C++ API

Deprecated Tensor.type() (#30281)

Please use Tensor.options() instead.

Miscellaneous

相关地址:原始地址 下载(tar) 下载(zip)

查看:2020-04-22发行的版本