MyGit

v1.9.0

pytorch/pytorch

版本发布时间: 2021-06-16 00:06:52

pytorch/pytorch最新发布版本:v2.4.1(2024-09-05 03:59:29)

PyTorch 1.9 Release Notes

Highlights

We are excited to announce the release of PyTorch 1.9. The release is composed of more than 3,400 commits since 1.8, made by 398 contributors. Highlights include:

We’d like to thank the community for their support and work on this latest release. We’d especially like to thank Quansight and Microsoft for their contributions.

You can find more details on all the highlighted features in the PyTorch 1.9 Release blogpost.

Backwards Incompatible changes

Python API

1.8.11.9.0
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([nan, nan, nan])
      
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([-inf, nan, inf])
      

1.8.11.9.0
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu') # uninitialized
tensor([7022349217739848992])
>>> a.new(a, device='cpu')
tensor([4294967295]) # uninitialized
      
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu')
RuntimeError: Legacy tensor constructor of the form torch.Tensor(tensor, device=device) is
not supported. Use torch.tensor(...) or torch.as_tensor(...) instead.
>>> a.new(a, device='cpu')
RuntimeError: Legacy tensor new of the form tensor.new(tensor, device=device) is not
supported. Use torch.as_tensor(...) instead.
      

1.8.11.9.0
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode='true')
tensor([2.1000, 2.1000])
      
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode=None) # equivalent to  torch.divide(a, b, rounding_mode='true') from the prior release
tensor([2.1000, 2.1000])
      

1.8.11.9.0
>>> import torch.tensor as tensor
>>> torch.tensor(1.)
tensor(1.)
      
>>> import torch.tensor as tensor
ModuleNotFoundError: No module named 'torch.tensor'
>>> from torch import tensor
>>> tensor(1.)
tensor(1.)
      

Autograd

1.8.1:

get_numerical_jacobian(torch.complex, (a, b), grad_out=2.0)

1.9.0:

      def wrapped(fn):
            def wrapper(*input):
                return torch.real(fn(*input))
            return wrapper
        
        get_numerical_jacobian(wrapped(torch.complex), (a, b), grad_out=1.0)

1.8.1:

# An example of a situation that will now return GradcheckError instead of
# RuntimeError is when there is a jacobian mismatch, which can happen
# for example when you forget to specify float64 for your inputs.
try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),))
except RuntimeError as e:
    assert type(e) is RuntimeError # explicitly check type -> NEEDS UPDATE

1.9.0:

try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),)
except RuntimeError as e:
   # GradcheckError inherits from RuntimeError so you can still catch this
   # with RuntimeError (No change necessary!)
   
   # BUT, if you explicitly check type...
   assert type(e) is torch.autograd.GradcheckError

torch.nn

v1.6 - v1.8.1:pre 1.6 & 1.9.0
>>> mha = torch.nn.MultiheadAttention(4, 2, bias=False)
>>> print(mha.out_proj.bias)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)
      
>>> mha = torch.nn.MultiheadAttention(4, 2, bias=False)
>>> print(mha.out_proj.bias)
None
      

1.8.1:1.9.0
>>> m = torch.nn.Linear(2, 3)
>>> def hook(mod, grad_input, grad_output):
>>> print('hook called:', grad_input, grad_output)
>>> m.register_full_backward_hook(hook)
>>> input_no_grad = torch.rand(1, 2, requires_grad=False)
>>> m(input_no_grad).sum().backward()
>>> input_grad = torch.rand(1, 2, requires_grad=True)
>>> m(input_grad).sum().backward()
hook called: (tensor([[0.1478, 0.6517]]),) (tensor([[1., 1., 1.]]),)
      
>>> m = torch.nn.Linear(2, 3)
>>> def hook(mod, grad_input, grad_output):
>>> print('hook called:', grad_input, grad_output)
>>> m.register_full_backward_hook(hook)
>>> input_no_grad = torch.rand(1, 2, requires_grad=False)
>>> m(input_no_grad).sum().backward()
hook called: (None,) (tensor([[1., 1., 1.]]),)
>>> input_grad = torch.rand(1, 2, requires_grad=True)
>>> m(input_grad).sum().backward()
hook called: (tensor([[0.1478, 0.6517]]),) (tensor([[1., 1., 1.]]),)
      

Dataloader

        # dataset returns numpy.random.randint(1, 10000) 
        ctx = mp.get_context('fork')
        gen = torch.Generator().manual_seed(0)
        dl = DataLoader(dataset, batch_size=2, num_workers=2, multiprocessing_context=ctx, generator=gen)
        for epoch in range(2):
            print("=" * 4, "Epoch", epoch, "=" * 4)
            for batch in dl:
                print(batch)

1.8.1:1.9.0
# When using fork, each worker has same random seed for NumPy random functions at each epoch.
========== Epoch 0 ==========
tensor([[ 0, 340],
[ 1, 7512]])
tensor([[ 2, 340],
[ 3, 7512]])
========== Epoch 1 ==========
tensor([[ 0, 340],
[ 1, 7512]])
tensor([[ 2, 340],
[ 3, 7512]])
      
# Random seeds for NumPy are different across `DataLoader` workers in each epoch.
========== Epoch 0 ==========
tensor([[ 0, 8715],
[ 1, 5555]])
tensor([[ 2, 6379],
[ 3, 1432]])
========== Epoch 1 ==========
tensor([[ 0, 1374],
[ 1, 996]])
tensor([[ 2, 143],
[ 3, 3507]])
      

A new attribute named type has been introduced for IterableDataset using the typing annotation at each class declaration. By adding this attribute, we are able to extend IterableDataset to have type inference and lazy initialization to incorporate the new DataLoader architecture. But, several BC-breaking restrictions are introduced due to this feature.

1.8.1:

# Users can use string to bypass the invalid type annotation without any error. 
# And, incorrect type annotations attached to `__iter__` function are ignored.

1.9.0:

# The following scenario will now raise different Exceptions
# 1) The type annotation is required to be valid now. Previous workaround
# like using string to  represent the invalid type annotation is not supported now.

# Raises Exception from the evaluation `eval("invalid_type", globals, locals)`
class DS(IterableDataset["invalid_type"]):  
     ...
# Raises TypeError if the return type of __iter__ is not an Iterator
class DS(IterableDataset[str]):
    def __iter__(self) -> str:
      ...
# Raise TypeError if the return type of __iter__ is of the form Iterator[X], 
# but the argument type X is not a subtype of the IterableDataset.type attribute.
class DS(IterableDataset[str]):
    def __iter__(self) -> Iterator[int]:
       ...

#  IterableDatset now has a metaclass, which will conflict with
#  existing user-defined metaclasses on IterableDatasets
class DS(IterableDataset[str], metaclass=MyMeta): 
    ...

Meta API

C++ API

1.8.1:

const at::Tensor a = at::randn({2, 2});
const at::Tensor b = at::ones({1, 4}, at::kInt);
at::Tensor& out = at::resize_as_(a, b); # success

1.9.0:

const at::Tensor b = at::ones({1, 4}, at::kInt);
at::Tensor& out = at::resize_as_(a, b); 
# error: binding value of type 'const at::Tensor' to reference to type 'at::Tensor' drops 'const' qualifier
const at::Tensor& out = at::resize_as_(a, b); # Success
at::Tensor out;  # Undefined Tensor
const at::Tensor a = at::randn({2, 2});
at::IntArrayRef dim = {1};
at::sum_out(out, a, dim);
# c10::Error: Expected a Tensor of type Variable but found an undefined Tensor for argument #4 'out'

TorchScript

1.8.1:

class MyClass:
    def __init__(self, a):
        self.attr = a
        
class MyModule(torch.nn.Module):
    def __init__(self):
        self.attr = MyClass(4)
        
sm = torch.jit.script(MyModule())

1.9.0:

class MyClass:
    def __init__(self, a):
        self.attr = a
        
class MyModule(torch.nn.Module):
    def __init__(self):
        self.attr = MyClass(4)
 
# RuntimeError: Could not cast attribute 'attr' to type Tensor: Unable to cast Python instance of type <class 'int'> to C++ type 'at::Tensor'         
sm = torch.jit.script(MyModule()) 

This error occurs because MyClass is automatically scripted, but self.attr is inferred to be a Tensor instead of an int because a is not annotated. To fix this, annotate a with the right type int, or mark attr as an attribute that should be ignored by the scripting process and not recursively processed:

       class MyModule(torch.nn.Module):
            __jit_ignored_attributes__ = ["attr"]
        
            def __init__(self):
                self.attr = MyClass(4)

Quantization

1.8.1:1.9.0
import torch.quantization.quantize_fx as quantize_fx
>>> m = quantize_fx.convert_fx(m, debug=True)
(Runs successfully)
      
>>> m = quantize_fx.convert_fx(m, is_reference=True) # Runs successfully
>>> m = quantize_fx.convert_fx(m, debug=True)
Traceback (most recent call last):
File "", line 1, in 
TypeError: convert_fx() got an unexpected keyword argument 'debug'
      

Distributed

1.8.1:

>>> # Assume the below is ran on 2 ranks in a distributed setting.
>>> rank_to_devices = { 0: [0, 1], 1: [2, 3] }
>>> # Each rank replicates model across 2 GPUs.
>>> model_ddp = torch.nn.parallel.DistributedDataParallel(
        model,
        device_ids=rank_to_devices[rank]
    )
>>> # No error is raised, but below warning is produced.
>>> UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES.

1.9.0:

>>> # Assume the below is ran on 2 ranks in a distributed setting.
>>> rank_to_devices = { 0: [0, 1], 1: [2, 3] }
>>> # Each rank replicates model across 2 GPUs.
>>> model_ddp = torch.nn.parallel.DistributedDataParallel(
        model,
        device_ids=rank_to_devices[rank]
    )
>>> # Single process multi-GPU mode now produces an error on initialization.
>>> ValueError: device_ids can only be None or contain a single element.

1.8.1:

#!/bin/bash
# Assumes training script train.py exists.
python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port="29500" --logdir test_logdir train.py
# Logs are written to $logdir/node_{}_local_rank_{}_stdout

1.9.0:

#!/bin/bash
# Assumes training script train.py exists.
python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port="29500" --log_dir test_logdir train.py
# Logs are written to $log_dir/$rank/stdout.log

Deprecations

Python API

Autograd

1.8.1:

{
  at::AutoNonVariableTypeMode guard(true);
}

1.9.0:

{
  c10::AutoDispatchBelowAutograd guard(true); // for kernel implementations
  // c10::InferenceMode guard(true); --> consider inference mode if you are looking for a user-facing API

}

1.8.1:

        # Instantiating custom function will raise a warning in 1.9
        Func().apply

1.9.0:

        # You should directly call the `apply` (classmethod) on the class
        Func.apply

C++ API

Distributed

New features

Python API

Complex Numbers

torch.nn

Profiler

Autograd

Dataloader

CUDA

C++ API

TorchScript

torch.package

Mobile

Distributed

torch.fx

ONNX

Vulkan

Misc

Improvements

Python API

Complex Numbers

Autograd

torch.nn

Dataloader

C++ API

AMD

CUDA

torch.fx

Profiler

TorchScript

torch.package

Quantization

Mobile

Distributed

torch.distributed.Store

torch.distributed.rpc

DistributedDataParallel

torch.distributed

torch.distributed.nn.RemoteModule

torch.futures.Future

torch.nn.SyncBatchNorm

torch.distributed.pipeline

Added new torch.distributed.elastic module that upstreams pytorch/elastic

torch.distributed.optim.ZeroRedundancyOptimizer

Combine backtrace print into one string to avoid interleaving (#56961). Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value (#58518).

ONNX

Vulkan

Benchmark

Misc

Bug fixes

Python API

Complex Numbers

Autograd

torch.nn

Dataloader

C++ API

AMD

CUDA

Dispatcher

torch.fx

Profiler

TorchScript

torch.package

Quantization

Mobile

Distributed

torch.distributed.Store

torch.distributed.rpc

torch.distributed

DistributedDataParallel

torch.distributed

torch.distributed.pipeline

torch.distributed.elastic

torch.nn.SyncBatchNorm

torch.distributed.optim.ZeroRedundancyOptimizer

Fix monitored_barrier with wait_all_ranks (#58702).

ONNX

Vulkan

Benchmark

Misc

Performance

Python API

Complex Numbers

Autograd

torch.nn

C++ API

CUDA

Composability

torch.fx

Profiler

TorchScript

Quantization

Mobile

Distributed

torch.distributed

Vulkan

Docs

Python API

Autograd

torch.nn

Dataloader

AMD

CUDA

torch.fx

Profiler

TorchScript

torch.package

Quantization

Mobile

Distributed

torch.distributed.Store

torch.distributed.optim

torch.distributed.elastic

DistributedDataParallel

torch.distributed.rpc

torch.distributed.nn.RemoteModule

torch.distributed.pipeline

torch.distributed

ONNX

相关地址:原始地址 下载(tar) 下载(zip)

查看:2021-06-16发行的版本