v0.3.1
版本发布时间: 2018-02-14 08:36:58
pytorch/pytorch最新发布版本:v2.5.1(2024-10-30 01:58:24)
Binaries
- Removed support for CUDA capability 3.0 and 5.0 (they still work for source builds for now, but the commitment to support this forward is removed)
- Stop binary releases for CUDA 7.5
- Add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities.
As always, links to our binaries are on http://pytorch.org
New features
- Add Cosine Annealing Learning Rate Scheduler https://github.com/pytorch/pytorch/pull/3311
- add
reduce
argument toPoissonNLLLoss
to be able to compute unreduced losses https://github.com/pytorch/pytorch/pull/3770 - Allow
target.requires_grad=True
inl1_loss
andmse_loss
(compute loss wrttarget
) https://github.com/pytorch/pytorch/pull/3876 - Add
random_split
that randomly splits a dataset into non-overlapping new datasets of given lengths https://github.com/pytorch/pytorch/pull/4435 - Introduced scopes to annotate ONNX graphs to have better TensorBoard visualization of models https://github.com/pytorch/pytorch/pull/5153
Allow
map_location
intorch.load
to be a string, such asmap_location='cpu'
ormap_location='cuda:2'
https://github.com/pytorch/pytorch/pull/4203
Bug Fixes
Data Loader / Datasets / Multiprocessing
- Made DataLoader workers more verbose on bus error and segfault. Additionally, add a
timeout
option to the DataLoader, which will error if sample loading time exceeds the given value. https://github.com/pytorch/pytorch/pull/3474 - DataLoader workers used to all have the same random number generator (RNG) seed because of the semantics of
fork
syscall. Now, each worker will have it's RNG seed set tobase_seed + worker_id
wherebase_seed
is a random int64 value generated by the parent process. You may usetorch.initial_seed()
to access this value inworker_init_fn
, which can be used to set other seeds (e.g. NumPy) before data loading.worker_init_fn
is an optional argument that will be called on each worker subprocess with the worker id as input, after seeding and before data loading https://github.com/pytorch/pytorch/pull/4018 - Add additional signal handling in DataLoader worker processes when workers abruptly die.
- Negative value for n_workers now gives a ValueError https://github.com/pytorch/pytorch/pull/4019
- fixed a typo in
ConcatDataset.cumulative_sizes
attribute name https://github.com/pytorch/pytorch/pull/3534 - Accept longs in default_collate for dataloader in python 2 https://github.com/pytorch/pytorch/pull/4001
- Re-initialize autograd engine in child processes https://github.com/pytorch/pytorch/pull/4158
- Fix distributed dataloader so it pins memory to current GPU not GPU 0. https://github.com/pytorch/pytorch/pull/4196
CUDA / CuDNN
- allow cudnn for fp16 batch norm https://github.com/pytorch/pytorch/pull/4021
- Use
enabled
argument intorch.autograd.profiler.emit_nvtx
(was being ignored) https://github.com/pytorch/pytorch/pull/4032 - Fix cuBLAS arguments for fp16
torch.dot
https://github.com/pytorch/pytorch/pull/3660 - Fix CUDA index_fill_ boundary check with small tensor size https://github.com/pytorch/pytorch/pull/3953
- Fix CUDA Multinomial checks https://github.com/pytorch/pytorch/pull/4009
- Fix CUDA version typo in warning https://github.com/pytorch/pytorch/pull/4175
- Initialize cuda before setting cuda tensor types as default https://github.com/pytorch/pytorch/pull/4788
- Add missing lazy_init in cuda python module https://github.com/pytorch/pytorch/pull/4907
- Lazy init order in set device, should not be called in getDevCount https://github.com/pytorch/pytorch/pull/4918
- Make torch.cuda.empty_cache() a no-op when cuda is not initialized https://github.com/pytorch/pytorch/pull/4936
CPU
- Assert MKL ld* conditions for ger, gemm, and gemv https://github.com/pytorch/pytorch/pull/4056
torch operators
- Fix
tensor.repeat
when the underlying storage is not owned bytorch
(for example, coming from numpy) https://github.com/pytorch/pytorch/pull/4084 - Add proper shape checking to torch.cat https://github.com/pytorch/pytorch/pull/4087
- Add check for slice shape match in index_copy_ and index_add_. https://github.com/pytorch/pytorch/pull/4342
- Fix use after free when advanced indexing tensors with tensors https://github.com/pytorch/pytorch/pull/4559
- Fix triu and tril for zero-strided inputs on gpu https://github.com/pytorch/pytorch/pull/4962
- Fix blas addmm (gemm) condition check https://github.com/pytorch/pytorch/pull/5048
- Fix topk work size computation https://github.com/pytorch/pytorch/pull/5053
- Fix reduction functions to respect the stride of the output https://github.com/pytorch/pytorch/pull/4995
- Improve float precision stability of
linspace
op, fix 4419. https://github.com/pytorch/pytorch/pull/4470
autograd
- Fix python gc race condition with THPVariable_traverse https://github.com/pytorch/pytorch/pull/4437
nn layers
- Fix padding_idx getting ignored in backward for Embedding(sparse=True) https://github.com/pytorch/pytorch/pull/3842 Fix cosine_similarity's output shape https://github.com/pytorch/pytorch/pull/3811
- Add rnn args check https://github.com/pytorch/pytorch/pull/3925
- NLLLoss works for arbitrary dimensions https://github.com/pytorch/pytorch/pull/4654
- More strict shape check on Conv operators https://github.com/pytorch/pytorch/pull/4637
- Fix maxpool3d / avgpool3d crashes https://github.com/pytorch/pytorch/pull/5052
- Fix setting using running stats in InstanceNorm*d https://github.com/pytorch/pytorch/pull/4444
Multi-GPU
- Fix DataParallel scattering for empty lists / dicts / tuples https://github.com/pytorch/pytorch/pull/3769
- Fix refcycles in DataParallel scatter and gather (fix elevated memory usage) https://github.com/pytorch/pytorch/pull/4988
- Broadcast output requires_grad only if corresponding input requires_grad https://github.com/pytorch/pytorch/pull/5061
core
- Remove hard file offset reset in load() https://github.com/pytorch/pytorch/pull/3695
- Have sizeof account for size of stored elements https://github.com/pytorch/pytorch/pull/3821
- Fix undefined FileNotFoundError https://github.com/pytorch/pytorch/pull/4384
- make torch.set_num_threads also set MKL threads (take 2) https://github.com/pytorch/pytorch/pull/5002
others
- Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 https://github.com/pytorch/pytorch/pull/4656
Performance improvements
- slightly simplified math in IndexToOffset https://github.com/pytorch/pytorch/pull/4040
- improve performance of maxpooling backwards https://github.com/pytorch/pytorch/pull/4106
- Add cublas batched gemm support. https://github.com/pytorch/pytorch/pull/4151
- Rearrange dimensions for pointwise operations for better performance. https://github.com/pytorch/pytorch/pull/4174
- Improve memory access patterns for index operations. https://github.com/pytorch/pytorch/pull/4493
- Improve CUDA softmax performance https://github.com/pytorch/pytorch/pull/4973
- Fixed double memory accesses of several pointwise operations. https://github.com/pytorch/pytorch/pull/5068
Documentation and UX Improvements
- Better error messages for blas ops with cuda.LongTensor https://github.com/pytorch/pytorch/pull/4160
- Add missing trtrs, orgqr, ormqr docs https://github.com/pytorch/pytorch/pull/3720
- change doc for Adaptive Pooling https://github.com/pytorch/pytorch/pull/3746
- Fix MultiLabelMarginLoss docs https://github.com/pytorch/pytorch/pull/3836
- More docs for Conv1d Conv2d https://github.com/pytorch/pytorch/pull/3870
- Improve Tensor.scatter_ doc https://github.com/pytorch/pytorch/pull/3937
- [docs] rnn.py: Note zero defaults for hidden state/cell https://github.com/pytorch/pytorch/pull/3951
- Improve Tensor.new doc https://github.com/pytorch/pytorch/pull/3954
- Improve docs for torch and torch.Tensor https://github.com/pytorch/pytorch/pull/3969
- Added explicit tuple dimensions to doc for Conv1d. https://github.com/pytorch/pytorch/pull/4136
- Improve svd doc https://github.com/pytorch/pytorch/pull/4155
- Correct instancenorm input size https://github.com/pytorch/pytorch/pull/4171
- Fix StepLR example docs https://github.com/pytorch/pytorch/pull/4478