v13.0.0rc1

版本发布时间: 2023-12-07 15:58:06

cupy/cupy最新发布版本:v13.3.0(2024-08-22 15:42:45)

This is the release note of v13.0.0rc1. See here for the complete list of solved issues and merged PRs.

This is a release candidate of the CuPy v13 series. Please start testing your workload with this release to prepare for the final v13 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v13.

💬 Join the Matrix chat to talk with developers and users and ask quick questions!

🙌 Help us sustain the project by sponsoring CuPy!

✨ Highlights

NVIDIA cuTENSOR 2.0

NVIDIA cuTENSOR is a performant and flexible library for accelerating tensor linear algebra. CuPy v13 supports cuTENSOR 2.0, the latest major release of the library, achieving higher performance than cuTENSOR 1.x series.

NVIDIA RAPIDS cuSignal Integration

cuSignal is a library developed by the NVIDIA RAPIDS project that provides GPU-accelerated implementation of signal processing algorithms using CuPy as a backend. cuSignal includes scipy.signal compatible APIs, so we share the same goals. After a discussion with the cuSignal team, we agreed to merge cuSignal into CuPy to provide users with a better experience using a unified library for SciPy routines on GPU.

Currently, most of the functions provided in cuSignal have been merged into CuPy, and the remaining items are expected to be merged into CuPy v13 in due course.

We would like to acknowledge and thank @awthomp and everyone involved in the cuSignal development for creating a great library and agreeing to this transition.

Distributed NDArray (experimental) (#7881)

Added initial support for sharding ndarrays across multiple GPU devices connected to the same host.

from cupyx.distributed.array import distributed_array

shape = (16, 16)
cpu_array = numpy.random.rand(*shape)
# Set the chunk indexes for each device
# device 0 holds  rows 0..8 and device 1 holds rows 8..16
mapping = {
        0: [(slice(8), slice(None, None))],
        1: [(slice(8, None), slice(None, None))],
}
# The array is allocated in devices 0 and 1
multi_gpu_array = distributed_array(cpu_array, mapping)

This work was done by @shino16 during the Preferred Networks 2023 summer internship.

Support for Python 3.12

Binary packages are now available for Python 3.12.

🛠️ Changes without compatibility

CUDA Runtime API is now statically linked

CuPy is now shipped with CUDA Runtime statically linked. Due to this, cupy.cuda.runtime.runtimeGetVersion() always returns the version of CUDA Runtime that CuPy is built with, regardless of the version of CUDA Runtime installed locally. If you need to retrieve the version of CUDA Runtime shared library installed locally, use cupy.cuda.get_local_runtime_version() instead.

📝 Changes

New Features

Port lombscargle from cuSignal to cupyx.scipy.signal (#7563)
Port periodogram, welch and csd from cuSignal to cupyx.signal (#7564)
Port cusignal windows module to cupyx.scipy.signal (#7568)
Add cupy.lib.stride_tricks.sliding_window_view (#7575)
cupyx/scipy/signal: add place poles (#7666)
Add check_{NOLA, COLA} to cupyx.scipy.signal (#7675)
Port argrel{extrema, max, min} to cupyx.scipy.signal from cusignal (#7694)
Port waveforms from cusignal to cupyx.scipy.signal (#7696)
Port wavelets module from cusignal to cupyx.scipy.signal (#7700)
Add 2D signal b-splines to cupyx.scipy.signal (#7721)
Port firwin/firwin2 from cuSignal (#7722)
port upfirdn from cuSignal (#7749)
Support boolean COO sparse matrix (#7764)
Port gauss_spline from cuSignal (#7837)
Port stft/istft from CuSignal to cupyx.scipy.signal (#7838)
Port vectorstrength, coherence and spectrogram from CuSignal to cupyx.scipy.signal (#7853)
Port decimate, resample and resample_poly from cuSignal to cupyx.scipy.signal (#7855)
Add max_len_seq to cupyx.scipy.signal (#7867)
Add distributed ndarray (#7942)

Enhancements

Implement axis parameter on cupy.unique (#6886)
Load cuTENSOR from wheel distribution (#7025)
Soft link NVRTC for cupy_backends.cuda.libs.nvrtc (#7621)
Add a property to get access to the nccl handle. (#7823)
Remove cusolver_enabled, cub_enabled, thrust_enabled flags (#7840)
Lazy import cuSOLVER (#7843)
Lazy import cuSPARSE (#7847)
Lazy import cuFFT (#7849)
Static link to CUDA Runtime in CUB module (#7850)
Bundle CCCL in CuPy (#7851)
Lazy import cuRAND (#7856)
Use NVRTC for compiling kernels calling cupyx.jit.cub APIs (#7869)
Add optional argument device_id=-1 to get_current_stream (#7885)
Prohibit conversion from Variable to Python scalar in fusion (#7887)
Add __slots__ to cupy.ndarray (#7891)
Lazy import cuBLAS (#7921)
Allow Jitify to only cache CuPy-owned headers (#7934)
Ensure D2H copies are stream ordered and by default blocking (#7938)
Accelerate H2D copies when the source is on pinned memory (#7939)
Add Linux CI for Python 3.12 (#7940)
MNT: Suppress CUB compilation warnings (#7943)
Static link CUDA Runtime (#7954)
Add debug feature to preloading and softlink (#7977)
Support cuTensor 2.0 (#7984)
Bump supported NumPy & SciPy versions (#7992)
Softlink CUDA Driver (#7994)
Show local runtime version in cupy.show_config() (#7995)
Avoid using numpy.find_common_type (#7651)
ENH: Remove NINF, PINF, Inf,... usages (#7800)
Fix cupy.empty_like parameter name to prototype (#7827)
Make stream kwonly argument in ndarray.__dlpack__ (#7829)
Remove conversions of array with ndim > 0 to a scalar (#7886)
scipy.linalg.{tri/tril/triu} are deprecated in SciPy 1.11.0 (#7889)
Fix signal.medfilt complex error type for SciPy>=1.11 (#7890)
Fix cupyx.scipy.sparse._base tests for SciPy 1.11 (#7905)
Fix return type of division of csr_matrix and dense array for SciPy 1.11 (#7906)
Fix maxiter in TestLOBPCG (#7908)

Performance Improvements

Optimize spmatrix._set_many (#7888)

Bug Fixes

Fix csr2dense to avoid race conditions (#7724)
Fix cuTENSOR contraction descriptor cache (#7814)
Fix handling of scalars in cupy.r_ (#7815)
Fix cupy.r_ for scalar inputs (#7896)
Fixed Improper Method Call: Replaced NotImplementedError with NotImplemented (#7900)
Provide .stop() method for cupyx.distributed._Backend (#7952)
Fix NVRTCError not calling initialize() (#7955)
Import cupyx.lapack inside cupy.linalg.solve (#7966)
Add lazy load for cupyx.lapack (#7993)
Fix issues with the initial state when a SOS filter has no IIR part (#7998)
Avoid using pkg_resources for cuTENSOR wheel discovery (#8012)

Code Fixes

MNT: suppress compiler warning from cupyx.cusolver (#7714)
Add type annotation in _creation.basic (#7739)
Fix nvrtc initialize not inlined for CUDA Python (#7842)
Fix coding style (#7844)
Reorganize directory structure around CCCL (#7920)
Remove deprecated ast expr in CuPy JIT (#7941)
Reorganize third party code under third_party directory (#7956)

Documentation

Add -U to pre-release installation command (#7803)
Fix get_window docstring reference (#7835)
Clarify sparse .transpose() return type in docstrings (#7868)
DOC: cupyx/scipy: add missing names (#7898)
Fix CUDA 12.2 for Windows notice (#7922)
Bump CuPy version in install.rst (#8002)
Update installation guide to note about cuTENSOR 2.0 support (#8003)
Update wheels list in README (#8006)

Installation

Avoid warning when uploading packages (#7792)
Fix ROCm Dockerfile not working (#7797)
Add cuSignal license (#7816)
Improve symlink handling and preflight (#7945)
Bump docker cuda version to 12 (#7973)

Tests

Add timeout to Windows CI (#7775)
Fix mypy not installed in pre-review test (#7832)
Execution tests for typing tests passing rows in typing_tests (#7836)
CI: Remove path length limitation on Windows CI image (#7857)
Fix Windows CI failures (#7862)
Skip test_pos_boolarray if numpy>=1.25 (#7893)
Add NumPy 1.25/1.26 & SciPy 1.11 to CI (#7897)
Skip some LOBPCG tests failing with SciPy 1.11 (#7924)
Support Python 3.12, add Windows CI (#7947)
Skip logspace test in NumPy 1.25 & 1.26 (#7946) (#7948)
Fix Windows test scripts (#7957)
Skip test_parameterize_pytest_impl test for pytest 7.4.3 (#7965)
Fix TestLOBPCG.test_maxit_None CUDA 12.2 CI failure (#8000)

Others

Fix publish workflow permission and output for review (#7788)
Fix backport workflow (#7831)
Avoid triggering Project Updates for updates from assignees (#7861)
Bump version to v13.0.0rc1 (#8015)

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @fazledyn-or @kerry-vorticity @kmaehashi @leofang @loganbvh @milesvant @mtsokol @mvnvidia @negin513 @shino16 @takagi

相关地址：原始地址下载(tar) 下载(zip)

1、 cupy-13.0.0rc1.tar.gz 3.34MB

2、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl 102.12MB

3、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 90.83MB

4、 cupy_cuda11x-13.0.0rc1-cp310-cp310-win_amd64.whl 71.34MB

5、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_aarch64.whl 103.37MB

6、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 91.35MB

7、 cupy_cuda11x-13.0.0rc1-cp311-cp311-win_amd64.whl 71.31MB

8、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_aarch64.whl 102.39MB

9、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 91.18MB

10、 cupy_cuda11x-13.0.0rc1-cp312-cp312-win_amd64.whl 71.25MB

11、 cupy_cuda11x-13.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl 102.92MB