v13.0.0rc1
版本发布时间: 2023-12-07 15:58:06
cupy/cupy最新发布版本:v13.3.0(2024-08-22 15:42:45)
This is the release note of v13.0.0rc1. See here for the complete list of solved issues and merged PRs.
This is a release candidate of the CuPy v13 series. Please start testing your workload with this release to prepare for the final v13 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre
. See the Upgrade Guide for the list of possible breaking changes in v13.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
NVIDIA cuTENSOR 2.0
NVIDIA cuTENSOR is a performant and flexible library for accelerating tensor linear algebra. CuPy v13 supports cuTENSOR 2.0, the latest major release of the library, achieving higher performance than cuTENSOR 1.x series.
NVIDIA RAPIDS cuSignal Integration
cuSignal is a library developed by the NVIDIA RAPIDS project that provides GPU-accelerated implementation of signal processing algorithms using CuPy as a backend. cuSignal includes scipy.signal
compatible APIs, so we share the same goals. After a discussion with the cuSignal team, we agreed to merge cuSignal into CuPy to provide users with a better experience using a unified library for SciPy routines on GPU.
Currently, most of the functions provided in cuSignal have been merged into CuPy, and the remaining items are expected to be merged into CuPy v13 in due course.
We would like to acknowledge and thank @awthomp and everyone involved in the cuSignal development for creating a great library and agreeing to this transition.
Distributed NDArray (experimental) (#7881)
Added initial support for sharding ndarray
s across multiple GPU devices connected to the same host.
from cupyx.distributed.array import distributed_array
shape = (16, 16)
cpu_array = numpy.random.rand(*shape)
# Set the chunk indexes for each device
# device 0 holds rows 0..8 and device 1 holds rows 8..16
mapping = {
0: [(slice(8), slice(None, None))],
1: [(slice(8, None), slice(None, None))],
}
# The array is allocated in devices 0 and 1
multi_gpu_array = distributed_array(cpu_array, mapping)
This work was done by @shino16 during the Preferred Networks 2023 summer internship.
Support for Python 3.12
Binary packages are now available for Python 3.12.
🛠️ Changes without compatibility
CUDA Runtime API is now statically linked
CuPy is now shipped with CUDA Runtime statically linked. Due to this, cupy.cuda.runtime.runtimeGetVersion()
always returns the version of CUDA Runtime that CuPy is built with, regardless of the version of CUDA Runtime installed locally. If you need to retrieve the version of CUDA Runtime shared library installed locally, use cupy.cuda.get_local_runtime_version()
instead.
📝 Changes
New Features
- Port
lombscargle
from cuSignal tocupyx.scipy.signal
(#7563) - Port
periodogram
,welch
andcsd
from cuSignal tocupyx.signal
(#7564) - Port
cusignal
windows module tocupyx.scipy.signal
(#7568) - Add
cupy.lib.stride_tricks.sliding_window_view
(#7575) -
cupyx/scipy/signal
: add place poles (#7666) - Add
check_{NOLA, COLA}
tocupyx.scipy.signal
(#7675) - Port
argrel{extrema, max, min}
tocupyx.scipy.signal from cusignal
(#7694) - Port
waveforms
from cusignal tocupyx.scipy.signal
(#7696) - Port
wavelets
module from cusignal tocupyx.scipy.signal
(#7700) - Add 2D signal b-splines to
cupyx.scipy.signal
(#7721) - Port
firwin/firwin2
from cuSignal (#7722) - port
upfirdn
from cuSignal (#7749) - Support boolean COO sparse matrix (#7764)
- Port
gauss_spline
from cuSignal (#7837) - Port
stft/istft
from CuSignal tocupyx.scipy.signal
(#7838) - Port
vectorstrength
,coherence
andspectrogram
from CuSignal tocupyx.scipy.signal
(#7853) - Port
decimate
,resample
andresample_poly
from cuSignal tocupyx.scipy.signal
(#7855) - Add
max_len_seq
tocupyx.scipy.signal
(#7867) - Add distributed ndarray (#7942)
Enhancements
- Implement axis parameter on cupy.unique (#6886)
- Load cuTENSOR from wheel distribution (#7025)
- Soft link NVRTC for
cupy_backends.cuda.libs.nvrtc
(#7621) - Add a property to get access to the nccl handle. (#7823)
- Remove
cusolver_enabled
,cub_enabled
,thrust_enabled
flags (#7840) - Lazy import cuSOLVER (#7843)
- Lazy import cuSPARSE (#7847)
- Lazy import cuFFT (#7849)
- Static link to CUDA Runtime in CUB module (#7850)
- Bundle CCCL in CuPy (#7851)
- Lazy import cuRAND (#7856)
- Use NVRTC for compiling kernels calling
cupyx.jit.cub
APIs (#7869) - Add optional argument
device_id=-1
toget_current_stream
(#7885) - Prohibit conversion from Variable to Python scalar in fusion (#7887)
- Add
__slots__
tocupy.ndarray
(#7891) - Lazy import cuBLAS (#7921)
- Allow Jitify to only cache CuPy-owned headers (#7934)
- Ensure D2H copies are stream ordered and by default blocking (#7938)
- Accelerate H2D copies when the source is on pinned memory (#7939)
- Add Linux CI for Python 3.12 (#7940)
- MNT: Suppress CUB compilation warnings (#7943)
- Static link CUDA Runtime (#7954)
- Add debug feature to preloading and softlink (#7977)
- Support cuTensor 2.0 (#7984)
- Bump supported NumPy & SciPy versions (#7992)
- Softlink CUDA Driver (#7994)
- Show local runtime version in
cupy.show_config()
(#7995) - Avoid using
numpy.find_common_type
(#7651) - ENH: Remove
NINF
,PINF
,Inf
,... usages (#7800) - Fix
cupy.empty_like
parameter name toprototype
(#7827) - Make
stream
kwonly argument inndarray.__dlpack__
(#7829) - Remove conversions of array with ndim > 0 to a scalar (#7886)
-
scipy.linalg.{tri/tril/triu}
are deprecated in SciPy 1.11.0 (#7889) - Fix
signal.medfilt
complex error type for SciPy>=1.11 (#7890) - Fix
cupyx.scipy.sparse._base
tests for SciPy 1.11 (#7905) - Fix return type of division of csr_matrix and dense array for SciPy 1.11 (#7906)
- Fix
maxiter
inTestLOBPCG
(#7908)
Performance Improvements
- Optimize
spmatrix._set_many
(#7888)
Bug Fixes
- Fix csr2dense to avoid race conditions (#7724)
- Fix cuTENSOR contraction descriptor cache (#7814)
- Fix handling of scalars in cupy.r_ (#7815)
- Fix
cupy.r_
for scalar inputs (#7896) - Fixed Improper Method Call: Replaced
NotImplementedError
withNotImplemented
(#7900) - Provide .stop() method for cupyx.distributed._Backend (#7952)
- Fix
NVRTCError
not callinginitialize()
(#7955) - Import cupyx.lapack inside cupy.linalg.solve (#7966)
- Add lazy load for
cupyx.lapack
(#7993) - Fix issues with the initial state when a SOS filter has no IIR part (#7998)
- Avoid using
pkg_resources
for cuTENSOR wheel discovery (#8012)
Code Fixes
- MNT: suppress compiler warning from
cupyx.cusolver
(#7714) - Add type annotation in _creation.basic (#7739)
- Fix nvrtc initialize not inlined for CUDA Python (#7842)
- Fix coding style (#7844)
- Reorganize directory structure around CCCL (#7920)
- Remove deprecated ast expr in CuPy JIT (#7941)
- Reorganize third party code under
third_party
directory (#7956)
Documentation
- Add
-U
to pre-release installation command (#7803) - Fix
get_window
docstring reference (#7835) - Clarify sparse .transpose() return type in docstrings (#7868)
- DOC: cupyx/scipy: add missing names (#7898)
- Fix CUDA 12.2 for Windows notice (#7922)
- Bump CuPy version in install.rst (#8002)
- Update installation guide to note about cuTENSOR 2.0 support (#8003)
- Update wheels list in README (#8006)
Installation
- Avoid warning when uploading packages (#7792)
- Fix ROCm Dockerfile not working (#7797)
- Add cuSignal license (#7816)
- Improve symlink handling and preflight (#7945)
- Bump docker cuda version to 12 (#7973)
Tests
- Add timeout to Windows CI (#7775)
- Fix mypy not installed in pre-review test (#7832)
- Execution tests for typing tests passing rows in
typing_tests
(#7836) - CI: Remove path length limitation on Windows CI image (#7857)
- Fix Windows CI failures (#7862)
- Skip test_pos_boolarray if numpy>=1.25 (#7893)
- Add NumPy 1.25/1.26 & SciPy 1.11 to CI (#7897)
- Skip some LOBPCG tests failing with SciPy 1.11 (#7924)
- Support Python 3.12, add Windows CI (#7947)
- Skip logspace test in NumPy 1.25 & 1.26 (#7946) (#7948)
- Fix Windows test scripts (#7957)
- Skip
test_parameterize_pytest_impl
test for pytest 7.4.3 (#7965) - Fix
TestLOBPCG.test_maxit_None
CUDA 12.2 CI failure (#8000)
Others
- Fix publish workflow permission and output for review (#7788)
- Fix backport workflow (#7831)
- Avoid triggering Project Updates for updates from assignees (#7861)
- Bump version to v13.0.0rc1 (#8015)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @emcastillo @ev-br @fazledyn-or @kerry-vorticity @kmaehashi @leofang @loganbvh @milesvant @mtsokol @mvnvidia @negin513 @shino16 @takagi
1、 cupy-13.0.0rc1.tar.gz 3.34MB
2、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl 102.12MB
3、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 90.83MB
4、 cupy_cuda11x-13.0.0rc1-cp310-cp310-win_amd64.whl 71.34MB
5、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_aarch64.whl 103.37MB
6、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 91.35MB
7、 cupy_cuda11x-13.0.0rc1-cp311-cp311-win_amd64.whl 71.31MB
8、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_aarch64.whl 102.39MB
9、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 91.18MB
10、 cupy_cuda11x-13.0.0rc1-cp312-cp312-win_amd64.whl 71.25MB
11、 cupy_cuda11x-13.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl 102.92MB
12、 cupy_cuda11x-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 91.55MB
13、 cupy_cuda11x-13.0.0rc1-cp39-cp39-win_amd64.whl 71.49MB
14、 cupy_cuda12x-13.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl 95.62MB
15、 cupy_cuda12x-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 84.22MB
16、 cupy_cuda12x-13.0.0rc1-cp310-cp310-win_amd64.whl 64.67MB
17、 cupy_cuda12x-13.0.0rc1-cp311-cp311-manylinux2014_aarch64.whl 96.87MB
18、 cupy_cuda12x-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 84.74MB
19、 cupy_cuda12x-13.0.0rc1-cp311-cp311-win_amd64.whl 64.64MB
20、 cupy_cuda12x-13.0.0rc1-cp312-cp312-manylinux2014_aarch64.whl 95.89MB
21、 cupy_cuda12x-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 84.55MB
22、 cupy_cuda12x-13.0.0rc1-cp312-cp312-win_amd64.whl 64.58MB
23、 cupy_cuda12x-13.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl 96.43MB
24、 cupy_cuda12x-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 84.94MB
25、 cupy_cuda12x-13.0.0rc1-cp39-cp39-win_amd64.whl 64.82MB
26、 cupy_rocm_4_3-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 39.24MB
27、 cupy_rocm_4_3-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 39.73MB
28、 cupy_rocm_4_3-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 39.49MB
29、 cupy_rocm_4_3-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 39.88MB
30、 cupy_rocm_5_0-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 57.14MB
31、 cupy_rocm_5_0-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 57.64MB
32、 cupy_rocm_5_0-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 57.4MB
33、 cupy_rocm_5_0-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 57.78MB