MyGit

v13.0.0rc1

cupy/cupy

版本发布时间: 2023-12-07 15:58:06

cupy/cupy最新发布版本:v13.3.0(2024-08-22 15:42:45)

This is the release note of v13.0.0rc1. See here for the complete list of solved issues and merged PRs.

This is a release candidate of the CuPy v13 series. Please start testing your workload with this release to prepare for the final v13 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v13.

💬 Join the Matrix chat to talk with developers and users and ask quick questions!

🙌 Help us sustain the project by sponsoring CuPy!

✨ Highlights

NVIDIA cuTENSOR 2.0

NVIDIA cuTENSOR is a performant and flexible library for accelerating tensor linear algebra. CuPy v13 supports cuTENSOR 2.0, the latest major release of the library, achieving higher performance than cuTENSOR 1.x series.

NVIDIA RAPIDS cuSignal Integration

cuSignal is a library developed by the NVIDIA RAPIDS project that provides GPU-accelerated implementation of signal processing algorithms using CuPy as a backend. cuSignal includes scipy.signal compatible APIs, so we share the same goals. After a discussion with the cuSignal team, we agreed to merge cuSignal into CuPy to provide users with a better experience using a unified library for SciPy routines on GPU.

Currently, most of the functions provided in cuSignal have been merged into CuPy, and the remaining items are expected to be merged into CuPy v13 in due course.

We would like to acknowledge and thank @awthomp and everyone involved in the cuSignal development for creating a great library and agreeing to this transition.

Distributed NDArray (experimental) (#7881)

Added initial support for sharding ndarrays across multiple GPU devices connected to the same host.

from cupyx.distributed.array import distributed_array

shape = (16, 16)
cpu_array = numpy.random.rand(*shape)
# Set the chunk indexes for each device
# device 0 holds  rows 0..8 and device 1 holds rows 8..16
mapping = {
        0: [(slice(8), slice(None, None))],
        1: [(slice(8, None), slice(None, None))],
}
# The array is allocated in devices 0 and 1
multi_gpu_array = distributed_array(cpu_array, mapping)

This work was done by @shino16 during the Preferred Networks 2023 summer internship.

Support for Python 3.12

Binary packages are now available for Python 3.12.

🛠️ Changes without compatibility

CUDA Runtime API is now statically linked

CuPy is now shipped with CUDA Runtime statically linked. Due to this, cupy.cuda.runtime.runtimeGetVersion() always returns the version of CUDA Runtime that CuPy is built with, regardless of the version of CUDA Runtime installed locally. If you need to retrieve the version of CUDA Runtime shared library installed locally, use cupy.cuda.get_local_runtime_version() instead.

📝 Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

👥 Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @fazledyn-or @kerry-vorticity @kmaehashi @leofang @loganbvh @milesvant @mtsokol @mvnvidia @negin513 @shino16 @takagi

相关地址:原始地址 下载(tar) 下载(zip)

1、 cupy-13.0.0rc1.tar.gz 3.34MB

2、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl 102.12MB

3、 cupy_cuda11x-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 90.83MB

4、 cupy_cuda11x-13.0.0rc1-cp310-cp310-win_amd64.whl 71.34MB

5、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_aarch64.whl 103.37MB

6、 cupy_cuda11x-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 91.35MB

7、 cupy_cuda11x-13.0.0rc1-cp311-cp311-win_amd64.whl 71.31MB

8、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_aarch64.whl 102.39MB

9、 cupy_cuda11x-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 91.18MB

10、 cupy_cuda11x-13.0.0rc1-cp312-cp312-win_amd64.whl 71.25MB

11、 cupy_cuda11x-13.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl 102.92MB

12、 cupy_cuda11x-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 91.55MB

13、 cupy_cuda11x-13.0.0rc1-cp39-cp39-win_amd64.whl 71.49MB

14、 cupy_cuda12x-13.0.0rc1-cp310-cp310-manylinux2014_aarch64.whl 95.62MB

15、 cupy_cuda12x-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 84.22MB

16、 cupy_cuda12x-13.0.0rc1-cp310-cp310-win_amd64.whl 64.67MB

17、 cupy_cuda12x-13.0.0rc1-cp311-cp311-manylinux2014_aarch64.whl 96.87MB

18、 cupy_cuda12x-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 84.74MB

19、 cupy_cuda12x-13.0.0rc1-cp311-cp311-win_amd64.whl 64.64MB

20、 cupy_cuda12x-13.0.0rc1-cp312-cp312-manylinux2014_aarch64.whl 95.89MB

21、 cupy_cuda12x-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 84.55MB

22、 cupy_cuda12x-13.0.0rc1-cp312-cp312-win_amd64.whl 64.58MB

23、 cupy_cuda12x-13.0.0rc1-cp39-cp39-manylinux2014_aarch64.whl 96.43MB

24、 cupy_cuda12x-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 84.94MB

25、 cupy_cuda12x-13.0.0rc1-cp39-cp39-win_amd64.whl 64.82MB

26、 cupy_rocm_4_3-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 39.24MB

27、 cupy_rocm_4_3-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 39.73MB

28、 cupy_rocm_4_3-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 39.49MB

29、 cupy_rocm_4_3-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 39.88MB

30、 cupy_rocm_5_0-13.0.0rc1-cp310-cp310-manylinux2014_x86_64.whl 57.14MB

31、 cupy_rocm_5_0-13.0.0rc1-cp311-cp311-manylinux2014_x86_64.whl 57.64MB

32、 cupy_rocm_5_0-13.0.0rc1-cp312-cp312-manylinux2014_x86_64.whl 57.4MB

33、 cupy_rocm_5_0-13.0.0rc1-cp39-cp39-manylinux2014_x86_64.whl 57.78MB

查看:2023-12-07发行的版本