MyGit

v11.0.0b2

cupy/cupy

版本发布时间: 2022-04-27 15:44:54

cupy/cupy最新发布版本:v13.3.0(2024-08-22 15:42:45)

This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

JIT Improvements (#6620, #6640, #6649, #6668)

CuPy JIT has been further enhanced thanks to @leofang and @eternalphane! It is now possible to use CUDA cooperative groups and access .shape and .strides attributes of ndarrays.

import cupy
from cupyx import jit

@jit.rawkernel()
def kernel(x, y):
    size = x.shape[0]
    ntid = jit.gridDim.x * jit.blockDim.x
    tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
    for i in range(tid, size, ntid):
        y[i] = x[i]
    g = jit.cg.this_thread_block()
    g.sync()

x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)

print(kernel.cached_code)

The above program emits the CUDA code as follows:

#include <cooperative_groups.h>
namespace cg = cooperative_groups;

extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
  ptrdiff_t i;
  ptrdiff_t size = thrust::get<0>(x.get_shape());
  unsigned int ntid = (gridDim.x * blockDim.x);
  unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
  for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
    i = __it;
    y[i] = x[i];
  }
  cg::thread_block g = cg::this_thread_block();
  g.sync();
}

Initial MPI and sparse matrix support in cupyx.distributed (#6628, #6658)

CuPy v10 added the cupyx.distributed API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse. Currently only send/recv primitives are supported but we will be adding support for collective calls in the following releases.

Additionally, now it is possible to use MPI (through the mpi4py python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.

# run with mpiexec -n N python …

import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()

comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Documentation

Installation

Tests

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar

相关地址:原始地址 下载(tar) 下载(zip)

1、 cupy_cuda102-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 60.59MB

2、 cupy_cuda102-11.0.0b2-cp310-cp310-manylinux2014_aarch64.whl 34.84MB

3、 cupy_cuda102-11.0.0b2-cp310-cp310-win_amd64.whl 42.51MB

4、 cupy_cuda102-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 59.06MB

5、 cupy_cuda102-11.0.0b2-cp37-cp37m-manylinux2014_aarch64.whl 33.14MB

6、 cupy_cuda102-11.0.0b2-cp37-cp37m-win_amd64.whl 42.42MB

7、 cupy_cuda102-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 62.25MB

8、 cupy_cuda102-11.0.0b2-cp38-cp38-manylinux2014_aarch64.whl 36.29MB

9、 cupy_cuda102-11.0.0b2-cp38-cp38-win_amd64.whl 42.51MB

10、 cupy_cuda102-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 60.51MB

11、 cupy_cuda102-11.0.0b2-cp39-cp39-manylinux2014_aarch64.whl 34.79MB

12、 cupy_cuda102-11.0.0b2-cp39-cp39-win_amd64.whl 42.51MB

13、 cupy_cuda110-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 75.21MB

14、 cupy_cuda110-11.0.0b2-cp310-cp310-win_amd64.whl 57.09MB

15、 cupy_cuda110-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 73.68MB

16、 cupy_cuda110-11.0.0b2-cp37-cp37m-win_amd64.whl 57MB

17、 cupy_cuda110-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 76.87MB

18、 cupy_cuda110-11.0.0b2-cp38-cp38-win_amd64.whl 57.09MB

19、 cupy_cuda110-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 75.14MB

20、 cupy_cuda110-11.0.0b2-cp39-cp39-win_amd64.whl 57.09MB

21、 cupy_cuda111-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 94MB

22、 cupy_cuda111-11.0.0b2-cp310-cp310-win_amd64.whl 76.83MB

23、 cupy_cuda111-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 92.46MB

24、 cupy_cuda111-11.0.0b2-cp37-cp37m-win_amd64.whl 76.74MB

25、 cupy_cuda111-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 95.66MB

26、 cupy_cuda111-11.0.0b2-cp38-cp38-win_amd64.whl 76.84MB

27、 cupy_cuda111-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 93.92MB

28、 cupy_cuda111-11.0.0b2-cp39-cp39-win_amd64.whl 76.83MB

29、 cupy_cuda112-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 75.63MB

30、 cupy_cuda112-11.0.0b2-cp310-cp310-win_amd64.whl 57.58MB

31、 cupy_cuda112-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 74.09MB

32、 cupy_cuda112-11.0.0b2-cp37-cp37m-win_amd64.whl 57.49MB

33、 cupy_cuda112-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 77.29MB

34、 cupy_cuda112-11.0.0b2-cp38-cp38-win_amd64.whl 57.58MB

35、 cupy_cuda112-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 75.55MB

36、 cupy_cuda112-11.0.0b2-cp39-cp39-win_amd64.whl 57.58MB

37、 cupy_cuda113-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 72.8MB

38、 cupy_cuda113-11.0.0b2-cp310-cp310-win_amd64.whl 54.31MB

39、 cupy_cuda113-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 71.27MB

40、 cupy_cuda113-11.0.0b2-cp37-cp37m-win_amd64.whl 54.22MB

41、 cupy_cuda113-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 74.47MB

42、 cupy_cuda113-11.0.0b2-cp38-cp38-win_amd64.whl 54.31MB

43、 cupy_cuda113-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 72.73MB

44、 cupy_cuda113-11.0.0b2-cp39-cp39-win_amd64.whl 54.31MB

45、 cupy_cuda114-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 81.28MB

46、 cupy_cuda114-11.0.0b2-cp310-cp310-win_amd64.whl 63MB

47、 cupy_cuda114-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 79.75MB

48、 cupy_cuda114-11.0.0b2-cp37-cp37m-win_amd64.whl 62.91MB

49、 cupy_cuda114-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 82.94MB

50、 cupy_cuda114-11.0.0b2-cp38-cp38-win_amd64.whl 63.01MB

51、 cupy_cuda114-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 81.2MB

52、 cupy_cuda114-11.0.0b2-cp39-cp39-win_amd64.whl 63MB

53、 cupy_cuda115-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 78MB

54、 cupy_cuda115-11.0.0b2-cp310-cp310-win_amd64.whl 59.68MB

55、 cupy_cuda115-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 76.46MB

56、 cupy_cuda115-11.0.0b2-cp37-cp37m-win_amd64.whl 59.59MB

57、 cupy_cuda115-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 79.66MB

58、 cupy_cuda115-11.0.0b2-cp38-cp38-win_amd64.whl 59.69MB

59、 cupy_cuda115-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 77.92MB

60、 cupy_cuda115-11.0.0b2-cp39-cp39-win_amd64.whl 59.68MB

61、 cupy_cuda116-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 78.04MB

62、 cupy_cuda116-11.0.0b2-cp310-cp310-win_amd64.whl 59.7MB

63、 cupy_cuda116-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 76.5MB

64、 cupy_cuda116-11.0.0b2-cp37-cp37m-win_amd64.whl 59.61MB

65、 cupy_cuda116-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 79.7MB

66、 cupy_cuda116-11.0.0b2-cp38-cp38-win_amd64.whl 59.71MB

67、 cupy_cuda116-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 77.96MB

68、 cupy_cuda116-11.0.0b2-cp39-cp39-win_amd64.whl 59.7MB

69、 cupy_rocm_4_2-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 34.64MB

70、 cupy_rocm_4_2-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 33.31MB

71、 cupy_rocm_4_2-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 36.11MB

72、 cupy_rocm_4_2-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 34.56MB

73、 cupy_rocm_4_3-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 36.22MB

74、 cupy_rocm_4_3-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 34.9MB

75、 cupy_rocm_4_3-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 37.7MB

76、 cupy_rocm_4_3-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 36.15MB

77、 cupy_rocm_5_0-11.0.0b2-cp310-cp310-manylinux1_x86_64.whl 54.29MB

78、 cupy_rocm_5_0-11.0.0b2-cp37-cp37m-manylinux1_x86_64.whl 52.96MB

79、 cupy_rocm_5_0-11.0.0b2-cp38-cp38-manylinux1_x86_64.whl 55.77MB

80、 cupy_rocm_5_0-11.0.0b2-cp39-cp39-manylinux1_x86_64.whl 54.22MB

查看:2022-04-27发行的版本