v2.4.2
版本发布时间: 2023-05-08 16:33:39
PaddlePaddle/Paddle最新发布版本:v3.0.0-beta0(2024-06-27 18:00:34)
2.4.2 Release Note
版本修复了已知问题,并新增了少量功能。
训练框架(含分布式)
-
修复 paddle.utils.dlpack.to_dlpack 在 for 循环里 API 多次创建 dlpack 对象的报错问题,修复引用对象计数错误导致 dlpack 实际指向内容被析构的问题。 #50138
-
修复 paddle.multiplex API 在多维 Input Tensor 场景下访存越界的问题并添加 check 机制。 #49368
-
引入 cutlass,实现 gemm+gather+scatter 的融合;优化 sparse conv 的训练和推理性能;优化 batch_norm 在 1D 输入数据下的推理性能。 #50118
-
修复因使用 constexpr 导致 gcc54 环境下编译失败的问题。 #50421
-
将 sum op 的 Kernel 迁移到 PHI 算子库,并且修复 infermeta 中 SelectedRows 无法获取正确 dim 的 bug。 #49342
-
修复 eigen 头文件错误引用导致的偶发编译错误。 #48157
-
修复 fold 算子在大 bs 输入下访存越界的问题。 #49491
-
通过增加类型判别,解决发送张量时,维度不统一,造成流水线并行 hang 住的问题。 #50337
-
修复了自定义算子输出梯度的参数顺序不连续时,反向算子的输出值可能为 None 的 bug。 #48656
-
修复 paddle.queeze_ API 在 inplace 操作时 shape 重复修改导致结果错误 bug。 #49903
-
修复动转静模式下无参数 Layer 无法调用 backward 的问题。 #49812
-
修复 CUDA11.8 在 windows 的编译问题。 #50205
-
修复
FusedDropoutActBiasGrad
在 H100 上不支持的错误。 #47285 -
新增
debug_graphviz_path
选项至build_strategy
。 #46531 -
修复未关闭的
popen
物件。 #47053
部署方向(Paddle Inference)
-
完善混合精度推理功能,提高混合精度推理稳定性。重构二阶段式 convert_to_mixed_precision 接口底层实现, enable_use_gpu 新增 precision 参数支持一阶段式。 #49077、#49239、#49477
-
支持 jetson ampere 架构下编译。 #49364
-
修复 fc kernel 低精度模式下的精度问题。 #49781
-
修复 CAPI 下, trt workspace 参数类型的错误。 #48350
-
修复 Paddle 1.x 版本下 arg_max arg_min 没有 flatten dtype 参数,推理时会报错的问题。 #49771
-
修复 split infermeta 重构后关于 lod 逻辑信息缺失问题。 #49745
-
修复常量折叠 pass 不正确设置,导致 conv2d 权重经折叠后为非 persistable 而没有进入 TensorRT engine 问题。 #50105
2.4.2 Release Note
V2.4.2 fixed known bugs, and added a tiny set of features.
Training Framework (distributed included)
-
Fix the problem while using paddle.utils.dlpack.to_dlpack API to create dlpack objects multiple times in the for loop, and fix the bug that the reference counting error causes the memory actually pointed by dlpack to be destructed unexpectedly. #50138
-
Fixed the issue of out-of-bounds memory access when the input tensor is multi-dimensional in paddle.multiplex API. #49368
-
Fix the occasional compilation error caused by incorrect referencing of the Eigen header file. #48157
-
Fixed the bug that the output value of the backward operator may be None when the output gradient parameter order of the custom operator is not continuous.#48656
-
Add cutlass and implement the fusion kernel of gather+gemm+scatter; Optimize training and inference performance of sparse convolution; Optimize inference performance of batch_norm under 1D input data.#50118
-
Fix compilation failure in gcc54 environment caused by using constexpr. #50421
-
Move sum op kernel to PHI and fix bug that can't get correct SelectedRows' dims when run infermeta.#49342
-
Fixed the issue that the fold operator accesses memory out of bounds under large bs input.#49491
-
Fix the problem that no parameter Layer cannot call backward under dynamic to static mode.#49812
-
Fix the compile problem of CUDA11.8 on windows platform.#50205
-
Fix the unsupported error for
FusedDropoutActBiasGrad
on H100.#47285 -
Add
debug_graphviz_path
option intobuild_strategy
.#46531 -
Fix the not closed
popen
object.#47053
Deployment Direction (Paddle Inference)
-
Improve the functionality and stability of mixed-precision inference. Reconstruct the implementation of interface convert_to_mixed_precision and add parameter precision to interface enable_use_gpu.#49077、#49239、#49477
-
Support compilation under jetson ampere architecture.#49364
-
Fixed fc kernel diff.#49781
-
Fixed the error of trt workspace parameter type under CAPI. #48350
-
Fixed the error caused by arg_max/arg_min without flatten dtype parameter in Paddle 1.x version. #49771
-
Fixed the bug of missing information about lod logic after split infermeta's refactoring. #49745
-
Fixed the bug of the constant-folding pass, which causes the conv2d weight to be non-persistent after folding and not enter the TensorRT engine. #50105