v10.0.0
版本发布时间: 2024-04-04 05:45:30
NVIDIA/TensorRT最新发布版本:v10.3.0(2024-08-09 07:23:49)
Key Features and Updates:
- Samples changes
- Parser changes
- Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
- kNATIVE_INSTANCENORM is now set to ON by default.
- Added support for IPluginV3 interfaces from TensorRT.
- Added support for INT4 quantization.
- Added support for the reduction attribute in ScatterElements.
- Added support for wrap padding mode in Pad
- Plugin changes
- A new plugin has been added in compliance with ONNX ScatterElements.
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
- All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
- disentangledAttentionPlugin: Fixed a kernel bug.
- Demo changes
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
- Updated tooling
- Polygraphy v0.49.9
- ONNX-GraphSurgeon v0.5.1
- TensorRT Engine Explorer v0.1.8
- Build Containers
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.