8.2.0-EA
版本发布时间: 2021-10-06 03:03:02
NVIDIA/TensorRT最新发布版本:v10.3.0(2024-08-09 07:23:49)
TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.
Added
-
Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
-
Einsum
-
IsNan
-
GatherND
-
Scatter
-
ScatterElements
-
ScatterND
-
Sign
-
Round
-
- Added support for building TensorRT Python API on Windows.
Updated
- Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs,
IExecutionContext: getEnqueueEmitsProfile()
,setEnqueueEmitsProfile()
, andreportToProfiler()
which can be used to collect layer profiling info when the inference is launched as a CUDA graph. - Eliminated the global logger; each
Runtime
,Builder
orRefitter
now has its own logger. - Added new operators:
IAssertionLayer
,IConditionLayer
,IEinsumLayer
,IIfConditionalBoundaryLayer
,IIfConditionalOutputLayer
,IIfConditionalInputLayer
, andIScatterLayer
. - Added new
IGatherLayer
modes:kELEMENT
andkND
- Added new
ISliceLayer
modes:kFILL
,kCLAMP
, andkREFLECT
- Added new
IUnaryLayer
operators:kSIGN
andkROUND
- Added new runtime class
IEngineInspector
that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc. -
ProfilingVerbosity
enums have been updated to show their functionality more explicitly.
- Added three new APIs,
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
-
Gather
andGatherElements
implementations to natively support negative indices -
Pad
layer to support ND padding, along withedge
andreflect
padding mode support -
If
layer with general performance improvements.
-
Removed
- Removed
sampleMLP
. - Several flags of trtexec have been deprecated:
-
--explicitBatch
flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically. -
--explicitPrecision
flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode. -
--nvtxMode=[verbose|default|none]
has been deprecated in favor of--profilingVerbosity=[detailed|layer_names_only|none]
to show its functionality more explicitly.
-
Signed-off-by: Rajeev Rao rajeevrao@nvidia.com