v1.4.0

版本发布时间: 2020-07-18 04:24:43

microsoft/onnxruntime最新发布版本:v1.19.2(2024-09-05 03:33:14)

Key Updates

Performance optimizations for Transformer models
- GPT2 - Enable optimizations for Attention with Past State and Attention Mask
- BERT - Improve EmbedLayerNormalization fusion coverage
Quantization updates
- Added new quantization operators: QLinearAdd, QAttention
- Improved quantization performance for transformer based models on CPU
  - More graph fusion
  - Further optimization in MLAS kernel
  - Introduced pre-packing for constant Matrix B of DynamicQuantizeMatMul and Qattention
New Python IOBinding APIs (bind_cpu_input, bind_output, copy_outputs_to_cpu) allow easier benchmarking
- Users no longer need to allocate inputs and outputs on non-CPU devices using third-party allocators.
- Users no longer need to copy inputs to non-CPU devices; ORT handles the copy.
- Users can now use copy_outputs_to_cpu to copy outputs from non-CPU devices to CPU for verification.
CUDA support for Einsum (opset12)
ONNX Runtime Training updates
- Opset 12 support
- New sample for training experiment using Huggingface GPT-2.
  - Upgraded docker image built from the latest PyTorch release
Telemetry is now enabled by default for Python packages and Github release zip files (C API); see more details on what/how telemetry is collected in ORT
[Coming soon] Availability of Python package for ONNX Runtime 1.4 for Jetpack 4.4

New Execution Providers available for preview:

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members: