v1.13
版本发布时间: 2022-07-27 17:04:41
intel/neural-compressor最新发布版本:v2.6(2024-06-14 21:55:11)
Features
-
Quantization
- Support new quantization APIs for Intel TensorFlow
- Support FakeQuant (QDQ) quantization format for ITEX
- Improve INT8 quantization recipes for ONNX Runtime
-
Mixed Precision
- Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
-
Neural Architecture Search
- Support SuperNet-based neural architecture search (DyNAS)
-
Sparsity
- Support training for block-wise structured sparsity
-
Strategy
- Support operator-type based tuning strategy
Productivity
- Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
- Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
- Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry
Ecosystem
- Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
- Upstream 10 quantized Transformers-based models to HuggingFace Model Hub
Examples
- Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
- Add more quantization examples from TensorFlow Model Zoo
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.7, 2.8, 2.9
- Intel TensorFlow 2.7, 2.8, 2.9
- PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
- IPEX 1.10.0, 1.11.0, 1.12.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.9.0, 1.10.0, 1.11.0