v0.4.4
版本发布时间: 2022-10-20 20:09:59
nebuly-ai/optimate最新发布版本:chatllama0.0.4(2023-03-27 21:47:20)
nebullvm 0.4.4 Release Notes
This release of Nebullvm provides new optimizers and various improvements in code stability.
New Features
- Update notebooks with new api.
- Improve test coverage.
- Add Intel Neural compressor pruning and quantization.
- The computation of the latency of the models now uses all the data and not only the first sample.
- Dynamic shape of openvino has been updated with the new method available from version 2
- Now the optimized model is discarted if the result is different from the original model (
metric_drop_ths=0
)
Bug fixed
- Fix an issue during onnx quantization, now it's much faster than before.
- Fix a tensor RT bug in static quantization with onnx interface.
- Fixes and improvements on the torchscript compiler: now it supports also trace and torch.fx for tracing the model.
- Fix a bug on macos related to ONNX and int8 quantization.
- Fix a bug on sparseml that prevented it from working on colab.
- Bug-fixes on the deepsparse compiler.
- Fixes and improvements on the onnx internal model handling.
- Fix an issue on tensorflow backend.
- Fixes on torch and onnx tensorrt with transformers.
- Fix a bug on tensor rt static quantization when using a new version of polygraphy
- Fix a bug on huggingface when passing the tokenizer to the optimize_model function
- Fix a bug when using quantization with a few data
Contributors
- Diego Fiori (@morgoth95)
- Valerio Sofi (@valeriosofi)