v0.1.0b2
版本发布时间: 2023-12-21 21:11:05
huggingface/optimum-nvidia最新发布版本:v0.1.0b8(2024-09-17 21:09:22)
This release is meant to focus on improving the previous one with additional test coverage, bug fixes and more usability improvements
TensorRT-LLM
- Updated TensorRT-LLM to version f7eca56161d496cbd28e8e7689dbd90003594bd2
Improvements
- Generally improve unittest coverage
- Initial documentation and updated build instructions
- The prebuilt container now supports Volta and Tesla (experimental) architectures for V100 and T4 GPUs
- More in-depth usage of TensortRT-LLM Runtime Python C++ binding
Bug Fixes
- Fixed an issue with pipeline returning only the first output when provided with a batch
- Fixed an issue with
bfloat16
conversion not loading weights in the right formats for the TRT Engine builder - Fixed an issue with non Multi Heads Attention setup where the heads were not replicated with the proper factor
Engine Builder changes
- RMSNorm plugin is now being deprecated by Nvidia for performance reasons so we will not attempt to enable it anymore
Model Support
- Mistral familly of model should theorically work but currently it is not being extensively tested through our CI/CD. We plan to add official support in the next release
What's Changed
- bump trt llm version to 0.6.1 by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/27
- Fix issue returning only the first batch item after pipeline call. by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/29
- Update README.md by @eltociear in https://github.com/huggingface/optimum-nvidia/pull/31
- Missing comma in setup.py by @IlyasMoutawwakil in https://github.com/huggingface/optimum-nvidia/pull/19
- Quality by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/30
- Fix typo by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/40
- Update to latest trtllm f7eca56161d496cbd28e8e7689dbd90003594bd2 by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/41
- Enable more SM architectures in the prebuild docker by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/35
- Add initial set of documentation to build the
optimum-nvidia
container by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/39 - Fix caching for docker by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/15
- Initial set of unittest in CI by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/43
- Build from source instructions by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/38
- Enable testing on GPUs by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/45
- Enable HF Transfer in tests by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/51
- Let's make sure to use the repeated heads tensor when in a non-mha scenario by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/48
- Bump version to 0.1.0b2 by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/53
- Add more unittest by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/52
- Disable RMSNorm plugin as deprecated for performance reasons by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/55
- Rename LLamaForCausalLM to LlamaForCausalLM to match transformers by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/54
- AutoModelForCausalLM instead of LlamaForCausalLM by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/24
- Use the new runtime handled allocation by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/46
New Contributors
- @eltociear made their first contribution in https://github.com/huggingface/optimum-nvidia/pull/31
- @IlyasMoutawwakil made their first contribution in https://github.com/huggingface/optimum-nvidia/pull/19
Full Changelog: https://github.com/huggingface/optimum-nvidia/compare/v0.1.0b1...v0.1.0b2