v0.1.0b2

版本发布时间: 2023-12-21 21:11:05

huggingface/optimum-nvidia最新发布版本:v0.1.0b8(2024-09-17 21:09:22)

This release is meant to focus on improving the previous one with additional test coverage, bug fixes and more usability improvements

TensorRT-LLM

Updated TensorRT-LLM to version f7eca56161d496cbd28e8e7689dbd90003594bd2

Improvements

Generally improve unittest coverage
Initial documentation and updated build instructions
The prebuilt container now supports Volta and Tesla (experimental) architectures for V100 and T4 GPUs
More in-depth usage of TensortRT-LLM Runtime Python C++ binding

Bug Fixes

Fixed an issue with pipeline returning only the first output when provided with a batch
Fixed an issue with bfloat16 conversion not loading weights in the right formats for the TRT Engine builder
Fixed an issue with non Multi Heads Attention setup where the heads were not replicated with the proper factor

Engine Builder changes

RMSNorm plugin is now being deprecated by Nvidia for performance reasons so we will not attempt to enable it anymore

Model Support

Mistral familly of model should theorically work but currently it is not being extensively tested through our CI/CD. We plan to add official support in the next release

What's Changed

bump trt llm version to 0.6.1 by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/27
Fix issue returning only the first batch item after pipeline call. by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/29
Update README.md by @eltociear in https://github.com/huggingface/optimum-nvidia/pull/31
Missing comma in setup.py by @IlyasMoutawwakil in https://github.com/huggingface/optimum-nvidia/pull/19
Quality by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/30
Fix typo by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/40
Update to latest trtllm f7eca56161d496cbd28e8e7689dbd90003594bd2 by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/41
Enable more SM architectures in the prebuild docker by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/35
Add initial set of documentation to build the optimum-nvidia container by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/39
Fix caching for docker by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/15
Initial set of unittest in CI by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/43
Build from source instructions by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/38
Enable testing on GPUs by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/45
Enable HF Transfer in tests by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/51
Let's make sure to use the repeated heads tensor when in a non-mha scenario by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/48
Bump version to 0.1.0b2 by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/53
Add more unittest by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/52
Disable RMSNorm plugin as deprecated for performance reasons by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/55
Rename LLamaForCausalLM to LlamaForCausalLM to match transformers by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/54
AutoModelForCausalLM instead of LlamaForCausalLM by @laikhtewari in https://github.com/huggingface/optimum-nvidia/pull/24
Use the new runtime handled allocation by @mfuntowicz in https://github.com/huggingface/optimum-nvidia/pull/46

New Contributors

@eltociear made their first contribution in https://github.com/huggingface/optimum-nvidia/pull/31
@IlyasMoutawwakil made their first contribution in https://github.com/huggingface/optimum-nvidia/pull/19

Full Changelog: https://github.com/huggingface/optimum-nvidia/compare/v0.1.0b1...v0.1.0b2

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-12-21发行的版本