v1.4
版本发布时间: 2024-04-04 00:33:14
intel/intel-extension-for-transformers最新发布版本:v1.4.2(2024-05-24 20:23:38)
Highlights Features Productivity Examples Bug Fixing
Highlights
- AutoRound is SOTA weight-only quantization (WOQ) algorithm for low-bit LLM inference on typical LLMs. This release includes support for AutoRound quantization and inference with INT4 models quantized by AutoRound.
Features
- LLM Workflow/Neural Chat
- Support Triton Serving/Deployment on HPU/GPU (4657036, c57c17e )
- Enable HF/TGI Endpoint (5b84e5, 525ea8, 34b3e9 )
- Enable RAG + ChatGPT flow (de8800 )
- [UI] Customized side by side (5835c3 )
- Support Multi-language TTS (260155a )
- Support language detection & translation for RAG chat (99df35d8 )
- Add file management in RAG API (b7fc01de )
- Support deepspeed for Textchat API (7b0b995 )
- Transformers Extension for LLM Optimization
Productivity
- Add bm25 algorithm into retrievers (a19467d0 )
- Add evaluation perplexity during training (2858ed1 )
- Enhance embedding to support jit model (588c60 )
- Update the character checking function to enable the Chinese character (0da63fe1 )
- Enlarge the context window for HPU graph recompile (dcaf17ac )
- Support IPEX bf16 & fp32 optimization for emebedding model (b51552 )
- Enable lm_eval during training. (2de883 )
- Refine setup.py and requirements.txt (436847 )
- Improve WOQ model saving and loading (30d9d10, 1065d81c )
- Add layerwise for WOQ RTN & GPTQ (15a848f3 )
- Update sparseGPT example (3ae0cd0 )
- Changed regular expression to add support of the unicode characters (fd2516b )
- Check and convert contiguous tensor when model saving (d21bb3e )
- Support load model from modelscope using NeuralSpeed (20ae00 )
Examples
- Support microsoft/biogpt model (3e7e35 )
- Add finetuning example for gemma-2b on ARC. (ffa8f3c6 )
- Add example to use RAG+OpenAI LLM (3c5959 )
- Enable mistralai/Mixtral-8x7B-v0.1 LORA finetuning on Gaudi2 (7539c35 )
- Enable image2text finetuning example on CPU (ef94aeaa )
- Add LLaVA-NeXT (feff1ec0 )
Bug Fixing
- Fix CLM tasks when transformers >= 4.38.1 (98bfcf8 )
- Fix distilgpt2 TF signature issue (a7c15a9f )
- Add User input + max tokens requested exceeds model context window error response (ae91bf8 )
- Fix audio plugin sample code issue and provide a way to set tts/asr model path (db7da09 )
- Fix modeling_auto trust_remote_code issue (3a0987 )
- Fix lm-eval neuralspeed loading model (cd6e488 )
- Fixed weight-only config save issue (5c92fe31 )
- Fix index error in Child-parent retriever (8797cfe )
- Fix WOQ int8 unpack weight (edede4 )
- Fix gptq desc_act and static_group (528d7de )
- Fix request.client=None issue (494a571 )
- Fix WOQ huggingface model loading (01b1a44 )
- Fix SQ model restore loading (1e00f29 )
- Remove redundant parameters for WOQ saving config and fix GPTQ issue (ef0882f6 )
- Fixed exmple error for Intel GPU WOQ (8fdde06 )
- Fix woq autoround last layer quant issue (d21bb3e )
- Fix code-generation params (ab2fd05 )
Validated Configurations
- Python 3.8, 3.9, 3.10, 3.11
- Ubuntu 20.04 & Windows 10
- Intel® Extension for TensorFlow 2.13.0, 2.14.0
- PyTorch 2.2.0+cpu 2.1.0+cpu
- Intel® Extension for PyTorch 2.2.0+cpu, 2.1.0+cpu
Thanks to these Contributors
Thanks for the contribution from dillonalaird, igeni, sramakintel, alexsin368 and huiyan2021
Welcome to contribute to our project and report issues to us.