v1.3.2
版本发布时间: 2024-02-24 13:47:02
intel/intel-extension-for-transformers最新发布版本:v1.4.2(2024-05-24 20:23:38)
Highlights
- Support NeuralChat-TGI serving with Docker (8ebff39)
- Support Neuralchat-vLLM serving with Docker (1988dd)
- Support SQL generation in NeuralChat (098aca7)
- Enable llava mmmu evaluation on Gaudi2 (c30353f)
- Improve LLM INT4 inference on Intel GPUs
Improvements
- Minimize dependencies for running a chatbot (a0c9dfe)
- Remove redundant knowledge id in audio plugin API (9a7353)
- Update parameters for NeuralSpeed (19fec91)
- Integrate backend code of Askdoc (c5d4cd)
- Refine finetuning data preprocessing with static shape for Gaudi2 (3f62ceb)
- Sync RESTful API with latest OpenAI protocol (2e1c79)
- Support WOQ model save and load (1c8078f)
- Extend API for GGUF (7733d4)
- Enable OpenAI compatible audio API (d62ff9e)
- Add pack_weight info acquire interface (18d36ef)
- add customized system prompts (04b2f8)
- Support WOQ scheme asym (c7f0b70)
- update code_lm_eval to bigcode_eval (44f914e)
- enable Retrieval PDF figure to text (d6a66b3)
- enable retrieval then rerank pipeline (15feadf)
- enable gramma check and query polish to enhance RAG performance (a63ec0)
Examples
- Add Rank-One Model Editing (ROME) implementation and example (8dcf0ea7)
- Support GPTQ, AWQ model in NeuralChat (5b08de)
- Add Neural Speed example scripts (6a97d15, 3385c42)
- Add langchain extension example and update notebook (d40e2f1)
- Support deepseek-coder models in NeuralChat (e7f5b1d)
- Add autoround examples (71f5e84)
- BGE embedding model finetuning (67bef24)
- Support DeciLM-7B and DeciLM-7B-instruct in NeuralChat (e6f87ab)
- Support GGUF model in NeuralChat (a53a33c)
Bug Fixing
- Add trust_remote_code args for lm_eval of WOQ example.( 9022eb)
- Fix CPU WOQ accuracy issue (e530f7)
- Change the default value for XPU weight-only quantization (4a78ba)
- Fix whisper forced_decoder_ids error (09ddad)
- Fix off by one error on masking (525076d)
- Fix backprop error for text only examples (9cff14a)
- Use unk token instead of eos token (6387a0)
- Fix errors in trainer save (ff501d0)
- Fix Qdrant bug caused by langchain_core upgrade (eb763e6)
- Set trainer.save_model state_dict format to safetensors (2eca8c)
- Fix text-generation example accuracy scripts (a2cfb80)
- Resolve WOQ quantization error when running neuralchat (6c0bd77)
- Fix response issue of model.predict (3068496)
- Fix pydub library import issues (c37dab)
- Fix chat history issue (7bb3314)
- Update gradio APP to sync with backend change (362b7af)
Validated Configurations
- Python 3.10
- Ubuntu 22.04
- Intel® Extension for TensorFlow 2.13.0
- PyTorch 2.1.0+cpu
- Intel® Extension for Torch 2.1.0+cpu