0.2.1
版本发布时间: 2024-01-08 22:57:51
open-compass/opencompass最新发布版本:0.3.6(2024-11-19 11:54:28)
We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.
🌟 Highlights:
-
Add Agent and Code datasets: Diverse new datasets like
GPQA
,mastermath2024v1
, and more, significantly expanding the scope of OpenCompass. - Support Different JudgeLLM Subjective Evaluation: Providing more choice when choose judgellms.
- Support Needle in Haystack: Support Needle in Haystack for longtext evaluation.
- Add VLLM Evaluation: We support VLLM inference and evaluation.
Here's what's new:
🚀 New Features:
-
📦 Dataset Expansion:
-
🛠 Functional Enhancements:
-
📖 Documentation Updates:
🐛 Bug Fixes:
- Addressed various issues including those in alignmentbench, configs, and postprocess scripts.
- Fixed bugs concerning subjective evaluation and EOS string detection.
- Quick fixes for improved performance and reliability.
🎉 Welcome New Contributors:
- A warm welcome to our first-time contributors:
- @BBuf, @DseidLi, @Skyfall-xzz, @RunningLeon, @zehuichen123, @AllentDan, @Connor-Shen, @Francis-llgg, @hzhwcmhf, @ChrisLiu6, @yanyc428, @tpoisonooo, @jiangjin1999
🔗 Full Changelog
- add rwkv-5-3b model by @BBuf in https://github.com/open-compass/opencompass/pull/666
- [Feature] Add double order of subjective evaluation and removing duplicated response among two models by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/692
- [Feat] update python action and slurm by @yingfhu in https://github.com/open-compass/opencompass/pull/694
- [Doc] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/698
- alignmentbench infer and judge by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/697
- [Fix] Update alignmentbench by @tonysy in https://github.com/open-compass/opencompass/pull/704
- removed redundant code in GSM8KDataset.load method. by @DseidLi in https://github.com/open-compass/opencompass/pull/700
- [Fix] fix a bug on configs/eval_mixtral_8x7b.py by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/706
- [Doc] Update Doc for Alignbench by @tonysy in https://github.com/open-compass/opencompass/pull/707
- [Fix] minor fix openai by @yingfhu in https://github.com/open-compass/opencompass/pull/711
- Add Judgellms by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/710
- [Feat] Update math/agent by @yingfhu in https://github.com/open-compass/opencompass/pull/716
- [Docs] update docker docs by @yingfhu in https://github.com/open-compass/opencompass/pull/718
- [Fix] Quick fix for max_out_len in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/719
- [Feature] Support the use of humaneval_plus. by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/720
- [Feature] Add reasonbench dataset by @Skyfall-xzz in https://github.com/open-compass/opencompass/pull/577
- [Feature] Add abbr for judgemodel in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/724
- Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend by @RunningLeon in https://github.com/open-compass/opencompass/pull/721
- [News] add news for T-Eval by @zehuichen123 in https://github.com/open-compass/opencompass/pull/727
- Add NeedleInAHaystack Test Support by @DseidLi in https://github.com/open-compass/opencompass/pull/714
- [Fix] Fixed abbr erro of subjective alignbench and size partition by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/730
- add turbomind restful api support by @AllentDan in https://github.com/open-compass/opencompass/pull/693
- [Fix] Update merge script for non-split settting by @tonysy in https://github.com/open-compass/opencompass/pull/733
- [Sync] Sync with internal codes by @Leymore in https://github.com/open-compass/opencompass/pull/734
- [Feature] Add InfiniteBench by @philipwangOvO in https://github.com/open-compass/opencompass/pull/739
- Update LightllmApi and Fix mmlu bug by @helloyongyang in https://github.com/open-compass/opencompass/pull/738
- [Feature] Add other judgelm prompts for Alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/731
- [Feat] support sanitized mbpp dataset by @yingfhu in https://github.com/open-compass/opencompass/pull/745
- [Fix] SubSizePartition fix by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/746
- add chinese version of humaneval, mbpp by @Connor-Shen in https://github.com/open-compass/opencompass/pull/743
- [Fix] fix erro in configs by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/750
- [Feature] Add Creationbench Dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/753
- [Feat] update code config by @yingfhu in https://github.com/open-compass/opencompass/pull/749
- update plot function in tools_needleinahaystack.py by @DseidLi in https://github.com/open-compass/opencompass/pull/747
- [Feature] Add new dataset mastermath2024v1 by @Francis-llgg in https://github.com/open-compass/opencompass/pull/744
- [Feature] Add GPQA Dataset by @Francis-llgg in https://github.com/open-compass/opencompass/pull/729
- change NeedleInAHaystackDataset to dynamic loading by @DseidLi in https://github.com/open-compass/opencompass/pull/754
- [Feature] Add support of Qwen API by @hzhwcmhf in https://github.com/open-compass/opencompass/pull/735
- [Feature] Support LLaMA2-Accessory by @ChrisLiu6 in https://github.com/open-compass/opencompass/pull/732
- [Fix] Fix small bug in alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/764
- [Feature] Add multi_round dataset evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/766
- [Feature] add subject ir dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/755
- [Update] Update introduction of CompassBench-2024-Q1 by @tonysy in https://github.com/open-compass/opencompass/pull/769
- [Fix] quick fix for postprocess by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/771
- Support Mbpp_plus dataset by @Connor-Shen in https://github.com/open-compass/opencompass/pull/770
- [Fix] fix typos in drop prompt by @yanyc428 in https://github.com/open-compass/opencompass/pull/773
- typo(installation.md): fix unzip commands by @tpoisonooo in https://github.com/open-compass/opencompass/pull/774
- Contamination analysis for MMLU, Hellaswag, and ARC_c by @liyucheng09 in https://github.com/open-compass/opencompass/pull/699
- [Docs] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/775
- [Feature] _batch_generate function, add the MultiTokenEOSCriteria by @jiangjin1999 in https://github.com/open-compass/opencompass/pull/772
- [Sync] Sync with internal codes 2023.01.08 by @Leymore in https://github.com/open-compass/opencompass/pull/777
For a full list of updates, visit our Full Changelog.
Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉
Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.