0.2.1

版本发布时间: 2024-01-08 22:57:51

open-compass/opencompass最新发布版本:0.3.8(2024-12-17 19:57:21)

We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.

🌟 Highlights:

Add Agent and Code datasets: Diverse new datasets like GPQA, mastermath2024v1, and more, significantly expanding the scope of OpenCompass.
Support Different JudgeLLM Subjective Evaluation: Providing more choice when choose judgellms.
Support Needle in Haystack: Support Needle in Haystack for longtext evaluation.
Add VLLM Evaluation: We support VLLM inference and evaluation.

Here's what's new:

🚀 New Features:

📦 Dataset Expansion:
- Added rwkv-5-3b model (#666)
- Integration of diverse datasets including GPQA, Creationbench, and more.
- Support for new datasets like mastermath2024v1, mbpp_plus, and sanitized_mbpp (#744, #770, #745)
🛠 Functional Enhancements:
- Subjective evaluation improvements (#692, #724)
- Updated python action, slurm, and docker docs (#694, #718)
- Turbomind API support and Qwen API integration (#693, #735)
📖 Documentation Updates:
- Updated contamination, alignmentbench, and other docs for better clarity (#698, #707)
- Fixed dead links and typos in various documents (#455, #773, #774)

🐛 Bug Fixes:

Addressed various issues including those in alignmentbench, configs, and postprocess scripts.
Fixed bugs concerning subjective evaluation and EOS string detection.
Quick fixes for improved performance and reliability.

🎉 Welcome New Contributors:

A warm welcome to our first-time contributors:
- @BBuf, @DseidLi, @Skyfall-xzz, @RunningLeon, @zehuichen123, @AllentDan, @Connor-Shen, @Francis-llgg, @hzhwcmhf, @ChrisLiu6, @yanyc428, @tpoisonooo, @jiangjin1999

🔗 Full Changelog

add rwkv-5-3b model by @BBuf in https://github.com/open-compass/opencompass/pull/666
[Feature] Add double order of subjective evaluation and removing duplicated response among two models by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/692
[Feat] update python action and slurm by @yingfhu in https://github.com/open-compass/opencompass/pull/694
[Doc] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/698
alignmentbench infer and judge by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/697
[Fix] Update alignmentbench by @tonysy in https://github.com/open-compass/opencompass/pull/704
removed redundant code in GSM8KDataset.load method. by @DseidLi in https://github.com/open-compass/opencompass/pull/700
[Fix] fix a bug on configs/eval_mixtral_8x7b.py by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/706
[Doc] Update Doc for Alignbench by @tonysy in https://github.com/open-compass/opencompass/pull/707
[Fix] minor fix openai by @yingfhu in https://github.com/open-compass/opencompass/pull/711
Add Judgellms by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/710
[Feat] Update math/agent by @yingfhu in https://github.com/open-compass/opencompass/pull/716
[Docs] update docker docs by @yingfhu in https://github.com/open-compass/opencompass/pull/718
[Fix] Quick fix for max_out_len in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/719
[Feature] Support the use of humaneval_plus. by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/720
[Feature] Add reasonbench dataset by @Skyfall-xzz in https://github.com/open-compass/opencompass/pull/577
[Feature] Add abbr for judgemodel in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/724
Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend by @RunningLeon in https://github.com/open-compass/opencompass/pull/721
[News] add news for T-Eval by @zehuichen123 in https://github.com/open-compass/opencompass/pull/727
Add NeedleInAHaystack Test Support by @DseidLi in https://github.com/open-compass/opencompass/pull/714
[Fix] Fixed abbr erro of subjective alignbench and size partition by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/730
add turbomind restful api support by @AllentDan in https://github.com/open-compass/opencompass/pull/693
[Fix] Update merge script for non-split settting by @tonysy in https://github.com/open-compass/opencompass/pull/733
[Sync] Sync with internal codes by @Leymore in https://github.com/open-compass/opencompass/pull/734
[Feature] Add InfiniteBench by @philipwangOvO in https://github.com/open-compass/opencompass/pull/739
Update LightllmApi and Fix mmlu bug by @helloyongyang in https://github.com/open-compass/opencompass/pull/738
[Feature] Add other judgelm prompts for Alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/731
[Feat] support sanitized mbpp dataset by @yingfhu in https://github.com/open-compass/opencompass/pull/745
[Fix] SubSizePartition fix by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/746
add chinese version of humaneval, mbpp by @Connor-Shen in https://github.com/open-compass/opencompass/pull/743
[Fix] fix erro in configs by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/750
[Feature] Add Creationbench Dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/753
[Feat] update code config by @yingfhu in https://github.com/open-compass/opencompass/pull/749
update plot function in tools_needleinahaystack.py by @DseidLi in https://github.com/open-compass/opencompass/pull/747
[Feature] Add new dataset mastermath2024v1 by @Francis-llgg in https://github.com/open-compass/opencompass/pull/744
[Feature] Add GPQA Dataset by @Francis-llgg in https://github.com/open-compass/opencompass/pull/729
change NeedleInAHaystackDataset to dynamic loading by @DseidLi in https://github.com/open-compass/opencompass/pull/754
[Feature] Add support of Qwen API by @hzhwcmhf in https://github.com/open-compass/opencompass/pull/735
[Feature] Support LLaMA2-Accessory by @ChrisLiu6 in https://github.com/open-compass/opencompass/pull/732
[Fix] Fix small bug in alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/764
[Feature] Add multi_round dataset evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/766
[Feature] add subject ir dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/755
[Update] Update introduction of CompassBench-2024-Q1 by @tonysy in https://github.com/open-compass/opencompass/pull/769
[Fix] quick fix for postprocess by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/771
Support Mbpp_plus dataset by @Connor-Shen in https://github.com/open-compass/opencompass/pull/770
[Fix] fix typos in drop prompt by @yanyc428 in https://github.com/open-compass/opencompass/pull/773
typo(installation.md): fix unzip commands by @tpoisonooo in https://github.com/open-compass/opencompass/pull/774
Contamination analysis for MMLU, Hellaswag, and ARC_c by @liyucheng09 in https://github.com/open-compass/opencompass/pull/699
[Docs] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/775
[Feature] _batch_generate function, add the MultiTokenEOSCriteria by @jiangjin1999 in https://github.com/open-compass/opencompass/pull/772
[Sync] Sync with internal codes 2023.01.08 by @Leymore in https://github.com/open-compass/opencompass/pull/777

For a full list of updates, visit our Full Changelog.

Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉

Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-01-08发行的版本