0.3.9
版本发布时间: 2024-12-31 17:28:48
open-compass/opencompass最新发布版本:0.3.9(2024-12-31 17:28:48)
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.9!
🌟 Highlights ✨ This version introduces a number of new features and improvements that enhance the user experience and expand the capabilities of OpenCompass. Notable changes include support for G-Pass@k and LiveMathBench, as well as the introduction of the Bradley-Terry subjective evaluation method.
🚀 New Features -🆕 Support for G-Pass@k and LiveMathBench metrics to better evaluate model performance. (#1772) -🆕 Theorem QA 0shot CoT configuration has been added for more comprehensive evaluation scenarios. (#1783) -🆕 A customizable tokenizer for RULER offers greater flexibility in processing inputs. (#1731) -🆕 Added LiveStemBench Dataset to enrich our collection of datasets. (#1794) -🆕 Integration of JudgeLLM into o1 evaluation for improved assessment accuracy. (#1795) -🆕 Implementation of the Bradley-Terry subjective evaluation method on wildbench, alpacaeval, and compassarena datasets. (#1791)
📖 Documentation -📚 Updated OC academic content to the most recent information as of December 2024. (#1771)
🐛 Bug Fixes -🔧 Fixed Order error which was causing issues with sequence handling. (#1767) -🔧 Resolved an issue where the lark report was returning None. (#1769) -🔧 Corrected the path for saving Local Runner parameters. (#1768) -🔧 Amended the summarizer abbreviation for models to ensure proper identification. (#1789) -🔧 Fixed output_path errors to improve file handling reliability. (#1798)
⚙ Enhancements and Refactors -💪 Fullbench testcase has been integrated into the CI pipeline. (#1766) Volc status exception handling has been updated for more robust responses. (#1780) -💪 Removed daily step retry mechanism and updated PR score calculation for efficiency. (#1782) -💪Deploy Python version has been updated to the latest stable release. (#1784) -💪Pypi deploy workflow has been refined for smoother deployments. (#1786)
Thank you for being part of the OpenCompass community! Your support and contributions make each release possible.