MyGit

v0.4.2

EleutherAI/lm-evaluation-harness

版本发布时间: 2024-03-18 21:07:28

EleutherAI/lm-evaluation-harness最新发布版本:v0.4.3(2024-07-01 22:00:36)

lm-eval v0.4.2 Release Notes

We are releasing a new minor version of lm-eval for PyPI users! We've been very happy to see continued usage of the lm-evaluation-harness, including as a standard testbench to propel new architecture design (https://arxiv.org/abs/2402.18668), to ease new benchmark creation (https://arxiv.org/abs/2402.11548, https://arxiv.org/abs/2402.00786, https://arxiv.org/abs/2403.01469), enabling controlled experimentation on LLM evaluation (https://arxiv.org/abs/2402.01781), and more!

New Additions

Backwards Incompatibilities

There were a few breaking changes to lm-eval's general API or logic we'd like to highlight:

TaskManager API

previously, users had to call lm_eval.tasks.initialize_tasks() to register the library's default tasks, or lm_eval.tasks.include_path() to include a custom directory of task YAML configs.

Old usage:

import lm_eval

lm_eval.tasks.initialize_tasks() 
# or:
lm_eval.tasks.include_path("/path/to/my/custom/tasks")

 
lm_eval.simple_evaluate(model=lm, tasks=["arc_easy"])

New intended usage:

import lm_eval

# optional--only need to instantiate separately if you want to pass custom path!
task_manager = TaskManager() # pass include_path="/path/to/my/custom/tasks" if desired

lm_eval.simple_evaluate(model=lm, tasks=["arc_easy"], task_manager=task_manager)

get_task_dict() now also optionally takes a TaskManager object, when wanting to load custom tasks.

This should allow for much faster library startup times due to lazily loading requested tasks or groups.

Updated Stderr Aggregation

Previous versions of the library incorrectly reported erroneously large stderr scores for groups of tasks such as MMLU.

We've since updated the formula to correctly aggregate Standard Error scores for groups of tasks reporting accuracies aggregated via their mean across the dataset -- see #1390 #1427 for more information.

As always, please feel free to give us feedback or request new features! We're grateful for the community's support.

What's Changed

New Contributors

Full Changelog: https://github.com/EleutherAI/lm-evaluation-harness/compare/v0.4.1...v0.4.2

相关地址:原始地址 下载(tar) 下载(zip)

查看:2024-03-18发行的版本