v0.4.0
版本发布时间: 2023-12-04 23:08:53
EleutherAI/lm-evaluation-harness最新发布版本:v0.4.3(2024-07-01 22:00:36)
What's Changed
- Replace stale
triviaqa
dataset link by @jon-tow in https://github.com/EleutherAI/lm-evaluation-harness/pull/364 - Update
actions/setup-python
in CI workflows by @jon-tow in https://github.com/EleutherAI/lm-evaluation-harness/pull/365 - Bump
triviaqa
version by @jon-tow in https://github.com/EleutherAI/lm-evaluation-harness/pull/366 - Update
lambada_openai
multilingual data source by @jon-tow in https://github.com/EleutherAI/lm-evaluation-harness/pull/370 - Update Pile Test/Val Download URLs by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/373
- Added ToxiGen task by @Thartvigsen in https://github.com/EleutherAI/lm-evaluation-harness/pull/377
- Added CrowSPairs by @aflah02 in https://github.com/EleutherAI/lm-evaluation-harness/pull/379
- Add accuracy metric to crows-pairs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/380
- hotfix(gpt2): Remove vocab-size logits slice by @jon-tow in https://github.com/EleutherAI/lm-evaluation-harness/pull/384
- Enable "low_cpu_mem_usage" to reduce the memory usage of HF models by @sxjscience in https://github.com/EleutherAI/lm-evaluation-harness/pull/390
- Upstream
hf-causal
andhf-seq2seq
model implementations by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/381 - Hosting arithmetic dataset on HuggingFace by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/391
- Hosting wikitext on HuggingFace by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/396
- Change device parameter to cuda:0 to avoid runtime error by @Jeffwan in https://github.com/EleutherAI/lm-evaluation-harness/pull/403
- Update README installation instructions by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/407
- feat: evaluation using peft models with CLM by @zanussbaum in https://github.com/EleutherAI/lm-evaluation-harness/pull/414
- Update setup.py dependencies by @ret2libc in https://github.com/EleutherAI/lm-evaluation-harness/pull/416
- fix: add seq2seq peft by @zanussbaum in https://github.com/EleutherAI/lm-evaluation-harness/pull/418
- Add support for load_in_8bit and trust_remote_code model params by @philwee in https://github.com/EleutherAI/lm-evaluation-harness/pull/422
- Hotfix: patch issues with the
huggingface.py
model classes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/427 - Continuing work on refactor [WIP] by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/425
- Document task name wildcard support in README by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/435
- Add non-programmatic BIG-bench-hard tasks by @yurodiviy in https://github.com/EleutherAI/lm-evaluation-harness/pull/406
- Updated handling for device in lm_eval/models/gpt2.py by @nikhilpinnaparaju in https://github.com/EleutherAI/lm-evaluation-harness/pull/447
- [WIP, Refactor] Staging more changes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/465
- [Refactor, WIP] Multiple Choice + loglikelihood_rolling support for YAML tasks by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/467
- Configurable-Tasks by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/438
- single GPU automatic batching logic by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/394
- Fix bugs introduced in #394 #406 and max length bug by @juletx in https://github.com/EleutherAI/lm-evaluation-harness/pull/472
- Sort task names to keep the same order always by @juletx in https://github.com/EleutherAI/lm-evaluation-harness/pull/474
- Set PAD token to EOS token by @nikhilpinnaparaju in https://github.com/EleutherAI/lm-evaluation-harness/pull/448
- [Refactor] Add decorator for registering YAMLs as tasks by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/486
- fix adaptive batch crash when there are no new requests by @jquesnelle in https://github.com/EleutherAI/lm-evaluation-harness/pull/490
- Add multilingual datasets (XCOPA, XStoryCloze, XWinograd, PAWS-X, XNLI, MGSM) by @juletx in https://github.com/EleutherAI/lm-evaluation-harness/pull/426
- Create output path directory if necessary by @janEbert in https://github.com/EleutherAI/lm-evaluation-harness/pull/483
- Add results of various models in json and md format by @juletx in https://github.com/EleutherAI/lm-evaluation-harness/pull/477
- Update config by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/501
- P3 prompt task by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/493
- Evaluation Against Portion of Benchmark Data by @kenhktsui in https://github.com/EleutherAI/lm-evaluation-harness/pull/480
- Add option to dump prompts and completions to a JSON file by @juletx in https://github.com/EleutherAI/lm-evaluation-harness/pull/492
- Add perplexity task on arbitrary JSON data by @janEbert in https://github.com/EleutherAI/lm-evaluation-harness/pull/481
- Update config by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/520
- Data Parallelism by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/488
- Fix mgpt fewshot by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/522
- Extend
dtype
command line flag toHFLM
by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/523 - Add support for loading GPTQ models via AutoGPTQ by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/519
- Change type signature of
quantized
and its default value for python < 3.11 compatibility by @passaglia in https://github.com/EleutherAI/lm-evaluation-harness/pull/532 - Fix LLaMA tokenization issue by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/531
- [Refactor] Make promptsource an extra / not required for installation by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/542
- Move spaces from context to continuation by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/546
- Use max_length in AutoSeq2SeqLM by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/551
- Fix typo by @kwikiel in https://github.com/EleutherAI/lm-evaluation-harness/pull/557
- Add load_in_4bit and fix peft loading by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/556
- Update task_guide.md by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/564
- [Refactor] Non-greedy generation ; WIP GSM8k yaml by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/559
- Dataset metric log [WIP] by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/560
- Add Anthropic support by @zphang in https://github.com/EleutherAI/lm-evaluation-harness/pull/562
- Add MultipleChoiceExactTask by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/537
- Revert "Add MultipleChoiceExactTask" by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/568
- [Refactor] [WIP] New YAML advanced docs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/567
- Remove the registration of "GPT2" as a model type by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/574
- [Refactor] Docs update by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/577
- Better docs by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/576
- Update evaluator.py cache_db argument str if model is not str by @poedator in https://github.com/EleutherAI/lm-evaluation-harness/pull/575
- Add --max_batch_size and --batch_size auto:N by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/572
- [Refactor] ALL_TASKS now maintained (not static) by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/581
- Fix seqlen issues for bloom, remove extraneous OPT tokenizer check by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/582
- Fix non-callable attributes in CachingLM by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/584
- Add error handling for calling
.to(device)
by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/585 - fixes some minor issues on tasks. by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/580
- Add - 4bit-related args by @SONG-WONHO in https://github.com/EleutherAI/lm-evaluation-harness/pull/579
- Fix triviaqa task by @seopbo in https://github.com/EleutherAI/lm-evaluation-harness/pull/525
- [Refactor] Addressing Feedback on new docs pages by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/578
- Logging Samples by @farzanehnakhaee70 in https://github.com/EleutherAI/lm-evaluation-harness/pull/563
- Merge master into big-refactor by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/590
- [Refactor] Package YAMLs alongside pip installations of lm-eval by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/596
- fixes for multiple_choice by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/598
- add openbookqa config by @farzanehnakhaee70 in https://github.com/EleutherAI/lm-evaluation-harness/pull/600
- [Refactor] Model guide docs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/606
- [Refactor] More MCQA fixes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/599
- [Refactor] Hellaswag by @nopperl in https://github.com/EleutherAI/lm-evaluation-harness/pull/608
- [Refactor] Seq2Seq Models with Multi-Device Support by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/565
- [Refactor] CachingLM support via
--use_cache
by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/619 - [Refactor] batch generation better for
hf
model ; deprecatehf-causal
in new release by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/613 - [Refactor] Update task statuses on tracking list by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/629
- [Refactor]
device_map
options forhf
model type by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/625 - [Refactor] Misc. cleanup of dead code by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/609
- [Refactor] Log request arguments to per-sample json by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/624
- [Refactor] HellaSwag YAML fix by @nopperl in https://github.com/EleutherAI/lm-evaluation-harness/pull/639
- [Refactor] Add caveats to
parallelize=True
docs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/638 - fixed super_glue and removed unused yaml config by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/645
- [Refactor] Fix sample logging by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/646
- Add PEFT, quantization, remote code, LLaMA fix by @gakada in https://github.com/EleutherAI/lm-evaluation-harness/pull/644
- [Refactor] Handle
cuda:0
device assignment by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/647 - [refactor] Add prost config by @farzanehnakhaee70 in https://github.com/EleutherAI/lm-evaluation-harness/pull/640
- [Refactor] Misc. bugfixes ; edgecase quantized models by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/648
- Update init.py by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/650
- [Refactor] Add Lambada Multilingual by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/658
- [Refactor] Add: SWAG,RACE,Arithmetic,Winogrande,PubmedQA by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/627
- [refactor] Add qa4mre config by @farzanehnakhaee70 in https://github.com/EleutherAI/lm-evaluation-harness/pull/651
- Update
generation_kwargs
by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/657 - [Refactor] Move race dataset on HF to EleutherAI group by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/661
- [Refactor] Add Headqa by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/659
- [Refactor] Add Unscramble ; Toxigen ; Hendrycks_Ethics ; MathQA by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/660
- [Refactor] Port TruthfulQA (mc1 only) by @nopperl in https://github.com/EleutherAI/lm-evaluation-harness/pull/666
- [Refactor] Miscellaneous fixes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/676
- [Refactor] Patch to revamp-process by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/678
- Revamp process by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/671
- [Refactor] Fix padding ranks by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/679
- [Refactor] minor edits by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/680
- [Refactor] Migrate ANLI tasks to yaml by @yeoedward in https://github.com/EleutherAI/lm-evaluation-harness/pull/682
- edited output_path and added help to args by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/684
- [Refactor] Minor changes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/685
- [Refactor] typo by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/687
- [Test] fix test_evaluator.py by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/675
- Fix dummy model not invoking super class constructor by @yeoedward in https://github.com/EleutherAI/lm-evaluation-harness/pull/688
- [Refactor] Migrate webqs task to yaml by @yeoedward in https://github.com/EleutherAI/lm-evaluation-harness/pull/689
- [Refactor] Fix tests by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/693
- [Refactor] Migrate xwinograd tasks to yaml by @yeoedward in https://github.com/EleutherAI/lm-evaluation-harness/pull/695
- Early stop bug of greedy_until (primary_until should be a list of str) by @ZZR0 in https://github.com/EleutherAI/lm-evaluation-harness/pull/700
- Remove condition to check for
winograd_schema
by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/690 - [Refactor] Use console script by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/703
- [Refactor] Fixes for when using
num_fewshot
by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/702 - [Refactor] Updated anthropic to new API by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/710
- [Refactor] Cleanup for
big-refactor
by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/686 - Update README.md by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/720
- [Refactor] Benchmark scripts by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/612
- [Refactor] Fix Max Length arg by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/723
- Add note about MPS by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/728
- Update huggingface.py by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/730
- Update README.md by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/732
- [Refactor] Port over Autobatching by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/673
- [Refactor] Fix Anthropic Import and other fixes by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/724
- [Refactor] Remove Unused Variable in Make-Table by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/734
- [Refactor] logiqav2 by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/711
- [Refactor] Fix task packaging by @yeoedward in https://github.com/EleutherAI/lm-evaluation-harness/pull/739
- [Refactor] fixed openai by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/736
- [Refactor] added some typehints by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/742
- [Refactor] Port Babi task by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/752
- [Refactor] CrowS-Pairs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/751
- Update README.md by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/745
- [Refactor] add xcopa by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/749
- Update README.md by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/764
- [Refactor] Add Blimp by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/763
- [Refactor] Use evaluation mode for accelerate to prevent OOM by @tju01 in https://github.com/EleutherAI/lm-evaluation-harness/pull/770
- Patch Blimp by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/768
- [Refactor] Speedup hellaswag context building by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/774
- [Refactor] Patch crowspairs higher_is_better by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/766
- [Refactor] XNLI by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/776
- [Refactor] Update Benchmark by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/777
- [WIP] Update API docs in README by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/747
- [Refactor] Real Toxicity Prompts by @aflah02 in https://github.com/EleutherAI/lm-evaluation-harness/pull/725
- [Refactor] XStoryCloze by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/759
- [Refactor] Glue by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/761
- [Refactor] Add triviaqa by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/758
- [Refactor] Paws-X by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/779
- [Refactor] MC Taco by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/783
- [Refactor] Truthfulqa by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/782
- [Refactor] fix doc_to_target processing by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/786
- [Refactor] Add README.md by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/757
- [Refactor] Don't always require Perspective API key to run by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/788
- [Refactor] Added HF model test by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/791
- [Big refactor] HF test fixup by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/793
- [Refactor] Process Whitespace for greedy_until by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/781
- [Refactor] Fix metrics in Greedy Until by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/780
- Update README.md by @Wehzie in https://github.com/EleutherAI/lm-evaluation-harness/pull/803
- Merge Fix metrics branch by @uSaiPrashanth in https://github.com/EleutherAI/lm-evaluation-harness/pull/802
- [Refactor] Update docs by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/744
- [Refactor] Superglue T5 Parity by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/769
- Update main.py by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/817
- [Refactor] Coqa by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/820
- [Refactor] drop by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/821
- [Refactor] Asdiv by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/813
- [Refactor] Fix IndexError by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/819
- [Refactor] toxicity: API inside function by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/822
- [Refactor] wsc273 by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/807
- [Refactor] Bump min accelerate version and update documentation by @fattorib in https://github.com/EleutherAI/lm-evaluation-harness/pull/812
- Add mypy baseline config by @ethanhs in https://github.com/EleutherAI/lm-evaluation-harness/pull/809
- [Refactor] Fix wikitext task by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/833
- [Refactor] Add WMT tasks by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/775
- [Refactor] consolidated tasks tests by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/831
- Update README.md by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/838
- [Refactor] mgsm by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/784
- [Refactor] Add top-level import by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/830
- Add pyproject.toml by @ethanhs in https://github.com/EleutherAI/lm-evaluation-harness/pull/810
- [Refactor] Additions to docs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/799
- [Refactor] Fix MGSM by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/845
- [Refactor] float16 MPS works in torch nightly by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/853
- [Refactor] Update benchmark by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/850
- Switch to pyproject.toml based project metadata by @ethanhs in https://github.com/EleutherAI/lm-evaluation-harness/pull/854
- Use Dict to make the code python 3.8 compatible by @chrisociepa in https://github.com/EleutherAI/lm-evaluation-harness/pull/857
- [Refactor] NQopen by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/859
- [Refactor] NQ-open by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/798
- Fix "local variable 'docs' referenced before assignment" error in write_out.py by @chrisociepa in https://github.com/EleutherAI/lm-evaluation-harness/pull/856
- [Refactor] 3.8 test compatibility by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/863
- [Refactor] Cleanup dependencies by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/860
- [Refactor] Qasper, MuTual, MGSM (Native CoT) by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/840
- undefined type and output_type when using promptsource fixed by @Hojjat-Mokhtarabadi in https://github.com/EleutherAI/lm-evaluation-harness/pull/842
- [Refactor] Deactivate select GH Actions by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/871
- [Refactor] squadv2 by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/785
- [Refactor] Set python3.8 as allowed version by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/862
- Fix positional arguments in HF model generate by @chrisociepa in https://github.com/EleutherAI/lm-evaluation-harness/pull/877
- [Refactor] MATH by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/861
- Create cot_yaml by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/870
- [Refactor] Port CSATQA to refactor by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/865
- [Refactor] CMMLU, C-Eval port ; Add fewshot config by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/864
- [Refactor] README.md for Asdiv by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/878
- [Refactor] Hotfixes to big-refactor by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/880
- Change Python Version to 3.8 in .pre-commit-config.yaml and GitHub Actions by @chrisociepa in https://github.com/EleutherAI/lm-evaluation-harness/pull/895
- [Refactor] Fix PubMedQA by @tmabraham in https://github.com/EleutherAI/lm-evaluation-harness/pull/890
- [Refactor] Fix error when calling
lm-eval
by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/899 - [Refactor] bigbench by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/852
- [Refactor] Fix wildcards by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/900
- Add transformation filters by @chrisociepa in https://github.com/EleutherAI/lm-evaluation-harness/pull/883
- [Refactor] Flan benchmark by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/816
- [Refactor] WIP: Add MMLU by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/753
- Added notable contributors to the citation block by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/907
- [Refactor] Improve error logging by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/908
- [Refactor] Add _batch_scheduler in greedy_until by @AndyWolfZwei in https://github.com/EleutherAI/lm-evaluation-harness/pull/912
- add belebele by @ManuelFay in https://github.com/EleutherAI/lm-evaluation-harness/pull/885
- Update README.md by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/917
- [Refactor] Precommit formatting for Belebele by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/926
- [Refactor] change all mentions of
greedy_until
togenerate_until
by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/927 - [Refactor] Squadv2 updates by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/923
- [Refactor] Verbose by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/910
- [Refactor] Fix Unit Tests by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/905
- Fix
generate_until
rename by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/929 - [Refactor] Generate_until rename by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/931
- Fix 'tqdm' object is not subscriptable" error in huggingface.py when batch size is auto by @jasonkrone in https://github.com/EleutherAI/lm-evaluation-harness/pull/916
- [Refactor] Fix Default Metric Call by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/935
- Big refactor write out adaption by @MicPie in https://github.com/EleutherAI/lm-evaluation-harness/pull/937
- Update pyproject.toml by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/915
- [Refactor] Fix whitespace warning by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/949
- [Refactor] Update documentation by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/954
- [Refactor]fix two bugs when ran with qasper_bool and toxigen by @AndyWolfZwei in https://github.com/EleutherAI/lm-evaluation-harness/pull/934
- [Refactor] Describe local dataset usage in docs by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/956
- [Refactor] Update README, documentation by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/955
- [Refactor] Don't load MMLU auxiliary_train set by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/953
- [Refactor] Patch for Generation Until by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/957
- [Refactor] Model written eval by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/815
- [Refactor] Bugfix: AttributeError: 'Namespace' object has no attribute 'verbose' by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/966
- [Refactor] Mmlu subgroups and weight avg by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/922
- [Refactor] Remove deprecated
gold_alias
task YAML option by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/965 - [Refactor] Logging fixes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/952
- [Refactor] fixes for alternative MMLU tasks. by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/981
- [Refactor] Alias fix by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/987
- [Refactor] Minor cleanup on base
Task
subclasses by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/996 - [Refactor] add squad from master by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/971
- [Refactor] Squad misc by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/999
- [Refactor] Fix CI tests by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/997
- [Refactor] will check if group_name is None by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1001
- [Refactor] Bugfixes by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1002
- [Refactor] Verbosity rework by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/958
- add description on task/group alias by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/979
- [Refactor] Upstream ggml from big-refactor branch by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/967
- [Refactor] Improve Handling of Stop-Sequences for HF Batched Generation by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1009
- [Refactor] Update README by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/1020
- [Refactor] Remove
examples/
folder by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1018 - [Refactor] vllm support by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/1011
- Allow Generation arguments on greedy_until reqs by @uSaiPrashanth in https://github.com/EleutherAI/lm-evaluation-harness/pull/897
- Social iqa by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/1030
- [Refactor] BBH fixup by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1029
- Rename bigbench.yml to default.yml by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/1032
- [Refactor] Num_fewshot process by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/985
- [Refactor] Use correct HF model type for MBart-like models by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1024
- [Refactor] Urgent fix by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1033
- [Refactor] Versioning by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1031
- fixes for sampler by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/1038
- [Refactor] Update README.md by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1046
- [refactor] mps requirement by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/1037
- [Refactor] Additions to example notebook by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1048
- Miscellaneous documentation updates by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/1047
- [Refactor] add notebook for overview by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1025
- Update README.md by @StellaAthena in https://github.com/EleutherAI/lm-evaluation-harness/pull/1049
- [Refactor] Openai completions by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1008
- [Refactor] Added support for OpenAI ChatCompletions by @DaveOkpare in https://github.com/EleutherAI/lm-evaluation-harness/pull/839
- [Refactor] Update docs ToC by @haileyschoelkopf in https://github.com/EleutherAI/lm-evaluation-harness/pull/1051
- [Refactor] Fix fewshot cot mmlu descriptions by @lintangsutawika in https://github.com/EleutherAI/lm-evaluation-harness/pull/1060
New Contributors
- @fattorib made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/373
- @Thartvigsen made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/377
- @aflah02 made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/379
- @sxjscience made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/390
- @Jeffwan made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/403
- @zanussbaum made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/414
- @ret2libc made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/416
- @philwee made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/422
- @yurodiviy made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/406
- @nikhilpinnaparaju made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/447
- @lintangsutawika made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/438
- @juletx made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/472
- @janEbert made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/483
- @kenhktsui made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/480
- @passaglia made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/532
- @kwikiel made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/557
- @poedator made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/575
- @SONG-WONHO made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/579
- @seopbo made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/525
- @farzanehnakhaee70 made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/563
- @nopperl made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/608
- @yeoedward made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/682
- @ZZR0 made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/700
- @tju01 made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/770
- @Wehzie made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/803
- @uSaiPrashanth made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/802
- @ethanhs made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/809
- @chrisociepa made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/857
- @Hojjat-Mokhtarabadi made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/842
- @AndyWolfZwei made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/912
- @ManuelFay made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/885
- @jasonkrone made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/916
- @MicPie made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/937
- @DaveOkpare made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/839
Full Changelog: https://github.com/EleutherAI/lm-evaluation-harness/compare/v0.3.0...v0.4.0