v0.14.1
版本发布时间: 2023-10-06 19:10:29
huggingface/tokenizers最新发布版本:v0.15.0(2023-11-15 03:06:30)
What's Changed
- Fix conda release by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1211
- Fix node release by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1212
- Printing warning to stderr. by @Narsil in https://github.com/huggingface/tokenizers/pull/1222
- Fixing padding_left sequence_ids. by @Narsil in https://github.com/huggingface/tokenizers/pull/1233
- Use LTO for release and benchmark builds by @csko in https://github.com/huggingface/tokenizers/pull/1157
- fix unigram.rs test_sample() by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1244
- implement a simple max_sentencepiece_length into BPE by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1228
- Makes
decode
anddecode_batch
work on borrowed content. by @mfuntowicz in https://github.com/huggingface/tokenizers/pull/1251 - Update all GH Actions with dependency on actions/checkout by @mfuntowicz in https://github.com/huggingface/tokenizers/pull/1256
- Parallelize unigram trainer by @mishig25 in https://github.com/huggingface/tokenizers/pull/976
- Update unigram/trainer.rs by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1257
- Fixing broken link. by @Narsil in https://github.com/huggingface/tokenizers/pull/1268
- fix documentation regarding regex by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1264
- Update Cargo.toml by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1266
- Update README.md - Broken link by @sbhavani in https://github.com/huggingface/tokenizers/pull/1272
- [doc build] Use secrets by @mishig25 in https://github.com/huggingface/tokenizers/pull/1273
- Improve error for truncation with too high stride by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1275
- Add unigram bytefallback by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1217
- revise type specification by @hiroshi-matsuda-rit in https://github.com/huggingface/tokenizers/pull/1289
- Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1291
- Update path name: master -> main by @bact in https://github.com/huggingface/tokenizers/pull/1292
- import Tuple from typing by @kellymarchisio in https://github.com/huggingface/tokenizers/pull/1295
- Fixing clippy warnings on 1.71. by @Narsil in https://github.com/huggingface/tokenizers/pull/1296
- Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1299
- feat: Added CITATION.cff. by @SamuelLarkin in https://github.com/huggingface/tokenizers/pull/1302
- Single warning for holes. by @Narsil in https://github.com/huggingface/tokenizers/pull/1303
- Give error when initializing tokenizer with too high stride by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1306
- Handle when precompiled charsmap is empty by @kellymarchisio in https://github.com/huggingface/tokenizers/pull/1308
- Derive clone for TrainerWrapper by @jonatanklosko in https://github.com/huggingface/tokenizers/pull/1317
- CD backports by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1318
- 0.13.4.rc1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1319
- Release all at once for simplicity. by @Narsil in https://github.com/huggingface/tokenizers/pull/1320
- Fix stride condition. by @Narsil in https://github.com/huggingface/tokenizers/pull/1321
- pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322
- Add
expect()
for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316 - Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328
- Reduce number of different revisions by 1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1329
- Python 38 arm by @Narsil in https://github.com/huggingface/tokenizers/pull/1330
- Move to maturing mimicking move for
safetensors
. + Rewritten node bindings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1331 - Updating the docs with the new command. by @Narsil in https://github.com/huggingface/tokenizers/pull/1333
- Update added tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1335
- update package version for dev by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1339
- Added ability to inspect a 'Sequence' pre-tokenizer. by @eaplatanios in https://github.com/huggingface/tokenizers/pull/1341
- Let's allow hf_hub < 1.0 by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1344
- Fixing the progressbar. by @Narsil in https://github.com/huggingface/tokenizers/pull/1353
- Preparing release. by @Narsil in https://github.com/huggingface/tokenizers/pull/1355
New Contributors
- @csko made their first contribution in https://github.com/huggingface/tokenizers/pull/1157
- @chris-ha458 made their first contribution in https://github.com/huggingface/tokenizers/pull/1244
- @sbhavani made their first contribution in https://github.com/huggingface/tokenizers/pull/1272
- @boyleconnor made their first contribution in https://github.com/huggingface/tokenizers/pull/1275
- @hiroshi-matsuda-rit made their first contribution in https://github.com/huggingface/tokenizers/pull/1289
- @bact made their first contribution in https://github.com/huggingface/tokenizers/pull/1292
- @kellymarchisio made their first contribution in https://github.com/huggingface/tokenizers/pull/1295
- @SamuelLarkin made their first contribution in https://github.com/huggingface/tokenizers/pull/1302
- @jonatanklosko made their first contribution in https://github.com/huggingface/tokenizers/pull/1317
- @mikelui made their first contribution in https://github.com/huggingface/tokenizers/pull/1322
- @eaplatanios made their first contribution in https://github.com/huggingface/tokenizers/pull/1341
Full Changelog: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.1