python-v0.13.3rc1
版本发布时间: 2023-04-04 21:06:22
huggingface/tokenizers最新发布版本:v0.15.0(2023-11-15 03:06:30)
What's Changed
- Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101
- Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099
- Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108
- Include license file in Rust crate by @ankane in https://github.com/huggingface/tokenizers/pull/1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in https://github.com/huggingface/tokenizers/pull/1120
- Fixing conda ssl location by @Narsil in https://github.com/huggingface/tokenizers/pull/1124
- Adding stale bot ? by @Narsil in https://github.com/huggingface/tokenizers/pull/1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in https://github.com/huggingface/tokenizers/pull/1127
- Wrap rustdoc html entity in code block by @hvaara in https://github.com/huggingface/tokenizers/pull/1130
- Fix broken links in docs by @hvaara in https://github.com/huggingface/tokenizers/pull/1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in https://github.com/huggingface/tokenizers/pull/1129
- Ignore Cargo.lock for subfolders by @hvaara in https://github.com/huggingface/tokenizers/pull/1131
- Fix one char super tiny typo by @fzyzcjy in https://github.com/huggingface/tokenizers/pull/1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in https://github.com/huggingface/tokenizers/pull/1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1140
- Add missing build targets by @Narsil in https://github.com/huggingface/tokenizers/pull/1145
- Adding python 3.8 for M1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1147
- Made dirs optional by @ankane in https://github.com/huggingface/tokenizers/pull/1148
- Update info on environment variable for threading by @mert-kurttutan in https://github.com/huggingface/tokenizers/pull/1150
- Making
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153 - Improved version. by @Narsil in https://github.com/huggingface/tokenizers/pull/1154
- Update model.rs by @thomasw21 in https://github.com/huggingface/tokenizers/pull/1166
- Using clippy 1.67 by @Narsil in https://github.com/huggingface/tokenizers/pull/1167
- pyo3 v0.18 migration by @mert-kurttutan in https://github.com/huggingface/tokenizers/pull/1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in https://github.com/huggingface/tokenizers/pull/1182
- Bump dirs from 3.0 to 4.0 by @hvaara in https://github.com/huggingface/tokenizers/pull/1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183 - Faster
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194 - Adding 2 new decoders: by @Narsil in https://github.com/huggingface/tokenizers/pull/1196
- Fixing decoder strip because of char boundaries. by @Narsil in https://github.com/huggingface/tokenizers/pull/1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199
New Contributors
- @ankane made their first contribution in https://github.com/huggingface/tokenizers/pull/1115
- @SeongBeomLEE made their first contribution in https://github.com/huggingface/tokenizers/pull/1120
- @hvaara made their first contribution in https://github.com/huggingface/tokenizers/pull/1127
- @fzyzcjy made their first contribution in https://github.com/huggingface/tokenizers/pull/1137
- @mert-kurttutan made their first contribution in https://github.com/huggingface/tokenizers/pull/1150
- @lhoestq made their first contribution in https://github.com/huggingface/tokenizers/pull/1192
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1
What's Changed
- Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101
- Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099
- Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107
- Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108
- Include license file in Rust crate by @ankane in https://github.com/huggingface/tokenizers/pull/1115
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1116
- [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in https://github.com/huggingface/tokenizers/pull/1120
- Fixing conda ssl location by @Narsil in https://github.com/huggingface/tokenizers/pull/1124
- Adding stale bot ? by @Narsil in https://github.com/huggingface/tokenizers/pull/1123
- Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1126
- Bump decode-uri-component from 0.2.0 to 0.2.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1125
- Bump cached-path from 0.5 to 0.6 by @hvaara in https://github.com/huggingface/tokenizers/pull/1127
- Wrap rustdoc html entity in code block by @hvaara in https://github.com/huggingface/tokenizers/pull/1130
- Fix broken links in docs by @hvaara in https://github.com/huggingface/tokenizers/pull/1133
- Bump derive_builder from 0.9 to 0.12 by @hvaara in https://github.com/huggingface/tokenizers/pull/1129
- Ignore Cargo.lock for subfolders by @hvaara in https://github.com/huggingface/tokenizers/pull/1131
- Fix one char super tiny typo by @fzyzcjy in https://github.com/huggingface/tokenizers/pull/1137
- [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. by @SeongBeomLEE in https://github.com/huggingface/tokenizers/pull/1136
- Bump json5, copy-webpack-plugin, webpack and webpack-cli in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1139
- Bump json5 from 2.2.0 to 2.2.3 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1140
- Add missing build targets by @Narsil in https://github.com/huggingface/tokenizers/pull/1145
- Adding python 3.8 for M1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1147
- Made dirs optional by @ankane in https://github.com/huggingface/tokenizers/pull/1148
- Update info on environment variable for threading by @mert-kurttutan in https://github.com/huggingface/tokenizers/pull/1150
- Making
Tokenizer
clone. by @Narsil in https://github.com/huggingface/tokenizers/pull/1152 - Prevent using
from_pretrained
on invalid ids (better error message). by @Narsil in https://github.com/huggingface/tokenizers/pull/1153 - Improved version. by @Narsil in https://github.com/huggingface/tokenizers/pull/1154
- Update model.rs by @thomasw21 in https://github.com/huggingface/tokenizers/pull/1166
- Using clippy 1.67 by @Narsil in https://github.com/huggingface/tokenizers/pull/1167
- pyo3 v0.18 migration by @mert-kurttutan in https://github.com/huggingface/tokenizers/pull/1173
- Bump webpack from 5.75.0 to 5.76.0 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1181
- Fixing infinite loop in UnigramTrainer. by @Narsil in https://github.com/huggingface/tokenizers/pull/1182
- Bump dirs from 3.0 to 4.0 by @hvaara in https://github.com/huggingface/tokenizers/pull/1142
- Adding ByteFallback support for
tokenizers
. by @Narsil in https://github.com/huggingface/tokenizers/pull/1183 - Faster
datasets
train example by @lhoestq in https://github.com/huggingface/tokenizers/pull/1192 - Adding
Replace
to decoder (to undo the Replace Normalizer for Metaspace split). by @Narsil in https://github.com/huggingface/tokenizers/pull/1195 - Creating
normalizers.Prepend
(To be used instead ofMetaspace
). by @Narsil in https://github.com/huggingface/tokenizers/pull/1194 - Adding 2 new decoders: by @Narsil in https://github.com/huggingface/tokenizers/pull/1196
- Fixing decoder strip because of char boundaries. by @Narsil in https://github.com/huggingface/tokenizers/pull/1197
- Add
content
to Strip decoder to allow decoding mid tokens. by @Narsil in https://github.com/huggingface/tokenizers/pull/1199 - New version 0.13.3 by @Narsil in https://github.com/huggingface/tokenizers/pull/1205
New Contributors
- @ankane made their first contribution in https://github.com/huggingface/tokenizers/pull/1115
- @SeongBeomLEE made their first contribution in https://github.com/huggingface/tokenizers/pull/1120
- @hvaara made their first contribution in https://github.com/huggingface/tokenizers/pull/1127
- @fzyzcjy made their first contribution in https://github.com/huggingface/tokenizers/pull/1137
- @mert-kurttutan made their first contribution in https://github.com/huggingface/tokenizers/pull/1150
- @lhoestq made their first contribution in https://github.com/huggingface/tokenizers/pull/1192
Full Changelog: https://github.com/huggingface/tokenizers/compare/node-v0.13.2...python-v0.13.3rc1