v1.1.0
版本发布时间: 2024-09-19 17:28:57
huggingface/setfit最新发布版本:v1.1.0(2024-09-19 17:28:57)
This release introduces a new backend to finetune embedding models, based on the Sentence Transformers Trainer, tackles deprecations of other dependencies like transformers
, deprecates Python 3.7 while adding support for new Python versions, and applies some other minor fixes. There shouldn't be any breaking changes.
Install this version with
pip install -U setfit
Defer the embedding model finetuning phase to Sentence Transformers (#554)
In SetFit v1.0, the old model.fit
training from Sentence Transformers was replaced by a custom training loop that has some features the former was missing, such as loss logging, useful callbacks, etc. However, since then, Sentence Transformers v3 has released, which also added all of the features that were previously lacking. To simplify the training moving forward, the training is now (once again) deferred to Sentence Transformers.
Because both the old and new training approach are inspired by the transformers
Trainer
, there should not be any breaking changes. The primary notable change is that training now requires accelerate
(as Sentence Transformers requires it), and we benefit from some of the Sentence Transformers training features, such as multi-GPU training.
Solve discrepancies with new versions of dependencies
To ensure compatibility with the latest versions of dependencies, the following issues have been addressed:
- Follow the (soft) deprecation of
evaluation_strategy
toeval_strategy
(#538). This previously resulted in crashes if yourtransformers
version was too new. - Avoid the now-deprecated
DatasetFilter
(#527). This previously resulted in crashes if yourhuggingface-hub
version was too new.
Python version support
- Following Python 3.7 its deprecation by the Python team, Python 3.7 is now also deprecated by SetFit moving forward. (#506)
- We've added official support for Python 3.11 and 3.12 now that both are included in our test suite. (#550)
Minor changes
- Firm up
max_steps
andeval_max_steps
: rather than being a rough maximum limit, the limit is now exact. This can be helpful to avoid memory overflow, especially in situations with notable dataset imbalances. (#549) - Training and validation losses are now nicely logged in notebooks. (#557)
Minor bug fixes
- Fix bug where
device
parameter inSetFitHead
is ignored if CUDA is not available. (#518)
All Changes
- [
absa
] Add SetFitABSA notebook on FiQA by @tomaarsen in https://github.com/huggingface/setfit/pull/471 - Refactor training logs & warmup_proportion by @tomaarsen in https://github.com/huggingface/setfit/pull/475
- [
feat
] Set labels based on head classes, if possible by @tomaarsen in https://github.com/huggingface/setfit/pull/476 -
optimum-intel
notebook by @danielkorat in https://github.com/huggingface/setfit/pull/480 - Optimum-Intel notebook: fix quantization explanation by @danielkorat in https://github.com/huggingface/setfit/pull/483
- Optimum-Intel Notebook by @danielkorat in https://github.com/huggingface/setfit/pull/484
- Fix Errors in
setfit-onnx-optimum
Notebook by @danielkorat in https://github.com/huggingface/setfit/pull/496 - Add files via upload by @MosheWasserb in https://github.com/huggingface/setfit/pull/497
- Switch to differentiable head, larger input sequence by @danielkorat in https://github.com/huggingface/setfit/pull/489
- Bugfix: Error in optimum-intel notebook due to missing attributes after
torch.compile()
by @danielkorat in https://github.com/huggingface/setfit/pull/517 - Renamed
evaluation_strategy
toeval_strategy
by @sergiopaniego in https://github.com/huggingface/setfit/pull/538 - [CI] Deprecate Python3.7 and invalidate cache weekly by @Wauplin in https://github.com/huggingface/setfit/pull/506
- Don't use deprecated
DatasetFilter
+ update deps by @Wauplin in https://github.com/huggingface/setfit/pull/527 - Fix SetFitModel: not a dataclass, not a PyTorchModelHubMixin by @Wauplin in https://github.com/huggingface/setfit/pull/505
- [
tests
] Resolve remaining test failures by @tomaarsen in https://github.com/huggingface/setfit/pull/550 - Train via the Sentence Transformers Trainer from ST v3 by @tomaarsen in https://github.com/huggingface/setfit/pull/554
- Update absa.mdx with necessary imports by @splevine in https://github.com/huggingface/setfit/pull/533
- Fix bug where SetFitHead not moved to non-cuda devices on init by @ajmssc in https://github.com/huggingface/setfit/pull/518
- Fix pandas groupby -> apply warning by @tomaarsen in https://github.com/huggingface/setfit/pull/555
- Check if max pairs limit reached in
generate_pairs
andgenerate_multilabel_pairs
by @OscarRunsCode in https://github.com/huggingface/setfit/pull/549 - Prevent sampling 2x more than requested when max_steps is set by @tomaarsen in https://github.com/huggingface/setfit/pull/556
- Create custom NotebookCallback subclass for embedding_loss, etc. by @tomaarsen in [(#557)](https://github.com/huggingface/setfit/pull/557
New Contributors
- @sergiopaniego made their first contribution in https://github.com/huggingface/setfit/pull/538
- @Wauplin made their first contribution in https://github.com/huggingface/setfit/pull/506
- @splevine made their first contribution in https://github.com/huggingface/setfit/pull/533
- @ajmssc made their first contribution in https://github.com/huggingface/setfit/pull/518
- @OscarRunsCode made their first contribution in https://github.com/huggingface/setfit/pull/549
Full Changelog: https://github.com/huggingface/setfit/compare/v1.0.3...v1.1.0