v0.3.4
版本发布时间: 2020-08-25 00:24:24
UKPLab/sentence-transformers最新发布版本:v3.0.1(2024-06-07 21:01:30)
- The documentation is substantially improved and can be found at: www.SBERT.net - Feedback welcome
- The dataset to hold training InputExamples (dataset.SentencesDataset) now uses lazy tokenization, i.e., examples are tokenized once they are needed for a batch. If you set
num_workers
to a positive integer in yourDataLoader
, tokenization will happen in a background thread. This substantially increases the start-up time for training. -
model.encode()
uses also a PyTorch DataSet + DataLoader. If you setnum_workers
to a positive integer, tokenization will happen in the background leading to faster encoding speed for large corpora. - Added functions and an example for mutli-GPU encoding - This method can be used to encode a corpus with multiple GPUs in parallel. No multi-GPU support for training yet.
- Removed parallel_tokenization parameters from encode & SentencesDatasets - No longer needed with lazy tokenization and DataLoader worker threads.
- Smaller bugfixes
Breaking changes:
- Renamed evaluation.BinaryEmbeddingSimilarityEvaluator to evaluation.BinaryClassificationEvaluator