MyGit

v0.4.1

UKPLab/sentence-transformers

版本发布时间: 2021-01-04 22:04:01

UKPLab/sentence-transformers最新发布版本:v3.0.1(2024-06-07 21:01:30)

Refactored Tokenization

train_examples = [InputExample(texts=['My first sentence', 'My second sentence'], label=0.8),
    InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

Asymmetric Models Add new models.Asym class that allows different encoding of sentences based on some tag (e.g. query vs paragraph). Minimal example:

word_embedding_model = models.Transformer(base_model, max_seq_length=250)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
d1 = models.Dense(word_embedding_model.get_word_embedding_dimension(), 256, bias=False, activation_function=nn.Identity())
d2 = models.Dense(word_embedding_model.get_word_embedding_dimension(), 256, bias=False, activation_function=nn.Identity())
asym_model = models.Asym({'QRY': [d1], 'DOC': [d2]})
model = SentenceTransformer(modules=[word_embedding_model, pooling_model, asym_model])

##Your input examples have to look like this:
inp_example = InputExample(texts=[{'QRY': 'your query'}, {'DOC': 'your document text'}], label=1)

##Encoding (Note: Mixed inputs are not allowed)
model.encode([{'QRY': 'your query1'}, {'QRY': 'your query2'}])

Inputs that have the key 'QRY' will be passed through the d1 dense layer, while inputs with they key 'DOC' through the d2 dense layer. More documentation on how to design asymmetric models will follow soon.

New Namespace & Models for Cross-Encoder Cross-Encoder are now hosted at https://huggingface.co/cross-encoder. Also, new pre-trained models have been added for: NLI & QNLI.

Logging Log messages now use a custom logger from logging thanks to PR #623. This allows you which log messages you want to see from which components.

Unit tests A lot more unit tests have been added, which test the different components of the framework.

相关地址:原始地址 下载(tar) 下载(zip)

查看:2021-01-04发行的版本