v1.12.0rc1
版本发布时间: 2022-12-19 17:40:16
deepset-ai/haystack最新发布版本:v2.4.0(2024-08-15 17:39:00)
⭐ Highlights
Large Language Models with PromptNode
Introducing PromptNode
, a new feature that brings the power of large language models (LLMs) to various NLP tasks. PromptNode
is an easy-to-use, customizable node you can run on its own or in a pipeline. We've designed the API to be user-friendly and suitable for everyday experimentation, but also fully compatible with production-grade Haystack deployments.
By setting a prompt template for a PromptNode
you define what task you want it to do. This way, you can have multiple PromptNode
s in your pipeline, each performing a different task. But that's not all. You can also inject the output of one PromptNode
into the input of another one.
Out of the box, we support both Google T5 Flan and OpenAI GPT-3 models, and you can even mix and match these models in your pipelines.
from haystack.nodes.prompt import PromptNode
# Initialize the node:
prompt_node = PromptNode("google/flan-t5-base") # try also 'text-davinci-003' if you have an OpenAI key
prompt_node("What is the capital of Germany?")
This node can do a lot more than simply querying LLMs: they can manage prompt templates, run batches, share models among instances, be chained together in pipelines, and more. Check its documentation for details!
Support for BM25Retriever
in InMemoryDocumentStore
InMemoryDocumentStore
has always been the go-to document store for small prototypes. The addition of BM25 support makes it officially one of the document stores to support all Retrievers available to Haystack, just like FAISS and Elasticsearch-like stores, but without the external dependencies. Don't use it in your million-documents-throughput deployments to production, though. It's not the fastest document store out there.
:trophy: Honorable mention to @anakin87 for this outstanding contribution, among many many others! :trophy:
Haystack is always open to external contributions, and every little bit is appreciated. Don't know where to start? Have a look at the Contributors Guidelines.
Extended support for Cohere and OpenAI embeddings
We enabled EmbeddingRetriever
to use the latest Cohere multilingual embedding models and OpenAI embedding models.
Simply use the model's full name (along with your API key) in EmbeddingRetriever
to get them:
# Cohere
retriever = EmbeddingRetriever(embedding_model="multilingual-22-12", batch_size=16, api_key=api_key)
# OpenAI
retriever = EmbeddingRetriever(embedding_model="text-embedding-ada-002", batch_size=32, api_key=api_key, max_seq_len=8191)
Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)
Whenever you need to execute multiple dense searches at once, ElasticsearchDocumentStore
and OpenSearchDocumentStore
can now do it in parallel. This not only speeds up run_batch
and eval_batch
for dense pipelines when used with those document stores but also significantly speeds up multi-embedding retrieval pipelines like, for example, MostSimilarDocumentsPipeline
.
For this, we measured a speed up of up to 49% on a realistic dataset.
Under the hood, our newly introduced query_by_embedding_batch
document store function uses msearch
to unchain the full power of your Elasticsearch/OpenSearch cluster.
:warning: Deprecated Docker images discontinued
1.12 is the last release we're shipping with the old Docker images deepset/haystack-cpu
, deepset/haystack-gpu
, and their relative tags. We'll remove the corresponding, deprecated Docker files /Dockerfile
, /Dockerfile-GPU
, and /Dockerfile-GPU-minimal
from the codebase after the release.
What's Changed
Pipeline
- fix:
ParsrConverter
fails on pages without text by @anakin87 in https://github.com/deepset-ai/haystack/pull/3605 - fix: Convert eval metrics to python float by @tstadel in https://github.com/deepset-ai/haystack/pull/3612
- feat: add support for
BM25Retriever
inInMemoryDocumentStore
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3561 - chore: fix return type of
aggregate_labels
by @tstadel in https://github.com/deepset-ai/haystack/pull/3617 - refactor: change MultiModal retriever to be of type DenseRetriever by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/3598
- fix: Move entire forward pass of TableQA within
torch.no_grad()
by @sjrl in https://github.com/deepset-ai/haystack/pull/3636 - feat: add offsets_in_context to evaluation result by @julian-risch in https://github.com/deepset-ai/haystack/pull/3640
- bug: Use tqdm auto instead of plain tqdm by @vblagoje in https://github.com/deepset-ai/haystack/pull/3672
- fix: monkey patch for
SklearnQueryClassifier
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3678 - feat: Update table reader tests to check the answer scores by @sjrl in https://github.com/deepset-ai/haystack/pull/3641
- feat: Adds all_terms_must_match parameter to BM25Retriever at runtime by @ugm2 in https://github.com/deepset-ai/haystack/pull/3627
- fix: fix PreProcessor
split_by
schema by @ZanSara in https://github.com/deepset-ai/haystack/pull/3680 - refactor: Generate JSON schema when missing by @masci in https://github.com/deepset-ai/haystack/pull/3533
- refactor: replace
torch.no_grad
withtorch.inference_mode
(where possible) by @anakin87 in https://github.com/deepset-ai/haystack/pull/3601 - Adjust get_type() method for pipelines by @vblagoje in https://github.com/deepset-ai/haystack/pull/3657
- refactor: improve Multilabel design by @anakin87 in https://github.com/deepset-ai/haystack/pull/3658
- feat: Update cohere embedding models #3704 by @vblagoje https://github.com/deepset-ai/haystack/pull/3704
- feat: Enable
text-embedding-ada-002
forEmbeddingRetriever
#3721 by @vblagoje https://github.com/deepset-ai/haystack/pull/3721
DocumentStores
- fix: Flatten
DocumentClassifier
output inSQLDocumentStore
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3273 - refactor: move milvus tests to their own module by @masci in https://github.com/deepset-ai/haystack/pull/3596
- feat: store metadata using JSON in SQLDocumentStore by @masci in https://github.com/deepset-ai/haystack/pull/3547
- fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in https://github.com/deepset-ai/haystack/pull/3603
- refactor: Move
InMemoryDocumentStore
tests to their own class by @masci in https://github.com/deepset-ai/haystack/pull/3614 - chore: remove redundant tests by @masci in https://github.com/deepset-ai/haystack/pull/3620
- refactor: Weaviate query with filters by @ZanSara in https://github.com/deepset-ai/haystack/pull/3628
- fix: use 9200 as the default port in
launch_opensearch()
by @masci in https://github.com/deepset-ai/haystack/pull/3630 - fix: revert Weaviate query with filters and improve tests by @ZanSara in https://github.com/deepset-ai/haystack/pull/3646
- feat: add query_by_embedding_batch by @tstadel in https://github.com/deepset-ai/haystack/pull/3546
- refactor: filters type by @tstadel in https://github.com/deepset-ai/haystack/pull/3682
- fix: pinecone metadata format by @jamescalam in https://github.com/deepset-ai/haystack/pull/3660
- fix: fixing broken BM25 support with Weaviate - fixes #3720 #3723 by @zoltan-fedor https://github.com/deepset-ai/haystack/pull/3723
Documentation
- fix: fixing the url for document merger by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/3615
- docs: Reformat code blocks in docstrings by @brandenchan in https://github.com/deepset-ai/haystack/pull/3580
Contributors to Tutorials
- fix: Tutorial 2, finetune a model, distillation code by Benvii https://github.com/deepset-ai/haystack-tutorials/pull/69
- chore: Update 01_Basic_QA_Pipeline.ipynb by gsajko https://github.com/deepset-ai/haystack-tutorials/pull/63
Other Changes
- test: add test to check id_hash_keys is not ignored by @julian-risch in https://github.com/deepset-ai/haystack/pull/3577
- fix: remove
beir
fromall-gpu
by @ZanSara in https://github.com/deepset-ai/haystack/pull/3669 - feat: Update DocumentMerger and TextIndexingPipeline imports by @brandenchan in https://github.com/deepset-ai/haystack/pull/3599
- fix: pin
espnet
in theaudio
extra by @ZanSara in https://github.com/deepset-ai/haystack/pull/3693 - refactor: update Squad data by @espoirMur in https://github.com/deepset-ai/haystack/pull/3513
- Update CONTRIBUTING.md by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/3624
- fix: revamp
colab
extra dependencies by @masci in https://github.com/deepset-ai/haystack/pull/3626 - refactor: remove
test
extra by @ZanSara in https://github.com/deepset-ai/haystack/pull/3679 - fix: remove beir from the base GPU image by @ZanSara in https://github.com/deepset-ai/haystack/pull/3692
- feat: Bump transformers version to remove torch scatter dependency by @sjrl in https://github.com/deepset-ai/haystack/pull/3703
New Contributors
- @espoirMur made their first contribution in https://github.com/deepset-ai/haystack/pull/3513
Full Changelog: https://github.com/deepset-ai/haystack/compare/v1.11.1...v1.12.0rc1