v1.13.0
版本发布时间: 2023-01-27 21:43:53
deepset-ai/haystack最新发布版本:v2.4.0(2024-08-15 17:39:00)
⭐ Highlights
Stop words for PromptNode
Implements stop words on the level of the PromptNode (for all models). Users can specify ‘stop_words’ as PromptNode list parameter, and thus stop LLM text generation once any of the stop words is encountered. Stop words will not be included in the response.
A dedicated Github repository for Haystack demo(s)
The source code for Haystack' Explore the World demo has been moved to a dedicated repository: https://github.com/deepset-ai/haystack-demos. Use this repository to check out the code, run it locally, fork, customize, and contribute!
New nodes: ImageToText
and CsvTextConverter
This release sees two new nodes, both contributed by community members!
The first one is ImageToText
(courtesy of our well-known @anakin87): an image captioning node that can generate description of image files and create Haystack documents from them.
The second one is CsvTextConverter
, from @Benvii: a small utility node that can load a CSV of FAQ question-answer pairs and correctly send them to your DocumentStore, making it super handy for FAQ matching pipelines.
Check out the docs to know more about them and try them out!
Faster tokenization for GPT models with tiktoken
Haystack now supports faster tokenization with OpenAI's tiktoken library, which can dramatically improve tokenization speed for GPT models. For unsupported architectures (Py < 3.8, arm64 and MacOS) fallbacks are in place and regular HuggingFace tokenizers are used. Thanks to @danielbichuetti for yet another amazing contribution!
What's Changed
Breaking Changes
- Migrating to use native Pytorch AMP by @sjrl in https://github.com/deepset-ai/haystack/pull/2827
- bug: consistent batch_size parameter names in distillation by @julian-risch in https://github.com/deepset-ai/haystack/pull/3811
- refactor: Move invocation_context from meta to own pipeline variable by @vblagoje in https://github.com/deepset-ai/haystack/pull/3888
Pipeline
- feat: Update cohere embedding models by @vblagoje in https://github.com/deepset-ai/haystack/pull/3704
- feat: add
index
parameter toTfidfRetriever
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3666 - feat: Use torch.inference_mode() for TableQA by @sjrl in https://github.com/deepset-ai/haystack/pull/3731
- feat: Enable text-embedding-ada-002 for EmbeddingRetriever by @vblagoje in https://github.com/deepset-ai/haystack/pull/3721
- refactor: improve monkey patch for
SklearnQueryClassifier
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3732 - refactor: remove unused code in
TfidfRetriever
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3733 - refactor: Remove duplicate code in TableReader by @sjrl in https://github.com/deepset-ai/haystack/pull/3708
- fix: Make
InferenceProcessor
thread safe by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3709 - chore: adding template for prompt node by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/3738
- fix: Fixed local reader model loading by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/3663
- fix: Fix
predict_batch
inTransformersReader
for single nested Document list by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3748 - feat: change PipelineConfigError to DocumentStoreError with more details by @julian-risch in https://github.com/deepset-ai/haystack/pull/3783
- bug: skip empty documents in reader by @julian-risch in https://github.com/deepset-ai/haystack/pull/3773
- fix: linefeeds in custom_query by @tstadel in https://github.com/deepset-ai/haystack/pull/3813
- fix: Convert table cells to strings for compatibility with TableReader by @sjrl in https://github.com/deepset-ai/haystack/pull/3762
- fix: Ensure eval mode for TableReader model for predictions by @sjrl in https://github.com/deepset-ai/haystack/pull/3743
- fix: gracefully handle
FileExistsError
duringPreprocessor
resource download by @wochinge in https://github.com/deepset-ai/haystack/pull/3816 - fix: make the crawler runnable and testable on Windows by @anakin87 in https://github.com/deepset-ai/haystack/pull/3830
- fix: ignore non-serializable params when hashing pipeline objects by @masci in https://github.com/deepset-ai/haystack/pull/3842
- feat: preprocessor raises warning when doc length exceeds threshold by @ZanSara in https://github.com/deepset-ai/haystack/pull/3837
- fix: remove string validation in YAML by @ZanSara in https://github.com/deepset-ai/haystack/pull/3854
- feat: Use truncate option for Cohere.embed by @sjrl in https://github.com/deepset-ai/haystack/pull/3865
- feat:
ImageToText
(caption generator) by @anakin87 in https://github.com/deepset-ai/haystack/pull/3859 - fix: Remove double super class init from ParsrConverter init by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3896
- feat: store
id_hash_keys
inDocument
objects to make documents clonable by @ZanSara in https://github.com/deepset-ai/haystack/pull/3697 - feat: adding the ability to use Ray Serve async functionality by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3769
- feat: support cl100k_base tokenization and increase performance for GPT2 by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/3897
- fix: Fix number of concurrent requests in RequestLimiter by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3705
- feat: Run commands inside docker container as a non root user by @vblagoje in https://github.com/deepset-ai/haystack/pull/3702
- fix: Removed overlooked torch scatter references by @sjrl in https://github.com/deepset-ai/haystack/pull/3719
- build: upgrade torch and let transformers pick the version by @julian-risch in https://github.com/deepset-ai/haystack/pull/3727
- feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in https://github.com/deepset-ai/haystack/pull/3667
- refactor: remove deprecated parameters from
Summarizer
by @anakin87 in https://github.com/deepset-ai/haystack/pull/3740 - refactor: Using
with open()
to read files by @sjrl in https://github.com/deepset-ai/haystack/pull/3787 - feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda by @vblagoje in https://github.com/deepset-ai/haystack/pull/3701
- fix: pin protobuf version by @masci in https://github.com/deepset-ai/haystack/pull/3789
- fix(docker): Use IMAGE_NAME in api image by @FabianHertwig in https://github.com/deepset-ai/haystack/pull/3786
- bug: Fix launch_milvus() by cd'ing to milvus_dir by @t0r0id in https://github.com/deepset-ai/haystack/pull/3795
- refactor: Change PromptNode registered templates from per class to per instance by @vblagoje in https://github.com/deepset-ai/haystack/pull/3810
- bug: The
PromptNode
handles all parameters as lists without checking if they are in fact lists by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3820 - feat: update the docker image for haystack-api service by @bilgeyucel in https://github.com/deepset-ai/haystack/pull/3835
- refactor: Simplify PromptTemplate substitution in PromptNode by @vblagoje in https://github.com/deepset-ai/haystack/pull/3876
- feat: PromptNode - implement stop words by @vblagoje in https://github.com/deepset-ai/haystack/pull/3884
- feat: Add retry with exponential back-off to PromptNode's OpenAI models by @vblagoje in https://github.com/deepset-ai/haystack/pull/3886
- chore: Add timeouts to external requests calls by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3895
- feat: Add
CsvTextConverter
by @Benvii in https://github.com/deepset-ai/haystack/pull/3587 - refactor: Improve stop_words handling, add unit test cases by @vblagoje in https://github.com/deepset-ai/haystack/pull/3918
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict #3872
Models
- fix: adjust max token size for openai ADA-v2 embeddings by @LeoGitGuy in https://github.com/deepset-ai/haystack/pull/3793
- feat: make new sklearn models default in QueryClassifier by @julian-risch in https://github.com/deepset-ai/haystack/pull/3777
DocumentStores
- Fixing broken BM25 support with Weaviate - fixes #3720 by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3723
- feat: make
score_script
first class citizen viaknn_engine
param by @tstadel in https://github.com/deepset-ai/haystack/pull/3284 - bug: skip validating empty embeddings by @julian-risch in https://github.com/deepset-ai/haystack/pull/3774
- fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field by @tstadel in https://github.com/deepset-ai/haystack/pull/3662
- fix: upgrade
launch_es()
to the version used in CI by @ZanSara in https://github.com/deepset-ai/haystack/pull/3858 - Adding condition to
pinecone
object. by @AI-Ahmed in https://github.com/deepset-ai/haystack/pull/3768 - fix: Allowing InMemStore and FAISSDocStore for indexing using single worker by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/3868
- fix: authenticate with aws4auth if set in OpenSearchDocumentStore by @FabianHertwig in https://github.com/deepset-ai/haystack/pull/3741
- Fixing the
query_batch
method of the deepsetcloud document store - … by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3724 - feat: add HA support for Weaviate by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3764
UI / Demo
- refactor: remove haystack demo along with deprecated Dockerfiles by @masci in https://github.com/deepset-ai/haystack/pull/3829
Documentation
- docs: Add info where the feedback is stored by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3772
- bug: fix the docs rest api reference url by @bilgeyucel in https://github.com/deepset-ai/haystack/pull/3775
- Docs: Update FAISSDocStore load and save descriptions by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3808
- fix: Add missing docstrings to PromptNode, PromptTemplate and PromptModel by @vblagoje in https://github.com/deepset-ai/haystack/pull/3821
- docs:
OpensearchDocumentStore
docstring by @ZanSara in https://github.com/deepset-ai/haystack/pull/3904
Other Changes
- proposal: Create a dedicated Github repository for Haystack demos by @masci in https://github.com/deepset-ai/haystack/pull/3695
- fix: build
pdftotext
from sources by @masci in https://github.com/deepset-ai/haystack/pull/3746 - fix: Trigger pipeline schema update on tagged releases by @askainet in https://github.com/deepset-ai/haystack/pull/3752
- ci: Add newline when generating OpenAPI specs by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3782
- test: Improve robustness of PromptNode unit tests by @vblagoje in https://github.com/deepset-ai/haystack/pull/3747
- feat: utility function to explicitly invoke JSON schema generation by @masci in https://github.com/deepset-ai/haystack/pull/3798
- fix: prevent posthog from sending errors to stderr by @julian-risch in https://github.com/deepset-ai/haystack/pull/4008
New Contributors
- @FabianHertwig made their first contribution in https://github.com/deepset-ai/haystack/pull/3786
- @t0r0id made their first contribution in https://github.com/deepset-ai/haystack/pull/3795
- @LeoGitGuy made their first contribution in https://github.com/deepset-ai/haystack/pull/3793
- @Benvii made their first contribution in https://github.com/deepset-ai/haystack/pull/3638
Full Changelog: https://github.com/deepset-ai/haystack/compare/v1.12.2...v1.13.0rc1