v1.14.0
版本发布时间: 2023-02-28 21:59:45
deepset-ai/haystack最新发布版本:v2.4.0(2024-08-15 17:39:00)
⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in https://github.com/deepset-ai/haystack/pull/3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in https://github.com/deepset-ai/haystack/pull/4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in https://github.com/deepset-ai/haystack/pull/4038
- build: cache nltk models into the docker image by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/3960 - feat: Add page range support to PDF converters. by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in https://github.com/deepset-ai/haystack/pull/4000
- feat: add
Shaper
by @ZanSara in https://github.com/deepset-ai/haystack/pull/3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in https://github.com/deepset-ai/haystack/pull/3885
- fix: make the crawler more robust on Windows by @anakin87 in https://github.com/deepset-ai/haystack/pull/4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in https://github.com/deepset-ai/haystack/pull/4026
- refactor: replace mutable default arguments by @julian-risch in https://github.com/deepset-ai/haystack/pull/4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/4078 - Remove double batching in retrieve_batch by @sjrl in https://github.com/deepset-ai/haystack/pull/4014
- style: Update black by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in https://github.com/deepset-ai/haystack/pull/4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in https://github.com/deepset-ai/haystack/pull/4114
- Docs: Fix code block formatting by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/4175
- feat: Add OpenAIError to retry mechanism by @sjrl in https://github.com/deepset-ai/haystack/pull/4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in https://github.com/deepset-ai/haystack/pull/3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in https://github.com/deepset-ai/haystack/pull/3980 - fix: Add inner query for mysql compatibility by @julian-risch in https://github.com/deepset-ai/haystack/pull/4068
- feat: add support for custom headers by @hsm207 in https://github.com/deepset-ai/haystack/pull/4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in https://github.com/deepset-ai/haystack/pull/4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in https://github.com/deepset-ai/haystack/pull/4113 - refactor: complete the document stores test refactoring by @masci in https://github.com/deepset-ai/haystack/pull/4125
- feat: include testing facilities into haystack package by @masci in https://github.com/deepset-ai/haystack/pull/4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3979
- Docs: Update docstrings by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/4119
- docs: Update Annotation Tool README.md by @bogdankostic in https://github.com/deepset-ai/haystack/pull/4123
- feat: Add model_kwargs option to PromptNode by @sjrl in https://github.com/deepset-ai/haystack/pull/4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in https://github.com/deepset-ai/haystack/pull/4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in https://github.com/deepset-ai/haystack/pull/4155
- feat: Implement
run_batch
for PromptNode by @sjrl in https://github.com/deepset-ai/haystack/pull/4072
Other Changes
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/3929
- fix: change model in distillation test by @ZanSara in https://github.com/deepset-ai/haystack/pull/3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in https://github.com/deepset-ai/haystack/pull/3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in https://github.com/deepset-ai/haystack/pull/3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in https://github.com/deepset-ai/haystack/pull/3930 - Docs: Update ImageToText docstrings by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3966
- ci: Add Docker images testing by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in https://github.com/deepset-ai/haystack/pull/3949
- ci: Fix docker image testing on release by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3976
- Fix: Fix quotation marks by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in https://github.com/deepset-ai/haystack/pull/3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/3977
- Missing import for
TransformersImageToText
by @ZanSara in https://github.com/deepset-ai/haystack/pull/3984 - test: CI on py3.8 by @ZanSara in https://github.com/deepset-ai/haystack/pull/3926
- Simplifies and fix docker images tests on release by @silvanocerza in https://github.com/deepset-ai/haystack/pull/3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in https://github.com/deepset-ai/haystack/pull/3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in https://github.com/deepset-ai/haystack/pull/3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in https://github.com/deepset-ai/haystack/pull/4008
- fix: extend schema for prompt node results by @tstadel in https://github.com/deepset-ai/haystack/pull/3891
- proposal: TableCell by @sjrl in https://github.com/deepset-ai/haystack/pull/3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in https://github.com/deepset-ai/haystack/pull/4016
- ci: Automate release on PyPi by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4015
- ci: Fix PyPi release workflow by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4029
- ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4031
- ci: latest version of pylint is failing, ignore new errors by @masci in https://github.com/deepset-ai/haystack/pull/4035
- ci: Add linting of workflow and related pre-commit hook by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4032
- ci: Fix pylint version to prevent crash by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4043
- ci: Add missing env var in PyPi release slack notification by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4052
- fix: allow Biadaptive & Triadaptive to work with EarlyStopping by @jackapbutler in https://github.com/deepset-ai/haystack/pull/4033
- proposal: Add Agents for extended LLM support by @julian-risch in https://github.com/deepset-ai/haystack/pull/3925
- ci: Fix pylint workflow check running on tests files by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4076
- fix: Add PromptTemplate repr method by @vblagoje in https://github.com/deepset-ai/haystack/pull/4058
- ci: Change actionlint pre-commit hook to use Dockerized tool by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4060
- ci: Make tests run conditionally in CI by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4086
- feat: OpenAI - warn users if
max_tokens
is too short by @anakin87 in https://github.com/deepset-ai/haystack/pull/4094 - Docs: Add shaper to api docs by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/4083
- feat: Update allowed models to be used with Prompt Node by @sjrl in https://github.com/deepset-ai/haystack/pull/4018
- ci: Add missing env vars in rest_api CI tests by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4098
- ci: Fix pylint CI check running with no files by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4097
- Proposal: Add a JsonConverter node by @bglearning in https://github.com/deepset-ai/haystack/pull/3959
- fix: query filters in REST API by @oryx1729 in https://github.com/deepset-ai/haystack/pull/4105
- fix: fix torchaudio version by @mayankjobanputra in https://github.com/deepset-ai/haystack/pull/4102
- ci: Exclude .github folder from triggering tests by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4120
- ci: Add workflow to label PRs that edit docstrings by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4115
- Update PromptTemplate unit tests by @vblagoje in https://github.com/deepset-ai/haystack/pull/4131
- ci: Add load arg to docker/bake-action before testing Docker images by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4124
- Revert changes introduced in PR #4124 by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4137
- ci: Fix Docker images test on release by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4153
- ci: Update docstring-labeler.yml workflow to safely run in PRs from forks by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4146
- Docs: Add filter to hide entity post processor by @agnieszka-m in https://github.com/deepset-ai/haystack/pull/4160
- ci: Use larger runner for Docker release workflow by @silvanocerza in https://github.com/deepset-ai/haystack/pull/4185
- fix: make all OpenAI API params in PromptNode and PromptModel controllable via model_kwargs by @tstadel in https://github.com/deepset-ai/haystack/pull/4183
New Contributors
- @jackapbutler made their first contribution in https://github.com/deepset-ai/haystack/pull/4033
Full Changelog: https://github.com/deepset-ai/haystack/compare/v1.13.2...v1.14.0