v2.0.0-beta.4
版本发布时间: 2024-01-08 19:30:38
deepset-ai/haystack最新发布版本:v2.4.0(2024-08-15 17:39:00)
Release Notes
v2.0.0-beta.4
⬆️ Upgrade Notes
- If you have a LocalWhisperTranscriber in a pipeline, change the audio_files input name to sources. Similarly for standalone invocation of the component, pass sources instead of audio_files to the run() method.
🚀 New Features
-
Add HuggingFace TEI Embedders - HuggingFaceTEITextEmbedder and HuggingFaceTEIDocumentEmbedder.
An example using HuggingFaceTEITextEmbedder to embed a string:
from haystack.components.embedders import HuggingFaceTEITextEmbedder text_to_embed = "I love pizza!" text_embedder = HuggingFaceTEITextEmbedder(model="BAAI/bge-small-en-v1.5", url="<your-tei-endpoint-url>", token="<your-token>" ) print(text_embedder.run(text_to_embed)) # {'embedding': [0.017020374536514282, -0.023255806416273117, ...]
An example using HuggingFaceTEIDocumentEmbedder to create Document embeddings:
from haystack.dataclasses import Document from haystack.components.embedders import HuggingFaceTEIDocumentEmbedder doc = Document(content="I love pizza!") document_embedder = HuggingFaceTEIDocumentEmbedder( model="BAAI/bge-small-en-v1.5", url="<your-tei-endpoint-url>", token="<your-token>" ) result = document_embedder.run([doc]) print(result["documents"][0].embedding) # [0.017020374536514282, -0.023255806416273117, ...]
-
Adds AzureOpenAIDocumentEmbedder and AzureOpenAITextEmbedder as new embedders. These embedders are very similar to their OpenAI counterparts, but they use the Azure API instead of the OpenAI API.
-
Adds support for Azure OpenAI models with AzureOpenAIGenerator and AzureOpenAIChatGenerator components.
-
Adds RAG OpenAPI services integration.
-
Introduces answer deduplication on the Document level based on an overlap threshold.
-
Add Multiplexer. For an example of its usage, see https://github.com/deepset-ai/haystack/pull/6420.
-
Adds support for single metadata dictionary input in TextFileToDocument`.
⚡️ Enhancement Notes
- Add support for ByteStream to LocalWhisperTranscriber and uniform the input socket names to the other components in Haystack.
- Rename metadata to meta. Rename metadata_fields_to_embed to meta_fields_to_embed in all Embedders. Rename metadata_field to meta_field in MetaFieldRanker.
- Rename all metadata references to meta.
- Change DocumentWriter default policy from DuplicatePolicy.FAIL to DuplicatePolicy.NONE. The DocumentStore protocol uses the same default so that different Document Stores can choose the default policy that better fit.
- Move serialize_type and deserialize_type in the utils module.
- The HTMLToDocument converter now allows choosing the boilerpy3 extractor to extract the content from the HTML document. The default extractor has been changed to DefaultExtractor, which is better for generic use cases than the previous default (ArticleExtractor).
- Adds scale_score, which allows users to toggle if they would like their document scores to be raw logits or scaled between 0 and 1 (using the sigmoid function). This is a feature that already existed in Haystack v1 that is being moved over. Adds calibration_factor. This follows the example from the ExtractiveReader which allows the user to better control the spread of scores when scaling the score using sigmoid. Adds score_threshold. Also copied from the ExtractiveReader. This optionally allows users to set a score threshold where only documents with a score above this threshold are returned.
- Add RAG self correction loop example
- Adds support for single metadata dictionary input in HTMLToDocument.
- Adds support for single metadata dictionary input in PyPDFToDocument.
- Split DynamicPromptBuilder into DynamicPromptBuilder and DynamicChatPromptBuilder
- Depend on our own rank_bm25 fork.
- Add meta_fields_to_embed following the implementation in SentenceTransformersDocumentEmbedder to be able to embed meta fields along with the content of the document.
- Add new variable model_kwargs to the TransformersSimilarityRanker so we can pass different loading options supported by HuggingFace. Add device availability checking if the user passes in None to the device init param. Ranking goes, GPU, MPS, CPU.
- Update OpenAIChatGenerator to handle both tools and functions calling. OpenAIChatGenerator now supports both tools and functions generation_kwargs parameters that enable function/tools invocation.
- Upgrade to OpenAI client version 1.x
⚠️ Deprecation Notes
- Deprecate GPTGenerator and GPTChatGenerator. Replace them with OpenAIGenerator and OpenAIChatGenerator.
🐛 Bug Fixes
- Fix Pipeline.connect() so it connects sockets with same name if multiple sockets with compatible types are found.