v2.0.0-beta.6
版本发布时间: 2024-02-05 23:51:04
deepset-ai/haystack最新发布版本:v2.4.0(2024-08-15 17:39:00)
Release Notes
v2.0.0-beta.6
⬆️ Upgrade Notes
-
Upgraded the default converter in PyPDFToDocument to insert page breaks "f" between each extracted page. This allows for downstream components and applications to better be able to keep track of the original PDF page a portion of text comes from.
-
⚠️ Breaking change: Update secret handling for components using the
Secret
type. The following components are affected:RemoteWhisperTranscriber
,AzureOCRDocumentConverter
,AzureOpenAIDocumentEmbedder
,AzureOpenAITextEmbedder
,HuggingFaceTEIDocumentEmbedder
,HuggingFaceTEITextEmbedder
,OpenAIDocumentEmbedder
,SentenceTransformersDocumentEmbedder
,SentenceTransformersTextEmbedder
,AzureOpenAIGenerator
,AzureOpenAIChatGenerator
,HuggingFaceLocalChatGenerator
,HuggingFaceTGIChatGenerator
,OpenAIChatGenerator
,HuggingFaceLocalGenerator
,HuggingFaceTGIGenerator
,OpenAIGenerator
,TransformersSimilarityRanker
,SearchApiWebSearch
,SerperDevWebSearch
The default init parameters for
api_key
,token
,azure_ad_token
have been adjusted to use environment variables wherever possible. Theazure_ad_token_provider
parameter has been removed from Azure-based components. Components based on Hugging Face are now required to either use a token or an environment variable if authentication is required - The on-disk local token file is no longer supported.
Required actions to take: To make fixes to accommodate to this breaking change check the expected environment variable name for the
api_key
of the affected component you are using. Make sure to provide your API keys via this environment variable. Alternatively, if that's not an option, use theSecret.from_token
function to wrap any bare/string API tokens. Mind that pipelines using token secrets cannot be serialized/deserialized.
🚀 New Features
-
Expose a
Secret
type to provide consistent API for any component that requires secrets for authentication. Currently supports string tokens and environment variables. Token-based secrets are automatically prevented from being serialized to disk (to prevent accidental leakage of secrets).from haystack.utils import Secret @component class MyComponent: def __init__(self, api_key: Optional[Secret] = None, **kwargs): self.api_key = api_key self.backend = None def warm_up(self): # Call resolve_value to yield a single result. The semantics of the result is policy-dependent. # Currently, all supported policies will return a single string token. self.backend = SomeBackend(api_key=self.api_key.resolve_value() if self.api_key else None, ...) def to_dict(self): # Serialize the policy like any other (custom) data. If the policy is token-based, it will # raise an error. return default_to_dict(self, api_key=self.api_key.to_dict() if self.api_key else None, ...) @classmethod def from_dict(cls, data): # Deserialize the policy data before passing it to the generic from_dict function. api_key_data = data["init_parameters"]["api_key"] api_key = Secret.from_dict(api_key_data) if api_key_data is not None else None data["init_parameters"]["api_key"] = api_key return default_from_dict(cls, data) # No authentication. component = MyComponent(api_key=None) # Token based authentication component = MyComponent(api_key=Secret.from_token("sk-randomAPIkeyasdsa32ekasd32e")) component.to_dict() # Error! Can't serialize authentication tokens # Environment variable based authentication component = MyComponent(api_key=Secret.from_env("OPENAI_API_KEY")) component.to_dict() # This is fine
-
Adds support for the Exact Match metric to
EvaluationResult.calculate_metrics(...)
:from haystack.evaluation.metrics import Metric exact_match_metric = eval_result.calculate_metrics(Metric.EM, output_key="answers")
-
Adds support for the F1 metric to
EvaluationResult.calculate_metrics(...)
:from haystack.evaluation.metrics import Metric f1_metric = eval_result.calculate_metrics(Metric.F1, output_key="answers")
-
Adds support for the Semantic Answer Similarity (SAS) metric to
EvaluationResult.calculate_metrics(...)
:from haystack.evaluation.metrics import Metric sas_metric = eval_result.calculate_metrics( Metric.SAS, output_key="answers", model="sentence-transformers/paraphrase-multilingual-mpnet-base-v2" )
-
Introducing the
HuggingFaceLocalChatGenerator
, a new chat-based generator designed for leveraging chat models from Hugging Face's (HF) model hub. Users can now perform inference with chat-based models in a local runtime, utilizing familiar HF generation parameters, stop words, and even employing custom chat templates for custom message formatting. This component also supports streaming responses and is optimized for compatibility with a variety of devices.Here is an example of how to use the
HuggingFaceLocalChatGenerator
:from haystack.components.generators.chat import HuggingFaceLocalChatGenerator from haystack.dataclasses import ChatMessage generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta") generator.warm_up() messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] print(generator.run(messages))
⚡️ Enhancement Notes
- Change
Pipeline.add_component()
to fail if theComponent
instance has already been added in anotherPipeline
. - Use
device_map
when loading aTransformersSimilarityRanker
andExtractiveReader
. This allows for multi-device inference and for loading quantized models (e.g.load_in_8bit=True
) - Add meta parameter to
ByteStream.from_file_path()
andByteStream.from_string()
. - Add query and document prefix options for the
TransformerSimilarityRanker
- The default in
default_streaming_callback
was confusing, this function was the go-to-helper one would use to quickly print the generated tokens as they come, but it was not used by default. The function was then renamed toprint_streaming_chunk.
- Speed up import of Document dataclass. Importing Document was slowed down cause we were importing the whole
pandas
andnumpy
packages. This has now been changed to import only the necessary classes and functions. - Introduces weighted score normalization for the
DocumentJoiner
's reciprocal rank fusion, enhancing the relevance of document sorting by allowing customizable influence on the final scores
🐛 Bug Fixes
- Fix auto-complete never working for any
Component
- Fix Haystack imports failing when using local development environment that doesn't have
haystack-ai
installed. - Remove all mentions of Canals by renaming some variables.
__canals_input__
and__canals_ouput__
have been renamed respectively to__haystack_input__
and__haystack_ouput__
.CANALS_VARIADIC_ANNOTATION
has been renamed toHAYSTACK_VARIADIC_ANNOTATION
and it's value changed from__canals__variadic_t
to__haystack__variadic_t
. Default Pipelinedebug_path
has been changed from.canals_debug
to.haystack_debug
.