0.12.5
版本发布时间: 2024-02-27 06:37:57
Unstructured-IO/unstructured最新发布版本:0.13.7(2024-05-09 01:28:21)
0.12.5
Features
-
Header and footer detection for fast strategy
partition_pdf
withfast
strategy now detects elements that are in the top or bottom 5 percent of the page as headers and footers. -
Add parent_element to overlapping case output Adds parent_element to the output for
identify_overlapping_or_nesting_case
andcatch_overlapping_and_nested_bboxes
functions. - Add table structure evaluation Adds a new function to evaluate the structure of a table and return a metric that represents the quality of the table structure. This function is used to evaluate the quality of the table structure and the table contents.
- Add AstraDB destination connector Adds support for writing embedded documents into an AstraDB vector database.
Fixes
-
Fix passing list type parameters when calling unstructured API via
partition_via_api()
Updatepartition_via_api()
to convert all list type parameters to JSON formatted strings before calling the unstructured client SDK. This will support image block extraction viapartition_via_api()
. - Add OctoAI embedder Adds support for embeddings via OctoAI.
-
Fix
check_connection
in opensearch, databricks, postgres, azure connectors - **Fix don't treat plain text files with double quotes as JSON ** If a file can be deserialized as JSON but it deserializes as a string, treat it as plain text even though it's valid JSON.
- **Fix
check_connection
in opensearch, databricks, postgres, azure connectors ** -
Fix cluster of bugs in
partition_xlsx()
that dropped content. Algorithm for detecting "subtables" within a worksheet dropped table elements for certain patterns of populated cells such as when a trailing single-cell row appeared in a contiguous block of populated cells. -
Improved documentation. Fixed broken links and improved readability on
Key Concepts
page. - **Rename
OpenAiEmbeddingConfig
toOpenAIEmbeddingConfig
. -
Fix partition_json() doesn't chunk. The
@add_chunking_strategy
decorator was missing frompartition_json()
such that pre-partitioned documents serialized to JSON did not chunk when a chunking-strategy was specified.