0.12.4
版本发布时间: 2024-02-09 06:17:40
Unstructured-IO/unstructured最新发布版本:0.13.7(2024-05-09 01:28:21)
0.12.4
Enhancements
-
Apply New Version of
black
formatting Theblack
library recently introduced a new major version that introduces new formatting conventions. This change brings code in theunstructured
repo into compliance with the new conventions. - Move ingest imports to local scopes Moved ingest dependencies into local scopes to be able to import ingest connector classes without the need of installing imported external dependencies. This allows lightweight use of the classes (not the instances. to use the instances as intended you'll still need the dependencies).
-
Add support for
.p7s
filespartition_email
can now process.p7s
files. The signature for the signed message is extracted and added to metadata. -
Fallback to valid content types for emails If the user selected content type does not exist on the email message,
partition_email
now falls back to anoter valid content type if it's available.
Features
- Add .heic file partitioning .heic image files were previously unsupported and are now supported though partition_image()
-
Add the ability to specify an alternate OCR implementation by implementing an
OCRAgent
interface and specify it usingOCR_AGENT
environment variable. - Add Vectara destination connector Adds support for writing partitioned documents into a Vectara index.
Fixes
-
Fix
partition_pdf()
not working when using chipper model withfile
-
Handle common incorrect arguments for
languages
andocr_languages
Users are regularly receiving errors on the API because they are definingocr_languages
orlanguages
with additional quotationmarks, brackets, and similar mistakes. This update handles common incorrect arguments and raises an appropriate warning. -
Default
hi_res_model_name
now relies onunstructured-inference
When no explicithi_res_model_name
is passed intopartition
orpartition_pdf_or_image
the default model is picked byunstructured-inference
's settings or os env variableUNSTRUCTURED_HI_RES_MODEL_NAME
; it now returns the same model name regardless ofinfer_table_structure
's value; this function will be deprecated in the future and the default model name will simply rely onunstructured-inference
and will not consider os env in a future release. - Fix remove Vectara requirements from setup.py - there are no dependencies
- Add missing dependency files to package manifest. Updates the file path for the ingest dependencies and adds missing extra dependencies.
- Fix remove Vectara requirements from setup.py - there are no dependencies
- Add title to Vectara upload - was not separated out from initial connector
- Fix change OpenSearch port to fix potential conflict with Elasticsearch in ingest test