0.12.3
版本发布时间: 2024-01-29 22:41:50
Unstructured-IO/unstructured最新发布版本:0.15.12(2024-09-13 22:39:58)
Enhancements
-
Driver for MongoDB connector. Adds a driver with
unstructured
version information to the MongoDB connector.
Features
-
Add Databricks Volumes destination connector Databricks Volumes connector added to ingest CLI. Users may now use
unstructured-ingest
to write partitioned data to a Databricks Volumes storage service.
Fixes
- Fix support for different Chipper versions and prevent running PDFMiner with Chipper
- Treat YAML files as text. Adds YAML MIME types to the file detection code and treats those files as text.
-
Fix FSSpec destination connectors check_connection. FSSpec destination connectors did not use
check_connection
. There was an error when trying tols
destination directory - it may not exist at the moment of connector creation. Nowcheck_connection
callsls
on bucket root and this method is called oninitialize
of destination connector. -
Fix databricks-volumes extra location.
setup.py
is currently pointing to the wrong location for the databricks-volumes extra requirements. This results in errors when trying to build the wheel for unstructured. This change updates to point to the correct path. - Fix uploading None values to Chroma and Pinecone. Removes keys with None values with Pinecone and Chroma destinations. Pins Pinecone dependency
- Update documentation. (i) best practice for table extration by using 'skip_infer_table_types' param, instead of 'pdf_infer_table_structure', and (ii) fixed CSS, RST issues and typo in the documentation.
- Fix postgres storage of link_texts. Formatting of link_texts was breaking metadata storage.