0.7.6
版本发布时间: 2023-06-16 23:09:44
Unstructured-IO/unstructured最新发布版本:0.15.12(2024-09-13 22:39:58)
0.7.6
Enhancements
- Convert fast startegy to ocr_only for images
- Adds support for page numbers in
.docx
and.doc
when user or renderer created page breaks are present. - Adds retry logic for the unstructured-ingest Biomed connector
Features
- Provides users with the ability to extract additional metadata via regex.
- Updates
partition_docx
to include headers and footers in the output. - Create
partition_tsv
and associated tests. Make additional changes todetect_filetype
.
Fixes
- Remove fake api key in test
partition_via_api
since we now require valid/empty api keys - Page number defaults to
None
instead of1
when page number is not present in the metadata. A page number ofNone
indicates that page numbers are not being tracked for the document or that page numbers do not apply to the element in question.. - Fixes an issue with some pptx files. Assume pptx shapes are found in top left position of slide
in case the shape.top and shape.left attributes are
None
.