0.12.6
版本发布时间: 2024-03-09 02:24:28
Unstructured-IO/unstructured最新发布版本:0.13.7(2024-05-09 01:28:21)
0.12.6
Enhancements
-
Improve ability to capture embedded links in
partition_pdf()
forfast
strategy Previously, a threshold value that affects the capture of embedded links was set to a fixed value by default. This allows users to specify the threshold value for better capturing. -
Refactor
add_chunking_strategy
decorator to dispatch by name. Addchunk()
function to be used by theadd_chunking_strategy
decorator to dispatch chunking call based on a chunking-strategy name (that can be dynamic at runtime). This decouples chunking dispatch from only those chunkers known at "compile" time and enables runtime registration of custom chunkers.
Features
- Added Unstructured Platform Documentation The Unstructured Platform is currently in beta. The documentation provides how-to guides for setting up workflow automation, job scheduling, and configuring source and destination connectors.
Fixes
-
Partitioning raises on file-like object with
.name
not a local file path. When partitioning a file using thefile=
argument, andfile
is a file-like object (e.g. io.BytesIO) having a.name
attribute, and the value offile.name
is not a valid path to a file present on the local filesystem,FileNotFoundError
is raised. This prevents use of thefile.name
attribute for downstream purposes to, for example, describe the source of a document retrieved from a network location via HTTP. - Fix SharePoint dates with inconsistent formatting Adds logic to conditionally support dates returned by office365 that may vary in date formatting or may be a datetime rather than a string.
-
Include warnings about the potential risk of installing a version of
pandoc
which does not support RTF files + instructions that will help resolve that issue. -
Incorporate the
install-pandoc
Makefile recipe into relevant stages of CI workflow, ensuring it is a version that supports RTF input files. - Fix Google Drive source key Allow passing string for source connector key.
-
Fix table structure evaluations calculations Replaced special value
-1.0
withnp.nan
and corrected rows filtering of files metrics basing on that. - Fix Sharepoint-with-permissions test Ignore permissions metadata, update test.
- Fix table structure evaluations for edge case Fixes the issue when the prediction does not contain any table - no longer errors in such case.