0.15.10
版本发布时间: 2024-09-10 20:55:31
Unstructured-IO/unstructured最新发布版本:0.15.12(2024-09-13 22:39:58)
0.15.10
Enhancements
-
Enhance
pdfminer
element cleanup Expand removal ofpdfminer
elements to include those inside allnon-pdfminer
elements, not justtables
. -
Modified analysis drawing tools to dump to files and draw from dumps If the parameter
analysis
of thepartition_pdf
function is set toTrue
, the layout for Object Detection, Pdfminer Extraction, OCR and final layouts will be dumped as json files. The drawers now accept dict (dump) objects instead of internal classes instances. -
Vectorize pdfminer elements deduplication computation. Use
numpy
operations to compute IOU and sub-region membership instead of using simply loop. This improves the speed of deduplicating elements for pages with a lot of elements.