magic_pdf-0.10.2-released
版本发布时间: 2024-11-27 18:33:18
opendatalab/MinerU最新发布版本:magic_pdf-0.10.5-released(2024-12-02 14:16:59)
What's Changed
- fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block. by @myhloli in https://github.com/opendatalab/MinerU/pull/1082
- refactor(txt_spans_extract_v2): optimize span processing and OCR logic by @myhloli in https://github.com/opendatalab/MinerU/pull/1086
- feat(ocr): filter out low confidence ocr results by @myhloli in https://github.com/opendatalab/MinerU/pull/1088
- feat(pdf_parse): add OCR score to span data by @myhloli in https://github.com/opendatalab/MinerU/pull/1089
- fix: test_rag by @icecraft in https://github.com/opendatalab/MinerU/pull/1105
- perf(image_processing): reduce maximum image size for analysis by @myhloli in https://github.com/opendatalab/MinerU/pull/1106
- fix: test_tools unittest by @icecraft in https://github.com/opendatalab/MinerU/pull/1104
- refactor(libs): remove unused imports and functions by @myhloli in https://github.com/opendatalab/MinerU/pull/1112
- Feat/add s3 read write example by @icecraft in https://github.com/opendatalab/MinerU/pull/1117
Full Changelog: https://github.com/opendatalab/MinerU/compare/magic_pdf-0.10.1-released...magic_pdf-0.10.2-released
1、 magic_pdf-0.10.2-py3-none-any.whl 970.4KB