magic_pdf-0.10.3-released
版本发布时间: 2024-11-29 16:05:05
opendatalab/MinerU最新发布版本:magic_pdf-0.10.6-released(2024-12-11 18:58:21)
What's Changed
- fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text by @myhloli in https://github.com/opendatalab/MinerU/pull/1132
- refactor(para): improve language detection and block splitting by @myhloli in https://github.com/opendatalab/MinerU/pull/1134
- feat(pdf_parse): filter out skewed text lines by @myhloli in https://github.com/opendatalab/MinerU/pull/1135
- refactor(ocr): improve text processing and span handling by @myhloli in https://github.com/opendatalab/MinerU/pull/1136
- refactor(pdf_check): improve character detection using PyMuPDF by @myhloli in https://github.com/opendatalab/MinerU/pull/1137
- feat(pdf_parse): add line start flag detection and optimize line stop flag logic by @myhloli in https://github.com/opendatalab/MinerU/pull/1138
- fix(ocr_mkcontent): handle empty paragraphs on pages by @myhloli in https://github.com/opendatalab/MinerU/pull/1139
- refactor(pdf_parse): adjust character-axis alignment algorithm by @myhloli in https://github.com/opendatalab/MinerU/pull/1140
- refactor(ocr): Fix the error of paddleocr failing to initialize in a multi-threaded environment by @myhloli in https://github.com/opendatalab/MinerU/pull/1141
Full Changelog: https://github.com/opendatalab/MinerU/compare/magic_pdf-0.10.2-released...magic_pdf-0.10.3-released
1、 magic_pdf-0.10.3-py3-none-any.whl 971.42KB