magic_pdf-0.10.0-released
版本发布时间: 2024-11-22 17:54:27
opendatalab/MinerU最新发布版本:magic_pdf-0.10.5-released(2024-12-02 14:16:59)
What's Changed
- fix: 修复issue opendatalab#715 by @LollipopsAndWine in https://github.com/opendatalab/MinerU/pull/971
- docs(README): update GPU hardware recommendations and table recognition options by @myhloli in https://github.com/opendatalab/MinerU/pull/973
- docs: improve GPU support list formatting in README_zh-CN.md by @myhloli in https://github.com/opendatalab/MinerU/pull/974
- docs: update feature description for table conversion by @myhloli in https://github.com/opendatalab/MinerU/pull/975
- docs: update readme by @myhloli in https://github.com/opendatalab/MinerU/pull/977
- update ci by @dt-yy in https://github.com/opendatalab/MinerU/pull/986
- test(unitest): Restore unit test cases by @myhloli in https://github.com/opendatalab/MinerU/pull/998
- refactor(tests): extract common test utilities into test_commons.py by @myhloli in https://github.com/opendatalab/MinerU/pull/1001
- feat(ocr): improve handling of angled text boxes by @myhloli in https://github.com/opendatalab/MinerU/pull/1010
- refactor(para): improve paragraph splitting logic by @myhloli in https://github.com/opendatalab/MinerU/pull/1013
- build(setup): add old_linux specific dependencies by @myhloli in https://github.com/opendatalab/MinerU/pull/1016
- refactor(para): adjust right margin threshold based on block width by @myhloli in https://github.com/opendatalab/MinerU/pull/1018
- fix: using new data api replace old rw api by @icecraft in https://github.com/opendatalab/MinerU/pull/1006
- delete unused pipeline file by @liugongjian in https://github.com/opendatalab/MinerU/pull/1024
- refactor: move some constants or enums defs to config folder by @icecraft in https://github.com/opendatalab/MinerU/pull/1027
- fix: remove test code by @icecraft in https://github.com/opendatalab/MinerU/pull/1036
- fix(tools): handle empty language string in common.py by @myhloli in https://github.com/opendatalab/MinerU/pull/1045
- refactor(ocr_dict_merge): add threshold parameter for line merging by @myhloli in https://github.com/opendatalab/MinerU/pull/1046
- fix(ocr_mkcontent): improve hyphen handling at line ends by @myhloli in https://github.com/opendatalab/MinerU/pull/1047
- fix(remove_overlaps_min_spans): optimize overlap detection in OCR span list modification by @myhloli in https://github.com/opendatalab/MinerU/pull/1048
- feat(ocr): improve text detection and OCR accuracy by @myhloli in https://github.com/opendatalab/MinerU/pull/1049
- refactor(txt_parse): improve text extraction accuracy with new algorithm by @myhloli in https://github.com/opendatalab/MinerU/pull/1050
- fix: use concrete class instead of abstract class by @icecraft in https://github.com/opendatalab/MinerU/pull/1052
- fix(pdf_parse): improve line stop flag detection accuracy by @myhloli in https://github.com/opendatalab/MinerU/pull/1053
- test: comment out assertions for metascan classify and meta scan tests by @myhloli in https://github.com/opendatalab/MinerU/pull/1054
- Add test cases to json compressor util by @liugongjian in https://github.com/opendatalab/MinerU/pull/1056
- refactor(para): improve line stop flag and remove unused debug mode by @myhloli in https://github.com/opendatalab/MinerU/pull/1058
- fix(table): add null check for OCR result in rapid table prediction by @myhloli in https://github.com/opendatalab/MinerU/pull/1060
- refactor(model): move page total time logging to custom model analysis by @myhloli in https://github.com/opendatalab/MinerU/pull/1061
- fix(table): add null check for OCR result in rapid table prediction by @myhloli in https://github.com/opendatalab/MinerU/pull/1062
- fix(pdf_parse): improve OCR result handling by @myhloli in https://github.com/opendatalab/MinerU/pull/1064
New Contributors
- @liugongjian made their first contribution in https://github.com/opendatalab/MinerU/pull/1024
Full Changelog: https://github.com/opendatalab/MinerU/compare/magic_pdf-0.9.3-released...magic_pdf-0.10.0-released
1、 magic_pdf-0.10.0-py3-none-any.whl 1.11MB