magic_pdf-0.9.0-released
版本发布时间: 2024-11-01 19:04:55
opendatalab/MinerU最新发布版本:magic_pdf-0.9.3-released(2024-11-15 19:27:30)
What's Changed
- Update README_zh-CN.md (#404) by @drunkpig in https://github.com/opendatalab/MinerU/pull/409
- feat: add dockerfile by @Lincyaw in https://github.com/opendatalab/MinerU/pull/189
- fix(ocr_mkcontent): improve language detection and content formatting by @myhloli in https://github.com/opendatalab/MinerU/pull/458
- fix(self_modify): merge detection boxes for optimized text region detection by @myhloli in https://github.com/opendatalab/MinerU/pull/448
- fix(pdf-extract): adjust box threshold for OCR detection to fix issue about OCR mode lost some line by @myhloli in https://github.com/opendatalab/MinerU/pull/447
- feat: rename the file generated by command line tools by @icecraft in https://github.com/opendatalab/MinerU/pull/401
- fix(ocr_mkcontent): revise table caption output by @myhloli in https://github.com/opendatalab/MinerU/pull/397
- build(docker): update docker build step by @myhloli in https://github.com/opendatalab/MinerU/pull/471
- upload an introduction about chemical formula and update readme.md by @GDDGCZ518 in https://github.com/opendatalab/MinerU/pull/489
- fix: remove the default value of output option in tools/cli.py and to… by @icecraft in https://github.com/opendatalab/MinerU/pull/494
- feat: add test case by @dt-yy in https://github.com/opendatalab/MinerU/pull/499
- fixes #492 decrease span threshold for block filling by @myhloli in https://github.com/opendatalab/MinerU/pull/500
- fix(detect_all_bboxes): remove small overlapping blocks by merging by @myhloli in https://github.com/opendatalab/MinerU/pull/501
- feat(cli&analyze&pipeline): add start_page and end_page args for pagination by @myhloli in https://github.com/opendatalab/MinerU/pull/507
- Feat/support rag by @icecraft in https://github.com/opendatalab/MinerU/pull/510
- feat(gradio): add app by gradio by @myhloli in https://github.com/opendatalab/MinerU/pull/512
- fix: replace \u0002, \u0003 in common text by @drunkpig in https://github.com/opendatalab/MinerU/pull/521
- fix(end_page_id):Fix the issue where end_page_id is corrected to len-1 when its input is 0. by @myhloli in https://github.com/opendatalab/MinerU/pull/518
- fix(para): When an English line ends with a hyphen, do not add a space at the end. by @drunkpig in https://github.com/opendatalab/MinerU/pull/523
- Release: Release 0.7.1 verison, update dev by @dt-yy in https://github.com/opendatalab/MinerU/pull/527
- Hotfix readme 0.7.1 by @Focusshang in https://github.com/opendatalab/MinerU/pull/529
- fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 by @papayalove in https://github.com/opendatalab/MinerU/pull/542
- fix: typo error in markdown by @icecraft in https://github.com/opendatalab/MinerU/pull/536
- fix(gradio): remove unused imports and simplify pdf display by @myhloli in https://github.com/opendatalab/MinerU/pull/534
- Feat/support footnote in figure by @icecraft in https://github.com/opendatalab/MinerU/pull/532
- refactor(pdf_extract_kit): implement singleton pattern for atomic models by @myhloli in https://github.com/opendatalab/MinerU/pull/533
- feat: mineru_web by @LollipopsAndWine in https://github.com/opendatalab/MinerU/pull/555
- features@add mineru gpu&web_api by @yanqiangmiffy in https://github.com/opendatalab/MinerU/pull/568
- docs(models_download): update model download instructions to use python script by @myhloli in https://github.com/opendatalab/MinerU/pull/560
- fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 by @papayalove in https://github.com/opendatalab/MinerU/pull/574
- feat(ocr): supports minority languages by @myhloli in https://github.com/opendatalab/MinerU/pull/577
- refactor(pdf_extract_kit): update model config and weight paths for UniMERNet-0.2.0 by @myhloli in https://github.com/opendatalab/MinerU/pull/584
- feat(gradio_app): add web app with PDF processing as a project by @myhloli in https://github.com/opendatalab/MinerU/pull/579
- fix: web_api by @LollipopsAndWine in https://github.com/opendatalab/MinerU/pull/580
- Realese 0.8.0 by @drunkpig in https://github.com/opendatalab/MinerU/pull/587
- fix: 1. resolve uncorrect pair relation of figure and footnote, 2. re… by @icecraft in https://github.com/opendatalab/MinerU/pull/603
- fix: recovert the lang option in tools/cli.py by @icecraft in https://github.com/opendatalab/MinerU/pull/604
- fix: solve conflicts by @myhloli in https://github.com/opendatalab/MinerU/pull/607
- fix: remove useless files by @myhloli in https://github.com/opendatalab/MinerU/pull/608
- feat(gradio_app): add examples accordion to the PDF conversion interface by @myhloli in https://github.com/opendatalab/MinerU/pull/597
- feat(pipeline): pass language parameter for parsing and markdown conversion by @myhloli in https://github.com/opendatalab/MinerU/pull/602
- feat(ocr_mkcontent): support drop reason in none_with_reason mode by @myhloli in https://github.com/opendatalab/MinerU/pull/630
- feat(UNIPipe): change default drop_mode to NONE_WITH_REASON by @myhloli in https://github.com/opendatalab/MinerU/pull/631
- refactor(pdf_extract): use Image.crop directly with layout detection by @myhloli in https://github.com/opendatalab/MinerU/pull/635
- fix(pdf-extract): ensure model is set to evaluation mode before processing by @myhloli in https://github.com/opendatalab/MinerU/pull/636
- fix(pdf_extract_kit):change unimernet base -> small by @myhloli in https://github.com/opendatalab/MinerU/pull/639
- feat: add test case by @dt-yy in https://github.com/opendatalab/MinerU/pull/645
- feat: 集成前端界面,配置一键启动 by @LollipopsAndWine in https://github.com/opendatalab/MinerU/pull/668
- feat: 删除无用的文件,更新前端style by @LollipopsAndWine in https://github.com/opendatalab/MinerU/pull/669
- docs: update project lists in README files to include web_api by @myhloli in https://github.com/opendatalab/MinerU/pull/670
- feat:add layoutreader to sort blocks by @myhloli in https://github.com/opendatalab/MinerU/pull/672
- refactor(model): improve timing information and performance by @myhloli in https://github.com/opendatalab/MinerU/pull/690
- feat: add arXiv paper link to header and adjust PDF parsing logic by @myhloli in https://github.com/opendatalab/MinerU/pull/693
- perf(pdf_extract_kit): conditional memory cleanup based on GPU capacity by @myhloli in https://github.com/opendatalab/MinerU/pull/694
- fix: caption or footnote match algorithm by @icecraft in https://github.com/opendatalab/MinerU/pull/695
- fix: caption|footnote match algorithm by @icecraft in https://github.com/opendatalab/MinerU/pull/696
- feat(layoutreader): support local model directory and improve model loading by @myhloli in https://github.com/opendatalab/MinerU/pull/698
- feat(docs): automate model download and configuration by @myhloli in https://github.com/opendatalab/MinerU/pull/699
- docs: add filename to wget command in model download scripts by @myhloli in https://github.com/opendatalab/MinerU/pull/700
- docs: update CUDA acceleration guides and README content by @myhloli in https://github.com/opendatalab/MinerU/pull/701
- Update README_Windows_CUDA_Acceleration_en_US.md by @myhloli in https://github.com/opendatalab/MinerU/pull/706
- feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support by @myhloli in https://github.com/opendatalab/MinerU/pull/716
- Update how_to_download_models_zh_cn.md by @myhloli in https://github.com/opendatalab/MinerU/pull/717
- fix: Solving the Grouping Anomaly Issue with Multiple Consecutive Non-Text Blocks by @myhloli in https://github.com/opendatalab/MinerU/pull/718
- feat: manager docs with sphinx by @icecraft in https://github.com/opendatalab/MinerU/pull/737
- feat(list&index block): detect and merge list and index blocks by @myhloli in https://github.com/opendatalab/MinerU/pull/740
- refactor(para_split_v3): merge list and index block detection by @myhloli in https://github.com/opendatalab/MinerU/pull/743
- fix(para_split_v3): refine list block detection in paragraph splitting by @myhloli in https://github.com/opendatalab/MinerU/pull/744
- update example files by @myhloli in https://github.com/opendatalab/MinerU/pull/747
- refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation. by @myhloli in https://github.com/opendatalab/MinerU/pull/753
- refactor(para): improve paragraph splitting algorithm by @myhloli in https://github.com/opendatalab/MinerU/pull/765
- docs:Update the driver requirements on the Ubuntu system. by @myhloli in https://github.com/opendatalab/MinerU/pull/766
- update:update config json by @myhloli in https://github.com/opendatalab/MinerU/pull/769
- feat(model): add support for DocLayout-YOLO model by @myhloli in https://github.com/opendatalab/MinerU/pull/773
- build(setup): add doclayout_yolo dependency by @myhloli in https://github.com/opendatalab/MinerU/pull/774
- build(docker): add doclayout-yolo dependency by @myhloli in https://github.com/opendatalab/MinerU/pull/776
- feat: add support for non-PDF file conversion to PDF by @myhloli in https://github.com/opendatalab/MinerU/pull/777
- Feat/data api by @icecraft in https://github.com/opendatalab/MinerU/pull/782
- Feat/new table caption match by @icecraft in https://github.com/opendatalab/MinerU/pull/784
- refactor(parse_core): improve image and table block handling by @myhloli in https://github.com/opendatalab/MinerU/pull/785
- refactor(ocr): adjust OCR processing parameters by @myhloli in https://github.com/opendatalab/MinerU/pull/786
- fix: add init to magic_pdf.config by @myhloli in https://github.com/opendatalab/MinerU/pull/788
- fix: add init to magic_pdf.utils by @myhloli in https://github.com/opendatalab/MinerU/pull/789
- feat(draw_bbox): update bounding box drawing for tables and images by @myhloli in https://github.com/opendatalab/MinerU/pull/791
- Add multi_gpu process project by @randydl in https://github.com/opendatalab/MinerU/pull/793
- docs: update model download instructions and simplify demo scripts by @myhloli in https://github.com/opendatalab/MinerU/pull/794
- Feat/new table caption match by @icecraft in https://github.com/opendatalab/MinerU/pull/797
- docs(README): update for v0.9.0 release by @myhloli in https://github.com/opendatalab/MinerU/pull/798
- docs: update logo path in README files by @myhloli in https://github.com/opendatalab/MinerU/pull/799
- docs: update documentation path in README files by @myhloli in https://github.com/opendatalab/MinerU/pull/800
- perf: table model update with PP OCRv4 by @papayalove in https://github.com/opendatalab/MinerU/pull/802
- fix: add priority match rule by @icecraft in https://github.com/opendatalab/MinerU/pull/804
- refactor(table): disable StructEqTable support and add TableMaster support by @myhloli in https://github.com/opendatalab/MinerU/pull/805
- docs(README): update model download instructions for PDF-Extract-Kit 1.0 by @myhloli in https://github.com/opendatalab/MinerU/pull/806
- Dev->0.9 release by @myhloli in https://github.com/opendatalab/MinerU/pull/808
- (docs&build): switch to Aliyun PyPI mirror by @myhloli in https://github.com/opendatalab/MinerU/pull/809
- (docs&build): switch to Aliyun PyPI mirror by @myhloli in https://github.com/opendatalab/MinerU/pull/810
- fix(magic_pdf): handle missing image_path in spans by @myhloli in https://github.com/opendatalab/MinerU/pull/817
- fix(magic_pdf): handle missing image_path in spans by @myhloli in https://github.com/opendatalab/MinerU/pull/818
- perf: table model update with PP OCRv4 by @papayalove in https://github.com/opendatalab/MinerU/pull/820
- fix(pdf_parse): optimize span processing by removing outside spans by @myhloli in https://github.com/opendatalab/MinerU/pull/824
- fix(pdf_parse): optimize span processing by removing outside spans by @myhloli in https://github.com/opendatalab/MinerU/pull/825
- fix(pdf_parse): improve span removal logic for all content types by @myhloli in https://github.com/opendatalab/MinerU/pull/830
- fix(pdf_parse): improve span removal logic for all content types by @myhloli in https://github.com/opendatalab/MinerU/pull/831
- feat(pdf_parse): improve span filtering and add new block types by @myhloli in https://github.com/opendatalab/MinerU/pull/834
- fix(pdf_parse): improve span filtering by @myhloli in https://github.com/opendatalab/MinerU/pull/835
- Release 0.9.0 by @myhloli in https://github.com/opendatalab/MinerU/pull/838
New Contributors
- @Lincyaw made their first contribution in https://github.com/opendatalab/MinerU/pull/189
- @GDDGCZ518 made their first contribution in https://github.com/opendatalab/MinerU/pull/489
- @LollipopsAndWine made their first contribution in https://github.com/opendatalab/MinerU/pull/555
- @yanqiangmiffy made their first contribution in https://github.com/opendatalab/MinerU/pull/568
- @randydl made their first contribution in https://github.com/opendatalab/MinerU/pull/793
Full Changelog: https://github.com/opendatalab/MinerU/compare/magic_pdf-0.8.1-update-docs...magic_pdf-0.9.0-released
1、 magic_pdf-0.9.0-py3-none-any.whl 1.09MB