magic_pdf-0.8.0-released
版本发布时间: 2024-09-10 20:20:57
opendatalab/MinerU最新发布版本:magic_pdf-0.9.3-released(2024-11-15 19:27:30)
What's Changed
feat:
- Add RAG API
- Integration of RAG into llama_index project
- Update Dockerfile
- Fine grained model singleton, reducing memory usage and accelerating initialization speed
- CLI and API add parsing range parameters, allowing customization of start and end pages
- Support image footnotes
bugfix:
- When removing the smaller overlapping block, retain the boundary information of that block
- Fill in the threshold of 0.6->0.3 for the span block
- The problem of losing low score lines in OCR DET stage
- Merge multiple spans of a single line in the OCR DET stage
- Optimization of English Adhesive Word Segmentation Logic
- Inaccurate layout box issue
- The problem of merging words after being broken by line breaks
- The final output result contains certain special characters
Full Changelog: https://github.com/opendatalab/MinerU/compare/magic_pdf-0.7.1-released...magic_pdf-0.8.0-released
1、 magic_pdf-0.8.0-py3-none-any.whl 1.06MB