v0.6.0
版本发布时间: 2022-05-05 22:20:47
open-mmlab/mmocr最新发布版本:v1.0.1(2023-07-04 15:11:53)
Highlights
- A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
- DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
- Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
- To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
- Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
- Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
- MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.
Lmdb Dataset
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB. This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.
Specifications
To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:
- The parameter describing the data volume of the dataset is
num-samples
instead oftotal_number
(deprecated). - Images and labels are stored with keys in the form of
image-000000001
andlabel-000000001
, respectively.
Usage
- Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
-
Previously, MMOCR had a function
txt2lmdb
(deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format. -
Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).
# Directory structure ├──img_path | |—— img1.jpg | |—— img2.jpg | |—— ... |——label.txt (or label.jsonl) # Annotation format label.txt: img1.jpg HELLO img2.jpg WORLD ... label.jsonl: {'filename':'img1.jpg', 'text':'HELLO'} {'filename':'img2.jpg', 'text':'WORLD'} ...
-
Then pack these files up:
python tools/data/utils/lmdb_converter.py {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
-
Check out tools.md for more details.
- The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
-
Set parser as
LineJsonParser
andfile_format
as 'lmdb' in dataset config# configs/_base_/recog_datasets/ST_MJ_train.py train1 = dict( type='OCRDataset', img_prefix=train_img_prefix1, ann_file=train_ann_file1, loader=dict( type='AnnFileLoader', repeat=1, file_format='lmdb', parser=dict( type='LineJsonParser', keys=['filename', 'text'], )), pipeline=None, test_mode=False)
-
Use
LoadImageFromLMDB
in pipeline:# configs/_base_/recog_pipelines/crnn_pipeline.py train_pipeline = [ dict(type='LoadImageFromLMDB', color_type='grayscale'), ...
- You are good to go! Start training and MMOCR will load data from your lmdb dataset.
New Features & Enhancements
- Add analyze_logs in tools and its description in docs by @Y-M-Y in https://github.com/open-mmlab/mmocr/pull/899
- Add LSVT Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/896
- Add RCTW dataset converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/914
- Support computing mean scores in UniformConcatDataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/981
- Support loading images and labels from lmdb file by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/982
- Add recog2lmdb and new toy dataset files by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/979
- Add labelme converter for textdet and textrecog by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/972
- Update CircleCI configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/918
- Update Git Action by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/930
- More customizable fields in dataloaders by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/933
- Skip CIs when docs are modified by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/941
- Rename Github tests, fix ignored paths by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/946
- Support latest MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/959
- Support dynamic threshold range in eval_hmean by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/962
- Update the version requirement of mmdet in docker by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/966
- Replace
opencv-python-headless
withopen-python
by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/970 - Update Dataset Configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/980
- Add SynthText dataset config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/983
- Automatically report mean scores when applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/995
- Add DBNet++ by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/973
- Add MASTER by @JiaquanYe in https://github.com/open-mmlab/mmocr/pull/807
- Allow choosing metrics to report in text recognition tasks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/989
- Add HierText converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/948
- Fix lint_only in CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/998
Bug Fixes
- Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/927
- Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/944
- Fix a Bug in ResNet plugin by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/967
- revert a wrong setting in db_r18 cfg by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/978
- Fix TotalText Anno version issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/945
- Update installation step of
albumentations
by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/984 - Fix ImgAug transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/949
- Fix GPG key error in CI and docker by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/988
- update label.lmdb by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/991
- correct meta key by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/926
- Use new image by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/976
- Fix Data Converter Issues by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/955
Docs
- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/905
- Fix the misleading description in test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/908
- Update recog.md for lmdb Generation by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/934
- Add MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/954
- Add wechat QR code to CN readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/960
- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/947
- Use QR codes from MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/971
- Renew dataset_types.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/997
New Contributors
- @Y-M-Y made their first contribution in https://github.com/open-mmlab/mmocr/pull/899
Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.5.0...v0.6.0