v0.5.0
版本发布时间: 2022-03-31 17:50:19
open-mmlab/mmocr最新发布版本:v1.0.1(2023-07-04 15:11:53)
Highlights
- MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain
.txt
file to JSON line format.jsonl
, and then revise a few configurations to enable theLineJsonParser
. For more information, please read our step-by-step tutorial. -
Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in
mmocr.utils.ocr
by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object byMMOCR(det=’Tesseract’, recog=’Tesseract’)
. Credit to @garvan2021 - We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
- Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!
Migration Guide - ResNet
Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like
architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.
Plugin
-
Plugin
is a module category inherited from MMCV's implementation ofPLUGIN_LAYERS
, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.Plugin Example
@PLUGIN_LAYERS.register_module() class Maxpool2d(nn.Module): """A wrapper around nn.Maxpool2d(). Args: kernel_size (int or tuple(int)): Kernel size for max pooling layer stride (int or tuple(int)): Stride for max pooling layer padding (int or tuple(int)): Padding for pooling layer """ def __init__(self, kernel_size, stride, padding=0, **kwargs): super(Maxpool2d, self).__init__() self.model = nn.MaxPool2d(kernel_size, stride, padding) def forward(self, x): """ Args: x (Tensor): Input feature map Returns: Tensor: The tensor after Maxpooling layer. """ return self.model(x)
Stage-wise Plugins
-
ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving
plugins
parameters in ResNet.[port1: before stage] ---> [stage] ---> [port2: after stage]
-
E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this
resnet18_speical = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='before_stage') dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, False, True), position='after_stage') ])
-
You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:
Multiple Plugins Example
-
ResNet in Master is based on ResNet31. And after each stage, a module named
GCAModule
will be used. TheGCAModule
is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins atafter_stage
port in the same time.resnet_master = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1), stages=[True, True, True, True], position='after_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
-
-
In each plugin, we will pass two parameters (
in_channels
,out_channels
) to support operations that need the information of current channels.
Block-wise Plugin (Experimental)
-
We also refactored the
BasicBlock
used in ResNet. Now it can be customized with block-wise plugins. Check here for more details. -
BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.
[port1: before_conv1] ---> [conv1] ---> [port2: after_conv1] ---> [conv2] ---> [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
-
In each plugin, we will pass a parameter
in_channels
to support operations that need the information of current channels. -
E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:
Block-wise Plugin Example
resnet_31 = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock'), arch_layers=[1, 2, 5, 3], arch_channels=[256, 256, 512, 512], strides=[1, 1, 1, 1], plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
Full Examples
ResNet without plugins
-
ResNet45 is used in ASTER and ABINet without any plugins.
resnet45_aster = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)]) resnet45_abi = ResNet( in_channels=3, stem_channels=32, block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[2, 1, 2, 1, 1])
ResNet with plugins
-
ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.
resnet_31 = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock'), arch_layers=[1, 2, 5, 3], arch_channels=[256, 256, 512, 512], strides=[1, 1, 1, 1], plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
Migration Guide - Dataset Annotation Loader
The annotation loaders, LmdbLoader
and HardDiskLoader
, are unified into AnnFileLoader
for a more consistent design and wider support on different file formats and storage backends. AnnFileLoader
can load the annotations from disk
(default), http
and petrel
backend, and parse the annotation in txt
or lmdb
format. LmdbLoader
and HardDiskLoader
are deprecated, and users are recommended to modify their configs to use the new AnnFileLoader
. Users can migrate their legacy loader HardDiskLoader
referring to the following example:
# Legacy config
train = dict(
type='OCRDataset',
...
loader=dict(
type='HardDiskLoader',
...))
# Suggested config
train = dict(
type='OCRDataset',
...
loader=dict(
type='AnnFileLoader',
file_storage_backend='disk',
file_format='txt',
...))
Similarly, using AnnFileLoader
with file_format='lmdb'
instead of LmdbLoader
is strongly recommended.
New Features & Enhancements
- Update mmcv install by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/775
- Upgrade isort by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/771
- Automatically infer device for inference if not speicifed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/781
- Add open-mmlab precommit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/787
- Add windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/790
- Add CurvedSyntext150k Converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/719
- Add FUNSD Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/808
- Support loading annotation file with petrel/http backend by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/793
- Support different seeds on different ranks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/820
- Support json in recognition converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/844
- Add args and docs for multi-machine training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/849
- Add warning info for LineStrParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/850
- Deploy openmmlab-bot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/876
- Add Tesserocr Inference by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/814
- Add LV Dataset Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/871
- Add SROIE Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/810
- Add NAF Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/815
- Add DeText Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/818
- Add IMGUR Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/825
- Add ILST Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/833
- Add KAIST Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/835
- Add IC11 (Born-digital Images) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/857
- Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/861
- Add BID Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/862
- Add Vintext Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/864
- Add MTWI Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/867
- Add COCO Text v2 Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/872
- Add ReCTS Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/892
- Refactor ResNets by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/809
Bug Fixes
- Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in https://github.com/open-mmlab/mmocr/pull/763
- Update mmdet version limit by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/773
- Minimum version requirement of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/769
- Disable worker in the dataloader of gpu unit test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/780
- Standardize the type of torch.device in ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/800
- Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/685
- Add num_classes to configs of ABINet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/805
- Support loading space character from dict file by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/854
- Description in tools/data/utils/txt2lmdb.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/870
- ignore_index in SARLoss by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/869
- Fix a bug that may cause inplace operation error by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/884
- Use hyphen instead of underscores in script args by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/890
Docs
- Add deprecation message for deploy tools by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/801
- Reorganizing OpenMMLab projects in readme by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/806
- Add demo/README_zh.md by @EighteenSprings in https://github.com/open-mmlab/mmocr/pull/802
- Add detailed version requirement table by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/778
- Correct misleading section title in training.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/819
- Update README_zh-CN document URL by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/823
- translate testing.md. by @yangrisheng in https://github.com/open-mmlab/mmocr/pull/822
- Fix confused description for load-from and resume-from by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/842
- Add documents getting_started in docs/zh by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/841
- Add the model serving translation document by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/845
- Update docs about installation on Windows by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/852
- Update tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/853
- Update Instructions for New Data Converters by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/900
- Brief installation instruction in README by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/897
- update doc for ILST, VinText, BID by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/902
- Fix typos in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/903
- Recog dataset doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/893
- Reorganize the directory structure section in det.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/894
New Contributors
- @GPhilo made their first contribution in https://github.com/open-mmlab/mmocr/pull/763
- @xinke-wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/801
- @EighteenSprings made their first contribution in https://github.com/open-mmlab/mmocr/pull/802
- @BeyondYourself made their first contribution in https://github.com/open-mmlab/mmocr/pull/823
- @yangrisheng made their first contribution in https://github.com/open-mmlab/mmocr/pull/822
- @Mountchicken made their first contribution in https://github.com/open-mmlab/mmocr/pull/844
- @garvan2021 made their first contribution in https://github.com/open-mmlab/mmocr/pull/814
Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.1...v0.5.0