项目地址: https://github.com/chenjun2hao/Bert_OCR.pytorch
Unofficial PyTorch implementation of the paper, which transforms the irregular text with 2D layout to character sequence directly via 2D attentional scheme. They utilize a relation attention module to capture the dependencies of feature maps
and a parallel attention module to decode all characters in
parallel.
At present, the accuracy of the paper cannot be achieved. And i borrowed code from deep-text-recognition-benchmark
model
在这里插入图片描述
result
Test on ICDAR2019 with only 51.15%, will continue to improve.
在这里插入图片描述
Feature
- Output image string once not like the seqtoseq model
Requirements
Pytorch >= 1.1.0
Test
download the pretrained model Baidu password: kdah.
test on images which in demo_image folder
python demo.py --image_folder demo_image --saved_model <model_path/best_accuracy.pth>
- some examples
| demo images | Bert_OCR |
|---|---|
![]() 在这里插入图片描述
|
available |
![]() 在这里插入图片描述
|
shakesshack |
![]() 在这里插入图片描述
|
london |
![]() 在这里插入图片描述
|
greenstead |
![]() 在这里插入图片描述
|
toast |
![]() 在这里插入图片描述
|
merry |
![]() 在这里插入图片描述
|
underground |
![]() 在这里插入图片描述
|
ronaldo |
在这里插入图片描述
|
bally |
![]() 在这里插入图片描述
|
university |
- result on benchmark data sets
| IIIT5k_3000 | SVT | IC03_860 | IC03_867 | IC13_857 | IC13_1015 | IC15_1811 | IC15_2077 | SVTP | CUTE80 |
|---|---|---|---|---|---|---|---|---|---|
| 84.367 | 79.907 | 91.860 | 91.465 | 88.448 | 86.010 | 65.654 | 63.215 | 68.527 | 81.185 |
total_accuracy: 78.423
Train
- I prepared a small dataset for train.The image and labels are in
./dataset/BAIDU.
python train.py --root ./dataset/BAIDU/images/ --train_csv ./dataset/BAIDU/small_train.txt --val_csv ./dataset/BAIDU/small_train.txt








