paddle的OCR

本文最后更新于 2024年8月26日凌晨

最近在研究机器人，就涉及到OCR，我现在基本放弃自己学习了，设备性能不行，直接用别人的模型挺香的。最开始选择阿里魔塔的模型，但是特么检测和识别居然一个是tensorflow一个是pytorch，就无语，于是转头用paddle的试试。

windows环境

anconda

仍然是使用anconda，软件安装后

1 2	`conda create -n paddle python=3.10 activate paddle`

paddle相关

官网找到对应自己的执行命令，我这里是windows下CUDA11.6，选择用pip和conda都行

1	`python -m pip install paddlepaddle-gpu==2.4.1.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html`

安装paddlehub

1	`pip install --upgrade paddlehub -i https://pypi.tuna.tsinghua.edu.cn/simple`

安装PaddleOCR

1	`pip install "paddleocr>=2.0.1"`

可能遇到的问题

No module named ‘lanms’

1	`pip install lanms-neo`

No module named ‘Polygon’

1	`pip install Polygon3 -i https://pypi.tuna.tsinghua.edu.cn/simple`

No module named ‘shapely’

1	`pip install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple`

No module named ‘pyclipper’
pip install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple

AttributeError: module ‘numpy’ has no attribute ‘int’. Did you mean: ‘inf’?

1	`pip install numpy==1.23`

ubuntu环境

安装anconda

1	`wget https://mirrors.bfsu.edu.cn/anaconda/archive/Anaconda3-2022.10-Linux-x86_64.sh`

一路回车加yes，完成后执行

1
2
3

echo 'export PATH="/root/anaconda3/bin:$PATH"'>>~/.bashrc
source ~/.bashrc
conda init bash

退出终端重新进入

安装paddle环境

1
2
3

conda create -n paddle python=3.10 
conda install paddlepaddle==2.4.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
pip install --upgrade paddlehub -i https://pypi.tuna.tsinghua.edu.cn/simple

可能出现的问题

如果seqeval安装报错则先安装setuptools_scm

1	`pip install setuptools_scm`

No module named ‘skimage’

1	`pip install scikit-image`

No module named ‘imgaug’

1	`conda install -c conda-forge imgaug`

极简示例

这个示例是使用paddlehub加载ch_pp-ocrv3进行的检测与识别，该示例无需安装PaddleOCR

import paddlehub as hub
import cv2

class OcrRecognition:
    def __init__(self):
        self.ocr = hub.Module(name="ch_pp-ocrv3", enable_mkldnn=True)   

    def detection(self,imgPath):
        np_images =[cv2.imread(imgPath)] 
        results= self.ocr.recognize_text( images=np_images,
                                    use_gpu=True,
                                    output_dir='downloadTemp',
                                    visualization=False,  # 是否将识别结果保存为图片文件；
                                    box_thresh=0.6, # 检测文本框置信度的阈值；
                                    text_thresh=0.5, # 识别中文文本置信度的阈值；
                                    angle_classification_thresh=0.9, # 文本角度分类置信度的阈值
                                    det_db_unclip_ratio=1.5) #设置检测框的大小
        resultList = []
        for result in results:
            data = result['data']
            for infomation in data:
                #print('text: ', infomation['text'], '\nconfidence: ', infomation['confidence'], '\ntext_box_position: ', infomation['text_box_position'])
                resultList.append(infomation['text'])

        return resultList

test = OcrRecognition()
print(test.detection("./test2.png"))

识别图是：

识别结果如下，我这里只保留了文本

1	`['测试文本一号（789A）', '2', '测试文本二号（321B）', '3', '测试文本三号（123C）']`

其他信息可以用如下代码打印

# for result in results:
#     data = result['data']
#     save_path = result['save_path']
#     for infomation in data:
#         print('text: ', infomation['text'], '\nconfidence: ', infomation['confidence'], '\ntext_box_position: ', infomation['text_box_position'])

使用PaddleOCR来进行识别

需要安装PaddleOCR，然后直接执行命令

1	`paddleocr --image_dir ./test2.png`

图片就是上面极简示例那个，然后此命令执行后会自动下载

ch_PP-OCRv3_det_infer.tar 检测模型
ch_PP-OCRv3_rec_infer.tar 识别模型
ch_ppocr_mobile_v2.0_cls_infer.tar 文本方向分类模型
下载到了C:\Users\dell/.paddleocr/whl目录下
执行结果：

他支持多种语言，通过–lang=参数来指定，例如

1	`paddleocr --image_dir ./test2.png --lang=ch`

以下是官网给的类型表格，懒得敲直接截图

然后是在python上使用示例

from paddleocr import PaddleOCR

# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换
# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`
ocr = PaddleOCR(use_angle_cls=True, lang="ch") 
img_path = './test2.png'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)

结果如下图

上面这种方式是直接调用pip安装的paddleocr，也就是源文件在D:\Anaconda3\envs\paddle\Lib\site-packages\paddleocr里。如果想要修改默认的模型就需要去这里改paddleocr.py文件，在第58行有模型的下载地址，52行有模型的存储地址。例如

'det': {
    'ch': {
        'url':
        'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar',
    },

然后去模型库复制想要使用的模型链接来替换这个即可

如果不想通过pip安装PaddleOCR，那么就克隆工程

1	`git clone https://github.com/PaddlePaddle/PaddleOCR.git`

然后安装环境

1	`pip install -r requirements.txt`

修改模型地址，不然执行后仍然安装到C盘的用户目录去咯。就是上面说的paddleocr.py的52行

1	`BASE_DIR = os.path.expanduser("~/.paddleocr/")`

改成这个工程的目录

1	`BASE_DIR = os.path.expanduser("./inference/")`

然后在paddleocr.py最后添加入口

1 2	`if __name__ == "__main__": main()`

最后执行

1	`paddleocr --image_dir ./test2.png`

模型就会被下载到PaddleOCR\inference\whl中

使用predict_system.py选择模型进行识别，这里来识别下日文

python tools/infer/predict_system.py --image_dir="./doc/imgs/japan_1.jpg" --det_model_dir="./inference/ch_ppocr_server_v2.0_det_infer" --rec_model_dir="./inference/japan_PP-OCRv3_rec_infer" --rec_char_dict_path="ppocr/utils/dict/japan_dict.txt" --vis_font_path="doc/fonts/japan.ttf"

PaddleHub Serving的服务部署

这个有官方文档，可以直接去看，以下是我使用的记录

首先把工程克隆下来

1	`git clone https://github.com/PaddlePaddle/PaddleOCR.git`

进入仓库，新建inference文件夹，然后去模型库下载模型：

检测模型，例如：ch_PP-OCRv4_det_server_infer，ch_PP-OCRv3_det_infer
识别模型，例如：ch_PP-OCRv4_rec_server_infer，ch_PP-OCRv3_rec_infer
文本方向分类模型，例如：ch_ppocr_mobile_v2.0_cls_infer

这里主要使用串联服务，代码路径是：PaddleOCR/deploy/hubserving/ocr_system，首先进入其中修改下上面模型的路径。

修改params.py，默认是使用三代模型，如果我们下载了最新的四代就需要修改下识别和检测模型的路径

1 2	`cfg.det_model_dir = "./inference/ch_PP-OCRv4_det_server_infer/" cfg.rec_model_dir = "./inference/ch_PP-OCRv4_rec_server_infer/"`

安装检测+识别串联服务模块，如果报错在上文中找

1	`hub install deploy\hubserving\ocr_system\`

启动服务

1	`hub serving start -m ocr_system`

输出

 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8866
 * Running on http://192.168.2.137:8866
[2022-12-30 15:28:17,256] [    INFO] _internal.py:224 - Press CTRL+C to quit

接下来就是发个请求试一试，paddleOcr里面有个例程(tools/test_hubserving.py)，但是单纯请求可以直接用下面代码

import requests
import json
import base64

def cv2_to_base64(image):
    return base64.b64encode(image).decode('utf8')

def main():
    headers = {"Content-type": "application/json"}
    img = open("./test2.png", 'rb').read()
    data = {'images': [cv2_to_base64(img)]}
    r = requests.post(
        url="http://127.0.0.1:8866/predict/ocr_system", headers=headers, data=json.dumps(data))
    res = r.json()["results"][0]
    print(res)


if __name__ == '__main__':
    main()

结果：

服务器打印信息

[2022/12/30 15:44:01] ppocr DEBUG: dt_boxes num : 6, elapse : 0.03400087356567383
[2022/12/30 15:44:01] ppocr DEBUG: cls num  : 6, elapse : 0.05700230598449707
[2022/12/30 15:44:02] ppocr DEBUG: rec_res num  : 6, elapse : 0.8120079040527344
[2022-12-30 15:44:02,251] [    INFO] _internal.py:224 - 127.0.0.1 - - [30/Dec/2022 15:44:02] "POST /predict/ocr_system HTTP/1.1" 200 -

如果nohup来启动，则可以通过下列命令停止：

1	`hub serving stop -p 8866`

如果修改配置则需要先卸载模块

1	`hub uninstall ocr_system`

paddleOcr训练手写数字

下载数据集

执行下列程序，得到两位数手写数字

import cv2
import random,os
import numpy as np
from tqdm import tqdm
from paddle.vision.datasets import MNIST

# 加载数据集
mnist_train = MNIST(mode='train', backend='cv2')
mnist_test = MNIST(mode='test', backend='cv2')

if not os.path.exists("./dataset"):
    os.makedirs("./dataset")
if not os.path.exists("./dataset/train"):
    os.makedirs("./dataset/train")
if not os.path.exists("./dataset/test"):
    os.makedirs("./dataset/test")

# 数据集预处理
datas_train = {}
for i in range(len(mnist_train)):
    sample = mnist_train[i]
    x, y = sample[0], sample[1]

    _sum = np.sum(x, axis=0)
    _where = np.where(_sum > 0)
    x = 255 - x[:, _where[0][0]: _where[0][-1]+1]
    if str(y[0]) in datas_train:
        datas_train[str(y[0])].append(x)
    else:
        datas_train[str(y[0])] = [x]

datas_test = {}
for i in range(len(mnist_test)):
    sample = mnist_test[i]
    x, y = sample[0], sample[1]

    _sum = np.sum(x, axis=0)
    _where = np.where(_sum > 0)
    x = 255 - x[:, _where[0][0]: _where[0][-1]+1]
    if str(y[0]) in datas_test:
        datas_test[str(y[0])].append(x)
    else:
        datas_test[str(y[0])] = [x]

# 图片拼接采样
datas_train_list = []
for num in tqdm(range(0, 99)):
    for _ in range(100):
        imgs = [255 - np.zeros((28, np.random.randint(10)))]
        for word in str(num):
            index = np.random.randint(0, len(datas_train[word]))
            imgs.append(datas_train[word][index])
            imgs.append(255 - np.zeros((28, np.random.randint(10))))
        img = np.concatenate(imgs, 1)
        cv2.imwrite('dataset/train/%03d_%04d.jpg' % (num, _), img)
        datas_train_list.append('train/%03d_%04d.jpg\t%d\n' % (num, _, num))

datas_test_list = []
for num in tqdm(range(0, 99)):
    for _ in range(50):
        imgs = [255 - np.zeros((28, np.random.randint(10)))]
        for word in str(num):
            index = np.random.randint(0, len(datas_test[word]))
            imgs.append(datas_test[word][index])
            imgs.append(255 - np.zeros((28, np.random.randint(10))))
        img = np.concatenate(imgs, 1)
        cv2.imwrite('dataset/test/%03d_%04d.jpg' % (num, _), img)
        datas_test_list.append('test/%03d_%04d.jpg\t%d\n' % (num, _, num))

# 数据列表生成
with open('dataset/train.txt', 'w') as f:
    for line in datas_train_list:
        f.write(line)

with open('dataset/test.txt', 'w') as f:
    for line in datas_test_list:
        f.write(line)

环境

克隆paddleOcr仓库

1	`git clone https://github.com/PaddlePaddle/PaddleOCR.git`

下载个预训练模型，我这里选择了ch_ppocr_server_v2.0_rec
在paddleOcr中新建pretrain_models文件夹，将其扔进去
然后继续建个文件夹mnist用来放训练的相关文件

训练

建立识别分类字典number.txt，里面就是0~9个数字一行一个。
然后建立训练配置文件mnist.yml

Global:
  use_gpu: true
  # 最大训练epoch数 默认500
  epoch_num: 100
  # log队列长度，每次打印输出队列里的中间值 默认20
  log_smooth_window: 20
  # 设置打印log间隔
  print_batch_step: 10
  # 设置模型保存路径
  save_model_dir: D:\\code\\ggggg\\ocrTrain\\paddle\\PaddleOCR\\mnist\\out
  # 设置模型保存间隔 默认3
  save_epoch_step: 1
  # 设置模型评估间隔
  # 2000 表示每2000次迭代评估一次，[1000， 2000]表示从1000次迭代开始，每2000次评估一次
  eval_batch_step: [0, 200]
  # 设置是否在训练过程中评估指标，此时评估的是模型在当前batch下的指标
  cal_metric_during_train: True
  # 设置预训练模型是否是静态图模式保存(目前仅检测算法需要)
  # load_static_weights: True
  # 设置加载预训练模型路径
  pretrained_model:   D:\\code\\ggggg\\ocrTrain\\paddle\\PaddleOCR\\pretrain_models\\ch_ppocr_server_v2.0_rec_pre\\best_accuracy
  # 加载模型参数路径 用于中断后加载参数继续训练
  checkpoints: 
  save_inference_dir:
  # 设置是否启用visualdl进行可视化log展示
  use_visualdl: False
  # 设置预测图像路径或文件夹路径
  infer_img: D:\\code\\ggggg\\ocrTrain\\paddle\\PaddleOCR\\mnist\\9_4.png
  # 设置字典路径 如果为空，则默认使用小写字母+数字作为字典
  character_dict_path: D:\\code\\ggggg\\ocrTrain\\paddle\\PaddleOCR\\mnist\\number.txt
  character_type: en
  # 设置文本最大长度
  max_text_length: 16
  infer_mode: False
  # 设置是否识别空格
  use_space_char: False
  distort: True

Optimizer:
  # 优化器类名 目前支持Momentum,Adam,RMSProp
  name: Adam
  # 设置一阶矩估计的指数衰减率
  beta1: 0.9
  # 设置二阶矩估计的指数衰减率
  beta2: 0.999
  # 设置学习率decay方式
  lr:
    #学习率decay类名
    name: Cosine
    #基础学习率
    learning_rate: 0.001
  #设置网络正则化方式  
  regularizer:
    #正则化类名
    name: 'L2'
    #正则化系数
    factor: 0.00001

Architecture:
  #网络类型 目前支持rec,det,cls
  model_type: rec
  #模型名称
  algorithm: CRNN
  #设置变换方式
  Transform:
  #设置网络backbone类名
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: small
    small_stride: [1, 2, 2, 2]
  Neck:
    name: SequenceEncoder
    encoder_type: rnn
    hidden_size: 48
  Head:
    name: CTCHead
    fc_decay: 0.00001

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

#训练配置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: D:\\code\\ggggg\\ocrTrain\\paddle\\dataset\\
    label_file_list: ["D:\\code\\ggggg\\ocrTrain\\paddle\\dataset\\train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - RecAug: 
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 28, 64]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 256
    drop_last: True
    num_workers: 0

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: D:\\code\\ggggg\\ocrTrain\\paddle\\dataset\\
    label_file_list: ["D:\\code\\ggggg\\ocrTrain\\paddle\\dataset\\test.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 28, 64]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 256
    num_workers: 0

进行训练，先训练100轮

1	`(paddle) D:\code\ggggg\ocrTrain\paddle\PaddleOCR\mnist>python ../tools/train.py -c ./mnist.yml`

继续训练

1	`python ../tools/train.py -c ./mnist.yml -o Global.checkpoints=./your/trained/model`

评估与测试

评估

1	`python ../tools/eval.py -c ./mnist.yml -o Global.pretrained_model=./out/best_accuracy`

直接对图片进行识别

1	`python ../tools/infer_rec.py -c mnist.yml -o Global.pretrained_model=./out/best_accuracy Global.load_static_weights=false Global.infer_img=5.png`

结果

[2023/02/01 14:49:53] ppocr INFO: load pretrain successful from ./out/best_accuracy
[2023/02/01 14:49:53] ppocr INFO: infer_img: 5.png
[2023/02/01 14:49:55] ppocr INFO:        result: 5      0.989963948726654
[2023/02/01 14:49:55] ppocr INFO: success!

转化为inferencre模型

使用tools/export_model.py工具

1	`python ../tools/export_model.py -c ./mnist.yml -o Global.pretrained_model=./out/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/mnist`

会得到三个文件

inference.pdiparams
inference.pdiparams.info
inference.pdmodel

可以用这个直接试试

python ../tools/infer/predict_rec.py --image_dir="5.png" --rec_model_dir="../inference/whl/rec/hw/ch_ppocr_server_v2.0_rec_pre_hw" --rec_char_dict_path="./number.txt" --use_space_char=False --enable_mkldnn=False --rec_image_shape="3, 28,64"

这里注意要将rec_image_shape设置成训练时约定的大小，否则结果是可能会和转化前测试结果不同的。还有就是这个模型没法识别透明背景的图

结果

1	`[2023/02/01 15:17:23] ppocr INFO: Predicts of 5.png:('5', 0.989963948726654)`

将新模型加入到paddleocr中

编辑paddleocr.py，先将识别模型存储地址修改

1 2	`# BASE_DIR = os.path.expanduser("~/.paddleocr/") BASE_DIR = os.path.expanduser("./inference/")`

给MODEL_URLS中的’rec’添加一个

'hw': {
    'url':
    'ch_ppocr_server_v2.0_rec_pre_hw.tar',
    'dict_path': './mnist/number.txt'
},

然后将模型放入下列文件夹中

1	`PaddleOCR\inference\whl\rec\hw\ch_ppocr_server_v2.0_rec_pre_hw`

还是由于上述原因，需要修改rec_image_shape，再paddleocr.py中找到这儿，改成3,28,64

1 2	`if params.ocr_version == 'PP-OCRv3': params.rec_image_shape = "3, 48, 320"`

识别

from paddleocr import PaddleOCR
if __name__ == "__main__" :
    ocr = PaddleOCR(use_angle_cls=False, use_gpu=True, lang="hw")
    img_path = "./mnist/5.png"
    result = ocr.ocr(img_path, det=False, cls=False)
    print(result)

结果

1	`[[('5', 0.989963948726654)]]`

图片透明部分变为白色

不得不再次感叹，copilot的厉害，自动提示出来的

1
2
3

image = cv2.imread("5_4.png",-1)
image[image[:,:,3]==0] = [255,255,255,255]
cv2.imwrite("5_4__1.png",image)

下面是我之前的方式

import cv2
import os
import numpy as np

# 修改透明背景为白色
def transparence2white(img):
    if(len(img[0,0]) < 4):
        return None
    for height in range(img.shape[1]):
        for width in range(img.shape[0]):
            point = img[width,height]
            if(point[3] == 0): #如果 A值为0则表示透明
                img[width,height] = [255,255,255,255] # 将其改为白色，且不透明
    return img

def changeImgTransparence2white(imgPath,savePath = None):
    # 判断该文件是否存在
    try :
        if os.path.exists(imgPath):
            img=cv2.imread(imgPath,-1)  # 读取图片。-1将图片透明度传入，数据由RGB的3通道变成4通道
            if(type(transparence2white(img)) is not np.ndarray):
                print("图片不是RGBA")
                return False
            else:
                if savePath != None:
                    print(savePath)
                    cv2.imwrite(savePath,img) 
                else:
                    filePath = os.path.join("./",os.path.splitext(imgPath)[0] + "_1" + os.path.splitext(imgPath)[1])
                    cv2.imwrite(filePath,img)
                return True
        else:
            print("图标不存在")
            return False
    except Exception as ex:
        print("其他异常：{}".format(ex))


if __name__  == '__main__':
    print(changeImgTransparence2white("5_4.png"))

RGBA转RGB

也即32深度变成24

1
2
3

img = cv2.imread("5_4.png",-1)
img = cv2.cvtColor(img,cv2.COLOR_RGBA2RGB)
cv2.imwrite("5_4__2.png",img)

技术类

#深度学习 #OCR

paddle的OCR

https://blog.kala.love/posts/9eb77f73/

作者

久远·卡拉

发布于

2022年12月30日

许可协议

Tesseract文本识别引擎上一篇

聊天机器人的探索笔记下一篇