20. 案例实战——数字验证码识别

》》点赞，收藏+关注，理财&技术不迷路《《

Tesserct-OCR是接收image的方式，image这个方式是PIL处理过的image，而不是我们opencv中数组类型的image。

"""

验证码识别

1.步骤：

1. 预处理-去除干扰线和点

2.不同的结构元素中选择

3. Image和numpy array相互转换

4. 识别和输出 tess.image_to_string

2. 报错与处理

当出现该错误：raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

不同系统采用不同策略：

On Linux

sudo apt update

sudo apt install tesseract-ocr

sudo apt install libtesseract-dev

On Mac

brew install tesseract

On Windows

先下载tesseract包：https://github.com/UB-Mannheim/tesseract/wiki.

然后修改pytesseract.py中tesseract_cmd指向的路径：tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'

references: https://pypi.org/project/pytesseract/ (INSTALLATION section) and https://github.com/tesseract-ocr/tesseract/wiki#installation

"""

发现3*3的结构元素处理起来效果不是很好，那我们换成2*2的结构元素来处理。

1*2：

2*1：

验证码的识别：

最终代码：

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。