摘要
本文档记录了本人如何使用tesseract_ocr实现字符识别功能。该技术文档包括函数解释与工程实例,如需转载,请注明引用。
工程实例
这里将tesseract_ocr的调用分成了三个子函数,分别是init_tesseract()、ocr()和end_tesseract()。
void init_tesseract(tesseract::TessBaseAPI *api)
{
/*本函数实现tesseract api的初始化功能,包括语言包及路径指定、识别模式、白名单设置、图片分割模式*/
api->Init("NULL", "eng", tesseract::OEM_TESSERACT_ONLY);//
api->SetVariable("tessedit_char_whitelist", "0123456789");//白名单,即先验识别范围
/*
NULL为可指定路径
eng为语言包名称
第三个参数为OCR引擎模式
0 =仅限原始Tesseract tesseract::OEM_TESSERACT_ONLY
1 =只有神经网络LSTM tesseract::OEM_CUBE_ONLY
2 =Tesseract + LSTM OEM_TESSERACT_CUBE_COMBINED
3 =基于可用的默认值 tesseract::OEM_DEFAULT
白名单不支持LSTM
*/
api->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
/*
PSM_OSD_ONLY Orientation and script detection only.
PSM_AUTO_OSD Automatic page segmentation with orientation and script detection. (OSD)
PSM_AUTO_ONLY Automatic page segmentation, but no OSD, or OCR.
PSM_AUTO Fully automatic page segmentation, but no OSD.
PSM_SINGLE_COLUMN Assume a single column of text of variable sizes.
PSM_SINGLE_BLOCK_VERT_TEXT Assume a single uniform block of vertically aligned text.
PSM_SINGLE_BLOCK Assume a single uniform block of text. (Default!)
PSM_SINGLE_LINE Treat the image as a single text line.
PSM_SINGLE_WORD Treat the image as a single word.
PSM_CIRCLE_WORD Treat the image as a single word in a circle.
PSM_SINGLE_CHAR Treat the image as a single character.
PSM_COUNT Number of enum entries.
*/
}
char* ocr(tesseract::TessBaseAPI *api, cv::Mat inputImg, float &conf)
{
char* showtxt;
api->SetImage((uchar*)inputImg.data, inputImg.cols, inputImg.rows, inputImg.channels(), inputImg.step);//
//api->SetRectangle(0, 0, inputImg.cols, inputImg.rows);
//Boxa* boxes = api.GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
/*
enum PageIteratorLevel {
RIL_BLOCK, // Block of text/image/separator line.
RIL_PARA, // Paragraph within a block.
RIL_TEXTLINE, // Line within a paragraph.
RIL_WORD, // Word within a textline.
RIL_SYMBOL // Symbol/character within a word.
};
*/
//api.SetAccuracyVSpeed(tesseract:);
//api.SetOutputName("out");
showtxt=api->GetUTF8Text();//Get the text
conf = api->MeanTextConf();//置信值
return showtxt;
}
void end_tesseract(tesseract::TessBaseAPI *api)
{
api->Clear();
api->End();
}
本文仅介绍前两个子函数,
* init_tesseract()
*api->Init()有三个参数,第一个参数是语言包路径,可自行设置。第二个为需要加载语言包的名字,可以加载多个语言包,例如"eng+chi_sim"。第三个参数为OCR引擎模式。需要注意的是白名单仅不支持LSTM。通过设定白名单可以设定识别范围,上述程序中识别结果仅从0123456789中选择。
*api->setPageSegMode()