最近使用Tesseract进行文字识别(VS2019 C#),按照官网以及杜娘上的说明使用,代码如下:
var ocr = new TesseractEngine(Application.StartupPath+ @"\traineddata", "chi_sim", EngineMode.Default);
Page pages = ocr.Process(new Bitmap(Application.StartupPath + @"\01.jpg"));
F5执行时一直提示“Tesseract.TesseractException:“Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details.”
官网解释 为语言包路径或文件问题。查遍全网,没有解决(中文的都类似,不能解决;E文的看不懂,汗!~)
没办法,试着看看源码吧!
打开源码,查找“TesseractEngine”,相关注释如下:
/// The <paramref name="datapath"/> parameter should point to the directory that contains the 'tessdata' folder
/// for example if your tesseract language data is installed in <c>C:\Tesseract\tessdata</c> the value of datapath should
/// be <c>C:\Tesseract</c>. Note that tesseract will use the value of the <c>TESSDATA_PREFIX</c> environment variable if defined,
/// effectively ignoring the value of <paramref name="datapath"/> parameter.
娘嘞~语言包所在文件夹名必须为“tessdata”,且使用时为上层文件夹路径!~
找到原因了,马上动手改!
把语言包文件夹名改为“tessdata”
代码修改如下:
var ocr = new TesseractEngine(Application.StartupPath, "chi_sim", EngineMode.Default);
Page pages = ocr.Process(new Bitmap(Application.StartupPath + @"\01.jpg"));
textBox1.Text = pages.GetText();
保存,F5运行,成功。