C#使用Tesseract OCR的坑

最近使用Tesseract进行文字识别（VS2019 C#），按照官网以及杜娘上的说明使用，代码如下：

var ocr = new TesseractEngine(Application.StartupPath+ @"\traineddata", "chi_sim", EngineMode.Default);

Page pages = ocr.Process(new Bitmap(Application.StartupPath + @"\01.jpg"));

F5执行时一直提示“Tesseract.TesseractException:“Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details.”

官网解释为语言包路径或文件问题。查遍全网，没有解决（中文的都类似，不能解决；E文的看不懂，汗！～）

没办法，试着看看源码吧！

打开源码，查找“TesseractEngine”，相关注释如下：

/// The <paramref name="datapath"/> parameter should point to the directory that contains the 'tessdata' folder

/// for example if your tesseract language data is installed in <c>C:\Tesseract\tessdata</c> the value of datapath should

/// be <c>C:\Tesseract</c>. Note that tesseract will use the value of the <c>TESSDATA_PREFIX</c> environment variable if defined,

/// effectively ignoring the value of <paramref name="datapath"/> parameter.

娘嘞～语言包所在文件夹名必须为“tessdata”，且使用时为上层文件夹路径！～

找到原因了，马上动手改！

把语言包文件夹名改为“tessdata”

代码修改如下：

var ocr = new TesseractEngine(Application.StartupPath, "chi_sim", EngineMode.Default);

Page pages = ocr.Process(new Bitmap(Application.StartupPath + @"\01.jpg"));

textBox1.Text = pages.GetText();

保存，F5运行，成功。

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

C#使用Tesseract OCR的坑

C#使用Tesseract OCR的坑

相关阅读更多精彩内容

友情链接更多精彩内容