绿色上网,从小孩抓起。
没有长篇大论,也不讲茴香豆的茴有多少种写法。
PaddleOCR 是百度开源的OCR(光学文本识别)计算机视觉AI模型,在中文识别领域可以说是”遥遥领先于同行“的存在,感谢百度开源,Respect !
这些优秀的AI模型都有一个共同点,好像没有C#的示例。
没有关系,我们dotnet er还是会把它造出来。
该系列总共有三个模型,onnx转换在paddle文档里有。
- det.onnx Detection 文本检测模型。
- cls.onnx Classification 文本方向分类模型。
- rec.onnx Recognition 文本识别模型。
这篇文章给大家汇报的是第一个模型:det.onnx Detection 文本检测模型,的推理过程和代码实现,不喜欢撸代码的小伙伴请看前半段。
先看结果,稳定军心
模型的输入输出
平均值和标准差在训练的配置文件中可以找到
- 平均值(mean): [0.485f, 0.456f, 0.406f]
-
标准差(stddev): [0.229f, 0.224f, 0.225f]
Inputs float[p2o.DynamicDimension.0,3,p2o.DynamicDimension.1,p2o.DynamicDimension.2]
- p2o.DynamicDimension.0 批次大小,动态
- 3 通道数 (排列方式BGR)
- p2o.DynamicDimension.1 图像宽度,动态
- p2o.DynamicDimension.2 图像高度,动态
组合起来是一个四维数组,结合均值和标准差举个例子
[
[
[
[ (B / 255f - mean) / stddev, (G / 255f - mean) / stddev, (R / 255f - mean) / stddev ]
]
]
]
Ouputs float[p2o.DynamicDimension.3,1,p2o.DynamicDimension.4,p2o.DynamicDimension.5]
- p2o.DynamicDimension.3 批次大小,动态
- 1 通道数
- p2o.DynamicDimension.4 图像宽度,动态
- p2o.DynamicDimension.5 图像高度,动态
组合起来是一个四维数组,举个例子
[
[
[
[ 0f, 0.2f, 0.3f ]
]
]
]
- 0f 图像坐标 [0,0] 处是文本的概率
- 0.2f 图像坐标 [0,1] 处是文本的概率
- 0.3f 图像坐标 [0,2] 处是文本的概率
----------华丽的分割线 ---------
理论结束了,代码稍微难一点,C#依旧稳定发挥
请先准备两个包:
- SixLabors.ImageSharp 处理图像
- Onnxruntime 模型推理
图像预处理
private static Tensor<float> ImageToTensor(Image<Rgb24> image)
{
float[] mean = [0.485f, 0.456f, 0.406f];
float[] stddev = [0.229f, 0.224f, 0.225f];
Tensor<float> result = new DenseTensor<float>([1, 3, image.Height, image.Width]);
for (int w = 0; w < image.Width; w++)
{
for (int h = 0; h < image.Height; h++)
{
var pixel = image[w, h];
result[0, 0, h, w] = (pixel.R / 255f - mean[0]) / stddev[0];
result[0, 1, h, w] = (pixel.G / 255f - mean[1]) / stddev[1];
result[0, 2, h, w] = (pixel.B / 255f - mean[2]) / stddev[2];
}
}
return result;
}
Points to Rectangles ,广度优先搜索
private static List<RectangleF> GetBoundingBoxes(bool[,] binaryMap)
{
var boxes = new List<RectangleF>();
int height = binaryMap.GetLength(0);
int width = binaryMap.GetLength(1);
bool[,] visited = new bool[height, width];
for (int h = 0; h < height; h++)
{
for (int w = 0; w < width; w++)
{
if (binaryMap[h, w] && !visited[h, w])
{
RectangleF box = ExpandBoundingBox(binaryMap, visited, h, w);
boxes.Add(box);
}
}
}
return boxes;
}
private static RectangleF ExpandBoundingBox(bool[,] binaryMap, bool[,] visited, int startH, int startW)
{
int minH = startH, maxH = startH;
int minW = startW, maxW = startW;
Stack<(int h, int w)> stack = new();
stack.Push((startH, startW));
while (stack.Count > 0)
{
(int h, int w) = stack.Pop();
if (h < 0 || h >= binaryMap.GetLength(0) || w < 0 || w >= binaryMap.GetLength(1))
{
continue;
}
if (visited[h, w] || !binaryMap[h, w])
{
continue;
}
visited[h, w] = true;
minH = Math.Min(minH, h);
maxH = Math.Max(maxH, h);
minW = Math.Min(minW, w);
maxW = Math.Max(maxW, w);
stack.Push((h - 1, w));
stack.Push((h + 1, w));
stack.Push((h, w - 1));
stack.Push((h, w + 1));
}
float x = minW - (maxH - minH);
float y = minH - (maxH - minH);
float width = maxW - minW + (maxH - minH) * 2;
float height = (maxH - minH) * 3;
return new RectangleF(x, y, width, height);
}
模型推理、结果二值化处理
public IEnumerable<RectangleF> Detection(Image<Rgb24> image, float threshold = 0.95f)
{
using var inputImage = image.Clone(x => x.Resize(new ResizeOptions
{
Size = new Size(IM_WIDTH, IM_HEIGHT),
Mode = ResizeMode.Pad
}));
var input = ImageToTensor(inputImage);
List<NamedOnnxValue> inputs = [NamedOnnxValue.CreateFromTensor(INPUT_NAME, input)];
using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = _detSession.Run(inputs);
Tensor<float> output = results[0].AsTensor<float>();
bool[,] binaryMap = new bool[IM_HEIGHT, IM_WIDTH];
for (int h = 0; h < IM_HEIGHT; h++)
{
for (int w = 0; w < IM_WIDTH; w++)
{
float score = output[0, 0, h, w];
binaryMap[h, w] = score > threshold;
}
}
(float neww, float newh, float rate) = CalculateTransform(image.Width, image.Height, inputImage.Width, inputImage.Height);
float offsetX = (inputImage.Width - neww) / 2f / rate;
float offsetY = (inputImage.Height - newh) / 2f / rate;
List<RectangleF> rectangles = GetBoundingBoxes(binaryMap);
foreach (var item in rectangles)
{
var rectangle = new RectangleF
{
X = item.X / rate - offsetX,
Y = item.Y / rate - offsetY,
Width = item.Width / rate,
Height = item.Height / rate
};
yield return rectangle;
}
}