什么是空间索引

空间索引是一种用于优化空间数据查询的数据结构和算法。与传统的一维数据索引（如B树）不同，空间索引专门为多维数据（如二维或三维坐标）设计。其核心思想是将空间对象（如点、线、多边形）组织起来，以便能够快速地“裁剪”掉与查询无关的大量数据，从而显著减少需要进行精确计算的对象数量。

使用空间索引优化目标检测可视化文字重叠

代码

#ifndef POSITION_HPP__
#define POSITION_HPP__

#include <algorithm>
#include <functional>
#include <string>
#include <tuple>
#include <utility>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <list>

template <typename T>
struct Box
{
    T l, t, r, b;
};

template <typename T>
constexpr T area(const Box<T> &b)
{
    return std::max(T(0), b.r - b.l) * std::max(T(0), b.b - b.t);
}

template <typename T>
constexpr T intersectionArea(const Box<T> &a, const Box<T> &b)
{
    const T w = std::max(T(0), std::min(a.r, b.r) - std::max(a.l, b.l));
    const T h = std::max(T(0), std::min(a.b, b.b) - std::max(a.t, b.t));
    return w * h;
}

template <typename T>
inline float computeIoU(const Box<T> &a, const Box<T> &b)
{
    const T inter = intersectionArea(a, b);
    const T unionArea = area(a) + area(b) - inter;
    return (unionArea > 0) ? static_cast<float>(inter) / unionArea : 0.f;
}

template <typename T>
inline float computeOverlap(const Box<T> &a, const Box<T> &b)
{
    const T inter = intersectionArea(a, b);
    const T minArea = std::min(area(a), area(b));
    return (minArea > 0) ? static_cast<float>(inter) / minArea : 0.f;
}

template <typename T>
class SpatialIndex
{
public:
    SpatialIndex(int gridSize = 100) : gridSize(gridSize) {}

    void insert(const Box<T> &box)
    {
        forEachCell(box, [&](Key key)
                    { grid[key].push_back(&boxRefStorage.emplace_back(box)); });
    }

    void clear()
    {
        grid.clear();
        boxRefStorage.clear();
    }

    std::vector<const Box<T> *> query(const Box<T> &region) const
    {
        std::vector<const Box<T> *> results;
        std::unordered_set<const Box<T> *> seen;
        forEachCell(region, [&](Key key)
                    {
            auto it = grid.find(key);
            if (it != grid.end()) {
                for (auto* b : it->second) {
                    if (seen.find(b) == seen.end()) {
                        results.push_back(b);
                        seen.insert(b);
                    }
                }
            } });
        return results;
    }

private:
    struct Key
    {
        int x, y;
        bool operator==(const Key &o) const { return x == o.x && y == o.y; }
    };
    struct KeyHash
    {
        std::size_t operator()(const Key &k) const
        {
            return std::hash<int>()(k.x) ^ (std::hash<int>()(k.y) << 1);
        }
    };

    void forEachCell(const Box<T> &box, const std::function<void(Key)> &fn) const
    {
        int gx0 = static_cast<int>(box.l) / gridSize;
        int gy0 = static_cast<int>(box.t) / gridSize;
        int gx1 = static_cast<int>(box.r) / gridSize;
        int gy1 = static_cast<int>(box.b) / gridSize;
        for (int gx = gx0; gx <= gx1; ++gx)
            for (int gy = gy0; gy <= gy1; ++gy)
                fn({gx, gy});
    }

    int gridSize;
    std::unordered_map<Key, std::vector<const Box<T> *>, KeyHash> grid;
    std::list<Box<T>> boxRefStorage;
};

template <typename T>
class PositionManager
{
private:
    std::function<std::tuple<int, int, int>(const std::string &)> getFontSizeFunc;
    SpatialIndex<T> index;

    static Box<T> tupleToBox(const std::tuple<T, T, T, T> &tpl)
    {
        return {std::get<0>(tpl), std::get<1>(tpl), std::get<2>(tpl), std::get<3>(tpl)};
    }

public:
    template <typename Func>
    PositionManager(Func &&fontSizeFunc, int gridSize = 100)
        : getFontSizeFunc(std::forward<Func>(fontSizeFunc)), index(gridSize) {}

    std::tuple<T, T> selectOptimalPosition(const std::tuple<T, T, T, T> &box,
                                           int canvasWidth, int canvasHeight,
                                           const std::string &text)
    {
        int textWidth, textHeight, baseline;
        std::tie(textWidth, textHeight, baseline) = getFontSizeFunc(text);

        auto candidates = findCandidatePositions(box, canvasWidth, canvasHeight,
                                                 textWidth, textHeight, baseline);

        if (candidates.empty())
        {
            // 如果一个候选位置都没有（例如文本框比画布还大），需要一个安全的回退策略
            Box<T> fallbackBox = {T(0), T(0), T(textWidth), T(textHeight)};
            index.insert(fallbackBox);
            return {fallbackBox.l, fallbackBox.t + textHeight};
        }

        float minIoU = 1.1;
        Box<T> best = tupleToBox(candidates.front());

        for (const auto &cposition : candidates)
        {
            Box<T> cb = tupleToBox(cposition);
            float maxIoU = 0.f;

            // 查询时只与可能碰撞的 Box 计算 IoU
            for (auto *m : index.query(cb))
            {
                maxIoU = std::max(maxIoU, computeIoU(cb, *m));
            }

            if (maxIoU == 0.f) // 找到完全不重叠的位置，直接采纳
            {
                best = cb;
                minIoU = 0.f;
                break;
            }

            if (maxIoU < minIoU)
            {
                minIoU = maxIoU;
                best = cb;
            }
        }
        index.insert(best);
        return {best.l, best.t + textHeight};
    }

    void clearMarkedPositions()
    {
        index.clear();
    }

    std::vector<std::tuple<T, T, T, T>> findCandidatePositions(
        const std::tuple<T, T, T, T> &box, int canvasWidth,
        int canvasHeight, int textWidth, int textHeight, int baseline)
    {
        std::vector<std::tuple<T, T, T, T>> candidates;
        candidates.reserve(10);

        T left, top, right, bottom;
        std::tie(left, top, right, bottom) = box;

        Box<T> canvas{0, 0, T(canvasWidth), T(canvasHeight)};

        std::vector<std::tuple<T, T, T, T>> positions = {
            {left, top - textHeight - baseline, left + textWidth, top},
            {right, top, right + textWidth, top + textHeight + baseline},
            {left - textWidth, top, left, top + textHeight + baseline},
            {left, bottom, left + textWidth, bottom + textHeight + baseline},
            {right - textWidth, top - textHeight - baseline, right, top},
            {right - textWidth, bottom, right, bottom + textHeight + baseline},
            {left, top, left + textWidth, top + textHeight + baseline},
            {right - textWidth, top, right, top + textHeight + baseline},
            {right - textWidth, bottom - textHeight - baseline, right, bottom},
            {left, bottom - textHeight - baseline, left + textWidth, bottom}};

        for (const auto &p : positions)
        {
            Box<T> pb = tupleToBox(p);
            // 使用直接的坐标比较代替浮点数运算，更清晰且稳健
            if (pb.l >= canvas.l && pb.t >= canvas.t && pb.r <= canvas.r && pb.b <= canvas.b)
            {
                candidates.push_back(p);
            }
        }
        if (candidates.empty())
        {
            // 如果所有预设位置都不在画布内，尝试一个基本位置
            Box<T> baseBox = {left, top, left + textWidth, top + textHeight + baseline};
            if (intersectionArea(canvas, baseBox) > 0)
            { // 只要有部分重叠即可
                candidates.push_back({left, top, left + textWidth, top + textHeight + baseline});
            }
        }
        return candidates;
    }
};

#endif // POSITION_HPP__

空间索引的实现：SpatialIndex<T> 类

代码中的 SpatialIndex 类就是空间索引的实现。它将二维空间划分为一个虚拟的网格（像棋盘一样），然后记录每个物体（Box<T>）占据了哪些网格单元。
关键组成部分：

gridSize: 定义了网格单元的大小。例如，gridSize = 100 意味着整个空间被划分成一个个 100x100 像素的方块。
grid (std::unordered_map): 这是索引的核心数据结构。
- 键（Key）: 一个 Key 结构体，包含网格坐标 (x, y)。这就像棋盘上每个格子的坐标。
- 值（Value）: 一个 std::vector<const Box<T> *>，存储了所有与该网格单元相交的 Box 对象的指针。
boxRefStorage (std::list): grid 中存储的是指针，需要一个地方来实际存放 Box 对象。std::list 被选用是因为向其添加新元素不会导致已有元素的内存地址变化，从而保证了 grid 中存储的指针持续有效。

核心方法：

insert(const Box<T> &box) - 插入/注册
当一个 Box 被插入时，forEachCell 方法会计算出这个 Box 的边界覆盖了哪些网格单元。
例如，一个框被插入时，它会同时覆盖 (0,0), (0,1), (1,0), (1,1) 这四个网格单元。
然后，代码会将这个 Box 的指针添加到所有这些被覆盖的网格单元的列表中。

before_after.png

上图中，红色矩形框被插入时，它的指针会被同时添加到它覆盖的四个网格单元（A, B, C, D）的列表中。

query(const Box<T> &region) const - 查询
当需要查询某个区域 region 内可能有哪些 Box 时，代码同样使用 forEachCell 计算出 region 覆盖了哪些网格单元。
它只需要检查这些被覆盖的网格单元，将这些单元列表中的所有 Box 指针收集起来。
使用 std::unordered_set 来确保同一个 Box（可能跨越多个被查询的网格）只被返回一次。

PositionManager 如何利用索引进行优化

PositionManager 类的目标是为新文本找到一个与已存在文本框重叠最小的最佳位置。
未优化前的低效做法：
想象一下，如果没有 SpatialIndex，在 selectOptimalPosition 方法中，对于每一个候选位置（cposition），你都需要将它与所有已经放置的文本框（markedPositions）进行比较，计算 IoU（交并比）。伪代码如下：

// 低效做法
for (const auto& candidate_box : candidates) {
    float maxIoU = 0.f;
    // 必须遍历所有已存在的框
    for (const auto& marked_box : markedPositions) {
        maxIoU = std::max(maxIoU, computeIoU(candidate_box, marked_box));
    }
    // ... 根据 maxIoU 判断优劣
}

如果已经有 1000 个文本框被放置，有 10 个候选位置，那么就需要进行 10 * 1000 = 10000 次 computeIoU 计算。随着已放置文本框数量的增加，性能会急剧下降。
使用 SpatialIndex 的高效做法：
代码中的 selectOptimalPosition 方法巧妙地利用了空间索引：
同步数据：每当一个最佳位置 best 被选定，它不仅被添加到 markedPositions 向量中，还通过 index.insert(best); 被注册到空间索引里。
局部化查询：在评估每个候选框 cb 时，代码执行了以下关键步骤：

// 查询时只与可能碰撞的 Box 计算 IoU
for (auto *m : index.query(cb))
{
    maxIoU = std::max(maxIoU, computeIoU(cb, *m));
}

这里的 index.query(cb) 是优化的核心。它不会返回所有已存在的框，而只会返回那些与候选框 cb 在空间上邻近（即在同一个或相邻的网格单元中）的框。
上图中，当查询蓝色候选框时，索引只会返回与它在同一网格单元 C 中的红色框，而完全忽略了远处其他网格中的框，从而避免了不必要的计算。

使用空间索引优化目标检测可视化文字重叠

什么是空间索引

使用空间索引优化目标检测可视化文字重叠

代码

空间索引的实现：SpatialIndex<T> 类

PositionManager 如何利用索引进行优化

推荐阅读更多精彩内容