从KD树到聚类：手写一个C++欧几里得聚类算法，彻底搞懂点云分割的底层逻辑-酒店常州论坛

从KD树到聚类：手写一个C++欧几里得聚类算法，彻底搞懂点云分割的底层逻辑

在自动驾驶和机器人感知领域，点云处理是环境理解的核心技术之一。当我们面对数百万个无序的激光雷达点时，如何高效地识别出车辆、行人等物体？这就是点云聚类算法要解决的关键问题。本文将带您从零开始，用标准C++实现KD树和欧几里得聚类算法，揭开点云分割的神秘面纱。

1. KD树：空间索引的高效之道

1.1 理解KD树的核心思想

KD树（k-dimensional tree）是一种空间划分数据结构，特别适合处理多维数据的近邻搜索。想象一下图书馆的图书分类系统——先按学科分区，再按作者姓氏排序，这种分层分类的思想正是KD树的精髓所在。

在2D空间中，KD树的构建过程可以这样描述：

选择x轴作为第一个划分维度
找到当前维度上的中值点作为分割点
将空间划分为左右两部分
在子空间中切换维度（y轴），重复上述过程

struct Node { std::vector<float> point; // 存储点的坐标 int id; // 点的唯一标识 Node* left; // 左子树 Node* right; // 右子树 Node(std::vector<float> arr, int setId) : point(arr), id(setId), left(nullptr), right(nullptr) {} };

1.2 KD树的插入与搜索实现

插入操作需要考虑当前划分的维度。对于2D情况，我们交替使用x和y轴：

void insert(std::vector<float> point, int id) { int depth = 0; Node** current = &root; while (*current != nullptr) { int dim = depth % 2; // 0 for x, 1 for y if (point[dim] < (*current)->point[dim]) { current = &((*current)->left); } else { current = &((*current)->right); } depth++; } *current = new Node(point, id); }

搜索邻近点时，KD树的效率优势就显现出来了。相比暴力搜索的O(n)复杂度，KD树平均能达到O(log n)：

std::vector<int> search(std::vector<float> target, float distanceTol) { std::vector<int> ids; searchHelper(root, target, ids, 0, distanceTol); return ids; } void searchHelper(Node* node, std::vector<float>& target, std::vector<int>& ids, int depth, float distanceTol) { if (node == nullptr) return; // 检查当前节点是否在目标范围内 bool inRange = true; for (int i = 0; i < 2; ++i) { if (fabs(node->point[i] - target[i]) > distanceTol) { inRange = false; break; } } if (inRange) { float dist = sqrt(pow(node->point[0]-target[0], 2) + pow(node->point[1]-target[1], 2)); if (dist <= distanceTol) ids.push_back(node->id); } // 决定搜索方向 int dim = depth % 2; if (target[dim] - distanceTol < node->point[dim]) { searchHelper(node->left, target, ids, depth+1, distanceTol); } if (target[dim] + distanceTol > node->point[dim]) { searchHelper(node->right, target, ids, depth+1, distanceTol); } }

2. 欧几里得聚类算法原理

2.1 从区域生长到点云聚类

欧几里得聚类算法的核心思想类似于图像处理中的区域生长算法。从一个种子点出发，逐步"吞并"邻近的点，直到没有新的点可以加入当前簇。这个过程用递归可以优雅地实现：

void proximity(int pointId, const std::vector<std::vector<float>>& points, std::vector<int>& cluster, std::vector<bool>& processed, KdTree* tree, float distanceTol) { processed[pointId] = true; cluster.push_back(pointId); std::vector<int> nearby = tree->search(points[pointId], distanceTol); for (int id : nearby) { if (!processed[id]) { proximity(id, points, cluster, processed, tree, distanceTol); } } }

2.2 完整聚类流程实现

将上述过程应用到所有未处理的点上，就能得到完整的聚类结果：

std::vector<std::vector<int>> euclideanCluster( const std::vector<std::vector<float>>& points, KdTree* tree, float distanceTol) { std::vector<std::vector<int>> clusters; std::vector<bool> processed(points.size(), false); for (int i = 0; i < points.size(); ++i) { if (processed[i]) continue; std::vector<int> cluster; proximity(i, points, cluster, processed, tree, distanceTol); clusters.push_back(cluster); } return clusters; }

3. 从2D到3D：扩展空间维度

3.1 3D KD树的实现调整

将算法扩展到3D空间只需做少量修改。主要变化在于：

维度循环变为3个（x,y,z）
距离计算增加z分量
节点结构保持不变

// 修改搜索函数中的距离计算 float dist = sqrt( pow(node->point[0]-target[0], 2) + pow(node->point[1]-target[1], 2) + pow(node->point[2]-target[2], 2));

3.2 3D点云聚类的可视化思考

理解3D聚类时，可以想象在点云中"灌水"的过程：

水从某个点开始蔓延
遇到距离阈值就停止扩散
最终形成的"水洼"就是一个聚类

这种可视化方法有助于理解算法的空间扩展特性。

4. 性能优化与实践技巧

4.1 平衡KD树的构建策略

不平衡的KD树会严重影响搜索效率。实践中可以采用以下策略：

在构建时选择中值作为分割点
当树深度过大时进行再平衡
考虑使用近似中值来提高构建速度

void buildBalanced(std::vector<std::vector<float>>& points, std::vector<int>& ids) { root = buildHelper(points, ids, 0); } Node* buildHelper(std::vector<std::vector<float>>& points, std::vector<int>& ids, int depth) { if (points.empty()) return nullptr; int dim = depth % 3; // 对于3D情况 int median = points.size() / 2; // 按当前维度排序 auto comparator = [dim](const std::vector<float>& a, const std::vector<float>& b) { return a[dim] < b[dim]; }; std::sort(points.begin(), points.end(), comparator); std::sort(ids.begin(), ids.end(), [&points](int a, int b) { return points[a][0] < points[b][0]; }); Node* node = new Node(points[median], ids[median]); // 递归构建左右子树 std::vector<std::vector<float>> leftPoints( points.begin(), points.begin()+median); std::vector<std::vector<float>> rightPoints( points.begin()+median+1, points.end()); std::vector<int> leftIds(ids.begin(), ids.begin()+median); std::vector<int> rightIds(ids.begin()+median+1, ids.end()); node->left = buildHelper(leftPoints, leftIds, depth+1); node->right = buildHelper(rightPoints, rightIds, depth+1); return node; }

4.2 实际工程中的考量

在真实点云处理流水线中，还需要考虑：

降采样预处理：使用体素网格过滤减少点数
距离阈值选择：根据传感器特性和场景调整
并行化处理：对大规模点云使用多线程

// 简单的体素降采样示例 void voxelFilter(const std::vector<std::vector<float>>& points, float leafSize, std::vector<std::vector<float>>& filtered) { std::unordered_map<std::string, std::vector<float>> voxelMap; for (const auto& p : points) { int x = static_cast<int>(p[0] / leafSize); int y = static_cast<int>(p[1] / leafSize); int z = static_cast<int>(p[2] / leafSize); std::string key = std::to_string(x) + "_" + std::to_string(y) + "_" + std::to_string(z); if (voxelMap.find(key) == voxelMap.end()) { voxelMap[key] = p; } } for (const auto& pair : voxelMap) { filtered.push_back(pair.second); } }

5. 算法对比与进阶思考

5.1 KD树与其他空间索引结构对比

数据结构	构建复杂度	查询复杂度	适用场景
KD树	O(n log n)	O(log n)	低维空间
八叉树	O(n log n)	O(log n)	3D空间
R树	O(n log n)	O(log n)	动态数据
暴力搜索	O(1)	O(n)	小数据集

5.2 从欧几里得到DBSCAN：聚类算法的演进

欧几里得聚类实际上是DBSCAN算法在固定阈值下的特例。更先进的聚类算法可以考虑：

自适应距离阈值
密度变化感知
多尺度聚类

// 自适应阈值聚类的伪代码 void adaptiveCluster(const PointCloud& cloud, float minDist, float maxDist, int minPoints) { for (const auto& point : cloud) { if (visited(point)) continue; float radius = estimateLocalDensity(point); radius = std::clamp(radius, minDist, maxDist); Cluster cluster; expandCluster(point, cluster, radius, minPoints); if (cluster.size() >= minPoints) { saveCluster(cluster); } } }

在实现这些算法时，KD树仍然是加速邻近搜索的利器。理解KD树和基础聚类算法后，读者可以更轻松地掌握PCL等库中的高级功能，也能根据特定需求定制优化方案。

企业官网建设流程全解析