C3P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping

Xu Yang, Wenhao Li, Qijie Ge, Lulu Suo, Weijie Tang, Zhengyu Wei, Longxiang Huang, and Bo Wang All authors are with Deptrum Ltd. Wenhao Li and Qijie Ge contributed equally to this work. Corresponding author: Xu Yang [email protected]
Abstract

This work presents a compact, cumulative, and coalescible probabilistic voxel mapping method to enhance performance, accuracy, and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating them to update the uncertainty at every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-fold strategy. First, we introduce a compact point-free representation for probabilistic voxels and derive a cumulative update of the planar uncertainty without caching original point clouds. Our voxel structure only keeps track of a predetermined set of statistics for points that lie inside it. This method reduces the runtime complexity from O(MN)𝑂𝑀𝑁O(MN)italic_O ( italic_M italic_N ) to O(N)𝑂𝑁O(N)italic_O ( italic_N ) and the space complexity from O(N)𝑂𝑁O(N)italic_O ( italic_N ) to O(1)𝑂1O(1)italic_O ( 1 ) where M𝑀Mitalic_M is the number of iterations and N𝑁Nitalic_N is the number of points. Second, to further minimize memory usage and enhance mapping accuracy, we provide a strategy to dynamically merge voxels associated with the same physical planes by taking advantage of the geometric features in the real world. Rather than constantly scanning for these coalescible voxels at every iteration, our merging strategy accumulates voxels in a locality-sensitive hash and triggers merging lazily. On-demand merging reduces memory footprint with minimal computational overhead and improves localization accuracy thanks to cross-voxel denoising. Experiments exhibit 20% higher accuracy, 20% faster performance, and 70% lower memory consumption than the state-of-the-art.

I Introduction

LiDAR SLAM has been extensively studied in the past decade thanks to the wide availability of depth sensors. The direct and precise depth acquisition facilitates pose estimation and map reconstruction. Numerous works contribute to the advancement of LiDAR SLAM [1, 2, 3, 4, 5, 6, 7]. These works can be broadly divided into direct and indirect methods based on how they use raw point clouds.

Indirect methods explicitly extract geometric features like straight lines or planes from point clouds and perform localization and mapping with respect to the extracted features[1, 4, 7]. Although effective in creating sparse maps, feature extraction and correspondence association are computationally intensive and heavily rely on sensor-specific patterns like circular LiDAR scans.

In contrast, direct methods obviate the need for feature extraction and directly register raw points to the map[2, 6]. These approaches leverage all available raw points for mapping and create dense maps. Consequently, there is a growing demand for efficient map organization in real-time LiDAR SLAM, particularly on resource-constrained platforms like embedded devices.

Direct methods usually adopt one of the two kinds of map representations for point registration: irregular tree structures or regular voxel maps. K-d tree partitions the space into blocks of dynamic sizes. To locate a plane for pose estimation, a tree traverse searches for the nearest neighbors [6, 2]. The performance of traversal may degrade when the tree is heavily unbalanced. Tree re-balancing and maintenance are complicated and often cause unexpected global pauses for real-time tasks. In contrast, hash-based voxel maps are more effective and universal. Voxel map partitions the space into blocks of fixed size and locates them with a hash function. Recent works [5, 3] have demonstrated the outstanding efficiency of voxel maps. Given these advantages, voxel maps still face several challenges:

Refer to caption
Figure 1: Bias-variance tradeoff of voxel maps. (a) Small voxels capture details of the environment but are susceptible to noise, i.e., large variance (note the wiggling planes on the walls and the ground). (b) Large voxels are more resistant to noise but might lead to approximation bias (note the thick wall is over-smoothed into a plane).
  1. 1.

    Probabilistic voxel mapping requires keeping all points to update the map because an iteration on all points is required to recompute the uncertainty of voxel planes[3]. This non-cumulative process is both time-consuming and memory-inefficient.

  2. 2.

    A regular voxel structure cannot capture large geometric features, e.g., walls, floors, ceilings, in the real world. This inflexibility leads to a duplication in representing a single large feature with many voxels.

  3. 3.

    Bias-variance tradeoff on voxel sizes. Large voxels excel at collecting a substantial amount of points to reduce variance, but they tend to oversmooth the environment, resulting in a representation bias. Conversely, small voxels excel at capturing map details, i.e., lower bias, but are susceptible to noise, i.e., higher variance, especially when the point cloud is sparse. (see Fig. 1)

To address the above challenges, we propose C3P-VoxelMap with the following contributions:

  1. 1.

    We derive a compact point-free representation of plane uncertainty and a cumulative update scheme for each voxel. Our approach eliminates the recalculation of Jacobians for plane parameters w.r.t. each point and reduces the computation workload substantially. Moreover, the memory consumption of each voxel is independent of the number of points therein, resulting in significant memory savings.

  2. 2.

    We introduce a voxel merging strategy to dynamically adapt to large planar features in the real world. With voxel merging, a large planar feature only requires one or a few voxels to represent. Moreover, voxel merging improves mapping accuracy by estimating a single large feature with points from multiple voxels.

  3. 3.

    We propose an on-demand merging strategy with a locality-sensitive hash that triggers a merge operation only when enough coalescible voxels are aggregated. This strategy realizes voxel merging with minimal overhead and avoids brute-force search.

II Related Works

This section briefly reviews the related works on direct and voxel-based mapping methods in recent years and details the difference with our proposed C3P-VoxelMap.

Probabilistic voxel mapping originates from NDT [8] and has evolved into many variations [9, 10, 3]. [9] first proposed to model voxelized point clouds with Gaussian distributions and minimize the Mahalanobis distance between source points and the target voxel distributions, leading to an efficient inquiry and registration. LiTAMIN[11] incorporates NDT’s voxelization in the target point set, which is usually large in size while adopting a k-d tree for nearest neighbor search (NNS) in the source point set. LiTAMIN2[10] extends LiTAMIN by applying KL-divergence to measure the distribution-to-distribution distance.

Different from Fast-LIO2[6] that uses NNS of a k-d tree, Faster-LIO[5] points out that the strict NNS is unnecessary for LIO and utilizes the voxel structure with approximate NNS. To manage the unbounded growth of the map, incremental voxel pruning removes voxels that haven’t been recently used. Consequently, Faster-LIO achieves similar accuracy as Fast-LIO2[6] while being more efficient in computation and memory consumption.

In [6] and [5], local planes are treated as deterministic features. That is, the system only estimates the pose of a plane but not its uncertainty during state estimation. However, the pose of a plane is regressed from points from multiple LiDAR scans and is estimated in the global frame, thus, the uncertainty raised from both LiDAR measurements and ego-pose estimation contributes to the latent plane distribution. To address this issue, VoxelMap[3] proposes a probabilistic plane representation and explicitly parameterizes the plane as a multivariate function of all points. The uncertainty of a plane is jointly determined by the points and the ego-poses.

Probabilistic plane representation has demonstrated its effectiveness in improving mapping accuracy[3]. However, to update a plane with newly observed points, all past points are required to calculate the new uncertainty, which creates a big burden for both memory and computation. To enhance memory efficiency, VoxelMap++[12] replaces the 6-DoF plane parameters with a 3-DoF representation. Since each voxel contains only a single plane but hundreds of points, the memory usage is dominated by the points therein. Thus, the space savings from 3-DoF representation are limited. In this work, a compact representation and a cumulative plane update are proposed, eliminating the storage of past points and repeated computation.

To overcome the inability to adapt to large geometric features, [13] and [3] utilize a coarse-to-fine voxel hierarchy. A regular voxel hash map is used for the coarse level, and for finer granularity, each voxel is further subdivided into sub-voxels that are indexed by an octree. Representing larger or irregular features in this voxel hierarchy is still unmanageable. [12] proposes to merge voxels with similar plane parameters. This design shares some similarities with our on-demand merging, but there exist several key differences. First, instead of searching for coalescible voxels constantly with union-find [12], our on-demand merging is triggered only when enough mergeable voxels are gathered in the locality-sensitive hash. Therefore, the voxel merging overhead is much smaller. Second, our merging strategy incorporates cross-voxel updates of probabilistic planes with respect to the uncertainty from all voxels.

In sum, [6, 5, 3, 12] are the most relevant works to ours. The differences among these methods are listed in Table I. In terms of the state estimation, all the above methods are developed under the Iterative Error State Kalman filter (IESKF). Since this work mainly focuses on how to manage the voxel map compactly, we keep the state estimation algorithm the same. [6] gives a detailed introduction to IESKF.

TABLE I: Comparison of Methods
Map Structure Plane Rep. Plane Update
Fast-LIO2 Incremental K-d tree Det. No update
Faster-LIO Incremental Voxel Det. No update
VoxelMap Adaptive Voxel Prob. Non-cumulative
VoxelMap++ Mergeable Voxel Prob. Non-cumulative
Ours C3P Voxel Prob. Cumulative
  • Det. is short for Deterministic.

  • Prob. is short for Probabilistic.

III System Overview

C3P-VoxelMap incorporates a coalescible voxel map with probabilistic plane representation and optimizes the estimation with IESKF. The voxel map is a 3D grid of the space where each voxel stores a probabilistic plane extracted from point clouds aggregated across multiple frames. The uncertainty of a plane is represented by a covariance matrix, jointly determined by the noise of all past points and the associated camera poses.

IESKF optimizes probabilistic plane configurations and camera poses by minimizing the point-to-plane distance. This probabilistic representation provides a more accurate point-to-plane measurement model than deterministic representations, as it accounts for the plane’s uncertainty [3]. C3P-VoxelMap enables cumulative update to the covariance of probabilistic planes without storing and re-iterating on past point clouds (see Section IV).

To merge voxels representing the same physical planes on demand, voxels are hashed into buckets based on their locations and plane orientations. When a bucket accumulates sufficient voxels, small planes are merged into a large one. The uncertainty of the merged plane is updated using the same cumulative method as for individual voxel planes. Each voxel subsequently adopts the merged plane as its feature in the filtering process (see Section V).

IV Cumulative Probabilistic Update

IV-A Probabilistic Plane Representation

Following the formulation in probabilistic voxel mapping [3], our system employs planes as the feature in the voxel map, which exist ubiquitously in the real world. A probabilistic plane is composed of a normal vector 𝐧𝐧\mathbf{\mathbf{n}}bold_n, a center point 𝐪𝐪\mathbf{q}bold_q, and a covariance matrix 𝚺𝐧,𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}\ bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPTdenoting the uncertainty. The uncertainty model of a voxel plane considers both sensor noise and the estimated pose noise in the world frame.

Considering a point 𝐩𝐩\mathbf{p}bold_p as a 3-dimensional random variable sampled from a plane, the parameter of the plane can be formulated as a multi-variate function f𝑓fitalic_f of all points:

[𝐧gt,𝐪gt]Tsuperscriptsuperscript𝐧𝑔𝑡superscript𝐪𝑔𝑡𝑇\displaystyle\left[\mathbf{n}^{gt},\mathbf{q}^{gt}\right]^{T}[ bold_n start_POSTSUPERSCRIPT italic_g italic_t end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT italic_g italic_t end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT =f(𝐩1+𝜹𝐩1,𝐩2+𝜹𝐩2,,𝐩N+𝜹𝐩N)absent𝑓subscript𝐩1subscript𝜹subscript𝐩1subscript𝐩2subscript𝜹subscript𝐩2subscript𝐩𝑁subscript𝜹subscript𝐩𝑁\displaystyle=f\left(\mathbf{p}_{1}+\boldsymbol{\delta}_{{\mathbf{p}_{1}}},% \mathbf{p}_{2}+\boldsymbol{\delta}_{{\mathbf{p}_{2}}},\ldots,\mathbf{p}_{N}+% \boldsymbol{\delta}_{{\mathbf{p}_{N}}}\right)= italic_f ( bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_δ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + bold_italic_δ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , bold_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + bold_italic_δ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (1)
[𝐧,𝐪]T+i=1N𝐟𝐩i𝜹𝐩iabsentsuperscript𝐧𝐪𝑇superscriptsubscript𝑖1𝑁𝐟subscript𝐩𝑖subscript𝜹subscript𝐩𝑖\displaystyle\approx[\mathbf{n},\mathbf{q}]^{T}+\sum_{i=1}^{N}\frac{\partial% \mathbf{f}}{\partial\mathbf{p}_{i}}\boldsymbol{\delta}_{{\mathbf{p}_{i}}}≈ [ bold_n , bold_q ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_δ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where [𝐧gt,𝐪gt]superscript𝐧𝑔𝑡superscript𝐪𝑔𝑡\left[\mathbf{n}^{gt},\mathbf{q}^{gt}\right][ bold_n start_POSTSUPERSCRIPT italic_g italic_t end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT italic_g italic_t end_POSTSUPERSCRIPT ] denotes the ground truth plane, and 𝜹𝐩isubscript𝜹subscript𝐩𝑖\boldsymbol{\delta}_{\mathbf{p}_{i}}bold_italic_δ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT denotes the Gaussian distributed noise of the i𝑖iitalic_i-th point. Here, function f𝑓fitalic_f first computes the mean 𝐪𝐪\mathbf{q}bold_q and the covariance matrix 𝐀𝐀\mathbf{A}bold_A from the point clouds:

𝐪=1Ni=1N𝐩i𝐀=1Ni=1N𝐩i𝐩iT𝐪𝐪Tformulae-sequence𝐪1𝑁superscriptsubscript𝑖1𝑁subscript𝐩𝑖𝐀1𝑁superscriptsubscript𝑖1𝑁subscript𝐩𝑖superscriptsubscript𝐩𝑖𝑇superscript𝐪𝐪𝑇\mathbf{q}=\frac{1}{N}\sum_{i=1}^{N}{\mathbf{p}_{i}}\qquad\mathbf{A}=\frac{1}{% N}\sum_{i=1}^{N}{\mathbf{p}_{i}\mathbf{p}_{i}^{T}-\mathbf{q}\mathbf{q}^{T}}bold_q = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_A = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_qq start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (2)

then followed a singular value decomposition (SVD) of matrix 𝐀𝐀\mathbf{A}bold_A to obtain eigenvalues and eigenvectors:

𝐀=𝐔diag(λ1,λ2,λ3)𝐔T𝐔=[𝐮1,𝐮2,𝐮3]formulae-sequence𝐀𝐔diagsubscript𝜆1subscript𝜆2subscript𝜆3superscript𝐔𝑇𝐔subscript𝐮1subscript𝐮2subscript𝐮3\mathbf{A}=\mathbf{U}\operatorname{diag}(\lambda_{1},\lambda_{2},\lambda_{3})% \mathbf{U}^{T}\qquad\mathbf{U}=\left[\mathbf{u}_{1},\mathbf{u}_{2},\mathbf{u}_% {3}\right]bold_A = bold_U roman_diag ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) bold_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_U = [ bold_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] (3)

where λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are singular values in a descend order, 𝐮1subscript𝐮1\mathbf{u}_{1}bold_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐮2subscript𝐮2\mathbf{u}_{2}bold_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐮3subscript𝐮3\mathbf{u}_{3}bold_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are singular vectors. The normal 𝐧𝐧\mathbf{n}bold_n of the voxel plane is simply obtained by

𝐧=𝐮3𝐧subscript𝐮3\mathbf{n}=\mathbf{u}_{3}bold_n = bold_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (4)

The Jocabian of 𝐧𝐧\mathbf{n}bold_n, 𝐪𝐪\mathbf{q}bold_q w.r.t. each point 𝐩isubscript𝐩𝑖\mathbf{p}_{i}bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (1) is computed as follows[3]:

𝐧𝐩i=𝐔[(𝐩i𝐪)T𝐅1(𝐩i𝐪)T𝐅2(𝐩i𝐪)T𝐅3],𝐪𝐩i=diag(1N,1N,1N)formulae-sequence𝐧subscript𝐩𝑖𝐔delimited-[]superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅1superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅2superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅3𝐪subscript𝐩𝑖diag1𝑁1𝑁1𝑁\frac{\partial\mathbf{n}}{\partial\mathbf{p}_{i}}=\mathbf{U}\left[\begin{array% }[]{c}\left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{1}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{2}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{3}\end{array}\right],% \frac{\partial\mathbf{q}}{\partial{\mathbf{p}_{i}}}=\operatorname{diag}\left(% \frac{1}{N},\frac{1}{N},\frac{1}{N}\right)divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = bold_U [ start_ARRAY start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = roman_diag ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG , divide start_ARG 1 end_ARG start_ARG italic_N end_ARG , divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ) (5)
𝐅m={1N(λ3λm)(𝐮m𝐧T+𝐧𝐮mT),m3,𝟎1×3,m=3.\mathbf{F}_{m}=\left\{\begin{array}[]{cc}\frac{1}{N\left(\lambda_{3}-\lambda_{% m}\right)}\left(\mathbf{u}_{m}\mathbf{n}^{T}+\mathbf{n}\mathbf{u}_{m}^{T}% \right)&,m\neq 3,\\ \mathbf{0}_{1\times 3}&,m=3.\end{array}\right.bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_N ( italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG ( bold_u start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_n start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + bold_nu start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_CELL start_CELL , italic_m ≠ 3 , end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 1 × 3 end_POSTSUBSCRIPT end_CELL start_CELL , italic_m = 3 . end_CELL end_ROW end_ARRAY (6)

Thus, the plane uncertainty is a linear combination of all 𝚺𝐩𝒊subscript𝚺subscript𝐩𝒊\boldsymbol{\Sigma_{\mathbf{p}_{i}}}bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT:

𝚺𝐧,𝐪=i=1N𝐟𝐩i𝚺𝐩𝒊𝐟𝐩iT,𝐟𝐩i=[𝐧𝐩i,𝐪𝐩i]formulae-sequencesubscript𝚺𝐧𝐪superscriptsubscript𝑖1𝑁𝐟subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐟subscript𝐩𝑖𝑇𝐟subscript𝐩𝑖𝐧subscript𝐩𝑖𝐪subscript𝐩𝑖\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}=\sum_{i=1}^{N}\frac{\partial% \mathbf{f}}{\partial\mathbf{p}_{i}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{% \frac{\partial\mathbf{f}}{\partial\mathbf{p}_{i}}}^{T},\frac{\partial\mathbf{f% }}{\partial\mathbf{p}_{i}}=\left[\frac{\partial\mathbf{n}}{\partial\mathbf{p}_% {i}},\frac{\partial\mathbf{q}}{\partial\mathbf{p}_{i}}\right]bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = [ divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ] (7)

where 𝚺𝐩𝒊3×3subscript𝚺subscript𝐩𝒊superscript33\boldsymbol{\Sigma_{\mathbf{p}_{i}}}\in\mathbb{R}^{3\times 3}bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT denotes the covariance matrix of the i𝑖iitalic_i-th point in the world coordinate frame.

IV-B Cumulative Update

Upon the arrival of new points, both the pose 𝐧𝐧\mathbf{\mathbf{n}}bold_n, 𝐪𝐪\mathbf{q}bold_q, and uncertainty 𝚺𝐧,𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}\ bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPTof a voxel plane need to be updated. Updating 𝐪𝐪\mathbf{q}bold_q cumulatively w.r.t. a new point 𝐩𝐩\mathbf{p}bold_p is straightforward.

𝐪=NN+1𝐪+1N+1𝐩superscript𝐪𝑁𝑁1𝐪1𝑁1𝐩\mathbf{q}^{\prime}=\frac{N}{N+1}\mathbf{q}+\frac{1}{N+1}\mathbf{p}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_N end_ARG start_ARG italic_N + 1 end_ARG bold_q + divide start_ARG 1 end_ARG start_ARG italic_N + 1 end_ARG bold_p (8)

The update of 𝐧𝐧\mathbf{n}bold_n is based on an cumulative update of 𝐀𝐀\mathbf{A}bold_A

𝐀=NN+1(𝐀+𝐪𝐪T)+1N+1𝐩𝐩T𝐪𝐪Tsuperscript𝐀𝑁𝑁1𝐀superscript𝐪𝐪𝑇1𝑁1superscript𝐩𝐩𝑇superscript𝐪superscriptsuperscript𝐪𝑇\mathbf{A}^{\prime}=\frac{N}{N+1}(\mathbf{A}+\mathbf{q}\mathbf{q}^{T})+\frac{1% }{N+1}\mathbf{p}\mathbf{p}^{T}-\mathbf{q}^{\prime}\mathbf{q^{\prime}}^{T}bold_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_N end_ARG start_ARG italic_N + 1 end_ARG ( bold_A + bold_qq start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_N + 1 end_ARG bold_pp start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (9)

The normal vector 𝐧superscript𝐧\mathbf{n}^{\prime}bold_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the third singular vector of the covariance matrix 𝐀3×3superscript𝐀superscript33\mathbf{A}^{\prime}\in\mathbb{R}^{3\times 3}bold_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT under singular value decomposition.

However, the cumulative update of 𝚺𝐧,𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}\ bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPTis non-trivial because this update triggers a re-computation of the Jacobian of 𝐧𝐧\mathbf{n}bold_n, 𝐪𝐪\mathbf{q}bold_q w.r.t. 𝐩isubscript𝐩𝑖\mathbf{p}_{i}bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. As shown in (7), the re-computation involves every point that falls into a voxel in the past. The re-computation is not only time-consuming but also memory-inefficient because all past points must be stored and re-iterated. We derive a new algorithm that makes cumulative updates possible and avoids the storage of past points.

By expanding (7) we have

𝚺𝐧,𝐪subscript𝚺𝐧𝐪\displaystyle\boldsymbol{\Sigma_{\mathbf{n},\mathbf{q}}}bold_Σ start_POSTSUBSCRIPT bold_n bold_, bold_q end_POSTSUBSCRIPT =i=1N𝐟𝐩i𝚺𝐩𝒊𝐟𝐩iTabsentsuperscriptsubscript𝑖1𝑁𝐟subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐟subscript𝐩𝑖𝑇\displaystyle=\sum_{i=1}^{N}\frac{\partial\mathbf{f}}{\partial{{\mathbf{p}_{i}% }}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{\partial\mathbf{f}}{\partial{{% \mathbf{p}_{i}}}}}^{T}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_f end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (10)
=i=1N[𝐧𝐩i𝚺𝐩𝒊𝐧𝐩iT𝐧𝐩i𝚺𝐩𝒊𝐪𝐩iT𝐪𝐩i𝚺𝐩𝒊𝐧𝐩iT𝐪𝐩i𝚺𝐩𝒊𝐪𝐩iT]absentsuperscriptsubscript𝑖1𝑁delimited-[]𝐧subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐧subscript𝐩𝑖𝑇𝐧subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐪subscript𝐩𝑖𝑇𝐪subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐧subscript𝐩𝑖𝑇𝐪subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐪subscript𝐩𝑖𝑇\displaystyle=\sum_{i=1}^{N}\left[\begin{array}[]{ll}\frac{\partial\mathbf{n}}% {\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{% \partial\mathbf{n}}{\partial{{\mathbf{p}_{i}}}}}^{T}&\frac{\partial\mathbf{n}}% {\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{% \partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}}^{T}\\ \frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{% \mathbf{p}_{i}}}}{\frac{\partial\mathbf{n}}{\partial{{\mathbf{p}_{i}}}}}^{T}&% \frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{% \mathbf{p}_{i}}}}{\frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}}^{T}% \end{array}\right]= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ]
=[i=1N𝐧𝐩i𝚺𝐩𝒊𝐧𝐩iTi=1N𝐧𝐩i𝚺𝐩𝒊𝐪𝐩iTi=1N𝐪𝐩i𝚺𝐩𝒊𝐧𝐩iTi=1N𝐪𝐩i𝚺𝐩𝒊𝐪𝐩iT]absentdelimited-[]superscriptsubscript𝑖1𝑁𝐧subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐧subscript𝐩𝑖𝑇superscriptsubscript𝑖1𝑁𝐧subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐪subscript𝐩𝑖𝑇superscriptsubscript𝑖1𝑁𝐪subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐧subscript𝐩𝑖𝑇superscriptsubscript𝑖1𝑁𝐪subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐪subscript𝐩𝑖𝑇\displaystyle=\left[\begin{array}[]{ll}\sum_{i=1}^{N}\frac{\partial\mathbf{n}}% {\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{% \partial\mathbf{n}}{\partial{{\mathbf{p}_{i}}}}}^{T}&\sum_{i=1}^{N}\frac{% \partial\mathbf{n}}{\partial{{\mathbf{p}_{i}}}}\boldsymbol{\Sigma_{{\mathbf{p}% _{i}}}}{\frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}}^{T}\\ \sum_{i=1}^{N}\frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}\boldsymbol% {\Sigma_{{\mathbf{p}_{i}}}}{\frac{\partial\mathbf{n}}{\partial{{\mathbf{p}_{i}% }}}}^{T}&\sum_{i=1}^{N}\frac{\partial\mathbf{q}}{\partial{{\mathbf{p}_{i}}}}% \boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{\partial\mathbf{q}}{\partial{{% \mathbf{p}_{i}}}}}^{T}\end{array}\right]= [ start_ARRAY start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_q end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ]
=def[𝚺𝐧𝐧𝚺𝐧𝐪𝚺𝐧𝐪T𝚺𝐪𝐪]superscriptdefabsentdelimited-[]subscript𝚺𝐧𝐧subscript𝚺𝐧𝐪superscriptsubscript𝚺𝐧𝐪𝑇subscript𝚺𝐪𝐪\displaystyle\stackrel{{\scriptstyle\operatorname{def}}}{{=}}\left[\begin{% array}[]{cc}\boldsymbol{\Sigma_{\mathbf{n}\mathbf{n}}}&\boldsymbol{\Sigma_{% \mathbf{n}\mathbf{q}}}\\ \boldsymbol{\Sigma_{\mathbf{n}\mathbf{q}}}^{T}&\boldsymbol{\Sigma_{\mathbf{q}% \mathbf{q}}}\end{array}\right]start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP [ start_ARRAY start_ROW start_CELL bold_Σ start_POSTSUBSCRIPT bold_nn end_POSTSUBSCRIPT end_CELL start_CELL bold_Σ start_POSTSUBSCRIPT bold_nq end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_Σ start_POSTSUBSCRIPT bold_nq end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL bold_Σ start_POSTSUBSCRIPT bold_qq end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ]

Given the similarity in form among 𝚺𝐧𝐧subscript𝚺𝐧𝐧\boldsymbol{\Sigma_{\mathbf{n}\mathbf{n}}}bold_Σ start_POSTSUBSCRIPT bold_nn end_POSTSUBSCRIPT, 𝚺𝐧𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma_{\mathbf{n}\mathbf{q}}}bold_Σ start_POSTSUBSCRIPT bold_nq end_POSTSUBSCRIPT, and 𝚺𝐪𝐪subscript𝚺𝐪𝐪\boldsymbol{\Sigma_{\mathbf{q}\mathbf{q}}}bold_Σ start_POSTSUBSCRIPT bold_qq end_POSTSUBSCRIPT, we will use 𝚺𝐧𝐧subscript𝚺𝐧𝐧\boldsymbol{\Sigma_{\mathbf{n}\mathbf{n}}}bold_Σ start_POSTSUBSCRIPT bold_nn end_POSTSUBSCRIPT as an example to provide a detailed derivation of the cumulative update process for plane covariance. First, we substitute (5) in 𝚺𝐧𝐧subscript𝚺𝐧𝐧\boldsymbol{\Sigma_{\mathbf{n}\mathbf{n}}}bold_Σ start_POSTSUBSCRIPT bold_nn end_POSTSUBSCRIPT

𝚺𝐧𝐧subscript𝚺𝐧𝐧\displaystyle\boldsymbol{\Sigma_{\mathbf{n}\mathbf{n}}}bold_Σ start_POSTSUBSCRIPT bold_nn end_POSTSUBSCRIPT =i=1N𝐧𝐩i𝚺𝐩𝒊𝐧𝐩iTabsentsuperscriptsubscript𝑖1𝑁𝐧subscript𝐩𝑖subscript𝚺subscript𝐩𝒊superscript𝐧subscript𝐩𝑖𝑇\displaystyle=\sum_{i=1}^{N}\frac{\partial\mathbf{n}}{\partial{{\mathbf{p}_{i}% }}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}{\frac{\partial\mathbf{n}}{\partial{{% \mathbf{p}_{i}}}}}^{T}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∂ bold_n end_ARG start_ARG ∂ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (11)
=i=1N𝐔[(𝐩i𝐪)T𝐅1(𝐩i𝐪)T𝐅2(𝐩i𝐪)T𝐅3]𝚺𝐩𝒊[𝐅1T(𝐩i𝐪)𝐅2T(𝐩i𝐪)𝐅3T(𝐩i𝐪)]𝐔Tabsentsuperscriptsubscript𝑖1𝑁𝐔delimited-[]superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅1superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅2superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅3subscript𝚺subscript𝐩𝒊delimited-[]superscriptsubscript𝐅1𝑇subscript𝐩𝑖𝐪superscriptsubscript𝐅2𝑇subscript𝐩𝑖𝐪superscriptsubscript𝐅3𝑇subscript𝐩𝑖𝐪superscript𝐔𝑇\displaystyle=\sum_{i=1}^{N}\mathbf{U}\left[\begin{array}[]{c}\left(\mathbf{p}% _{i}-\mathbf{q}\right)^{T}\mathbf{F}_{1}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{2}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{3}\end{array}\right]% \boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\left[\begin{array}[]{c}\mathbf{F}_{1}^{% T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\\ \mathbf{F}_{2}^{T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\\ \mathbf{F}_{3}^{T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\end{array}\right]% \mathbf{U}^{T}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_U [ start_ARRAY start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL bold_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW end_ARRAY ] bold_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT
=𝐔i=1N[(𝐩i𝐪)T𝐅1(𝐩i𝐪)T𝐅2(𝐩i𝐪)T𝐅3]𝚺𝐩𝒊[𝐅1T(𝐩i𝐪)𝐅2T(𝐩i𝐪)𝐅3T(𝐩i𝐪)]𝐔Tabsent𝐔superscriptsubscript𝑖1𝑁delimited-[]superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅1superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅2superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅3subscript𝚺subscript𝐩𝒊delimited-[]superscriptsubscript𝐅1𝑇subscript𝐩𝑖𝐪superscriptsubscript𝐅2𝑇subscript𝐩𝑖𝐪superscriptsubscript𝐅3𝑇subscript𝐩𝑖𝐪superscript𝐔𝑇\displaystyle=\mathbf{U}\sum_{i=1}^{N}\left[\begin{array}[]{c}\left(\mathbf{p}% _{i}-\mathbf{q}\right)^{T}\mathbf{F}_{1}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{2}\\ \left(\mathbf{p}_{i}-\mathbf{q}\right)^{T}\mathbf{F}_{3}\end{array}\right]% \boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\left[\begin{array}[]{c}\mathbf{F}_{1}^{% T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\\ \mathbf{F}_{2}^{T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\\ \mathbf{F}_{3}^{T}\left(\mathbf{p}_{i}-\mathbf{q}\right)\end{array}\right]% \mathbf{U}^{T}= bold_U ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL bold_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) end_CELL end_ROW end_ARRAY ] bold_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT
=def𝐔𝐁𝐔Tsuperscriptdefabsentsuperscript𝐔𝐁𝐔𝑇\displaystyle\stackrel{{\scriptstyle\operatorname{def}}}{{=}}\mathbf{U}\mathbf% {B}\mathbf{U}^{T}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_UBU start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT

where 𝐁3×3𝐁superscript33\mathbf{B}\in\mathbb{R}^{3\times 3}bold_B ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT.

This transformation isolates 𝐔𝐔\mathbf{U}bold_U from the point-related computation. Thus, to derive an update formula concerning new points, we only need to compute 𝐁𝐁\mathbf{B}bold_B cumulatively.

Let bmnsubscript𝑏𝑚𝑛b_{mn}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT be the element at m𝑚mitalic_m-th row and n𝑛nitalic_n-th column of 𝐁𝐁\mathbf{B}bold_B.

bmnsubscript𝑏𝑚𝑛\displaystyle b_{mn}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT =i=1N(𝐩i𝐪)T𝐅𝐦𝚺𝐩𝒊𝐅𝐧T(𝐩i𝐪)absentsuperscriptsubscript𝑖1𝑁superscriptsubscript𝐩𝑖𝐪𝑇subscript𝐅𝐦subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝐧𝑇subscript𝐩𝑖𝐪\displaystyle=\sum_{i=1}^{N}{\left({{\mathbf{p}_{i}}}-\mathbf{q}\right)^{T}% \mathbf{\mathbf{F}_{m}}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{\mathbf{F% }_{n}}^{T}\left({{\mathbf{p}_{i}}}-\mathbf{q}\right)}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_q ) (12)
=i=1N𝐩iT𝐅m𝚺𝐩𝒊𝐅nT𝐩ii=1N𝐩iT𝐅m𝚺𝐩𝒊𝐅nT𝐪absentsuperscriptsubscript𝑖1𝑁superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖superscriptsubscript𝑖1𝑁superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝑛𝑇𝐪\displaystyle=\sum_{i=1}^{N}{{{\mathbf{p}_{i}}}}^{T}\mathbf{F}_{m}\boldsymbol{% \Sigma_{{\mathbf{p}_{i}}}}\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}-\sum_{i=1}^{N}{% {{\mathbf{p}_{i}}}}^{T}\mathbf{F}_{m}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}% \mathbf{F}_{n}^{T}\mathbf{q}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_q
i=1N𝐪T𝐅m𝚺𝐩𝒊𝐅nT𝐩i+i=1N𝐪T𝐅m𝚺𝐩𝒊𝐅nT𝐪superscriptsubscript𝑖1𝑁superscript𝐪𝑇subscript𝐅𝑚subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖superscriptsubscript𝑖1𝑁superscript𝐪𝑇subscript𝐅𝑚subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝑛𝑇𝐪\displaystyle-\sum_{i=1}^{N}\mathbf{q}^{T}\mathbf{F}_{m}\boldsymbol{\Sigma_{{% \mathbf{p}_{i}}}}\mathbf{F}_{n}^{T}{{{\mathbf{p}_{i}}}}+\sum_{i=1}^{N}\mathbf{% q}^{T}\mathbf{F}_{m}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{F}_{n}^{T}% \mathbf{q}- ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_q

We would like to leverage statistics of all points to get rid of the dependency on the value of each point individually. To achieve this goal, we introduce two lemmas below.

Lemma 1

Given a standard orthogonal basis in 3superscript3\mathbb{R}^{3}blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT: 𝐞1=[1,0,0]T,𝐞2=[0,1,0]T,𝐞3=[0,0,1]Tformulae-sequencesubscript𝐞1superscript100𝑇formulae-sequencesubscript𝐞2superscript010𝑇subscript𝐞3superscript001𝑇\mathbf{e}_{1}=\left[1,0,0\right]^{T},\mathbf{e}_{2}=\left[0,1,0\right]^{T},% \mathbf{e}_{3}=\left[0,0,1\right]^{T}bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ 1 , 0 , 0 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = [ 0 , 1 , 0 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = [ 0 , 0 , 1 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 𝐀3×3for-all𝐀superscript33\forall\mathbf{A}\in\mathbb{R}^{3\times 3}∀ bold_A ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT , each element of 𝐀𝐀\mathbf{A}bold_A can be represented as 𝐚ij=𝐞iT𝐀𝐞jsubscript𝐚𝑖𝑗superscriptsubscript𝐞𝑖𝑇subscript𝐀𝐞𝑗\mathbf{a}_{ij}=\mathbf{e}_{i}^{T}\mathbf{A}\mathbf{e}_{j}bold_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Ae start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ,where i,j{1,2,3}𝑖𝑗123i,j\in\{1,2,3\}italic_i , italic_j ∈ { 1 , 2 , 3 }

Lemma 2

Given 𝐱𝐱\mathbf{x}bold_x, 𝐲3𝐲superscript3\mathbf{y}\in\mathbb{R}^{3}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, 𝐀3×3𝐀superscript33\mathbf{A}\in\mathbb{R}^{3\times 3}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT, then 𝐱T𝐀𝐲superscript𝐱𝑇𝐀𝐲\mathbf{x}^{T}\mathbf{A}\mathbf{y}\in\mathbb{R}bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Ay ∈ blackboard_R, 𝐲𝐱T𝐀3×3superscript𝐲𝐱𝑇𝐀superscript33\mathbf{y}\mathbf{x}^{T}\mathbf{A}\in\mathbb{R}^{3\times 3}bold_yx start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT and 𝐱T𝐀𝐲=tr(𝐱T𝐀𝐲)=tr(𝐲𝐱T𝐀)superscript𝐱𝑇𝐀𝐲trsuperscript𝐱𝑇𝐀𝐲trsuperscript𝐲𝐱𝑇𝐀\mathbf{x}^{T}\mathbf{A}\mathbf{y}=\operatorname{tr}(\mathbf{x}^{T}\mathbf{A}% \mathbf{y})=\operatorname{tr}(\mathbf{y}\mathbf{x}^{T}\mathbf{A})bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Ay = roman_tr ( bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Ay ) = roman_tr ( bold_yx start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A ), where tr(.)\operatorname{tr}(.)roman_tr ( . ) is the trace of the matrix.

Let bmnppsuperscriptsubscript𝑏𝑚𝑛𝑝𝑝b_{mn}^{pp}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT be the first term of bmnsubscript𝑏𝑚𝑛b_{mn}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT in (12), i.e.,

bmnpp=i=1N𝐩iT𝐅m𝚺𝐩𝒊𝐅nT𝐩isuperscriptsubscript𝑏𝑚𝑛𝑝𝑝superscriptsubscript𝑖1𝑁superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝚺subscript𝐩𝒊superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖b_{mn}^{pp}=\sum_{i=1}^{N}{{{\mathbf{p}_{i}}}}^{T}\mathbf{F}_{m}\boldsymbol{% \Sigma_{{\mathbf{p}_{i}}}}\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

Below we show how to update bmnppsuperscriptsubscript𝑏𝑚𝑛𝑝𝑝b_{mn}^{pp}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT cumulatively. Other terms of bmnsubscript𝑏𝑚𝑛b_{mn}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT can be computed similarly.

Let 𝐯im=𝐅mT𝐩i3×1superscriptsubscript𝐯𝑖𝑚superscriptsubscript𝐅𝑚𝑇subscript𝐩𝑖superscript31\mathbf{v}_{i}^{m}=\mathbf{F}_{m}^{T}{{\mathbf{p}_{i}}}\in\mathbb{R}^{3\times 1}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 1 end_POSTSUPERSCRIPT , 𝐃i=𝚺𝐩𝒊3×3subscript𝐃𝑖subscript𝚺subscript𝐩𝒊superscript33\mathbf{D}_{i}=\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\in\mathbb{R}^{3\times 3}bold_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT, 𝐯in=𝐅nT𝐩i3×1superscriptsubscript𝐯𝑖𝑛superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖superscript31\mathbf{v}_{i}^{n}=\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}\in\mathbb{R}^{3\times 1}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 1 end_POSTSUPERSCRIPT.

According to Lemma 1, each element of 𝐯imsuperscriptsubscript𝐯𝑖𝑚\mathbf{v}_{i}^{m}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, 𝐯insuperscriptsubscript𝐯𝑖𝑛\mathbf{v}_{i}^{n}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝐃isubscript𝐃𝑖\mathbf{D}_{i}bold_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be explicitly represented by the basis.

𝐯im=[𝐞1T𝐅mT𝐩i𝐞2T𝐅mT𝐩i𝐞3T𝐅mT𝐩i]𝐯in=[𝐞1T𝐅nT𝐩i𝐞2T𝐅nT𝐩i𝐞3T𝐅nT𝐩i]formulae-sequencesuperscriptsubscript𝐯𝑖𝑚delimited-[]superscriptsubscript𝐞1𝑇superscriptsubscript𝐅𝑚𝑇subscript𝐩𝑖superscriptsubscript𝐞2𝑇superscriptsubscript𝐅𝑚𝑇subscript𝐩𝑖superscriptsubscript𝐞3𝑇superscriptsubscript𝐅𝑚𝑇subscript𝐩𝑖superscriptsubscript𝐯𝑖𝑛delimited-[]superscriptsubscript𝐞1𝑇superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖superscriptsubscript𝐞2𝑇superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖superscriptsubscript𝐞3𝑇superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖\displaystyle\begin{split}\mathbf{v}_{i}^{m}&=\left[\begin{array}[]{c}\mathbf{% e}_{1}^{T}\mathbf{F}_{m}^{T}{{\mathbf{p}_{i}}}\\ \mathbf{e}_{2}^{T}\mathbf{F}_{m}^{T}{{\mathbf{p}_{i}}}\\ \mathbf{e}_{3}^{T}\mathbf{F}_{m}^{T}{{\mathbf{p}_{i}}}\end{array}\right]\qquad% \mathbf{v}_{i}^{n}=\left[\begin{array}[]{c}\mathbf{e}_{1}^{T}\mathbf{F}_{n}^{T% }{{\mathbf{p}_{i}}}\\ \mathbf{e}_{2}^{T}\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}\\ \mathbf{e}_{3}^{T}\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}\end{array}\right]\end{split}start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_CELL start_CELL = [ start_ARRAY start_ROW start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] end_CELL end_ROW (13)
𝐃i=[𝐞1T𝚺𝐩𝒊𝐞1𝐞1T𝚺𝐩𝒊𝐞2𝐞1T𝚺𝐩𝒊𝐞3𝐞2T𝚺𝐩𝒊𝐞1𝐞2T𝚺𝐩𝒊𝐞2𝐞2T𝚺𝐩𝒊𝐞3𝐞3T𝚺𝐩𝒊𝐞1𝐞3T𝚺𝐩𝒊𝐞2𝐞3T𝚺𝐩𝒊𝐞3]subscript𝐃𝑖delimited-[]superscriptsubscript𝐞1𝑇subscript𝚺subscript𝐩𝒊subscript𝐞1superscriptsubscript𝐞1𝑇subscript𝚺subscript𝐩𝒊subscript𝐞2superscriptsubscript𝐞1𝑇subscript𝚺subscript𝐩𝒊subscript𝐞3superscriptsubscript𝐞2𝑇subscript𝚺subscript𝐩𝒊subscript𝐞1superscriptsubscript𝐞2𝑇subscript𝚺subscript𝐩𝒊subscript𝐞2superscriptsubscript𝐞2𝑇subscript𝚺subscript𝐩𝒊subscript𝐞3superscriptsubscript𝐞3𝑇subscript𝚺subscript𝐩𝒊subscript𝐞1superscriptsubscript𝐞3𝑇subscript𝚺subscript𝐩𝒊subscript𝐞2superscriptsubscript𝐞3𝑇subscript𝚺subscript𝐩𝒊subscript𝐞3\displaystyle\begin{split}\mathbf{D}_{i}&=\left[\begin{array}[]{lll}\mathbf{e}% _{1}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{1}&\mathbf{e}_{1}^{T% }\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{2}&\mathbf{e}_{1}^{T}% \boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{3}\\ \mathbf{e}_{2}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{1}&\mathbf% {e}_{2}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{2}&\mathbf{e}_{2}% ^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{3}\\ \mathbf{e}_{3}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{1}&\mathbf% {e}_{3}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{2}&\mathbf{e}_{3}% ^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{3}\end{array}\right]\end% {split}start_ROW start_CELL bold_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = [ start_ARRAY start_ROW start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] end_CELL end_ROW (14)

Substituting (13-14) in bmnppsuperscriptsubscript𝑏𝑚𝑛𝑝𝑝b_{mn}^{pp}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT and expanding it yields a more concise form:

bmnppsuperscriptsubscript𝑏𝑚𝑛𝑝𝑝\displaystyle b_{mn}^{pp}italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT =i=1N𝐯imT𝐃i𝐯inabsentsuperscriptsubscript𝑖1𝑁superscriptsubscript𝐯𝑖𝑚𝑇subscript𝐃𝑖superscriptsubscript𝐯𝑖𝑛\displaystyle=\sum_{i=1}^{N}\mathbf{v}_{i}^{mT}\mathbf{D}_{i}\mathbf{v}_{i}^{n}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_T end_POSTSUPERSCRIPT bold_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (15)
=i=1Nj=13k=13𝐩iT𝐅m𝐞j𝐞jT𝚺𝐩𝒊𝐞k𝐞kT𝐅nT𝐩iabsentsuperscriptsubscript𝑖1𝑁superscriptsubscript𝑗13superscriptsubscript𝑘13superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖\displaystyle=\sum_{i=1}^{N}\sum_{j=1}^{3}\sum_{k=1}^{3}{{\mathbf{p}_{i}}}^{T}% \mathbf{F}_{m}\mathbf{e}_{j}\mathbf{e}_{j}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_% {i}}}}\mathbf{e}_{k}\mathbf{e}_{k}^{T}\mathbf{F}_{n}^{T}{{\mathbf{p}_{i}}}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

According to the trace trick in Lemma 2 and the associative property of matrix multiplication, (15) can be further transformed as follows.

bmnpp=superscriptsubscript𝑏𝑚𝑛𝑝𝑝absent\displaystyle b_{mn}^{pp}=italic_b start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_p end_POSTSUPERSCRIPT = i=1Nj=13k=13tr(𝐩iT𝐅m𝐞j𝐞jT𝚺𝐩𝒊𝐞k𝐞kT𝐅nT𝐩i)superscriptsubscript𝑖1𝑁superscriptsubscript𝑗13superscriptsubscript𝑘13trsuperscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇subscript𝐩𝑖\displaystyle\sum_{i=1}^{N}\sum_{j=1}^{3}\sum_{k=1}^{3}\operatorname{tr}\left(% {{\mathbf{p}_{i}}}^{T}\mathbf{F}_{m}\mathbf{e}_{j}\mathbf{e}_{j}^{T}% \boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{k}\mathbf{e}_{k}^{T}\mathbf{% F}_{n}^{T}{{\mathbf{p}_{i}}}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_tr ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (16)
=\displaystyle== i=1Nj=13k=13tr(𝐩i𝐩iT𝐅m𝐞j𝐞jT𝚺𝐩𝒊𝐞k𝐞kT𝐅nT)superscriptsubscript𝑖1𝑁superscriptsubscript𝑗13superscriptsubscript𝑘13trsubscript𝐩𝑖superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇\displaystyle\sum_{i=1}^{N}\sum_{j=1}^{3}\sum_{k=1}^{3}\operatorname{tr}\left(% {{\mathbf{p}_{i}}}{{\mathbf{p}_{i}}}^{T}\mathbf{F}_{m}\mathbf{e}_{j}\mathbf{e}% _{j}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{k}\mathbf{e}_{k}^{T}% \mathbf{F}_{n}^{T}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_tr ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )
=\displaystyle== i=1Nj=13k=13tr(𝐞jT𝚺𝐩𝒊𝐞k𝐩i𝐩iT𝐅m𝐞j𝐞kT𝐅nT)superscriptsubscript𝑖1𝑁superscriptsubscript𝑗13superscriptsubscript𝑘13trsuperscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘subscript𝐩𝑖superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇\displaystyle\sum_{i=1}^{N}\sum_{j=1}^{3}\sum_{k=1}^{3}\operatorname{tr}\left(% \mathbf{e}_{j}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{k}{{% \mathbf{p}_{i}}}{{\mathbf{p}_{i}}}^{T}\mathbf{F}_{m}\mathbf{e}_{j}\mathbf{e}_{% k}^{T}\mathbf{F}_{n}^{T}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_tr ( bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )
=\displaystyle== j=13k=13tr(i=1N(𝐞jT𝚺𝐩𝒊𝐞k𝐩i𝐩iT)𝐅m𝐞j𝐞kT𝐅nT)superscriptsubscript𝑗13superscriptsubscript𝑘13trsuperscriptsubscript𝑖1𝑁superscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘subscript𝐩𝑖superscriptsubscript𝐩𝑖𝑇subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇\displaystyle\sum_{j=1}^{3}\sum_{k=1}^{3}\operatorname{tr}\left(\sum_{i=1}^{N}% \left(\mathbf{e}_{j}^{T}\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}\mathbf{e}_{k}{{% \mathbf{p}_{i}}}{{\mathbf{p}_{i}}}^{T}\right)\mathbf{F}_{m}\mathbf{e}_{j}% \mathbf{e}_{k}^{T}\mathbf{F}_{n}^{T}\right)∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_tr ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )
=\displaystyle== j=13k=13tr(𝐗j,k𝐅m𝐞j𝐞kT𝐅nT)superscriptsubscript𝑗13superscriptsubscript𝑘13trsubscript𝐗𝑗𝑘subscript𝐅𝑚subscript𝐞𝑗superscriptsubscript𝐞𝑘𝑇superscriptsubscript𝐅𝑛𝑇\displaystyle\sum_{j=1}^{3}\sum_{k=1}^{3}\operatorname{tr}\left(\mathbf{X}_{j,% k}\mathbf{F}_{m}\mathbf{e}_{j}\mathbf{e}_{k}^{T}\mathbf{F}_{n}^{T}\right)∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_tr ( bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )

where

𝐗j,k=i=1N𝐞jT𝚺𝐩𝒊𝐞k𝐩i𝐩iTj,k{1,2,3}formulae-sequencesubscript𝐗𝑗𝑘superscriptsubscript𝑖1𝑁superscriptsubscript𝐞𝑗𝑇subscript𝚺subscript𝐩𝒊subscript𝐞𝑘subscript𝐩𝑖superscriptsubscript𝐩𝑖𝑇for-all𝑗𝑘123\mathbf{X}_{j,k}=\sum_{i=1}^{N}\mathbf{e}_{j}^{T}\boldsymbol{\Sigma_{{\mathbf{% p}_{i}}}}\mathbf{e}_{k}{{\mathbf{p}_{i}}}{{\mathbf{p}_{i}}}^{T}\quad\forall j,% k\in\{1,2,3\}bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∀ italic_j , italic_k ∈ { 1 , 2 , 3 } (17)

It is noteworthy that terms related to points, specifically 𝐩isubscript𝐩𝑖\mathbf{p}_{i}bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its covariance matrix 𝚺𝐩𝒊subscript𝚺subscript𝐩𝒊\boldsymbol{\Sigma_{{\mathbf{p}_{i}}}}bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, can be incrementally accumulated in 𝐗j,ksubscript𝐗𝑗𝑘\mathbf{X}_{j,k}bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, thanks to 𝐗j,ksubscript𝐗𝑗𝑘\mathbf{X}_{j,k}bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT being independent of 𝐅msubscript𝐅𝑚\mathbf{F}_{m}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Furthermore, in (6), 𝐅msubscript𝐅𝑚\mathbf{F}_{m}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is computed from the singular vectors of the covariance matrix 𝐀𝐀\mathbf{A}bold_A, which has a trivial cumulative update formula in (9). Similarly, all terms in 𝚺𝐧,𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}\ bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPTcan be updated cumulatively with three statistics 𝐗j,k,𝐘jsubscript𝐗𝑗𝑘subscript𝐘𝑗\mathbf{X}_{j,k},\mathbf{Y}_{j}bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝐙𝐙\mathbf{Z}bold_Z.

𝐘j=i=1N𝚺𝐩𝒊𝐞j𝐩iTj{1,2,3}formulae-sequencesubscript𝐘𝑗superscriptsubscript𝑖1𝑁subscript𝚺subscript𝐩𝒊subscript𝐞𝑗superscriptsubscript𝐩𝑖𝑇for-all𝑗123\displaystyle\begin{split}\mathbf{Y}_{j}&=\sum_{i=1}^{N}\boldsymbol{\Sigma_{{% \mathbf{p}_{i}}}}\mathbf{e}_{j}{{\mathbf{p}_{i}}}^{T}\quad\qquad\forall j\in\{% 1,2,3\}\end{split}start_ROW start_CELL bold_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∀ italic_j ∈ { 1 , 2 , 3 } end_CELL end_ROW (18)
𝐙=i=1N𝚺𝐩𝒊𝐙superscriptsubscript𝑖1𝑁subscript𝚺subscript𝐩𝒊\displaystyle\begin{split}\mathbf{Z}&=\sum_{i=1}^{N}\boldsymbol{\Sigma_{{% \mathbf{p}_{i}}}}\end{split}start_ROW start_CELL bold_Z end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW (19)

Given a new point 𝐩isubscript𝐩𝑖\mathbf{p}_{i}bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its covariance matrix 𝚺𝐩𝒊subscript𝚺subscript𝐩𝒊\boldsymbol{\Sigma_{\mathbf{p}_{i}}}bold_Σ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, all three statistics can be updated cumulatively.

IV-C Complexity Analysis

As 3×3333\times 33 × 3 symmetric matrices, only the upper-triangles of 𝐗j,ksubscript𝐗𝑗𝑘\mathbf{X}_{j,k}bold_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT and 𝐙𝐙\mathbf{Z}bold_Z need to be stored, each with six scalars respectively. Given that 𝐘jsubscript𝐘𝑗\mathbf{Y}_{j}bold_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is asymmetric, nine scalars are needed. Considering the number of matrices for each statistic, we can store all statistic matrices for a plane’s uncertainty in a total of 6×6+3×9+6=6966396696\times 6+3\times 9+6=696 × 6 + 3 × 9 + 6 = 69 scalars.

In contrast, non-cumulative methods like VoxelMap[3] store each point in a 3D vector and a 3×3333\times 33 × 3 covariance matrix, resulting in a space complexity of O(N)𝑂𝑁O(N)italic_O ( italic_N ) where N𝑁Nitalic_N is the number of points. Our compact representation has a constant O(1)𝑂1O(1)italic_O ( 1 ) space complexity regardless of the number of points.

Due to the re-computation in plane uncertainty update, non-cumulative methods have a time complexity of O(MN)𝑂𝑀𝑁O(MN)italic_O ( italic_M italic_N ) where M𝑀Mitalic_M is the number of iterations. Our cumulative method reduces the time complexity to O(N)𝑂𝑁O(N)italic_O ( italic_N ), making the updates much more efficient and scalable.

V On-demand Voxel Merging

The cumulative update strategy reduces the computation and storage for each voxel. Further improvement is achieved by reducing the number of voxels through merging. To reduce the overhead of searching for coalescible voxels in a brute-force manner, we design a locality-sensitive hash to aggregate voxels aligning on similar physical planes (Section V-A). We propose the merge procedure and criterion to ensure merging consistency (Section V-B). Moreover, we analyze the accuracy improvement of plane estimation due to voxel merging (Section V-C).

Refer to caption
Figure 2: Locality-sensitive hash keys are comprised of five parameters: θ𝜃\thetaitalic_θ, φ𝜑\varphiitalic_φ are spherical coordinates of normal 𝐧𝐧\mathbf{n}bold_n, d𝑑ditalic_d is distance from origin to the plane, u𝑢uitalic_u and v𝑣vitalic_v are projection coordinates of a voxel on the plane.

V-A Locality-Sensitive Hash

To hash voxel planes according to their proximity in the parameter space, we employ [θ,φ,d]𝜃𝜑𝑑[\theta,\varphi,d][ italic_θ , italic_φ , italic_d ] for a representation with minimum DoF, where azimuthal angle θ𝜃\thetaitalic_θ and polar angle φ𝜑\varphiitalic_φ are spherical coordinates computed from the normal vector 𝐧𝐧\mathbf{n}bold_n, as in Fig. 2, and d𝑑ditalic_d is computed by

d=𝐧T𝐪𝑑superscript𝐧𝑇𝐪d=-\mathbf{n}^{T}\mathbf{q}italic_d = - bold_n start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_q (20)

Given that a physical plane has limited area in the real world, voxel planes with similar parameters θ,φ,d𝜃𝜑𝑑\theta,\varphi,ditalic_θ , italic_φ , italic_d but distributed far away might be associated with different real-world features and should not be merged. To accommodate this property, we introduce a proximity coordinates [u,v]𝑢𝑣[u,v][ italic_u , italic_v ] computed as follows:

[uv]=[𝐞1T𝐑T𝐪𝐞2T𝐑T𝐪]delimited-[]𝑢𝑣delimited-[]superscriptsubscript𝐞1𝑇superscript𝐑𝑇𝐪superscriptsubscript𝐞2𝑇superscript𝐑𝑇𝐪\left[\begin{array}[]{c}u\\ v\end{array}\right]=\left[\begin{array}[]{c}\mathbf{e}_{1}^{T}\mathbf{R}^{T}% \mathbf{q}\\ \mathbf{e}_{2}^{T}\mathbf{R}^{T}\mathbf{q}\end{array}\right][ start_ARRAY start_ROW start_CELL italic_u end_CELL end_ROW start_ROW start_CELL italic_v end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_q end_CELL end_ROW start_ROW start_CELL bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_q end_CELL end_ROW end_ARRAY ] (21)

where 𝐑𝐑\mathbf{R}bold_R indicates the rotation transform from global frame to the plane as computed in (22), making z𝑧zitalic_z-axis aligns with the direction of 𝐧𝐧\mathbf{n}bold_n, and 𝐞1subscript𝐞1\mathbf{e}_{1}bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐞2subscript𝐞2\mathbf{e}_{2}bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are defined in Lemma 1.

𝐑=[cosθsinθ0sinθcosθ0001][cosφ0sinφ010sinφ0cosφ]𝐑delimited-[]𝑐𝑜𝑠𝜃𝑠𝑖𝑛𝜃0𝑠𝑖𝑛𝜃𝑐𝑜𝑠𝜃0001delimited-[]𝑐𝑜𝑠𝜑0𝑠𝑖𝑛𝜑010𝑠𝑖𝑛𝜑0𝑐𝑜𝑠𝜑\mathbf{R}=\left[\begin{array}[]{ccc}cos\theta&-sin\theta&0\\ sin\theta&cos\theta&0\\ 0&0&1\end{array}\right]\left[\begin{array}[]{ccc}cos\varphi&0&sin\varphi\\ 0&1&0\\ -sin\varphi&0&cos\varphi\end{array}\right]bold_R = [ start_ARRAY start_ROW start_CELL italic_c italic_o italic_s italic_θ end_CELL start_CELL - italic_s italic_i italic_n italic_θ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_s italic_i italic_n italic_θ end_CELL start_CELL italic_c italic_o italic_s italic_θ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL italic_c italic_o italic_s italic_φ end_CELL start_CELL 0 end_CELL start_CELL italic_s italic_i italic_n italic_φ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - italic_s italic_i italic_n italic_φ end_CELL start_CELL 0 end_CELL start_CELL italic_c italic_o italic_s italic_φ end_CELL end_ROW end_ARRAY ] (22)

A voxel plane is hashed into a bucket indexed by 𝐤5𝐤superscript5\mathbf{k}\in\mathbb{R}^{5}bold_k ∈ blackboard_R start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT.

𝐤=(θΔθ,φΔφ,dΔd,uΔu,vΔv)T𝐤superscript𝜃Δ𝜃𝜑Δ𝜑𝑑Δ𝑑𝑢Δ𝑢𝑣Δ𝑣𝑇\mathbf{k}=\left(\lfloor\frac{\theta}{\Delta\theta}\rfloor,\lfloor\frac{% \varphi}{\Delta\varphi}\rfloor,\lfloor\frac{d}{\Delta d}\rfloor,\lfloor\frac{u% }{\Delta u}\rfloor,\lfloor\frac{v}{\Delta v}\rfloor\right)^{T}bold_k = ( ⌊ divide start_ARG italic_θ end_ARG start_ARG roman_Δ italic_θ end_ARG ⌋ , ⌊ divide start_ARG italic_φ end_ARG start_ARG roman_Δ italic_φ end_ARG ⌋ , ⌊ divide start_ARG italic_d end_ARG start_ARG roman_Δ italic_d end_ARG ⌋ , ⌊ divide start_ARG italic_u end_ARG start_ARG roman_Δ italic_u end_ARG ⌋ , ⌊ divide start_ARG italic_v end_ARG start_ARG roman_Δ italic_v end_ARG ⌋ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (23)

where \lfloor\cdot\rfloor⌊ ⋅ ⌋ rounds a number down to the nearest integer; ΔθΔ𝜃\Delta\thetaroman_Δ italic_θ, ΔφΔ𝜑\Delta\varphiroman_Δ italic_φ, ΔdΔ𝑑\Delta droman_Δ italic_d, ΔuΔ𝑢\Delta uroman_Δ italic_u and ΔvΔ𝑣\Delta vroman_Δ italic_v are the widths of the bucket along each dimension. Voxels with similar plane orientation and proximate to each other are likely to be hashed into the same bucket.

V-B Voxel Merging

When a hash bucket accumulates enough voxels, a merge operation is triggered as shown in Fig. 3. Given multiple voxels in a bucket to be merged, we first find the voxel with the most points and then merge other voxels into this one by testing them against the merge criterion. This is because the uncertainty of a plane is gradually reduced with point accumulation. By taking the voxel with the most points as the reference, we are more confident in the merging correctness.

Refer to caption
Figure 3: Illustration of voxel merging. Through LSH, voxel planes proximate to each other in the parameter space are hashed into the same bucket. A merge operation is triggered once a bucket accumulates enough voxels.

To avoid merging voxels associated with different physical planes by error, we check the following merging criterion. Specifically, we require the eigenvalues of the covariance matrix 𝐀𝐀\mathbf{A}bold_A satisfy the planar assumption, i.e.,

λ3λ1<ηλ3λ2<ηformulae-sequencesubscript𝜆3subscript𝜆1𝜂subscript𝜆3subscript𝜆2𝜂\frac{\lambda_{3}}{\lambda_{1}}<\eta\qquad\frac{\lambda_{3}}{\lambda_{2}}<\etadivide start_ARG italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG < italic_η divide start_ARG italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG < italic_η (24)

where λ1,λ2,λ3subscript𝜆1subscript𝜆2subscript𝜆3\lambda_{1},\lambda_{2},\lambda_{3}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are eigenvalues of 𝐀𝐀\mathbf{A}bold_A in the descending order and η𝜂\etaitalic_η is a sufficient small threshold closed to zero.

Once merged with the reference voxel of a bucket, the parameters of the merged voxels, i.e., 𝐧𝐧\mathbf{n}bold_n, 𝐪𝐪\mathbf{q}bold_q and 𝚺𝐧,𝐪subscript𝚺𝐧𝐪\boldsymbol{\Sigma}_{\mathbf{n},\mathbf{q}}\ bold_Σ start_POSTSUBSCRIPT bold_n , bold_q end_POSTSUBSCRIPT, as well as the memory allocated for them, are released. In contrast, the memory of a reference voxel is shared by other voxels merged into it, leading to a considerable saving of memory footprint. Future points falling into any of these merged voxels cumulatively update the reference voxel. Thanks to the cumulative update approach presented in Section IV-B, the statistics term of each voxel can be summed trivially without iterating the vast number of points for the large plane being merged.

The voxel merging strategy enables a flexible representation of physical planes larger than the voxel dimension with only a single voxel. Consequently, the challenge of bias-variance trade-off in terms of voxel size detailed in Section I can be addressed.

V-C Accuracy Analysis

Voxel merging enhances mapping accuracy thanks to cross-voxel denoising. We provide a theoretic analysis in this section.

Given two voxels with point set \mathbb{P}blackboard_P and point set \mathbb{Q}blackboard_Q respectively. Both sets are sampled from the same physical plane. Suppose \mathbb{P}blackboard_P and \mathbb{Q}blackboard_Q contains M𝑀Mitalic_M and N𝑁Nitalic_N points following normal distribution:

={𝐩1,𝐩2,,𝐩M}𝐩i𝒩(𝝁p,𝚺p)formulae-sequencesubscript𝐩1subscript𝐩2subscript𝐩𝑀similar-tofor-allsubscript𝐩𝑖𝒩subscript𝝁𝑝subscript𝚺𝑝\mathbb{P}=\{\mathbf{p}_{1},\mathbf{p}_{2},...,\mathbf{p}_{M}\}\quad\forall% \mathbf{p}_{i}\sim\mathcal{N}(\boldsymbol{\mu}_{p},\boldsymbol{\Sigma}_{p})blackboard_P = { bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT } ∀ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) (25)
={𝐪1,𝐪2,,𝐪N}𝐪i𝒩(𝝁q,𝚺q)formulae-sequencesubscript𝐪1subscript𝐪2subscript𝐪𝑁similar-tofor-allsubscript𝐪𝑖𝒩subscript𝝁𝑞subscript𝚺𝑞\mathbb{Q}=\{\mathbf{q}_{1},\mathbf{q}_{2},...,\mathbf{q}_{N}\}\quad\forall% \mathbf{q}_{i}\sim\mathcal{N}(\boldsymbol{\mu}_{q},\boldsymbol{\Sigma}_{q})blackboard_Q = { bold_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } ∀ bold_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) (26)

According to Section IV-B, the mean 𝝁pqsubscript𝝁𝑝𝑞\boldsymbol{\mu}_{pq}bold_italic_μ start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT and covariance matrix 𝚺pqsubscript𝚺𝑝𝑞\boldsymbol{\Sigma}_{pq}bold_Σ start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT after merging \mathbb{P}blackboard_P and \mathbb{Q}blackboard_Q can be updated cumulatively:

𝝁pq=t𝝁p+(1t)𝝁q,t=MM+Nformulae-sequencesubscript𝝁𝑝𝑞𝑡subscript𝝁𝑝1𝑡subscript𝝁𝑞𝑡𝑀𝑀𝑁\boldsymbol{\mu}_{pq}=t\boldsymbol{\mu}_{p}+(1-t)\boldsymbol{\mu}_{q},\quad t=% \frac{M}{M+N}bold_italic_μ start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT = italic_t bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + ( 1 - italic_t ) bold_italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_t = divide start_ARG italic_M end_ARG start_ARG italic_M + italic_N end_ARG (27)
𝚺pq=t𝚺p+(1t)𝚺q+t(1t)(𝝁p𝝁q)(𝝁p𝝁𝒒)Tsubscript𝚺𝑝𝑞𝑡subscript𝚺𝑝1𝑡subscript𝚺𝑞𝑡1𝑡subscript𝝁𝑝subscript𝝁𝑞superscriptsubscript𝝁𝑝subscript𝝁𝒒𝑇\boldsymbol{\Sigma}_{pq}=t\boldsymbol{\Sigma}_{p}+(1-t)\boldsymbol{\Sigma}_{q}% +t(1-t)(\boldsymbol{\mu}_{p}-\boldsymbol{\mu}_{q})(\boldsymbol{\mu}_{p}-% \boldsymbol{\mu_{q}})^{T}bold_Σ start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT = italic_t bold_Σ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + ( 1 - italic_t ) bold_Σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + italic_t ( 1 - italic_t ) ( bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ( bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (28)

where t𝚺p+(1t)𝚺q𝑡subscript𝚺𝑝1𝑡subscript𝚺𝑞t\boldsymbol{\Sigma}_{p}+(1-t)\boldsymbol{\Sigma}_{q}italic_t bold_Σ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + ( 1 - italic_t ) bold_Σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is linear combination of sample covariance weighted by points, and 𝝁p𝝁qsubscript𝝁𝑝subscript𝝁𝑞\boldsymbol{\mu}_{p}-\boldsymbol{\mu}_{q}bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is a vector pointing from the center of \mathbb{Q}blackboard_Q to the center of \mathbb{P}blackboard_P. Here we introduce two lemmas to analyze why voxel merging benefits the accuracy.

Lemma 3

Given a non-zero vector 𝐯n𝐯superscript𝑛\mathbf{v}\in\mathbb{R}^{n}bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the matrix 𝐯𝐯Tsuperscript𝐯𝐯𝑇\mathbf{v}\mathbf{v}^{T}bold_vv start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT has only one non-zero eigenvalue λ=𝐯2𝜆superscriptdelimited-∥∥𝐯2\lambda=\lVert\mathbf{v}\rVert^{2}italic_λ = ∥ bold_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with the corresponding eigenvector 𝐯𝐯\mathbf{v}bold_v, where delimited-∥∥\lVert\cdot\rVert∥ ⋅ ∥ denotes the Euclidean norm.

Lemma 4

For a symmetric matrix 𝐒=𝐯𝐯T𝐒superscript𝐯𝐯𝑇\mathbf{S}=\mathbf{v}\mathbf{v}^{T}bold_S = bold_vv start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where 𝐯𝐯\mathbf{v}bold_v is a non-zero vector in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the eigenvectors and singular vectors of 𝐒𝐒\mathbf{S}bold_S are identical. The principal component of 𝐒𝐒\mathbf{S}bold_S computed using Principal Component Analysis is in the direction of 𝐯𝐯\mathbf{v}bold_v.

According to Lemma 3 and 4, apart from the sample covariance 𝚺psubscript𝚺𝑝\boldsymbol{\Sigma}_{p}bold_Σ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and 𝚺qsubscript𝚺𝑞\boldsymbol{\Sigma}_{q}bold_Σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, 𝚺pqsubscript𝚺𝑝𝑞\boldsymbol{\Sigma}_{pq}bold_Σ start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT in (28) increases along the direction of 𝝁p𝝁qsubscript𝝁𝑝subscript𝝁𝑞\boldsymbol{\mu}_{p}-\boldsymbol{\mu}_{q}bold_italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, which is expected to be on the physical plane. The increased variance on the plane enhances the magnitudes of the first and second principal components. As a result, the third principal component, i.e., the estimated normal direction in (4), gains increased robustness under the same noise level, as illustrated in Fig. 4.

Refer to caption
Figure 4: Illustration of accuracy enhancement thanks to voxel merging. The solid angle represents the robustness of plane estimation against noise. Merged voxel demonstrates a smaller angle than the original voxel under the same noise level.

VI Experiments

This section presents experiments on the accuracy and efficiency of C3P-VoxelMap against the state-of-the-art on benchmarks including KITTI[14], UTBM[15] and our self-collected data with a solid-state LiDAR Livox Mid360. The benchmarks cover both urban and indoor scenarios.

We compare our method with five direct LO/LIO methods, namely Fast-LIO2[6], Faster-LIO[5], VoxelMap[3], VoxelMap++[12] and LiTAMIN2[10]. A comparison of these approaches is shown in Table I. C3P-VoxelMap is built on [3] and publicly available on Github111https://github.com/deptrum/c3p-voxelmap.

VI-A Accuracy Evaluation

In this section, we evaluate our method on the odometry datasets of the KITTI Vision Benchmark[14] and UTBM[15]. Default parameters are used to evaluate open-sourced algorithms without extra tuning. Specifically, for VoxelMap and our proposed method, the maximum voxel size is 3m and the maximum octo-tree layer is 3, leading to a minimum voxel size of 0.375m. Since LiTAMIN2 is not open-sourced, we use results reported in the paper[10].

TABLE II: Accuracy (ATE In Meters) Comparison on UTBM Dateset
Approach 0713 0717 0719 0720 Average
(Length [m]) (5086) (4998) (4994) (5035) (5028)
Ours 9.71 10.28 13.68 10.84 11.09
VoxelMap 10.99 11.26 15.80 14.58 13.13
Faster-LIO 14.88 15.36 16.30 15.05 15.38
Fast-LIO2 13.11 14.98 16.18 15.45 14.9

The results of experiments on UTBM and KITTI are listed in Table II and III, respectively. Since KITTI datasets provide no IMU data and the LiDAR scans have already been undistorted, a constant motion model is employed for all methods. We utilize absolute trajectory error (ATE) defined in [16] as the metric to evaluate all methods. Lower ATE indicates higher odometry accuracy. The last column is the average ATE of all sequences weighted by the number of frames. It is shown that our method demonstrates around 20% higher accuracy than the state-of-the-art [3], especially on long sequences like KITTI-02 and UTBM-0720.

From Table II and III, Fast-LIO2 and Faster-LIO are similar in terms of accuracy, because both methods employ the same deterministic plane representation and the same measurement model in state estimation. This experiment indicates that the voxel structure does not contribute directly to the accuracy compared with the k-d tree structure. In contrast, VoxelMap[3] realizes higher accuracy by adopting probabilistic planes in the residual computation of IESKF. Our method further improves the accuracy with respect to VoxelMap thanks to the proposed on-demand merging.

For a better understanding of the accuracy improvement due to voxel merging, we visualize the merged voxels with distinct colors in Fig. 5. The large plane features, e.g., ground, walls, and ceilings, are merged successfully. Since the uncertainty of these planes is jointly determined by all merged voxels, a higher mapping accuracy is achieved than estimating the uncertainty of each voxel separately.

Refer to caption
Figure 5: Illustration of voxel planes: (a) w/o merge and (b) w/ merge. Voxel merge eliminates the wiggling of planes and unifies the poses of planes associated with the same physical plane.
TABLE III: Accuracy (ATE In Meters) Comparison on KITTI Odometry Training Sequences
Approach 00 01 02 03 04 05 06 07 08 09 10 Average
(Length [m]) (3724) (2453) (5067) (560) (393) (2205) (1232) (694) (3222) (1705) (919) (2016)
Ours 1.97 3.69 5.74 0.92 0.26 0.98 0.31 0.56 3.33 2.65 1.34 2.74
VoxelMap 3.34 4.52 10.88 0.95 0.18 1.10 0.30 0.71 2.77 1.84 1.34 3.95
VoxelMap++ - - - 1.39 0.41 - 0.33 0.55 - 1.40 1.71 -
Faster-LIO 5.016 19.495 9.692 0.977 0.397 1.661 2.142 1.018 4.307 2.326 1.866 5.25
Fast-LIO2 9.426 20.77 9.537 1.06 0.435 1.999 2.152 1.15 4.361 2.458 2.399 6.24
LiTAMIN2 5.8 15.9 10.7 0.8 0.7 2.4 0.9 0.6 2.5 2.1 1.0 5.1

Bold denotes the best accuracy for the case, “-” indicates a significant drift from the ground-truth.

TABLE IV: Time Evaluation (in milliseconds)
Map ID Ours (w/ merge) Ours (w/o merge) VoxelMap Faster-LIO Fast-LIO2
E M Total E M Total E M Total E M Total E M Total
KITTI 00 23.54 7.25 30.79 26.06 4.36 30.42 26.06 14.01 40.07 47.49 1.01 48.50 35.30 3.43 38.73
01 48.23 14.23 62.46 48.92 9.41 58.33 50.47 21.22 71.69 71.87 2.23 74.10 55.26 7.73 62.99
02 25.35 8.05 33.40 24.26 4.58 28.84 25.84 16.36 42.20 46.81 1.10 47.91 35.07 3.77 38.84
03 35.60 10.40 46.00 35.72 6.41 42.13 37.41 25.38 62.79 60.05 1.29 61.34 45.58 4.05 49.63
04 38.28 11.05 49.33 37.24 7.48 44.72 38.82 22.55 61.37 56.96 1.58 58.54 50.23 6.19 56.42
05 31.52 8.66 40.18 31.15 5.01 36.16 33.57 16.88 50.45 55.53 0.97 56.50 37.14 3.16 40.30
06 46.83 12.90 59.73 45.09 7.36 52.45 47.97 29.99 77.96 89.70 1.32 91.02 55.77 3.59 59.36
07 24.11 6.72 30.83 23.85 4.16 28.01 25.04 14.72 39.76 41.10 0.76 41.86 32.25 2.85 35.10
08 39.32 10.66 49.98 39.31 6.15 45.46 42.06 18.91 60.97 69.34 1.39 70.73 45.23 4.28 49.51
09 35.85 9.90 45.75 36.86 6.15 43.01 38.53 18.98 57.51 54.85 1.25 56.10 42.77 4.67 47.44
10 23.74 6.77 30.51 23.70 4.11 27.81 24.50 14.20 38.7 38.39 0.81 39.20 30.39 2.92 33.31
Avg. 31.37 9.06 40.43 31.59 5.38 36.97 33.18 17.66 50.84 55.58 1.17 56.75 39.78 3.91 43.69

“E” denotes the state estimation stage, “M” denotes the map update stage.

VI-B Efficiency Evaluation

Besides accuracy, we evaluate both the computation performance and the memory consumption of our algorithm against others. Experiments are conducted on a computer with Intel Core i5-12500H of 3.3GHz and 16GB RAM.

We notice some open-source implementations employ multi-threading techniques, e.g., TBB in Faster-LIO, OpenMP in Fast-LIO2 and VoxelMap. To compare the efficiency of the algorithm itself, we leave out the implementation-related performance gain from multi-threading. Thus, for a fair comparison, tests are performed on a single-threaded setup.

Time Efficiency: The performance results are listed in Table IV. All methods include two primary modules: state estimation and map update, and our C3P-VoxelMap outperforms others in the overall processing time.

For the state estimation stage, although the same IESKF framework is employed for all the tested methods, the difference in processing time cannot be ignored. As listed in Table IV, Faster-LIO consumes the longest 55.58 milliseconds in this stage as it requires a nearby voxel search which is time-consuming. Fast-LIO2 also spends 8.41 milliseconds more time than ours in indexing neighbor points in the k-d tree and re-estimate planes. In contrast, [3] and our C3P-VoxelMap obviate the need for NNS because each voxel contains an individual plane, which can be directly hashed in the voxel map. Thus, our method saves 20% and 43% time usage compared with Fast-LIO2 and Faster-LIO, respectively.

For the map update stage, thanks to our cumulative probabilistic update of voxel planes, the iteration of points and recalculation of Jacobians are eliminated and the time complexity is independent of the number of points. As a result, our C3P-VoxelMap runs 70%(w/o merge) and 49%(w/ merge) faster than the original VoxelMap as shown in Table IV. Since we not only update plane parameters but also update the uncertainty of the probabilistic plane, the time usage in our method takes longer than Fast-LIO2. However, our total processing time is 20% lower than VoxelMap and 7% lower than Fast-LIO2.

Memory Efficiency: We conduct two experiments to demonstrate the memory efficiency of our method against the others: First, we verify the constant complexity of cumulative probabilistic update. Second, we test the memory usage in both structured and unstructured environments.

To verify the constant space complexity against the number of voxel points therein, we adjust the maximum number of points N𝑁Nitalic_N stored in each voxel when comparing our method with the original VoxelMap[3]. If a voxel is updated by more than N𝑁Nitalic_N points, the plane uncertainty will not be updated anymore. We compare the memory consumption between the original VoxelMap and ours on KITTI’s 06 sequence under different voxel granularity: a single-layer voxel map with a fixed 2m voxel size, and a multi-layer voxel map with 3m root voxel and a 3-layer sub-voxel as used in VI-A.

Refer to caption
Figure 6: Comparison of memory usage between VoxelMap and ours.

As illustrated in Fig. 6, the multi-layer experiment consumes more memory than the single-layer setup as the granularity of the map is finer. However, compared to the linear complexity of the original VoxelMap, our method only occupies a constant memory that is independent of the number of points used by the voxel map, leading to constant memory usage.

TABLE V: Comparison of Memory Usage (in mb)
Approach 04 06 09 Indoor
(Length [m]) (393) (1232) (1705) (560)
C3P-VoxelMap 243.7 315.9 878.7 229.6
VoxelMap 521.0 1433.6 2150.4 1433.6

The comparison of the memory usage between C3P-VoxelMap and the original VoxelMap is listed in Table V. Besides urban environment tests on benchmark datasets, we also test our algorithm in an indoor environment with Livox Mid-360 LiDAR. The results demonstrate that our method behaves 70% greater memory efficiency than the VoxelMap.

VII Conclusion

This work presents a compact, cumulative, and coalescible probabilistic voxel mapping method termed C3P-VoxelMap, which involves a point-free representation of voxel planes, a cumulative probabilistic update method, and an on-demand voxel merging strategy. The plane uncertainty is represented by a fixed set of statistics regardless of the number of points and supports cumulative updates without re-computation. Our compact formulation reduces the space complexity from O(N)𝑂𝑁O(N)italic_O ( italic_N ) to O(1)𝑂1O(1)italic_O ( 1 ) and the time complexity from O(MN)𝑂𝑀𝑁O(MN)italic_O ( italic_M italic_N ) to O(N)𝑂𝑁O(N)italic_O ( italic_N ) with respect to the number of iterations and the number of points N. The memory efficiency is further optimized by merging voxels associated with the same physical planes. Thanks to the proposed locality-sensitive hash, the merging process is triggered on-demand with a small overhead. On-demand merging enhances the adaptability of our voxel map to represent the real world and achieves a balance in the bias-variance tradeoff. Moreover, this merging method improves the localization accuracy by smoothing out noises thanks to the cross-voxel updates. Experiments on KITTI and UTBM benchmark demonstrate that our C3P-VoxelMap outperforms the state-of-the-art with 20% higher accuracy, 20% higher performance, and 70% lower memory consumption. The high performance and low memory footprint of C3P-VoxelMap show promises in achieving real-time LiDAR SLAM on resource-constrained platforms.

References

  • [1] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4758–4765.
  • [2] K. Chen, R. Nemiroff, and B. T. Lopez, “Direct lidar-inertial odometry: Lightweight lio with continuous-time motion correction,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3983–3989.
  • [3] C. Yuan, W. Xu, X. Liu, X. Hong, and F. Zhang, “Efficient and probabilistic adaptive voxel mapping for accurate online lidar odometry,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8518–8525, 2022.
  • [4] L. Zhou, G. Huang, Y. Mao, J. Yu, S. Wang, and M. Kaess, “𝒫𝒞𝒫𝒞\mathcal{PLC}caligraphic_P caligraphic_L caligraphic_C-lislam: Lidar slam with planes, lines, and cylinders,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7163–7170, 2022.
  • [5] C. Bai, T. Xiao, Y. Chen, H. Wang, F. Zhang, and X. Gao, “Faster-lio: Lightweight tightly coupled lidar-inertial odometry using parallel sparse incremental voxels,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4861–4868, 2022.
  • [6] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022.
  • [7] D. V. Nam and K. Gon-Woo, “Solid-state lidar based-slam: A concise review and application,” in 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), 2021, pp. 302–305.
  • [8] P. Biber and W. Strasser, “The normal distributions transform: a new approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), vol. 3, 2003, pp. 2743–2748 vol.3.
  • [9] M. Magnusson, A. Lilienthal, and T. Duckett, “Scan registration for autonomous mining vehicles using 3d-ndt,” Journal of Field Robotics, vol. 24, no. 10, pp. 803–827, 2007. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.20204
  • [10] M. Yokozuka, K. Koide, S. Oishi, and A. Banno, “Litamin2: Ultra light lidar-based slam using geometric approximation applied with kl-divergence,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 11 619–11 625.
  • [11] M. Yokozuka, K. Koide, S. Oishi, and A. Banno, “Litamin: Lidar-based tracking and mapping by stabilized icp for geometry approximation with normal distributions,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 5143–5150.
  • [12] C. Wu, Y. You, Y. Yuan, X. Kong, Y. Zhang, Q. Li, and K. Zhao, “Voxelmap++: Mergeable voxel mapping method for online lidar(-inertial) odometry,” IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 427–434, 2024.
  • [13] Z. Liu and F. Zhang, “Balm: Bundle adjustment for lidar mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3184–3191, 2021.
  • [14] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354–3361.
  • [15] Z. Yan, L. Sun, T. Krajník, and Y. Ruichek, “Eu long-term dataset with multiple sensors for autonomous driving,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 697–10 704.
  • [16] Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7244–7251.