Next Article in Journal
Detection and Prediction of Probe Mark Damage in Wafer Testing
Previous Article in Journal
Border Gateway Protocol Route Leak Detection Technique Based on Graph Features and Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration

by
Yue Dai
1,†,
Shuilin Wang
1,†,
Chunfeng Shao
1,
Heng Zhang
2 and
Fucang Jia
3,4,*
1
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
2
Faculty of Robotics Science and Engineering, Northeastern University, Shenyang 110819, China
3
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
4
The Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(20), 4074; https://doi.org/10.3390/electronics13204074
Submission received: 14 August 2024 / Revised: 10 October 2024 / Accepted: 12 October 2024 / Published: 16 October 2024

Abstract

:
Point cloud registration is one of the fundamental tasks in computer vision, but faces challenges under low overlap conditions. Recent approaches use transformers and overlapping masks to improve perception, but mask learning only considers Euclidean distances between features, ignores mismatches caused by fuzzy geometric structures, and is often computationally inefficient. To address these issues, we introduce a novel matching framework. Firstly, we fuse adaptive graph convolution with PPF features to obtain rich feature perception. Subsequently, we construct a PGT framework that uses GeoTransformer and combines it with location information encoding to enhance the geometry perception between source and target clouds. In addition, we improve the visibility of overlapping regions through information exchange and the AIS module, aiming at subsequent keypoint extraction, preserving points with distinct geometrical structures while suppressing the influence of non-overlapping regions to improve computational efficiency. Finally, the mask is refined through contrast learning to preserve geometric and distance similarity, which helps to compute the transformation parameters more accurately. We have conducted comprehensive experiments on synthetic and real-world scene datasets, demonstrating superior registration performance compared to recent deep learning methods. Our approach shows remarkable improvements of 68.21% in R R M S E and 76.31% in t R M S E on synthetic data, while also excelling in real-world scenarios with enhancements of 76.46% in R R M S E and 45.16% in t R M S E .

1. Introduction

Point cloud registration is one of the core tasks in the field of computer vision and is crucial for many application scenarios such as augmented reality [1,2], predictive modeling [3,4], 3D reconstruction [5,6] and autonomous driving [7,8,9]. The registration process aims to determine the optimal 3D rigid-body transformation between two sets of point clouds. The registration task is particularly challenging in cases where the point clouds partially overlap and the initial positions differ significantly [10].
The more widely used in practice today is the traditional method. Traditional methods can be categorized into two types, one is based on hand-crafted features. That is, correspondences are found through elaborate descriptors (such as FPFH [11], LOVS [12]), and then false matches are eliminated through randomized consistency sampling (RANSAC [13]). The other is optimization-based methods such as the Iterative Closest Point (ICP [14]) algorithm and its variants. They estimate the optimal transformation matrix through an iterative process, but are usually limited by the nonconvexity problem and can easily fall into local optima. Improved algorithms such as Go-ICP [15] use a global optimization strategy to avoid local optima, but they are much less computationally efficient than ICP and are highly sensitive to the initial estimation, which makes them unsuitable for real-time applications [5].
In recent years, deep learning has shown new possibilities in the field of point cloud registration, which can be broadly categorized into two groups based on whether correspondence matching is required to compute correspondences: correspondence matching-based methods and correspondence-free methods.
Correspondence matching-based methods, such as Deep Closest Point (DCP [16]) and PRNet [17], propose to learn feature matching to determine point correspondences but are prone to false matches due to hard point pair assignments. IDAM [18] improves computational efficiency by directly eliminating regions of its own geometrically inconspicuous shape through hard point elimination, but this tends to exclude overlapping regions of points that may correspond correctly. RPMNet [19] enhances the perception of features by combining normal information and possesses high matching accuracy, but fails in low overlap cases. ROPNet [20] improves the registration accuracy by using some of the significant points in the overlapping regions for registration through salient overlap point detection, but it is susceptible to the influence of outliers and the algorithm is not robust enough. Recent Transformer-based approaches can significantly improve feature perception through information interaction. Predator [21] uses graph neural networks and cross-attention to enable information interaction and to obtain overlap and saliency scores. Lepard [22] proposed a relocated location modeling approach and self-attention and cross-attention to enhance feature modeling capabilities. CMIGNet [23] combines cross-modal information to perceive transformations, which further improves feature perception, but the lack of perception of its own structure leads to unsatisfactory registration accuracy. Geometry Transformer [24], which focuses on geometric information through attentional computation, alleviates the feature perception problem under low overlap, but the computational effort is substantially higher. UTOPIC [24] proposes an improved version, which reduces the complexity of the geometric transformer and improves the computational efficiency. However, it is difficult to distinguish between a large number of repetitive geometric structures in the scene. For correspondence-free methods, which usually have faster speeds in comparison, such as PCRNet [25] and PointNetLK [26], the transformations are estimated directly through global feature aggregation. However, these methods are blind to the negative effects of non-overlapping regions. OMNet [27] proposes to deal with the effects of non-overlapping regions by learning overlapping masks, but ignores the information of its own primitive geometric structure, resulting in a lack of accuracy in the learning of overlapping masks.
To address the above problems, this study proposes a novel iterative registration network aimed at accurately estimating the 3D rigid transformations between point clouds, and in the process improving the robustness to noise and interference. Our main contributions include the following:
  • We introduce an adaptive graphical feature extraction network and enhance the geometric structure information of the features by fusing Point Pair Features (PPF [28]) feature information. The geometric perception during mask learning can be enhanced.
  • We propose a keypoint selection mechanism based on feature interaction with the AIS module to efficiently mine keypoints in overlapping regions and reduce the impact of the risk of losing the correct corresponding points during keypoint extraction.
  • Jointly positional information coding and GeoTransformer to construct PGT module to further improve the ability of feature-aware geometric information, and at the same time better capture the global context information in feature space and geometric space to obtain reliable geometric-aware feature representation.
  • The overlap mask learning module is introduced to mitigate the unfavorable effects of non-overlapping regions by learning the overlap masks of geometrically-aware features. This approach enables the model to distinguish and utilize overlapping regions more accurately, thereby improving the accuracy of the registration process. At the same time, we use the learning process of contrastive learning constrained masks to improve the differentiation of features.

2. Related Work

2.1. Correspondence Matching Based Methods

DCP [16] replaces manual feature descriptors with convolutional neural networks and employs an attention mechanism to promote information interaction between pairs of points, thus enhancing the ability of approximate matching. PRNet [17] expands DCP on the basis of iterative methods and proposes a scheme for detecting the keypoints of overlapping regions, but its computational efficiency is not satisfactory. IDAM [18] uses hard-point elimination in key point selection through significance scores, selects partially significant features to improve computational efficiency, and adopts a mixed-point-pair soft-elimination strategy in matching feature selection, but it is easy to eliminate the points that originally have the correct correspondence with the target at the initial time, which will bring negative impacts. RPMNet [19], on the other hand, extracts hybrid features, makes full use of geometric information such as normals for registration, and further utilizes Sinkhorn normalization [29] to promote the bijection of matching matrices, which performs well on synthetic data. ROPNet [20] uses significance overlap region detection, which further improves computational efficiency and accuracy. However, all of the above methods do not work well with low or even lower overlap because there is not necessarily a perfect one-to-one correspondence between two partially overlapping point clouds, they do not completely remove the interference from non-overlapping regions, and matching using sparse points is prone to mis-correspondence. The recent introduction of self-attention and cross-attention-based Transformers enhances the perception of the source point cloud to the target point cloud. For example, Geometry Transformer [30] obtains enhanced features by detecting transform-invariant geometric structure information in the hyperpoint region, followed by a coarse-to-fine robust point matching stage to obtain the optimal transform matrix, but its use of KPconv [31] to extract the descriptors and a complex correspondence matching stage increase the computational complexity. UTOPIC [24] proposes a modified Geometry Transformer with lower complexity, but additional optimizations of the solution complicate the process. CMIGNet [23] uses cross-modal information to constrain point-pair matching, and although it achieves some improvement in accuracy, it ignores its own geometric sensing, making semantic fusion on the Unclear. In contrast to the above approaches, our work enhances geometric perception and utilizes feature interaction and AIS modules to enhance the geometric structure of extracted features and discard features corresponding to points in non-overlapping regions, thereby improving computational efficiency.

2.2. Global Feature Based Methods

Unlike correspondence matching-based methods, global feature-based methods compute the regression transformation matrix parameters directly from global features extracted from the entire point cloud of the two input points, including both overlapping and non-overlapping regions. PointNetLK [26] is one of the pioneering works that combines PointNet [32] and the LK algorithm [33] into a recurrent neural network that iteratively regresses the registration parameters. PointNetLK improves robustness to noise by alternating between the LK algorithm and the regression network. FMR [34] adds a new branch of the decoder that optimizes the global feature distances of the input. However, they both similarly ignore the effects of non-overlapping regions. OMNet [27] utilizes information about overlapping points for registration by learning an overlap mask, but it only uses similarity in distances between features to determine overlapping regions, ignoring similarity in geometric structure between features. UTOPIC [24] uses overlap uncertainty perception to take into account ambiguous overlapping points but actually introduces additional computation, and we only need a small number of significant points to produce an accurate registration. We use contrast learning to and exploit the similarity of the geometric structure of the enhanced features to further refine the features to obtain more accurate overlap masks.

3. Methods

Our registration method is shown in Figure 1, where we use the rotation matrix R and translation vectors t to represent the 3D transform. As shown in Figure 2, we first extract features of the source point cloud and the target point cloud to obtain the mixed features W x and W y of the fusion image and the PPF structure. We then perform geometry-aware enhancement through Geotransformer [24] and extract location information through an additional encoder to obtain the enhanced perceptual features F x and F y in Figure 3. We combine AIS modeling and feature interaction to capture feature similarity in order to select features corresponding to the key points in Figure 4. Prior to each iteration, the source point cloud is transformed using the rigid transformation obtained from the previous iteration, and geometrically aware features are fed into the overlapping mask prediction module to improve mask prediction. Finally, we fuse the features from the two point clouds and input them into the transform matrix computation module to output the transform parameters for the next iteration. We use the transform parameters from the iteration for the next stage of point cloud pose adjustment, and we recalculate them each round based on the adjusted point cloud. The parameters are trained independently for each iteration.

3.1. Problem Definition

Consider the source point cloud X = { x 1 , x 2 , , x n } R 3 and target point cloud Y = { y 1 , y 2 , , y m } R 3 , where each point is represented by its 3D coordinates. The goal of point cloud registration is to identify correspondences between points in the source point cloud X and the target point cloud Y , and to find an optimal spatial transformation T, that by transforming the source point cloud X , it best matches the target point cloud Y .

3.2. Multi-Feature Extraction and Fusion

As shown in Figure 2, using an adaptive graph convolutional network [35], we extract multilevel graph features (64, 64, 128, 256) from each input point cloud. These features encompass rich local geometric and global contextual information. Simultaneously, the extracted PPF features are integrated with the acquired local and global graph features through Multilayer Perceptron (MLP) to yield a final output of dimension 512. This approach enables our features to blend multiscale graph information with geometric details, resulting in a more representative representation.

3.3. Position Embedding and Geometry Transformer

Many studies have shown that Transformers can effectively capture contextual information and perform cross-feature fusion when processing point cloud data. However, previous approaches only provide deep features to the Transformer, ignoring the geometric information in the features, resulting in a lack of differentiation in the learned features. In addition, due to the lack of relative position information, it is difficult to model the complex feature space, which increases the computational complexity.
In the field of point cloud registration, DIT [36] uses a PointNet-like network for position encoding, and Leaprd [22] has proposed a solution that utilizes position encoding and relocation techniques to encode the relative positions of points. However, their limitations are that they sacrifice the ability to perceive the geometric information of the point cloud. To address this problem, Geometric Transformer [30] introduces an innovative geometric relative position embedding technique that enhances feature discrimination by embedding the geometric information of the point cloud into the transformer. Although Geometric Transformer’s approach provides better utilization of geometric information, its high memory consumption limits its application on large scale point cloud data.
To overcome this challenge, UTOPIC [24] proposes an improved version of the Geometric Transformer. The UTOPIC approach not only encodes geometric information efficiently but also consumes relatively low memory. By efficiently encoding and compressing the geometric information, UTOPIC provides high-quality geometric information while reducing the memory footprint, making it more efficient in processing large-scale point cloud data. The structure of UTOPIC consists of a geometric self-attention module and a feature cross-attention module alternately, which are iterated N times. The method generates enhanced features F x and F y . For the exact structure of the Geometry Transformer, please refer to the modified version in [24], and the original version can be found in detail in [30]. We improve on this by taking into account both location modeling capabilities and feature awareness, and improving significance through residual linkage. Our module is visible in Figure 3.

3.4. Feature Interaction and AIS Module

We use feature interaction to supplement feature information from different point clouds, thereby generating more robust and discriminative feature descriptions. Specifically, we use the Maxpool method to obtain the most representative features of the source and target point cloud features F x and F y , and repeat N times to obtain their respective features F x ˜ and F y ˜ . By superimposing these features, we can capture similar descriptions of overlapping areas, thereby reducing the probability of false matches. We further optimize feature extraction by computing soft correspondence matrices between features to capture similarity structures. The soft correspondence matrix of the features is computed using the AIS module to obtain their similarity weights. The AIS module consists of three steps: affinity, instance normalization, and sinkhorn [29]. To compute the affinity matrix A, follow the steps in (1), where W is a learnable parameter in the affinity matrix [37].
A i , j = ( H X ^ ) T W ( H Y ^ )
Transforming A by instance normalization changes the elements of this matrix to finite positive values. Finally, it is processed using the Sinkhorn [29] operator, where a row and column of zeros are appended. This makes it so that there is no corresponding node to match the dummy row and column.

3.5. KeyPoint’s Feature Selection

Based on the features extracted by the information interaction and the AIS module, we use the weights of MLP regression to sort the features of all points and index the features corresponding to the top K high-significance points. The purpose of this step is to avoid extracting indifferent features, which may lead to a lack of clarity in the registration result. Therefore, we choose to extract some key points to optimize this process.
Unlike the previous key point extraction method IDAM [18], our method selects key points based on the interaction between features and the AIS module. Compared with selecting points based only on the significance scores obtained by MLP, this approach can avoid excluding points outside the overlapping area with correct correspondence. By considering the interaction between features, we can give higher weights to the significance scores of features in the overlapping area, thereby more effectively eliminating the influence of non-overlapping areas. We prefer to use feature interactions to select keypoints because this helps ensure the accuracy and clarity of the registration results. Figure 4 demonstrates our keypoints selection module.

3.6. Overlapping Mask Contrastive Learning

Masks are taught to distinguish between overlapping and non-overlapping points and are weighted. We refer to the method for learning overlap masks in OMNet [27]. The difference is that we input features after focusing on geometric information into the overlap mask prediction module to generate new overlap masks and decoded features for computing the transformation matrix. We use contrastive learning loss to constrain mask learning so that our masks are more accurate and focus on the representation of overlapping regions. Our module is represented by the function g ( · ) , which contains three layers of MLPs (512, 512, 1), and uses sigmoid as the activation function to obtain the final mask. The input features are denoted by F ^ x , F ^ y and the output masks are denoted by M x i , M y i . The mask obtained from the previous iteration guides the subsequent corresponding search phase for constraining the input features.
M x i = g ( F ^ x )
M y i = g ( F ^ y )

3.7. Correspondences Searh

We combine the spatial coordinates P X m , P Y m and features F X m , F Y m of the source point cloud X and the target point cloud Y, respectively, and include the intermediate features of the overlapping mask prediction area. This can enhance perception by using multi-level features and compress them into one dimension through convolution operations, and obtain the coordinate matching matrix M P and feature matching matrix M F . We add them to form the final matching matrix; meanwhile, we concatenate them and apply maximum aggregation and convolution operations to calculate the matching score matrix S of X. Therefore, we can calculate the weight w i of the i t h point pair via (4). Weights are used to differentiate the importance between pairs of points. Where · indicates the indicator function and s i indicates the matching score of x i . Ultimately, we utilize Singular Value Decomposition (SVD) to derive the transformation matrix via (5). Where x i is the corresponding point found by x i according to the final matching matrix. Figure 5 demonstrates our correspondence search module.
w i = s i · s i median m ( s m ) i s i · s i median m ( s m )
R , t = a r g m i n i w i R x i + t x i

3.8. Loss Function

Overlapping Contrastive Loss. In order to highlight the features of overlapping areas and suppress the influence of non-overlapping areas, we introduced overlapping contrastive learning. We first transform the source point cloud to its true value, and mark the overlapping points in the source point cloud X and the target point cloud Y according to the threshold. Then, the features obtained in the overlapping prediction module are marked as overlapping features and non-overlapping features according to the index. The overlapping point features are regarded as the positive set P . For the non-overlapping point features, there are two cases N 1 and N 2 . One is the overlapping point features of the source point cloud and the non-overlapping point features of the target point cloud, and the other is the non-overlapping point features of the source point cloud and the overlapping point features of the target point cloud. At the same time, the contrastive loss is calculated, and the loss function is constructed as follows:
L O C L = ( i , j ) P D f X i , f Y j σ p + 2 / | P | + ( i , j ) N 1 σ n D f X i , f Y j + 2 / N 1 + ( i , j ) N 2 σ n D f Y i , f X j + 2 / N 2
In the formula, D(·, ·) represents the Euclidean distance between features, and [ · ] + represents the function max(x, 0). σ n and σ p are the threshold distances for positive and negative pairs, which are used to prevent the network from overfitting. The threshold distance between positive and negative pairs somewhat limits the points in non-overlapping regions, while the points in overlapping regions are selected as far as possible within a certain range and the anomalies are further eliminated based on the subsequent network.
Keypoints Select Loss. Direct annotation is not possible for real keypoints. We use mutual supervised loss [18] to supervise it. The central idea is that keypoints exhibit low entropy because they are confident of matching. We define the loss of the keypoints prediction as follows:
L M P = 1 K i = 1 K a ( i ) j = 1 K M ( i , j ) log ( M ( i , j ) ) 2
Match Score Loss. The cross-entropy loss function is used to calculate the matching weight loss. The true matching score is labeled s ^ i , which represents the index label within the threshold under the ground truth transformation.
L M S ( n ) = 1 K i = 1 K s ^ i log ( s ( i ) ) ( 1 s ^ i ) log ( 1 s ( i ) )
Correspondences Search loss. The network iterates n times, and the sum of the losses of the supervision calculation of the matching matrix is defined as follows:
L C S ( n ) = 1 K i = 1 K d ^ i log ( M ( n ) ( i , j * ) )
where j * refers to the index of the closest point under the real transformation, and d ^ i is a marker to determine whether it is below the distance threshold. Our final loss is expressed as follows:
L t o t a l = L O C L + L M P + n ( L M S ( n ) + L C S ( n ) )

4. Experiment

In this section, we first describe our experimental setup. This includes the dataset, the setup of the comparison algorithm, the evaluation metrics and some implementation details. We performed a number of experiments on the ModelNet dataset, under invisible classes and invisible shapes. Also, in order to verify the effectiveness of the registration in the case of partial overlap and the generalization performance of the method, we performed experiments on the unseen classes in the case of partial overlap with the addition of Gaussian noise. We also report the computational time of the comparison algorithms to evaluate the computational efficiency of the methods. To assess the robustness of the algorithm in different overlapping cases, we conducted experiments with different overlapping degrees under different classes. Tests were also performed under different noises to verify the robustness of the algorithm against strong noises. Finally, we performed ablation experiments on the algorithm to verify the effect of our different modules on the registration accuracy.

4.1. Experimental Setup

Dataset. MdoelNet40 Dataset. The dataset ModelNet was generated from the ModelNet40 [38] dataset. The dataset consists of 5112 training data, 1202 validation data and 1266 test data. The first 20 categories of the dataset are used for training and validation, and the remaining 20 categories are used for testing. Following previous work [27], for a given shape, 1024 points were randomly selected to form a point cloud and three Euler angle rotations in the range [0, 45°] and translations in the range [−0.5, 0.5] were randomly generated as rigid transformations on each axis. For partial overlap generation, similar to [19], the overlap was 70% and 717 points were retained per point cloud. To generate noisy point clouds, we dithered the points in both point clouds with randomly selected noise from N (0, 0.01) and cropped to [−0.05, 0.05] on each axis, and finally disrupted each point cloud and reordered all points.
The 7Scenes dataset [39] is a collection of commonly used environmental point clouds of indoor environments captured using a handheld Kinect RGB-D camera. These point clouds are generated by merging depth graphs and colour maps before inputting them into subsequent models. The dataset consists of point clouds from seven different indoor environments: chess, fire, head, office, pumpkin, red kitchen, and stairs. The dataset is divided into 296 training samples and 57 test samples. We reduce the sampling rate of each point cloud to 2048 points and then copy them into the source and target point clouds through preprocessing, and then crop them by 70% to obtain local point clouds. The rotation and translation transformations are the same as in the previous Model.
Evaluation metrics. For the evaluation metrics, we follow the metrics [19] used in the previous method and we use anisotropy metrics, i.e., the root mean square error (RMSE) and the mean absolute error (MAE) of the Euler angles and translations.
Baseline Algorithms and Implement details. We compare our approach to recent deep learning-based methods: DCP [16], PointNetLK [26], IDAM [18], RPMNet [19], ROPNet [20], OMNet [27], FINet [40] and CMIGNet [23], which were tested using their official pre-trained models and codes. We train our network end-to-end using PyTorch implementation with 4090 GPU. We run four iterations during training and testing. We train our network with the Adam [41] optimizer for 100 epochs. The initial learning rate is 10 4 .

4.2. Same Categories

We randomly selected 1024 points for the model and rotated and translated them by the previously described preprocessing operation, while retaining 70% of the points for the simulation of partial point cloud registration. Our results can be seen in the Table 1, where our method acquires the smallest registration error. The graphical results can be seen in Figure 6.

4.3. Unseen Categories

In this experiment, we evaluate the generalization ability of our method on unseen categories. Specifically, we test the performance of our method on 20 new categories on the ModelNet40 test set that have not previously appeared in model training. To ensure a fair comparison, the data preprocessing steps in this experiment are the same as in the previous first experiment. Despite the new challenges posed by confronting unknown categories, our method still exhibits excellent performance. Our results can be seen in the Table 2, The graphical results can be seen in Figure 7.

4.4. Gaussian Noise in Same Categories

An in-depth evaluation of the model’s performance in the presence of noise has been performed considering real-world applications. In real-world applications, point cloud data are usually disturbed by interference from various environmental factors, such as sensor errors or environmental disturbances, which result in the presence of a large amount of noise in the data. In order to simulate this real-world situation, we extend the preprocessing steps used in previous experiments. In addition to the conventional translation and rotation operations, we additionally introduced random Gaussian noise with a standard deviation of 0.01, which is limited to the range between [−0.05, 0.05]. The introduction of this noise allows our model to be tested in more challenging environments to evaluate its robustness and performance in the context of noisy data. Our results can be seen in the Table 3; the graphical results can be seen in Figure 8.

4.5. Partial Visibility with Gaussian in Unseen Categories

As shown in the Table 4, our method outperforms recent state-of-the-art deep learning methods in the case of noise and with partial overlap; the graphical results can be seen in Figure 9. The closest to our method is CMIGNet [23]; however, our accuracy is far superior due to our enhanced geometry perception. We also report the average running time of the different methods, PointNetLK [26] and DCP [16] have lower computation times than ours but are not as accurate, which is related to their insufficient suppression of non-overlapping regions. RPMNet [19] uses normal information to further improve the accuracy of perception, but also performs poorly. OMNet [27] also performs information interaction but does not take into account the saliency of the features, and therefore, takes longer relative to us. ROPNet [20] uses a mixture of features and significance scores to compute matches but does not consider the effect of fuzzy geometric structures, and the registration degrades dramatically under the influence of outliers. Since both RPMNet [19] and IDAM [18] are based on correspondence matching, the process is more complex, while our computation time is shorter than theirs and we obtain better results. Our computation time can be seen in Figure 10.

4.6. Different Overlap Ratios with Gaussian in Unseen Categories

In order to test the robustness to partial overlap we conducted experiments considering the effect of Gaussian noise and with invisible categories at different overlap rates of 70%, 65%, 60%, 55%, 50%, 45%, 40% and 35%. The registration accuracies are presented in Table 5 and the related visualization example results are shown in Figure 11. The experimental results show that as the overlap rate decreases, the overlap region decreases and less structural information is used for registration. In other words, as the overlap rate decreases from 70% to 35%, the registration accuracy continues to decrease, verifying the effect of the low overlap rate on point cloud registration. The decreasing trend is especially significant as it approaches 35%.
However, from the visualization results and the magnitude of accuracy degradation, our method shows strong robustness at low overlap rates. Therefore, our method is able to effectively deal with the point cloud registration problem at low overlap rates. We also compare the effect with our recent method CMIGNet [23], where our error is significantly lower than it at different overlap rates. The results can be seen in Figure 12.

4.7. Noise at Different Levels

In order to assess the effect of different noise levels on the results, we performed tests similar to the experimental setup with different overlap ratios but fixed the overlap ratio at 70%. In this setup, we added Gaussian noise levels according to different standard deviations increasing from 0.01 to 0.05 in order to observe the registration effect of our method in the face of gradually increasing noise intensity. The results show that our method maintains good registration in the face of gradually increasing noise intensity. Specific data are listed in the Table 6, our visualization results are displayed in Figure 13.

4.8. Evaluation of the 7Scences Dataset

For comparative evaluation using the 7Scenes dataset and categories (Office) for the Testing. As shown in Table 7, our method performs well in real scenarios. Figure 14 provides a few examples from the 7Scenes dataset. Because point clouds in real scenes have more geometric structures, most methods have improved registration accuracy, while our method pays more attention to the geometric structures, and therefore, has higher registration accuracy.

4.9. Ablation Studies

We performed ablation experiments on the proposed modules. Firstly, we labeled the feature extraction module that combines adaptive graph convolution with PPF features as the PA module, followed by the PGT module that combines geometric transformers and positional information. We labeled the keypoint selection module as the FAS module, which includes a combination of the feature interaction FI and AIS modules, as they both aim to capture similarity. Finally, we labeled the overlap mask modules as OM. All experiments were performed with partial overlap (70%) and added Gaussian noise.
The experimental results are shown in Table 8 and the results indicate that the keypoints selection module based on the information interaction and AIS module is effective, also because we fused the geometric information and further used the geometric converter to enhance the perception. Also, the geometric converter’s focus on geometric structures makes our masks more focused on those regions with similar geometric structures, which improves the learning accuracy of the masks. The use of contrastive learning to constrain overlapping masks enables the anomalous effects of non-overlapping regions to be suppressed. We further performed ablation experiments on the AIS and FI module, and Figure 15 shows our results.

5. Conclusions

Our approach utilizes information interaction and AIS modules to learn similar correlation matrices to capture overlapping regions, and exploits geometric information enhancement techniques to mine different points in the overlapping regions, thereby improving computational efficiency and avoiding the risk of eliminating correctly corresponding points. The overlap mask is also learned to predict the overlapping regions, while the accuracy of the overlap mask is further improved by introducing contrast learning to refine the overlapping regions and suppress the influence of non-overlapping regions. However, our registration method may not work well in cases where the geometry is not obvious, especially when the overlap is low and there are a large number of smooth and ambiguous geometries, the registration accuracy drops sharply. Our follow-up work will use multimodal information fusion to enhance feature perception and guide the registration process through the semantic information of the images. This results in higher registration accuracy results.

Author Contributions

All authors have made significant contributions to this work. Y.D. and S.W. were responsible for designing the network framework and conducting the experiments; F.J. offered valuable suggestions and guidance on the network; C.S. and H.Z. meticulously prepared the training dataset; Y.D. and S.W. thoroughly analyzed the results; Y.D. and S.W. collaborated on drafting the original paper; F.J., Y.D. and S.W. contributed to revising the paper; F.J. acquired the funding support; F.J. supervised the work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 62172401 and 82227806, the Guangdong Natural Science Foundation under grant number 2024A0505040020, 2022A1515010439 and 2022A0505020019, the Shenzhen Scientific and Technology Program Grant under grant number SGDX20230116092200001 and JCYJ20220818101802005, the Zhuhai Science and Technology Program under grant number ZH22017002210017PWC.

Data Availability Statement

The ModelNet40 Dataset can be download in https://modelnet.cs.princeton.edu/, accessed on 15 August 2023.

Conflicts of Interest

The authors declare that there are no conflicts of interest with regard to this study.

Abbreviations

The following abbreviations are used in this manuscript:
3DThree dimension
RMSERoot mean square error
MAEMean absolute error
FPFHFast point feature histogram
SVDSingular value decomposition
MLPMultilayer perceptron
PPFPoint pair features
LOVSLocal object view selection
RANSACRandom sample consensus
CNNConvolutional neural network

References

  1. Azuma, R.T. A survey of augmented reality. Presence Teleoperators Virtual Environ. 1997, 6, 355–385. [Google Scholar] [CrossRef]
  2. Carmigniani, J.; Furht, B.; Anisetti, M.; Ceravolo, P.; Damiani, E.; Ivkovic, M. Augmented reality technologies, systems and applications. Multimed. Tools Appl. 2011, 51, 341–377. [Google Scholar] [CrossRef]
  3. Peng, Y.; Yamaguchi, H.; Funabora, Y.; Doki, S. Modeling fabric-type actuator using point clouds by deep learning. IEEE Access 2022, 10, 94363–94375. [Google Scholar] [CrossRef]
  4. Peng, Y.; He, M.; Hu, F.; Mao, Z.; Huang, X.; Ding, J. Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2405.07488. [Google Scholar] [CrossRef]
  5. Izadi, S.; Kim, D.; Hilliges, O.; Molyneaux, D.; Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, S.; Freeman, D.; Davison, A.; et al. Kinectfusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 559–568. [Google Scholar]
  6. Merickel, M. 3D reconstruction: The registration problem. Comput. Vision Graph. Image Process. 1988, 42, 206–219. [Google Scholar] [CrossRef]
  7. Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
  8. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  9. Dai, Y.; Jia, F.C. Geometry-Aware Enhancement-Based Point Elimination with Overlapping Mask Learning for Partial Point Cloud Registration. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–7. [Google Scholar]
  10. Eckart, B.; Kim, K.; Kautz, J. HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 705–721. [Google Scholar]
  11. Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
  12. Quan, S.; Ma, J.; Hu, F.; Fang, B.; Ma, T. Local voxelized structure for 3D binary feature representation and robust registration of point clouds from low-cost sensors. Inf. Sci. 2018, 444, 153–171. [Google Scholar] [CrossRef]
  13. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  14. Besl, P.J.; McKay, N.D. A method for registration of 3D shapes. IEEE Trans. Pattern. Anal. Mach. Vision 1992, 14, 239–256. [Google Scholar] [CrossRef]
  15. Yang, J.; Li, H.; Jia, Y. Go-ICP: Solving 3D registration efficiently and globally optimally. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 1457–1464. [Google Scholar]
  16. Wang, Y.; Solomon, J.M. Deep closest point: Learning representations for point cloud registration. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3523–3532. [Google Scholar]
  17. Wang, Y.; Solomon, J.M. Prnet: Self-supervised learning for partial-to-partial registration. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8812–8824. [Google Scholar]
  18. Li, J.; Zhang, C.; Xu, Z.; Zhou, H.; Zhang, C. Iterative distance-aware similarity matrix convolution with mutual-supervised point elimination for efficient point cloud registration. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020; pp. 378–394. [Google Scholar]
  19. Yew, Z.J.; Lee, G.H. Rpm-net: Robust point matching using learned features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11824–11833. [Google Scholar]
  20. Zhu, L.; Liu, D.; Lin, C.; Yan, R.; G�mez-Fern�ndez, F.; Yang, N.; Feng, Z. Point cloud registration using representative overlapping points. arXiv 2021, arXiv:2107.02583. [Google Scholar]
  21. Huang, S.; Gojcic, Z.; Usvyatsov, M.; Wieser, A.; Schindler, K. Predator: Registration of 3D point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 4267–4276. [Google Scholar]
  22. Li, Y.; Harada, T. Lepard: Learning partial point cloud matching in rigid and deformable scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5554–5564. [Google Scholar]
  23. Xie, Y.; Zhu, J.; Li, S.; Shi, P. Cross-modal information-guided network using contrastive learning for point cloud registration. IEEE Robot. Autom. Lett. 2023, 9, 103–110. [Google Scholar] [CrossRef]
  24. Chen, Z.; Chen, H.; Gong, L.; Yan, X.; Wang, J.; Guo, Y.; Wei, M. UTOPIC: Uncertainty-aware Overlap Prediction Network for Partial Point Cloud Registration. Comput. Graph. Forum 2022, 41, 87–98. [Google Scholar] [CrossRef]
  25. Sarode, V.; Li, X.; Goforth, H.; Aoki, Y.; Srivatsan, R.A.; Lucey, S.; Choset, H. Pcrnet: Point cloud registration network using pointnet encoding. arXiv 2019, arXiv:1908.07906. [Google Scholar]
  26. Aoki, Y.; Goforth, H.; Srivatsan, R.A.; Lucey, S. Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7163–7172. [Google Scholar]
  27. Xu, H.; Liu, S.; Wang, G.; Liu, G.; Zeng, B. Omnet: Learning overlapping mask for partial-to-partial point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3132–3141. [Google Scholar]
  28. Deng, H.; Birdal, T.; Ilic, S. PPFNet: Global context aware local features for robust 3D point matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  29. Sinkhorn, R. A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 1964, 35, 876–879. [Google Scholar] [CrossRef]
  30. Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Xu, K. Geometric transformer for fast and robust point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11143–11152. [Google Scholar]
  31. Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
  32. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
  33. Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of the IJCAI, Vancouver, BC, Canada, 24–28 August 1981; pp. 674–679. [Google Scholar]
  34. Huang, X.; Mei, G.; Zhang, J. Feature-metric registration: A fast semi-supervised approach for robust point cloud registration without correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11366–11374. [Google Scholar]
  35. Zhou, H.; Feng, Y.; Fang, M.; Wei, M.; Qin, J.; Lu, T. Adaptive graph convolution for point cloud analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4965–4974. [Google Scholar]
  36. Chen, G.; Wang, M.; Zhang, Q.; Yuan, L.; Yue, Y. Full transformer framework for robust point cloud registration with deep information interaction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 13368–13382. [Google Scholar] [CrossRef] [PubMed]
  37. Fu, K.; Liu, S.; Luo, X.; Wang, M. Robust point cloud registration framework based on deep graph matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8893–8902. [Google Scholar]
  38. Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
  39. Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1802–1811. [Google Scholar]
  40. Xu, H.; Ye, N.; Liu, G.; Zeng, B.; Liu, S. FINet: Dual branches feature interaction for partial-to-partial point cloud registration. AAAI Conf. Artif. Intell. 2022, 36, 2848–2856. [Google Scholar] [CrossRef]
  41. Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. The overall architecture of our network. Our network mainly consists of feature extraction, PGT module, feature interaction with AIS module, key point selection module, overlapping mask prediction module and corresponding search module. The inputs are source point clouds X and target point clouds Y of dimension M 3 , and the network loops over the obtained features N times to refine the registration results after extracting the keypoints. The source point clouds X and target point clouds Y undergo feature extraction to encode features, followed by PGT module to enhance feature perception and encode relative position information to obtain feature F x , F y . Then similar feature information is captured by feature interaction with the AIS module and significant features and points that are in the overlap region are selected by scores. Finally, the transformation matrix T is obtained by the corresponding search module and the overlap mask M X , M Y are optimized by Contrastive Learning. N indicates the number of iterations.
Figure 1. The overall architecture of our network. Our network mainly consists of feature extraction, PGT module, feature interaction with AIS module, key point selection module, overlapping mask prediction module and corresponding search module. The inputs are source point clouds X and target point clouds Y of dimension M 3 , and the network loops over the obtained features N times to refine the registration results after extracting the keypoints. The source point clouds X and target point clouds Y undergo feature extraction to encode features, followed by PGT module to enhance feature perception and encode relative position information to obtain feature F x , F y . Then similar feature information is captured by feature interaction with the AIS module and significant features and points that are in the overlap region are selected by scores. Finally, the transformation matrix T is obtained by the corresponding search module and the overlap mask M X , M Y are optimized by Contrastive Learning. N indicates the number of iterations.
Electronics 13 04074 g001
Figure 2. Our feature extraction module. We use adaptive graph convolution to extract the point cloud features, fusing the obtained multilevel features and obtaining the global features, followed by feature fusion of the extracted PPF geometrical features with the global features and the multilevel graph features to obtain 512-dimensional features.
Figure 2. Our feature extraction module. We use adaptive graph convolution to extract the point cloud features, fusing the obtained multilevel features and obtaining the global features, followed by feature fusion of the extracted PPF geometrical features with the global features and the multilevel graph features to obtain 512-dimensional features.
Electronics 13 04074 g002
Figure 3. Our PGT module. We use fully connected layer FC, Sigmoid and Relu activation functions to build the position encoding module, we use concat operation to stitch features with position information and position enhancement by Geotransformer. Finally, we superimpose the original features to highlight saliency.
Figure 3. Our PGT module. We use fully connected layer FC, Sigmoid and Relu activation functions to build the position encoding module, we use concat operation to stitch features with position information and position enhancement by Geotransformer. Finally, we superimpose the original features to highlight saliency.
Electronics 13 04074 g003
Figure 4. Our keypoints selection module. Uses MLP to extract significance scores and selects corresponding features and keypoints based on the top TOP-K scores.
Figure 4. Our keypoints selection module. Uses MLP to extract significance scores and selects corresponding features and keypoints based on the top TOP-K scores.
Electronics 13 04074 g004
Figure 5. Corredpondences search module.
Figure 5. Corredpondences search module.
Electronics 13 04074 g005
Figure 6. Visualization of the registration of the same category in ModelNet40. Red represents the source point cloud, blue represents the target point cloud, and green represents the source point cloud after registration. The settings for subsequent visualization results remain consistent. (a) plant, (b) vase, (c) night-stand, (d) plant.
Figure 6. Visualization of the registration of the same category in ModelNet40. Red represents the source point cloud, blue represents the target point cloud, and green represents the source point cloud after registration. The settings for subsequent visualization results remain consistent. (a) plant, (b) vase, (c) night-stand, (d) plant.
Electronics 13 04074 g006
Figure 7. Visualization of the registration of the unseen category in ModelNet40. (a) monitor, (b) range-hood, (c) glass-box, (d) night-stand.
Figure 7. Visualization of the registration of the unseen category in ModelNet40. (a) monitor, (b) range-hood, (c) glass-box, (d) night-stand.
Electronics 13 04074 g007
Figure 8. Visualization results for Gaussian noise in ModelNet40. (a) door, (b) table, (c) mantel, (d) bookshelf.
Figure 8. Visualization results for Gaussian noise in ModelNet40. (a) door, (b) table, (c) mantel, (d) bookshelf.
Electronics 13 04074 g008
Figure 9. Visualization results for Gaussian noise with low overlap in ModelNet40. (a) night-stand, (b) laptop, (c) vase, (d) table.
Figure 9. Visualization results for Gaussian noise with low overlap in ModelNet40. (a) night-stand, (b) laptop, (c) vase, (d) table.
Electronics 13 04074 g009
Figure 10. Our method achieves the best registration accuracy in terms of registration accuracy, and although the method DCP slightly outperforms us in terms of speed, our registration accuracy far exceeds it.
Figure 10. Our method achieves the best registration accuracy in terms of registration accuracy, and although the method DCP slightly outperforms us in terms of speed, our registration accuracy far exceeds it.
Electronics 13 04074 g010
Figure 11. Registration results of our algorithm at different degrees of overlap. It can be seen that our method still maintains a good registration accuracy when the overlap degree decreases sharply.
Figure 11. Registration results of our algorithm at different degrees of overlap. It can be seen that our method still maintains a good registration accuracy when the overlap degree decreases sharply.
Electronics 13 04074 g011
Figure 12. Regarding the comparison of our method with the recent CMIGNet method for different overlap ratios, the solid line corresponds to the rotation error and the dashed line corresponds to the translation error, which is much lower than the CMIGNet error.
Figure 12. Regarding the comparison of our method with the recent CMIGNet method for different overlap ratios, the solid line corresponds to the rotation error and the dashed line corresponds to the translation error, which is much lower than the CMIGNet error.
Electronics 13 04074 g012
Figure 13. The results of our registration under different noise levels, we can see that our registration still maintains good accuracy when the noise is stacked sequentially.
Figure 13. The results of our registration under different noise levels, we can see that our registration still maintains good accuracy when the noise is stacked sequentially.
Electronics 13 04074 g013
Figure 14. Our results on the real scene registration result. The color representation is consistent with previous experiments. Subfigures (ad) represent the alignment results of the point clouds acquired at different viewing angles.
Figure 14. Our results on the real scene registration result. The color representation is consistent with previous experiments. Subfigures (ad) represent the alignment results of the point clouds acquired at different viewing angles.
Electronics 13 04074 g014
Figure 15. Further ablation of our approach regarding the combined AIS module and the feature interaction module FI, histograms mention their effects.
Figure 15. Further ablation of our approach regarding the combined AIS module and the feature interaction module FI, histograms mention their effects.
Electronics 13 04074 g015
Table 1. The registration results of the same category in ModelNet40.
Table 1. The registration results of the same category in ModelNet40.
Method R RMSE R MAE t RMSE t MAE
DCP [16]7.1584.2590.03280.0276
PointNetLK [26]16.2147.3190.04180.0312
IDAM [18]2.8760.7580.02840.0167
RPMNet [19]0.9710.5260.01080.0072
ROPNet [20]1.3230.6880.01530.0108
OMNet [27]1.2310.6720.01450.0094
FINet [40]1.1320.5980.01320.0089
CMIGNet [23]0.8750.4520.00510.0042
Ours0.4670.3090.00390.0024
Table 2. Registration results of unseen categories in ModelNet40.
Table 2. Registration results of unseen categories in ModelNet40.
Method R RMSE R MAE t RMSE t MAE
DCP [16]9.4527.0250.03580.0211
PointNetLK [26]23.2289.8420.06150.0289
IDAM [18]3.2790.5180.02130.0072
RPMNet [19]1.8230.7240.01420.0044
ROPNet [20]1.8020.6790.01410.0042
OMNet [27]2.9851.0850.01640.0062
FINet [40]2.4580.8540.01120.0055
CMIGNet [23]0.8920.4860.00490.0035
Ours0.5760.3580.00440.0028
Table 3. ModelNet40 registration results under Gaussian noise.
Table 3. ModelNet40 registration results under Gaussian noise.
Method R RMSE R MAE t RMSE t MAE
DCP [16]7.2154.2580.02680.0195
PointNetLK [26]19.2539.3520.05850.0475
IDAM [18]4.1251.8030.02550.0152
RPMNet [19]2.5331.1770.02340.0105
ROPNet [20]2.4451.1090.02210.0098
OMNet [27]2.4120.9880.01750.0092
FINet [40]1.8130.9620.01360.0087
CMIGNet [23]1.5120.6790.00630.0042
Ours1.0820.6010.00720.0038
Table 4. Registration results for Gaussian noise with low overlap in ModelNet40.
Table 4. Registration results for Gaussian noise with low overlap in ModelNet40.
Method R RMSE R MAE t RMSE t MAE
DCP [16]9.8256.8030.09860.0759
PointNetLK [26]36.22323.3750.27110.2019
IDAM [18]9.7125.8760.12230.0523
RPMNet [19]2.6971.0030.03090.0119
ROPNet [20]2.1661.0720.02000.0098
OMNet [27]4.8763.6540.06320.0403
FINet [40]4.9562.9420.05070.0301
CMIGNet [23]4.4232.3120.02870.0156
Ours1.4060.6960.00680.0036
Table 5. Testing of point clouds with different overlap rates with Gaussian noise.
Table 5. Testing of point clouds with different overlap rates with Gaussian noise.
Overlap Ratio R RMSE R MAE t RMSE t MAE
70%1.4060.6960.00680.0036
65%1.6780.7970.01190.0051
60%1.8110.8510.01180.0055
55%2.0830.9370.01540.0064
50%2.1781.0130.01680.0071
45%2.3911.1440.01750.0078
40%2.6671.1910.01970.0083
35%2.9081.2640.02060.0088
Table 6. Testing of point clouds with different levels of Gaussian noise.
Table 6. Testing of point clouds with different levels of Gaussian noise.
Gaussian Noise R RMSE R MAE t RMSE t MAE
0.011.4060.6960.00680.0036
0.021.6310.9390.00780.0049
0.032.0561.1180.00890.0052
0.042.2131.2460.00990.0063
0.052.3751.4060.01040.0071
Table 7. Registration results in 7scenes Dataset.
Table 7. Registration results in 7scenes Dataset.
Method R RMSE R MAE t RMSE t MAE
DCP [16]6.7124.1750.19760.0171
PointNetLK [26]4.0512.9020.03220.0091
IDAM [18]8.5915.8740.03280.0232
RPMNet [19]1.0840.6030.00710.0037
ROPNet [20]1.3910.7350.00740.0044
OMNet [27]1.4480.8350.00770.0046
FINet [40]1.7810.9020.00930.0052
CMIGNet [23]0.8030.4810.00310.0017
Ours0.1890.1010.00170.0011
Table 8. Ablation Studies on Different Module.
Table 8. Ablation Studies on Different Module.
PAPGTFASOM R RMSE R MAE t RMSE t MAE
----4.2431.9410.02760.0159
---3.8941.5770.01990.0076
--1.9960.8010.01070.0052
-1.6350.7290.00960.0042
1.4060.6960.00680.0036
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, Y.; Wang, S.; Shao, C.; Zhang, H.; Jia, F. Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration. Electronics 2024, 13, 4074. https://doi.org/10.3390/electronics13204074

AMA Style

Dai Y, Wang S, Shao C, Zhang H, Jia F. Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration. Electronics. 2024; 13(20):4074. https://doi.org/10.3390/electronics13204074

Chicago/Turabian Style

Dai, Yue, Shuilin Wang, Chunfeng Shao, Heng Zhang, and Fucang Jia. 2024. "Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration" Electronics 13, no. 20: 4074. https://doi.org/10.3390/electronics13204074

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop