Jingnan Shi

Arun’s Method for 3D Registration

This post is about K. S. Arun’s seminal work on estimating the relative transformation between two sets of 3D points using correspondences. This problem is also called point set registration or scan matching.

What is 3D Registration?

Cpd_fish_affine.gif

Figure 1: A sample point set registration solution. Dllu, CC BY-SA 3.0, via Wikimedia Commons.

The problem of 3D registration in the broadest sense aims to align two sets of 3D point clouds. The two sets of 3D points can be unordered and without known correspondences. In other words, we do not have information about what corresponds to what in the two input point clouds. Eq. \ref{eq:3dreg-prob} shows this precisely: \[ \mathbf{P'} = \mathbf{R} \mathbf{P} \mathbf{C} + \mathbf{t} \mathbf{1}^{T} \label{eq:3dreg-prob} \tag{1} \] where \(P'\) and \(P\) are \(3 \times N\) matrices representing the two sets of \(N\) points to be registered; \(R\) is the rotation matrix in \(\mathrm{SO(3)}\); \(\mathbf{t}\) is the translation vector; \(\mathbf{1}\) is a \(N \times 1\) vector of all ones, and \(\mathbf{C}\) is a \(N \times N\) correspondence matrix, where the the sum of each row and column equals to \(1\).

Arun’s method solves Eq. \ref{eq:3dreg-prob} in the case of known \(\mathbf{C}\). In other words, Eq. \ref{eq:3dreg-prob} can be rewritten as: \[ \mathbf{P'} = \mathbf{R} \mathbf{P} + \mathbf{t} \mathbf{1}^{T} \label{eq:3dreg-prob-arun} \tag{2} \]

In the rest of this post, I will walk through Arun’s approach, and present a Python implementation of the algorithm together with some cool visualization.

Arun’s Method

Overview

The paper titled “Least-Squares Fitting of Two 3-D Point Sets”, published by A. S. Arun, T. S. Huang and S. D. Blostein on 1987, describes a non-iterative least-squares approach to match two sets of 3D points. The method itself uses singular value decomposition, which makes the algorithm easy to implement using widely available numerical packages.

Following Eq. \ref{eq:3dreg-prob-arun}, Arun et. al. are trying to minimize the following cost function: \[ \Sigma^{2} = \sum_{i=1}^{N} || p'_{i} - (\mathbf{R} p_{i} + \mathbf{t}) ||^{2} \]

Before going further, I want to introduce a few notations just to make things easier:

  • I will use \(\{p_{i}\}\) and \(\{p'_{i}\}\) to indicate the sets of points forming \(\mathbf{P}\) and \(\mathbf{P'}\).
  • I will use \(p\) and \(p'\) to represent the centroids of \(\{p_{i}\}\) and \(\{p'_{i}\}\) respectively
  • I will use \(\hat{\cdot}\) to denote the maximum likelihood estimates: \(\hat{\mathbf{R}}\) being the estimated rotation matrix, and \(\hat{\mathbf{t}}\) being the estimated translation.

The algorithm takes an decoupled approach:

  • First, it estimates \(\hat{\mathbf{R}}\).
  • Then, it finds \(\hat{\mathbf{t}}\) by \(\mathbf{t} = p' - \hat{\mathbf{R}} p\).

We can use this approach because the centroid of \(\{\hat{\mathbf{R}} p_{i} + \hat{\mathbf{t}}\}\) overlaps with the centroid of \(\{p'_{i}\}\) 1. Thus, if we calculate \(q_{i} = p_{i} - p\) and \(q'_{i} = p'_{i} - p'\), Eq. \ref{eq:3dreg-prob-arun} can be reduced to: \[ \Sigma^{2} = \sum_{i=1}^{N} || q'_{i} - \mathbf{R} q_{i} ||^{2} \label{eq:3dreg-R-only} \tag{3} \]

The main contribution of Arun’s method lies in the algorithm for finding \(\hat{\mathbf{R}}\), detailed below:

  1. Calculate the following quantities: \[ q_{i} = p_{i} - p \\ q'_{i} = p'_{i} - p' \] for all \(0 \leq i \leq N\), which are distances from each point to its centroid.
  2. Calculate \(\mathbf{H}\) (\(3 \times 3\)): \[ \mathbf{H} = \sum_{i=1}^{N} q_{i} q_{i}'^{T} \]
  3. Find SVD of \(\mathbf{H}\): \[ \mathbf{H} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^{T} \]
  4. Calculate \(\mathbf{X} = \mathbf{V} \mathbf{U}^{T}\)
  5. Check the determinant of \(\mathbf{X}\). If it equals to \(+1\), then \(\hat{\mathbf{R}} = \mathbf{X}\). If the determinant equals to \(-1\), the algorithm has failed.

Proof

To prove this actually works, first notice that Eq. \ref{eq:3dreg-R-only} can be expanded:

\begin{aligned} \Sigma^{2} &= \sum_{i=1}^{N} || q'_{i} - \mathbf{R} q_{i} ||^{2} \\ &= \sum_{i=1}^{N} (q'_{i} - \mathbf{R} q_{i})^{T} (q'_{i} - \mathbf{R} q_{i}) \\ &= \sum_{i=1}^{N} ({q'_{i}}^{T} q'_{i} + {q_{i}}^{T}\mathbf{R}^{T} \mathbf{R} q_{i} - {q'_{i}}^{T} \mathbf{R} q_{i}- q_{i}^{T} \mathbf{R}^{T} q'_{i} ) \\ &= \sum_{i=1}^{N} ({q'_{i}}^{T} q'_{i} + {q_{i}}^{T} q_{i} - 2{q'_{i}}^{T} \mathbf{R} q_{i}) \\ \end{aligned}

Since \(\mathbf{R}\) is the variable we are trying to optimize, the above is equivalent to maximizing the following value: \[ F = \sum_{i=1}^{N} ( {q'_{i}}^{T} \mathbf{R} q_{i}) \label{eq:trace-f} \tag{4} \] Note that \(F\) is a scalar, so it follows that \[ F = tr( \sum_{i=1}^{N} ( {q'_{i}}^{T} \mathbf{R} q_{i}) ) \] Using the cyclic property and linearity of trace, we have \[ F = \text{tr}( \mathbf{R} \mathbf{H}) \] where \[ \mathbf{H} = \sum_{i=1}^{N} ( q_{i} {q'_{i}}^{T}) \]

Let the SVD of \(\mathbf{H}\) be \(\mathbf{H} = \mathbf{U} \mathbf{\Lambda} \mathbf{V}^{T}\), where \(\mathbf{U}\) and \(\mathbf{V}\) are orthonormal matrices, and \(\mathbf{\Lambda}\) is a \(3 \times 3\) diagonal matrix with nonnegative elements. Let \(\mathbf{X} = \mathbf{V} \mathbf{U}^{T}\), and it follows that \[ \mathbf{X} \mathbf{H} = \mathbf{V} \mathbf{U}^{T} \mathbf{U} \mathbf{\Lambda} \mathbf{V}^{T} = \mathbf{V} \mathbf{\Lambda} \mathbf{V}^{T} \] which is symmetrical and positive definite (observe that you can divide \(\mathbf{\Lambda}\) into two matrices and have \(\mathbf{X} \mathbf{H} = \mathbf{K}^{T} \mathbf{K}\) where \(\mathbf{K} = \sqrt{\Lambda} \mathbf{V}^{T}\)).

Note that for any positive definite matrix \(\mathbf{A} \mathbf{A}^{T}\) and any orthonormal matrix \(\mathbf{B}\), we have \(\text{tr}(\mathbf{A} \mathbf{A}^{T}) \geq \text{tr}( \mathbf{B} \mathbf{A} \mathbf{A}^{T})\). For the proof, please refer to Arun’s original paper.

Thus, we have (for any orthonormal matrix \(\mathbf{B}\)), \(\text{tr}(\mathbf{X} \mathbf{H}) \geq \text{tr}(\mathbf{B} \mathbf{X} \mathbf{H})\). Hence, \(X\) maximizes \(F\) (Eq. \ref{eq:trace-f}) among all \(3 \times 3\) orthonormal matrices. And if the determinant of \(X\) equals to 1, it is the rotation matrix we are looking for.

Cases where the determinant equals to -1

If we assume no noise, geometrically, if the input points are colinear, then there exists infinitely many possible rotations that can minimize the objective function. If the input points are not coplanar, then there exists a unique rotation and no reflection that minimizes the objective function. The interesting case is where the points are coplanar but not colinear. In this case, there exists a rotation as well as a reflection that minimizes the objective function. The rotation matrix will have a determinant equals to +1, and the reflection will have a determinant equals to -1.

If the algorithm returns a matrix with determinant equals to -1, we can still recover a proper rotation matrix. First, observe that when the points are coplanar, the rank of the matrix \(\mathbf{H}\) will be 2 (not full-rank), because \(\mathbf{H}\) is the sum of many rank-1 matrices which are outer products of vectors on the same plane. Thus, one of the singular values of \(\mathbf{H}\) will be zero. Let the three singular values be \(\lambda_{1} > \lambda_{2} > \lambda_{3} = 0\). Then, \[ \mathbf{H} = \lambda_{1} u_{1} v_{1}^{t}+\lambda_{2} u_{2} v_{2}^{t}+0 \cdot u_{3} v_{3}^{t} \] where \(u_{i}\) and \(v_{i}\) are the columns of matrix \(\mathbf{U}\) and \(\mathbf{V}\) respectively. It is then clear that changing the sign of \(v_{3}\) does not affect \(\mathbf{H}\). Thus, if \(\mathbf{X} = \mathbf{V} \mathbf{U}^{T}\) has determinant equals to \(-1\), then \(\mathbf{X}' = \mathbf{V}' \mathbf{U}^{T}\) where \(\mathbf{V}' = [v_{1}, v_{2}, -v_{3}]\) will have determinant equals to \(+1\) and be a proper rotation matrix.

Implementation

The algorithm is really easy to implement:

def arun(A, B):
    """Solve 3D registration using Arun's method: B = RA + t
    """
    N = A.shape[1]
    assert B.shape[1] == N

    # calculate centroids
    A_centroid = np.reshape(1/N * (np.sum(A, axis=1)), (3,1))
    B_centroid = np.reshape(1/N * (np.sum(B, axis=1)), (3,1))

    # calculate the vectors from centroids
    A_prime = A - A_centroid
    B_prime = B - B_centroid

    # rotation estimation
    H = np.zeros([3, 3])
    for i in range(N):
        ai = A_prime[:, i]
        bi = B_prime[:, i]
        H = H + np.outer(ai, bi)
    U, S, V_transpose = np.linalg.svd(H)
    V = np.transpose(V_transpose)
    U_transpose = np.transpose(U)
    R = V @ np.diag([1, 1, np.linalg.det(V) * np.linalg.det(U_transpose)]) @ U_transpose

    # translation estimation
    t = B_centroid - R @ A_centroid

    return R, t

The only part that is slightly different from the paper is that instead of checking the determinants to see whether we need to take the negative of \(v_{3}\), we simply just use a diagonal matrix with the bottom right entry as np.linalg.det(V) * np.linalg.det(U_transpose) to avoid an extra if statement.

I have also made a small repo using the famous Stanford bunny model 2 as input data. You can run the code here yourself:

You can see that the estimated rotation and translation are exactly the same as expected. Note that in the code, I have not added any noise. In real applications, if the level of noise are severe, we need to robustify this method to make it performant. This will be the topic of a future post. Stay tuned!

Footnotes:

1

Huang, T. S., S. D. Blostein, and E. A. Margerum. “Least-squares estimation of motion parameters from 3-D point correspondences.” Proc. IEEE Conf. Computer Vision and Pattern Recognition. Vol. 10. 1986.