{"title": "On Learning Rotations", "book": "Advances in Neural Information Processing Systems", "page_first": 55, "page_last": 63, "abstract": "An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the Von Neumann divergence. The additive updates are skew-symmetric matrices with trace zero which comprise the Lie algebra of the rotation group. The orthogonality and unit determinant of the matrix  parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to interesting interpretations in terms of the computational topology of the compact Lie groups. The stability and the computational complexity of the algorithm are discussed.", "full_text": "On Learning Rotations\n\nRaman Arora\n\nUniversity of Wisconsin-Madison\n\nDepartment of Electrical and Computer Engineering\n\n1415 Engineering Drive, Madison, WI 53706\n\nrmnarora@u.washington.edu\n\nAbstract\n\nAn algorithm is presented for online learning of rotations. The proposed algorithm\ninvolves matrix exponentiated gradient updates and is motivated by the von Neu-\nmann divergence. The multiplicative updates are exponentiated skew-symmetric\nmatrices which comprise the Lie algebra of the rotation group. The orthonormal-\nity and unit determinant of the matrix parameter are preserved using matrix log-\narithms and exponentials and the algorithm lends itself to intuitive interpretation\nin terms of the differential geometry of the manifold associated with the rotation\ngroup. A complexity reduction result is presented that exploits the eigenstructure\nof the matrix updates to simplify matrix exponentiation to a quadratic form.\n\n1 Introduction\n\nThe problem of learning rotations \ufb01nds application in many areas of signal processing and machine\nlearning. It is an important problem since many problems can be reduced to that of learning rota-\ntions; for instance Euclidean motion in Rn\u22121 is simply rotation in Rn. A conformal embedding was\npresented in [1] that extends rotations to a representation for all Euclidean transformations. Further-\nmore, the rotation group provides a universal representation for all Lie groups. This was established\nin [2] by showing that any Lie algebra can be expressed as a bivector algebra. Since the Lie algebra\ndescribes the structure of the associated Lie group completely, any Lie group can be represented as\nrotation group.\nThe batch version of the problem was originally posed as the problem of estimating the attitude of\nsatellites by Wahba in 1965 [3]. In psychometrics, it was presented as the orthogonal Procrustes\nproblem [4]. It has been studied in various forms over the last few decades and \ufb01nds application in\nmany areas of computer vision [5, 6, 7], face recognition [8], robotics [9, 10], crystallography[11]\nand physics [12].\nWhile the batch version of the problem is well understood, the online learning of rotations from\nvector instances is challenging since the manifold associated with the rotation group is a curved\nspace and it is not possible to form updates that are linear combinations of rotations [13]. The set\nof rotations about the origin in n-dimensional Euclidean space forms a compact Lie group, SO(n),\nunder the operation of composition. The manifold associated with the n-dimensional rotation group\nis the unit sphere Sn\u22121 in n dimensional Euclidean space.\n\n1.1 Related Work\n\nThe online version of learning rotations was posed as an open problem by Smith and Warmuth\n[13]. Online learning algorithms were recently presented for some matrix groups. In [14], an online\nalgorithm was proposed for learning density matrix parameters and was extended in [15] to the\nproblem of learning subspaces of low rank. However, the extension of these algorithms to learning\nrotations will require repeated projection and approximation [13]. Adaptive algorithms were also\n\n1\n\n\fstudied in [16] for optimization under unitary matrix constraint. The proposed methods are steepest\ndescent methods on Riemannian manifolds.\n\n1.2 Our Approach\n\nThis paper presents an online algorithm for learning rotations that utilizes the Bregman matrix di-\nvergence with respect to the quantum relative entropy (also known as von Neumann divergence) as a\ndistance measure between two rotation matrices. The resulting algorithm has matrix-exponentiated\ngradient (MEG) updates [14]. The key ingredients of our approach are (a) von Neumann Divergence\nbetween rotation matrices [17], (b) squared error loss function and (c) matrix exponentiated gradient\n(MEG) updates.\nAny Lie group is also a smooth manifold and the updates in the proposed algorithm have an intuitive\ninterpretation in terms of the differential topology of the associated manifold. We also utilize various\nelementary Lie algebra concepts to provide intuitive interpretation of the updates. The development\nin the paper closely follows that of the matrix exponentiated gradient (MEG) updates in [14] for\ndensity matrix parameters. The form of the updates are similar to steepest descent methods of [16],\nbut are derived for learning rotations from vector instances using an information-theoretic approach.\nThe MEG updates are reduced to a quadratic form in the Lie algebra element corresponding to the\ngradient of loss function on the rotation group.\nThe paper is organized as follows. The problem is formulated in Section 2. Section 3 presents\nmathematical preliminaries in differential geometry and Bregman matrix divergence. The matrix\nexponentiated gradient updates are developed in Section 4. The MEG updates are simpli\ufb01ed in\nSection 5. Experimental results are discussed in Section 6.\n\n2 Problem Statement\nLet xt be a stream of instances of n-dimensional unit vectors. Let R\u2217 be an unknown n\u00d7 n rotation\nmatrix that acts on xt to give the rotated vector yt = R\u2217xt. The matrix \u02c6Rt denotes the estimate\nof R\u2217 at instance t and \u02c6yt = \u02c6Rt xt represents the prediction for the rotated vector yt. The loss\nincurred due to error in prediction is Lt( \u02c6Rt) = d(\u02c6yt, yt), where d(\u00b7,\u00b7) is a distance function. The\nestimate of the rotation needs to be updated based on the loss incurred at every instance and the\nobjective is to develop an algorithm for learning R\u2217 that has a bounded regret.\nWe seek adaptive updates that solve the following optimization problem at each step,\n\n\u02c6Rt+1 = arg min\n\nR\n\n\u2206F (R, \u02c6Rt) + \u03b7 Lt(R),\n\n(1)\n\nwhere \u02c6Rt is the estimated rotation matrix at instance t, \u03b7 is the learning rate or the step-size and \u2206F\nis a matrix divergence that measures the discrepancy between matrices. This is a typical problem\nformulation in online learning where the objective comprises a loss function and a divergence term.\nThe parameter \u03b7 balances the trade-off between the two con\ufb02icting goals at each update: incurring\nsmall loss on the new data versus con\ufb01dence in the estimate from the previously observed data.\nMinimizing the weighted objective therefore results in smooth updates as well as minimizes the loss\nfunction.\nIn this paper, the updates are smoothed using the von Neumann divergence which is de\ufb01ned for\nmatrices as\n\n\u2206F (R, \u02c6Rt) = tr(R log R \u2212 R log \u02c6Rt \u2212 R + \u02c6Rt),\n\n(2)\nwhere tr(A) is the trace of the matrix A. The search is over all R \u2208 SO(n), i.e. over all n \u00d7 n\nmatrices such that RT R = I, RRT = I and det(R) = 1.\n\n3 Mathematical Preliminaries\n\nThis section reviews some basic de\ufb01nitions and concepts in linear algebra and differential geometry\nthat are utilized for the development of the updates in the next section.\n\n2\n\n\f3.1 Matrix Calculus\nGiven a real-valued matrix function F : Rn\u00d7n \u2192 R, the gradient of the function with respect to the\nmatrix R \u2208 Rn\u00d7n is de\ufb01ned to be the matrix [18],\n\n\uf8eb\uf8ec\uf8ed \u2202F\n\n...\n\n\u2202R11\n\n\u2202F\n\n\u2202Rn1\n\n\uf8f6\uf8f7\uf8f8 .\n\n\u00b7\u00b7\u00b7\n...\n\u00b7\u00b7\u00b7\n\n\u2202F\n\n\u2202R1n\n\n...\n\n\u2202F\n\n\u2202Rnn\n\n\u2207RF (R) =\n\n(3)\n\n(6)\n\n(7)\n\nSome of the matrix derivatives that are used later in the paper are following: for a constant matrix\n\u0393 \u2208 Rn\u00d7n,\n\n1. \u2207R tr(\u0393RRT ) = (\u0393 + \u0393T )R,\n2. \u2207R det(R) = det(R)(R\u22121)T ,\n3. \u2207R(y \u2212 Rx)T (y \u2212 Rx) = \u22122(y \u2212 Rx)xT .\n\nA related concept in differential geometry is that of the space of vectors tangent to a group at the\nidentity element of the group. This is de\ufb01ned to be the Lie algebra associated with the group. It is\na convenient way of describing the in\ufb01nitesimal structure of a topological group about the identity\nelement and completely determines the associated group. The utility of the Lie algebra is due to the\nfact that it is a vector space and thus it is much easier to work with it than with the linear group.\nA real n \u00d7 n matrix A is in the Lie algebra of the rotation group SO(n) if and only if it is a skew-\nsymmetric matrix (i.e. AT = \u2212A). Furthermore, for any matrix A in the Lie algebra of SO(n),\nexp(\u03b7A) is a one-parameter subgroup of the rotation group, parametrized by \u03b7 \u2208 R [19].\nThe matrix exponential and logarithm play an important role in relating a matrix Lie group G and\nthe associated Lie algebra g. The exponential of a matrix R \u2208 Rn\u00d7n is given by the following\nseries,\n\nexp(R) = I + R +\n\n(4)\nGiven an element A \u2208 g, the matrix exponential exp(A) is the corresponding element in the group.\nThe matrix logarithm log (R) is de\ufb01ned to be the inverse of the matrix exponential: it maps from the\nLie group G into the Lie algebra g. The matrix logarithm is a well-de\ufb01ned map since the exponential\nmap is a local diffeomorphism between a neighborhood of the zero matrix and a neighborhood of\nthe identity matrix [19, 20].\n\nR2 +\n\n1\n2!\n\nR2 + \u00b7\u00b7\u00b7\n\n1\n3!\n\n3.2 Riemannian Gradient\nConsider a real-valued differentiable function, Lt : SO(n) \u2192 R, de\ufb01ned on the rotation group. The\nRiemannian gradient \u02dc\u2207RLt of the function Lt on the Lie group SO(n) evaluated at the rotation\nmatrix R and translated to the identity (to get a Lie algebra element) is given as [16]\n\n(5)\nwhere \u2207RLt is the matrix derivative of the cost function in the Euclidean space de\ufb01ned in (3) at\nmatrix R.\n\nRLt,\n\n\u02dc\u2207RLt = \u2207RLt RT \u2212 R \u2207T\n\n3.3 Von Neumann Divergence\n\nIn any online learning problem, the choice of divergence between the parameters dictates the result-\ning updates. This paper utilizes the von Neumann divergence which is a special case of the Bregman\ndivergence and measures discrepancy between two matrices.\nLet F be convex differentiable function de\ufb01ned on a subset of Rn\u00d7n with the gradient f(R) =\n\u2207RF (R). The Bregman divergence between two matrices R1 and R2 is de\ufb01ned as\n\n\u2206F (R1, R2) := F (R1) \u2212 F (R2) \u2212 tr((R1 \u2212 R2)f(R2)T ).\n\nThe gradient of Bregman divergence with respect to R1 is given as,\n\u2207R1\u2206F (R1, R2) = f(R1) \u2212 f(R2).\n\n3\n\n\fChoosing the function F in the de\ufb01nition of Bregman divergence to be the von Neumann entropy,\ngiven as F (R) = tr(R log R \u2212 R)), obtain the von Neumann divergence [14, 17]:\n\n\u2206F (R1, R2) = Tr(R1 log R1 \u2212 R1 log R2 \u2212 R1 + R2).\n\n(8)\nFinally, the gradient of the von Neumann entropy was shown to be f(R) = \u2207RF (R) = log R in\n[14]. Consequently, the gradient of the von Neumann divergence can be expressed as\n\n\u2207R1\u2206F (R1, R2) = log (R1) \u2212 log (R2).\n\n4 Online Algorithm\n\nThe problem of online learning of rotations can be expressed as the optimization problem\n\n\u02c6Rt+1 = arg min\n\nR\n\n\u2206F (R, \u02c6Rt) + \u03b7Lt(R)\n\ns.t.\n\nRT R = I, RRT = I\n\ndet(R) = 1\n\n(9)\n\n(10)\n\nwhere \u02c6Rt is the estimate of the rotation matrix at time instance t and Lt is the loss incurred in the\nprediction of yt. The proposed adaptive updates are matrix exponentiated gradient (MEG) updates\ngiven as\n\n(cid:32)\n\n\u02c6Rt+1 = exp\n\nlog \u02c6Rt \u2212 \u03b7 skew\n\nt \u2207RLt( \u02c6Rt)\n\n(11)\n\n(cid:16) \u02c6RT\n\n(cid:17)(cid:33)\n\n,\n\nwhere \u2207RLt( \u02c6Rt) is the gradient of the cost function in the Euclidean space with respect to the\nrotation matrix R and skew (\u00b7) is the skew-symmetrization operator on the matrices, skew (A) =\nA \u2212 AT . The updates seem intuitive, given the following elementary facts about the Lie algebraic\nstructure of the rotation group: (a) the gradient of loss function gives geodesic direction and velocity\nvector on the unit sphere, (b) a skew-symmetric matrix is an element of Lie algebra [19, 20], (c) the\nmatrix logarithm maps a rotation matrix to the corresponding Lie algebra element, (d) composition\nof two elements of Lie algebra yields another Lie algebra element and (e) the matrix exponential\nmaps a Lie algebra element to corresponding rotation matrix.\nThe loss function is de\ufb01ned to be the squared error loss function and therefore the gradient of the\nloss function is given by the matrix \u2207RLt( \u02c6Rt) = 2(\u02c6yt \u2212 yt)xT\nt . This results in the online updates\n\n\u02c6Rt+1 = exp\n\nlog \u02c6Rt \u2212 2\u03b7 skew\n\nt (\u02c6yt \u2212 yt)xT\n\nt\n\n= \u02c6Rt exp\n\n\u2212 2\u03b7 skew\n\nt (\u02c6yt \u2212 yt)xT\n\nt\n\n(cid:32)\n\n(cid:32)\n\n(cid:16) \u02c6RT\n(cid:16) \u02c6RT\n\n(cid:17)(cid:33)\n(cid:17)(cid:33)\n\n.\n\n,\n\n(12)\n\n4.1 Updates Motivated by von-Neumann Divergence\n\nThe optimization problem in (10) is solved using the method of Lagrange multipliers. First observe\nthat the constraints RT R = I and RRT = I are redundant since one implies the other. Introducing\nthe Lagrangian multiplier matrix \u0393 for the orthonormality constraint and Lagrangian multiplier \u03bb\nfor the unity determinant constraint, the objective function can be written as\n\nJ (R, \u0393, \u03bb) = \u2206F (R, \u02c6Rt) + \u03b7Lt(R) + tr(\u0393(RRT \u2212 I)) + \u03bb(det(R) \u2212 1).\n\n(13)\n\nTaking the gradient on both sides of equation with respect to the matrix R, get\n\n\u2207R J (R, \u0393, \u03bb) = \u2207R \u2206F (R, \u02c6Rt) + \u03b7 \u02dc\u2207R Lt(R)\n\n+(\u0393 + \u0393T )R + \u03bb det(R)(R\u22121)T ,\n\n(14)\n\n4\n\n\fusing the matrix derivatives from Section 3.1 and the Riemannian gradient for the loss function from\neqn. (5). Putting \u2207R J (R, \u0393, \u03bb) = 0 and using the fact that \u2207R\u2206F (R, \u02c6Rt) = f(R) \u2212 f( \u02c6Rt), get\n+ (\u0393 + \u0393T )R + \u03bb det(R)(R\u22121)T . (15)\n\n0 = f(R) \u2212 f( \u02c6Rt) + \u03b7 skew\n\n(cid:17)\nt \u2207RLt(R)\n(cid:17) \u2212 (\u0393 + \u0393T )R \u2212 \u03bb det(R)(R\u22121)T(cid:17)\n\n(cid:16) \u02c6RT\n(cid:16) \u02c6RT\n\nR = f\u22121(cid:16)\n\nGiven that f is a bijective map, write\nf( \u02c6Rt) \u2212 \u03b7 skew\n\nforces the rotation constraint. Choosing \u03bb = det(R)\u22121 and \u0393 = \u2212(1/2)(cid:0)R\u22121(cid:1)T R\u22121 yields the\n\nSince the objective is convex, it is suf\ufb01cient to produce a choice of Lagrange multipliers that en-\n\nt \u2207RLt(R)\n\n. (16)\n\nfollowing implicit update\n\n\u02c6Rt+1 = exp\n\nlog \u02c6Rt \u2212 \u03b7 skew\n\nt \u2207RLt( \u02c6Rt+1)\n\n(17)\n\nAs noted by Tsuda et. al. in [14], the implicit updates of the form above are usually not solvable in\nclosed form. However, by approximating \u2207RLt( \u02c6Rt+1) with \u2207RLt( \u02c6Rt) (as in [21, 14]), we obtain\nan explicit update\n\n\u02c6Rt+1 = exp\n\nlog \u02c6Rt \u2212 \u03b7 skew\n\nt \u2207RLt( \u02c6Rt)\n\n.\n\n(18)\n\n(cid:16) \u02c6RT\n\n(cid:16) \u02c6RT\n\n(cid:32)\n\n(cid:32)\n\n(cid:17)(cid:33)\n\n.\n\n(cid:17)(cid:33)\n\nThe next result ensures the closure property for the matrix exponentiated gradient updates in the\nequation above. In other words, the estimates for the rotation matrix do not steer away from the\nmanifold associated with the rotation group. Therefore, if \u02c6R0 \u2208 SO(n) then \u02c6Rt+1 \u2208 SO(n).\nLemma 1. If \u02c6Rt \u2208 SO(n) then \u02c6Rt+1 given by the updates in (18) is a rotation matrix in SO(n).\n\nProof. Using the properties of matrix logarithm and matrix exponential, express (18) as\n\n\u02c6Rt+1 = \u02c6Rt exp(\u2212\u03b7S),\n\n(19)\nRLt(R) \u02c6Rt is an n \u00d7 n dimensional skew-symmetric matrix with\n\nwhere S = \u02c6RT\ntrace zero. Then\n\nt \u2207RLt(R) \u2212 \u2207T\n\n\u02c6RT\n\nt+1\n\n\u02c6Rt+1 =\n\n(cid:16) \u02c6Rt e\u2212\u03b7S(cid:17)T(cid:16) \u02c6Rt e\u2212\u03b7S(cid:17)\n= (cid:0)e\u2212\u03b7S(cid:1)T \u02c6RT\n(cid:0)e\u2212\u03b7S(cid:1) ,\n= (cid:0)e\u2212\u03b7S(cid:1)T (cid:0)e\u2212\u03b7S(cid:1) ,\n\n\u02c6Rt\n\n,\n\nt\n\nwhere we used the facts that \u02c6Rt \u2208 SO(n),(cid:0)eS(cid:1)T = eST , ST = \u2212S and that e0 = I. Similarly,\n\n= e\u03b7(\u2212ST \u2212S) = e\u03b7(S\u2212S) = I,\n\n\u02c6Rt+1 \u02c6RT\n\nt+1 = I. Finally, note that\n\ndet( \u02c6Rt+1) = det( \u02c6Rt e\u2212\u03b7S) = det( \u02c6Rt) \u00b7 det(e\u2212\u03b7S) = e\u2212\u03b7 Tr (S),\n\nsince determinant of exponential of a matrix is equal to the exponential of the trace of the matrix.\nAnd since S is a trace zero matrix, det( \u02c6Rt+1) = 1.\n\n4.2 Differential Geometrical Interpretation\n\nThe resulting updates in (18) have nice interpretation in terms of the differential geometry of the\nrotation group. The gradient of the cost function, \u2207RLt( \u02c6Rt), in the Euclidean space gives a tangent\ndirection at the current estimate of the rotation matrix. The Riemannian gradient is computed as\n\u2207RLt( \u02c6Rt) \u2212 \u02c6Rt \u2207T\nRLt( \u02c6Rt) \u02c6Rt. The Riemannian gradient at the identity element of the group is\nobtained by de-rotation by \u02c6Rt, giving \u02dc\u2207RLt( \u02c6Rt), as in (5). The gradient corresponds to an element\nof the Lie algebra, so(n), of the rotation group. The exponential map gives the corresponding rota-\ntion matrix which is the multiplicative update to the estimate of the rotation matrix at the previous\ninstance.\n\n5\n\n\f5 Complexity Reduction of MEG Updates\n\nThe matrix exponentiated gradient updates ensure that the estimates for the rotation matrix stay on\nthe manifold associated with the rotation group at each iteration. However, with the matrix ex-\nponentiation at each step, the updates are computationally intensive and in fact the computational\ncomplexity of the updates is comparable to other approaches that would require repeated approxima-\ntion and projection on to the manifold. This section discusses a fundamental complexity reduction\nresult to establish a simpler update by exploiting the eigen-structure of the update matrix. First ob-\nserve that the matrix in the exponential in eqn. (12) (for the case of squared error loss function) can\nbe written as\n\nt\n\n,\n\n(cid:16) \u02c6RT\n(cid:16) \u02c6RT\n(cid:16)\n\n(cid:17)\nt (\u02c6yt \u2212 yt)xT\n(cid:17)\nt ( \u02c6Rtxt \u2212 R\u2217xt)xT\n(cid:17)\nt \u2212 \u02c6RT\nt R\u2217xtxT\nxtxT\nt\nt \u2212 xtxT\nt RT\u2217 \u02c6Rt\n\nt\n\n,\n\n,\n\n(cid:17)\n\n,\n\nS = \u22122\u03b7 skew\n= \u22122\u03b7 skew\n= \u22122\u03b7 skew\n\n(cid:16) \u02c6RT\n\nt R\u2217xtxT\n\n= 2\u03b7\n= AT X \u2212 XA,\n\n(cid:32)\n\n(cid:33)\n\n(20)\nt and A \u2261 2\u03b7RT\u2217 \u02c6Rt. Each term in the matrix S is a rank-one matrix (due to pre\nwhere X \u2261 xtxT\nt , respectively). Thus S is at most rank-two. Since S is skew-\nand post-multiplication with xtxT\nsymmetric, it has (at most) two eigenvalues in a complex conjugate pair \u00b1j\u03bb (and n \u2212 2 zero\neigenvalues) [22], which allows the following simpli\ufb01cation.\nLemma 2. The matrix exponentiated gradient updates in eqn. (12) are equivalent to the following\nupdates,\n\nsin(\u03bb)\n\n1 \u2212 cos(\u03bb)\n\n\u02c6Rt+1 = \u02c6Rt\n\n,\n\n\u03bb\n\n\u03bb2\n\nI +\n\nS +\n\nS2\n\n(cid:113)\n1 \u2212(cid:0)yT\n\nt \u02c6yt\n\n(cid:1)2 and S is the skew-symmetric matrix given in eqn. (20) with eigenval-\n\nwhere \u03bb = 2\u03b7\nues \u00b1j\u03bb.\nNote that yt, \u02c6yt are unit vectors in Rn and therefore \u03bb is real-valued. The proof of the complex-\nity reduction follows easily from a generalization of the Rodrigues\u2019 formula for computing matrix\nexponentials for skew-symmetric matrix. The proof is not presented here due to space constraints\nbut the interested reader is referred to [23, 24]. Owing to the result above the matrix exponential\nreduces to a simple quadratic form involving an element from the Lie algebra of the rotation group.\nThe pseudocode is given in Algorithm 1.\n\n(21)\n\nChoose \u03b7\nInitialize R1 = I\nfor t = 1, 2, . . . do\n\nObtain an instance of unit vector xt \u2208 Rn;\nPredict the rotated vector \u02c6yt = \u02c6Rt xt;\nReceive the true rotated vector yt = R\u2217 xt;\nIncur the loss Lt( \u02c6Rt) = |yt \u2212 \u02c6yt|2;\nCompute the matrix S = 2\u03b7\n\nCompute the eigenvalues \u03bb = 2\u03b7\nUpdate the rotation matrix \u02c6Rt+1 = \u02c6Rt\n\n(cid:17)\n\n;\n\nt ytxT\n\n(cid:16) \u02c6RT\n(cid:113)\n1 \u2212(cid:0)yT\nt \u2212 xtyT\n(cid:18)\nt \u02c6yt\nI + sin(\u03bb)\n\n(cid:1)2;\n\n\u02c6Rt\n\nt\n\n(cid:19)\n\nS2\n\n\u03bb S + 1\u2212cos(\u03bb)\n\n\u03bb2\n\nend\n\nAlgorithm 1: Pseudocode for Learning rotations using Matrix Exponentiated Gradient updates\n\n6\n\n\f6 Experimental Results\n\nThis section presents experimental results with the proposed algorithm for online learning of rota-\ntions. The performance of the algorithm is evaluated in terms of the Frobenius norm of the difference\nof the true rotation matrix and the estimate. Figure 1 shows the error plot with respect to time. The\nunknown rotation is a 12 \u00d7 12 dimensional matrix and changes randomly every 200 instances. The\ntrajectories are averaged over 1000 random simulations. It is clear from the plot that the estimation\nerror decays rapidly to zero and estimates of the rotation matrices are exact.\n\nFigure 1: Online learning of rotations: Estimate of unknown rotation is updated every time new\ninstance of rotation is observed. The true rotation matrix is randomly changing at regular interval\n(N=200). The error in Frobenius norm is plotted against the instance index.\n\nThe online algorithm is also found robust to small amount of additive white Gaussian noise in\nobservations of the true rotated vectors, i.e. the observations are now given as yt = R\u2217xt + \u03b1 wt,\nwhere \u03b1 determines the signal to noise ratio. The performance of the algorithm is studied with\nvarious noisy conditions. Figure 2 shows error plots with respect to time for various noisy conditions\nin R20. The Frobenius norm error decays quickly to a noise \ufb02oor determined by the SNR as well as\nthe step size \u03b7. In the simulations in Fig. 2 the step size was decreased gradually over time. It is not\nclear immediately how to pick the optimal step size but a classic step size adaptation rule or Armijo\nrule may be followed [25, 16].\nThe tracking performance of the online algorithm is compared with the batch version. In Figure\n3, the unknown rotation R\u2217 \u2208 SO(30) changes slightly after every 30 instances. The smoothly\nchanging rotation is induced by composing R\u2217 matrix with a matrix R\u03b4 every thirty iterations. The\nmatrix R\u03b4 is composed of 3 \u00d7 3 block-diagonal matrices, each corresponding to rotation about the\nX-axis in 3D space by \u03c0/360 radians. The batch version stores the last 30 instances in an 30 \u00d7 30\nmatrix X and corresponding rotated vectors in matrix Y. The estimate of the unknown rotation is\ngiven as YX\u22121. The batch version achieves zero error only at time instances when all the data in\nX, Y correspond to the same rotation whereas the online version consistently achieves a low error\nand tracks the changing rotation.\nIt is clear from the simulations that the Frobenius norm decreases at each iteration. It is easy to\nshow this global stability of the updates proposed here in noise-free scenario [24]. The proposed\nalgorithm was also applied to learning and tracking the rotations of 3D objects. Videos showing\nexperimental results with the 3D Stanford bunny [26] are posted online at [27].\n\n7 Conclusion\n\nIn this paper, we have presented an online algorithm for learning rotations. The algorithm was\nmotivated using the von Neumann divergence and squared error loss function and the updates were\n\n7\n\n020040060080010001200012345Estimation error versus time \u2212 SO(12)Time (instance index)Estimation Error  Frobenius normSpectral norm\fFigure 2: Average error plotted against instance index for various noise levels.\n\nFigure 3: Comparing the performance of tracking rotations for the batch version versus the online\nalgorithm. The rotation matrix changes smoothly every M = 30 instances.\n\ndeveloped in the Lie algebra of the rotation group. The resulting matrix exponentiated gradient\nupdates were reduced to a simple quadratic form. The estimation performance of the proposed\nalgorithm was studied under various scenarios. Some of the future directions include identifying\nalternative loss functions that exploit the spherical geometry as well as identifying regret bounds for\nthe proposed updates.\nAcknowledgements: The author would like to thank W. A. Sethares, M. R. Gupta and A. B. Frigyik\nfor helpful discussions and feedback on early drafts of the paper.\n\nReferences\n\n[1] Rich Wareham, Jonathan Cameron, and Joan Lasenby, \u201cApplications of conformal geometric\n\nalgebra in computer vision and graphics,\u201d in IWMM/GIAE, 2004, pp. 329\u2013349.\n\n[2] C. Doran, D. Hestenes, F. Sommen, and N. Van Acker, \u201cLie groups as spin groups,\u201d Journal\n\nof Mathematical Physics, vol. 34, no. 8, pp. 36423669, August 1993.\n\n[3] Grace Wahba, \u201cProblem 65-1, a least squares estimate of satellite attitude,\u201d SIAM Review, vol.\n\n7, no. 3, July 1965.\n\n8\n\n0100200300400500600012345Time (instance index)Estimation Error (Frobenius norm)Avg. Estimation error (Frobenius norm) vs time \u2212 SO(20)  05\u00d7 10\u221241x10\u221231.5x10\u221232x10\u22123100200300400500600012345Tracking rotations in SO(30)Time (instance index)Error (Frobenius norm)  Batch versionOnline algorithm\f[4] P. Schonemann, \u201cA generalized solution of the orthogonal Procrustes problem,\u201d Psychome-\n\ntrika, vol. 31, no. 1, pp. 3642\u20133669, March 1966.\n\n[5] P. Besl and N. McKay, \u201cA method for registration of 3D shapes,\u201d . IEEE Trans. on Pattern\n\nAnalysis and Machine Intelligence, vol. 14, pp. 239\u2013256, 1992.\n\n[6] Hannes Edvardson and \u00a8Orjan Smedby, \u201cCompact and ef\ufb01cient 3D shape description through\nradial function approximation,\u201d Computer Methods and Programs in Biomedicine, vol. 72, no.\n2, pp. 89\u201397, 2003.\n\n[7] D.W. Eggert, A. Lorusso, and R.B. Fisher, \u201cEstimating 3D rigid body transformations: a\ncomparison of four major algorithms,\u201d Machine Vision and Applications, Springer, vol. 9, no.\n5-6, Mar 1997.\n\n[8] R. Sala Llonch, E. Kokiopoulou, I. Tosic, and P. Frossard, \u201c3D face recognition with sparse\n\nspherical representations,\u201d Preprint, Elsiever, 2009.\n\n[9] Ameesh Makadia and Kostas Daniilidis, \u201cRotation recovery from spherical images without\ncorrespondences,\u201d IEEE Trans. Pattern Analysis Machine Intelligence, vol. 28, no. 7, pp.\n1170\u20131175, 2006.\n\n[10] Raman Arora and Harish Parthasarathy, \u201cNavigation using a spherical camera,\u201d in Interna-\n\ntional Conference on Pattern Recognition (ICPR), Tampa, Florida, Dec 2008.\n\n[11] Philip R. Evans, \u201cRotations and rotation matrices,\u201d Acta Cryst., vol. D57, pp. 1355\u20131359,\n\n2001.\n\n[12] Richard L. Liboff, Introductory Quantum Mechanics, Addison-Wesley, 2002.\n[13] Adam Smith and Manfred Warmuth, \u201cLearning rotations,\u201d in Conference on Learning Theory\n\n(COLT), Finland, Jun 2008.\n\n[14] Koji Tsuda, Gunnar Ratsch, and Manfred K Warmuth, \u201cMatrix exponentiated gradient updates\nfor on-line learning and Bregman projection,\u201d Journal of Machine Learning Research, vol. 6,\nJun 2005.\n\n[15] Manfred K Warmuth, \u201cWinnowing subspaces,\u201d in Proc. 24th Int. Conf. on Machine Learning,\n\n2007.\n\n[16] T.E. Abrudan, J. Eriksson, and V. Koivunen, \u201cSteepest descent algorithms for optimization\nunder unitary matrix constraint,\u201d Signal Processing, IEEE Transactions on, vol. 56, no. 3, pp.\n1134\u20131147, March 2008.\n\n[17] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cam-\n\nbridge, 2000.\n\n[18] Kaare Brandt Petersen and Michael Syskind Pedersen, \u201cThe matrix cookbook,\u201d http://\n\nmatrixcookbook.com, November 14, 2008.\n\n[19] Michael Artin, Algebra, Prentice Hall, 1991.\n[20] John A. Thorpe, Elementary topics in Differential Geometry, Springer-Verlag, 1994.\n[21] J. Kivinen andM. K.Warmuth, \u201cExponentiated gradient versus gradient descent for linear pre-\n\ndictors,\u201d Information and Computation, vol. 132, no. 1, pp. 1\u201363, Jan 1997.\n\n[22] L. J. Butler, Applications of Matrix Theory to Approximation Theory, MS Thesis, Texas Tech\n\nUniversity, Aug. 1999.\n\n[23] J. Gallier and D. Xu, \u201cComputing exponentials of skew-symmetric matrices and logarithms of\northogonal matrices,\u201d International Journal of Robotics and Automation, vol. 17, no. 4, 2002.\n[24] Raman Arora, Group theoretical methods in signal processing: learning similarities, trans-\nformation and invariants, Ph.D. thesis, University of Wisconsin-Madison, Madison, August\n2009.\n\n[25] E. Polak, Optimization: Algorithms and Consistent Approximations, Springer-Verlag, 1997.\n[26] Stanford University Computer Graphics Laboratory, \u201cThe Stanford 3D scanning repository,\u201d\n\nhttp://graphics.stanford.edu/data/.\n\n[27] Raman Arora, \u201cTracking rotations of 3D Stanford bunny,\u201d http://www.cae.wisc.edu/\n\n\u02dcsethares/links/raman/LearnROT/vids.html, 2009.\n\n9\n\n\f", "award": [], "sourceid": 355, "authors": [{"given_name": "Raman", "family_name": "Arora", "institution": null}]}