{"title": "Efficient Convex Completion of Coupled Tensors using Coupled Nuclear Norms", "book": "Advances in Neural Information Processing Systems", "page_first": 6902, "page_last": 6910, "abstract": "Coupled norms have emerged as a convex method to solve coupled tensor completion. A limitation with coupled norms is that they only induce low-rankness using the multilinear rank of coupled tensors. In this paper, we introduce a new set of coupled norms known as coupled nuclear norms by constraining the CP rank of coupled tensors. We propose new coupled completion models using the coupled nuclear norms as regularizers, which can be optimized using computationally efficient optimization methods. We derive excess risk bounds for proposed coupled completion models and show that proposed norms lead to better performance. Through simulation and real-data experiments, we demonstrate that proposed norms achieve better performance for coupled completion compared to existing coupled norms.", "full_text": "Ef\ufb01cient Convex Completion of Coupled Tensors\n\nusing Coupled Nuclear Norms\n\nKishan Wimalawarne1 and Hiroshi Mamitsuka1,2\n\n1Bioinformatics Center, Kyoto University, Kyoto, Japan\n\n2Department of Computer Science, Aalto University, Espoo, Finland\n\nkishanwn@gmail.com, mami@kuicr.kyoto-u.ac.jp\n\nAbstract\n\nCoupled norms have emerged as a convex method to solve coupled tensor com-\npletion. A limitation with coupled norms is that they only induce low-rankness\nusing the multilinear rank of coupled tensors. In this paper, we introduce a new\nset of coupled norms known as coupled nuclear norms by constraining the CP\nrank of coupled tensors. We propose new coupled completion models using the\ncoupled nuclear norms as regularizers, which can be optimized using computa-\ntionally ef\ufb01cient optimization methods. We derive excess risk bounds for pro-\nposed coupled completion models and show that proposed norms lead to better\nperformance. Through simulation and real-data experiments, we demonstrate that\nproposed norms achieve better performance for coupled completion compared to\nexisting coupled norms.\n\n1\n\nIntroduction\n\nIn this paper, we investigate convex coupled norms for coupled tensor completion. Two tensors\nare considered to be coupled when they share a common mode. A well explored problem with\ncoupled tensors is coupled tensor completion, which studies imputation of partially observed tensors\nusing coupled tensors as side information (Acar et al., 2014; Bouchard et al., 2013). Coupled tensor\ncompletion is commonly found in many real world applications such as link prediction (Ermis et al.,\n2015), recommendation systems (Acar et al., 2014) and computer vision (Li et al., 2015). Moreover,\nthe increase in availability of data from multiple sources further makes coupled tensor completion\nan important research area requiring thorough investigation.\nOver the years, several methods have been proposed to solve coupled tensor completion (Acar et al.,\n2014; Ermis et al., 2015). However, many of these methods are non-convex models leading to local\noptimal solutions. Additionally, these non-convex models have requirements of specifying ranks\nof coupled tensors, which are in many situations unknown. The recent development of coupled\nnorms (Wimalawarne et al., 2018) has emerged as a convex solution for coupled completion. These\ncoupled norms are modeled using the trace norm regularization, which eliminates the requirement\nof pre-specifying ranks. In spite of favorable qualities, coupled norms only induce low-rankness\nwith respect to the multilinear rank of coupled tensors. This makes coupled norms sub-optimal for\ncompletion of coupled tensors with other low rank structures.\nUntil recently, most of the research on convex norms that induces low-rankness of tensors has fo-\ncused on constraining the multilinear rank (Tomioka and Suzuki, 2013; Wimalawarne et al., 2014).\nHowever, recent studies (Yuan and Zhang, 2016) have shown that the tensor nuclear norm, which\nis a convex relaxation to minimizing the CANDECOMP/PARAFAC (CP) rank (Carroll and Chang,\n1970; Harshman, 1970; Hitchcock, 1927; Kolda and Bader, 2009) has favorable properties com-\npared low rank inducing norms that constrains the multilinear rank (Tomioka and Suzuki, 2013;\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fp\n\nWimalawarne et al., 2014). More speci\ufb01cally, Yang et al. (2015) showed that tensor comple-\ntion using the tensor nuclear norm leads to better sample complexity compared to the over-\nlapped norm (Tomioka and Suzuki, 2013; Liu et al., 2013), which was experimentally veri\ufb01ed by\nYuan and Zhang (2016). These advantages are unavailable for coupled norms since they do not\nsupport the tensor nuclear norm nor do they constrain the CP rank of coupled tensors.\nIn this paper, we investigate coupled completion through constraining the CP ranks of coupled ten-\nsors. We propose a set of convex coupled norms by extending the tensor nuclear norm. Additionally,\nwe propose novel completion models that are regularized by the proposed norms, which obtain\nglobally optimal solutions. We present theoretical analysis of the proposed completion models us-\ning excess risk bound analysis. Our analysis shows that the excess risk bound for two coupled\nK-mode tensors, X 2 Rn(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n and Y 2 Rn(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n, both having same CP rank r, is bounded by\nn(ln n)K(cid:0)1=2). We show that the obtained excess risk bounds are smaller compared\nO(r24KK\nto excess risk bounds resulting from multilinear rank based coupled norms. Finally, we verify our\ntheoretical claims by simulation and real-data experiments.\nspecify the mode-k unfolding (Kolda and Bader, 2009) by T(k) 2 Rnk(cid:2)\u220f\nWe use the following notations throughout the paper. Given a K-mode tensor T 2 Rn1(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK , we\nj\u0338=k nj , which is obtained\nby concatenating all slices along the mode-k. Given two matrices, M 2 Rn1(cid:2)n2 and N 2 Rn1(cid:2)n\n\u2032\n2,\nthe notation [M ; N ] 2 Rn1(cid:2)(n2+n\n\u2032\n\u2211\n2) represents their concatenation on the common mode-1. We\nindicate the outer product between vectors ui 2 Rni; i = 1; : : : ; N using the notation (cid:10) as (u1 (cid:10)\nl=1 ul;il. The k-mode product of a tensor T 2 Rn1(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nk(cid:1)(cid:1)(cid:1)(cid:2)nK and a vector\n(cid:1)(cid:1)(cid:1)(cid:10) uN )i1;:::;iN =\nv 2 Rnk is de\ufb01ned as T (cid:2)k v =\nTi1;i2;:::;ik;:::;iK vik. Given that rank of the mode-k unfolding\nof T is rk, the multilinear rank of T is de\ufb01ned as (r1;(cid:1)(cid:1)(cid:1) ; rK).\n\n\u220f\n\nnk\nik=1\n\nn\n\n2 Review of Coupled Completion\n\nWe brie\ufb02y review existing coupled completion methods in this section.\n\n2.1 Non-convex Factorization Methods\n\nR\n\nR\n\n\u2211\nCoupled completion models have been mostly investigated through factorization methods.\nIn\n\u2211\nessence, these methods consider explicit factorization of a coupled tensor T 2 Rn1(cid:2)n2(cid:2)n3 and a\ni=1 ai(cid:10) bi(cid:10) ci having ai 2 Rn1; bi 2 Rn2 ; ci 2 Rn3; i = 1; : : : ; R\nmatrix M 2 Rn1(cid:2)m as T =\ni=1 ai (cid:10) di having ai 2 Rn1; di 2 Rm; i = 1; : : : ; R, respectively, with a common\nand M =\nrank R and shared components ai; i = 1; : : : ; R. Many variations of factorization models for cou-\npled completion models have been proposed based on CP decomposition with shared and unshared\ncomponents (Acar et al., 2014), Tucker decomposition (Ermis et al., 2015), and non-negative factor-\nization (Ermis et al., 2015). However, due to factorization, these coupled completion models are\nnon-convex that lead to local optimal solutions. Furthermore, these methods require a priori speci\ufb01-\ncation of rank (R) of each tensor, as well as the number of shared components between the factorized\ntensors.\n\n2.2 Convex Coupled Norms\n\nCoupled norms (Wimalawarne et al., 2018) are a set of convex norms designed by combining low\nrank tensor and matrix norms. Given a tensor T 2 Rn1(cid:2)n2(cid:2)n3 and a matrix M 2 Rn1(cid:2)n\n\u2032\n2 coupled\non mode a, coupled norms are de\ufb01ned as\n\n\u2225T ; M\u2225a\n\n(b;c;d);\n\nwhere the subscripts b; c; d 2 fO; L; S;(cid:0)g specify the regularization method to be applied to each\nmode and the superscript a speci\ufb01es the mode in which the tensor and the matrix are coupled. No-\ntations O, L, and S indicate that the respective mode is regularized by using the overlapping trace\nnorm (Tomioka and Suzuki, 2013), latent trace norm (Tomioka and Suzuki, 2013), and scaled latent\ntrace norm (Wimalawarne et al., 2014), respectively, and (cid:0) indicates no regularization.\n\n2\n\n\fAn example of a coupled norm that regularizes both coupled tensors using the overlapped trace norm\nis\n\n\u2225T ; M\u22251\n\n(O;O;O) := \u2225[T(1); M ]\u2225tr +\n\n3\u2211\n\n\u2225T(k)\u2225tr:\n3\u2211\n\n1p\nnk\n\nk=2\n\nThe following norm is another example where we consider the T as a summation of latent tensors\nT (1), T (2), and T (3) and apply the scaled latent norm as\n\u2225[T (1)\n\n\u2225T ; M\u22251\n\n\u2225T (k)\n\n(\n\n)\n\ninf\n\nk=2\n\n:\n\n(1) ; M ]\u2225tr +\n\n(S;S;S) =\n\n\u2225tr\n\n(k)\n\nT (1)+T (2)+T (3)=T\n\n1p\nn1\n\nGiven two partially observed tensors ^T1 2 Rn1(cid:2)n2(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK ; K (cid:21) 3 and ^T2 2\n\u2032 (cid:21) 2, coupled on their mode-a with observed indexes given by the mappings\nRn\n\u2126 ^T1\n\n\u2032\nK\u2032 ; K\n, coupled completion is performed by solving\n\n(cid:2)n\n(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n\nand \u2126 ^T2\n\n\u2032\n1\n\n\u2032\n2\n\nminT1;T2\n\n1\n2\n\n\u2225\u2126 ^T1\n\n(T1 (cid:0) ^T1)\u22252\n\nF +\n\n\u2225\u2126 ^T2\n\n1\n2\n\n(T2 (cid:0) ^T2)\u22252\n\nF + (cid:21)\u2225T1;T2\u2225a\ncn;\n\ncn is a suitable coupled norm.\n\nwhere \u2225T1;T2\u2225a\nAn important property with coupled norms is that the trace norm is applied with respect to each mode\nunfolding of tensors. This results in inducing low-rankness only by using the multilinear rank of\ncoupled tensors. Furthermore, since the de\ufb01nitions of matrix rank and multilinear rank are different,\nconcatenated regularization on the coupled mode may not be optimal for sharing information among\nthe tensors.\n\n3 Proposed Method: Coupled Completion via Coupled Nuclear Norms\n\n{ 1\u2211\n\n1\u2211\n\nIn this section, we propose a set of convex coupled norms that overcome limitations of existing\ncoupled completion methods. The main tool we use to build our norms is the tensor nuclear norm\n(Yuan and Zhang, 2016; Yang et al., 2015; Lim and Comon, 2014), which is de\ufb01ned for a tensor\nT 2 Rn1(cid:2)n2(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK as\n\n\u2225T \u2225(cid:3) = inf\n\n(cid:13)jjT =\n\n(cid:13)ju1j (cid:10) u2j (cid:1)(cid:1)(cid:1) (cid:10) uKj;\u2225ukj\u22252\n\nj=1\n\nj=1\n\n(1)\nIn practice, we consider that T has a \ufb01nite rank R, which is expressed by the notation rank(T ) = R.\nWhen K = 2 and each ukj is orthogonal, the tensor nuclear norm is equivalent to the matrix nuclear\nnorm.\nWe now propose coupled norms by only using the tensor nuclear norms, thus low-rankness of both\nthe coupled tensors are induced using the CP rank. We name our norms coupled nuclear norms.\nWe introduce the following notation to de\ufb01ne the coupled nuclear norms for two coupled tensors\nW 2 Rn1(cid:2)n2(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK and V 2 Rn\n\n(cid:2)n\n\n\u2032\n1\n\n\u2032\n2\n\n:\n\n}\n2 = 1; (cid:13)j (cid:21) (cid:13)j+1 > 0\n\n(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n\n\u2032\nK\u2032 as\n\u2225W;V\u2225a\nccp;((cid:21)b;b)((cid:21)c;c);\n\nwhere the superscript a indicates the coupled mode, and each tuple ((cid:21)b; b) and ((cid:21)c; c) indicates the\nregualarization method for each tensor. We specify b; c 2 fF; Lg, where F and L indicate that a\ntensor is regularized as a whole or as a latent decomposition, respectively. Furthermore, we indicate\n(cid:21)b 2 R and (cid:21)c 2 R to specify regularization parameters for nuclear norms of each tensor. The\nsubscript ccp is used to distinguish the proposed norms from coupled norms in (Wimalawarne et al.,\n2018).\nLet us now look at a few de\ufb01nitions of coupled nuclear norms. We start with the following norm\n}\n(cid:13)ix1i (cid:10) (cid:1)(cid:1)(cid:1) (cid:10) xai (cid:10) (cid:1)(cid:1)(cid:1) xKi;\n\n\u2225W\u2225(cid:3) (cid:20) (cid:21)1;\u2225V\u2225(cid:3) (cid:20) (cid:21)2\n\nccp;((cid:21)1;F)((cid:21)2;F) =\n\n\u2225W;V\u2225a\n\n{\n\n(cid:12)(cid:12)(cid:12)(cid:12)W =\nR\u2211\nR\u2211\n\ni=1\n\nV =\n\n(cid:23)iy1i (cid:10) (cid:1)(cid:1)(cid:1) (cid:10) xai (cid:10) (cid:1)(cid:1)(cid:1) yK\u2032i\n\n;\n\n(2)\n\ni=1\n\n3\n\n\fwhere the subscripts with F lead us to consider W and V as whole tensors without any latent decom-\nposition. We assume that each tensor has a rank R and all the component vectors xai; i = 1; : : : ; R\non the coupled mode a are common to both the tensors, while the tensor nuclear norm is applied to\nW and V to constrain their ranks.\nA limitation in the previous norm is that it assumes both W and V have the same rank and all\ncomponents along the coupled mode are common. In practice, this can be a strong assumption and\nwe need to have more freedom for ranks and the amount of sharing among tensors. To incorporate\nthese features into coupled nuclear norms, we propose to use latent decomposition of tensors, such\nthat we learn latent tensors that are coupled to other tensors as well as uncoupled. Next, we assume\na latent decomposition for W and de\ufb01ne the following norm\n\nccp;((cid:21)1;(cid:21)2;L);((cid:21)3;F) =\n\ninf\n\nW (1)+W (2)=W\n\u2225V\u2225(cid:3) (cid:20) (cid:21)3\n\n\u2225W;V\u2225a\nR1\u2211\n\n(cid:12)(cid:12)(cid:12)(cid:12)W (1) =\n\n(cid:13)(1)\ni x(1)\n\n1i\n\ni=1\n\n{\n\u2225W (1)\u2225(cid:3) (cid:20) (cid:21)1;\u2225W (2)\u2225(cid:3) (cid:20) (cid:21)2;\n}\n(cid:10) (cid:1)(cid:1)(cid:1) (cid:10) x(2)\nKi;\n\nR2\u2211\n\n(cid:13)(2)\ni x(2)\n\ni=1\n\n1i\n\n(cid:23)iy1i (cid:10) (cid:1)(cid:1)(cid:1) (cid:10) xai (cid:10) (cid:1)(cid:1)(cid:1) yK\u2032i\n\n;\n\ninf\n\nW (1)+W (2)=W\n(cid:10) (cid:1)(cid:1)(cid:1) (cid:10) xai (cid:10) (cid:1)(cid:1)(cid:1) x(1)\nKi; W (2) =\nR1\u2211\n\nV =\n\ni=1\n\n(3)\nwhere the subscript ((cid:21)1; (cid:21)2; L) indicates that the tensor W is considered as two latent tensors and\ntheir nuclear norms are constrained by (cid:21)1 and (cid:21)2. The third subscript ((cid:21)3; F) indicates that V is\nconsidered as a whole without any latent decomposition. Further, the norm considers W (1) to have\ncommon factors with V due to coupling and W (2) is independent from any coupling with V. Due to\nthe latent decomposition, the rank of W is R1 + R2, however, only R1 components of xa in W are\nshared with V.\nthe\naddition\nIn\nde\ufb01ne\nccp;((cid:21)1;F);((cid:21)2;(cid:21)3;L) where the tensor V is considered to have a latent\ninfV (1)+V (2)=V \u2225W;V\u2225a\ndecomposition, and infW (1)+W (2)=W;V (1)+V (2)=V \u2225W;V\u2225a\nccp;((cid:21)1;(cid:21)2;L);((cid:21)3;(cid:21)4;L) where both the\ntensors are considered to have latent decompositions. Furthermore, our proposed norms can be\nextended to de\ufb01ne norms for coupled tensors with more than two coupled tensors.\nIt is important to note that the de\ufb01nition of proposed norms do not adhere to all the properties of\nthe normed space. Rather, they can be considered them as sets constructed by tensor nuclear norms.\nHowever, we refer to our de\ufb01nitions as norms since they are constructed by constraining the tensor\nnuclear norms. Further, we point out that the number of different norms we need for a coupled\ntensor using coupled nuclear norms are less compared to multilinear rank based coupled norms\n(Wimalawarne et al., 2018).\n\ncoupled\n\nnuclear\n\nnorms,\n\nfurther\n\nabove\n\ncan\n\nwe\n\nto\n\n3.1 New Coupled Completion Models\n\nWe now propose coupled completion models using coupled nuclear norms. Let us consider two\npartially observed tensors X 2 Rn1(cid:2)n2(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK and Y 2 Rn\n(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n\n\u2032\nK\u2032 coupled on the mode a.\nLet us also consider \u21261 : Rn1(cid:2)n2(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)nK ! Rm1 and \u21262 : Rn\nK\u2032 ! Rm2 as mapping to\n(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n\n\u2032\n\u2032\nobserved elements of X and Y, respectively, where m1 and m2 are the number of observed elements.\n2\nOur objective is to impute missing elements of X and Y by performing coupled completion using\nour proposed norms. Let W and V be completed tensors that we want to obtain for X and Y,\nrespectively. To achieve this using \u2225W;V\u2225a\n\n(cid:2)n\n\u2032\n2\n(cid:2)n\n\u2032\n1\n\n\u2032\n1\n\nccp;((cid:21)1;F)((cid:21)1;F), we propose a completion model as\n\u2225W;V\u2225a\n\nccp;((cid:21)1;F)((cid:21)1;F)\n\nminW;V\n\nand another completion model by using \u2225W;V\u2225a\n\ns:t \u21261(W) = \u21261(X ); \u21262(V) = \u21262(Y);\nccp;((cid:21)1;(cid:21)2;L)((cid:21)3;F) as\n\nmin\n\nW (1)+W (2)=W;V\n\n\u2225W;V\u2225a\n\nccp;((cid:21)1;(cid:21)2;L)((cid:21)1;F)\n\ns:t \u21261(W (1) + W (2)) = \u21261(X );\n\n\u21262(V) = \u21262(Y):\n\n4\n\n(4)\n\n(5)\n\n\fSimilarly, we can de\ufb01ne completion models using infV (1)+V (2)=V \u2225W;V\u2225a\ninfW (1)+W (2)=W;V (1)+V (2)=V \u2225W;V\u2225a\nA key advantage with the proposed coupled nuclear norms is that they do not have overlapping group\nstructures as in (Wimalawarne et al., 2018) and all tensors are regularized separately. This allows\nus to use a computationally feasible method such as the Frank-Wolfe optimization (Jaggi, 2013) to\nsolve the proposed completion models. We provide a Frank-Wolfe based optimization method to\nsolve above completion models in the Section B of the Appendix.\n\nccp;((cid:21)1;F);((cid:21)2;(cid:21)3;L) and\n\nccp;((cid:21)1;(cid:21)2;L);((cid:21)3;(cid:21)4;L).\n\n4 Theoretical Analysis\n\nIn this section, we analyze excess risk bounds for proposed coupled completion using coupled nu-\nclear norms. We consider a partially observed K-mode tensor X 2 Rn(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n and a partially ob-\n\u2032-mode tensor Y 2 Rn(cid:2)(cid:1)(cid:1)(cid:1)(cid:2)n coupled on their \ufb01rst modes. Let us consider two sets S\nserved K\nand P, whose elements contain indexes of arbitrary subsets of elements of X and Y, respectively.\nFollowing (Shamir and Shalev-Shwartz, 2014), we split S and P uniformly at random into training\nand test sets; the set S as STrain and STest such that S = STtrain [ STest, the set P as PTrain and\nPTest such that P = PTtrain [ PTest. Furthermore, following (Shamir and Shalev-Shwartz, 2014)\nwe consider the special case where jSTrainj = jSTestj = jSj=2 and jPTrainj = jPTestj = jPj=2.\nTo prove excess risk bounds, we recast each coupled nuclear norm as a hypothesis class for each\ncompletion model. Let us again denote W and V as completed tensors we want to learn from X\nand Y, respectively. Given the coupled nuclear norm \u2225W;V\u2225a\nccp;((cid:21)b;F)((cid:21)c;F), we de\ufb01ne a hypothesis\nclass as W = fW;V : \u2225W;V\u2225a\nccp;((cid:21)b;F)((cid:21)c;F); rank(W) = rank(V) = rg for some regularization\nparameter (cid:21)b and (cid:21)c. Using the hypothesis class and a (cid:3)-Lipschitz continuous and bl bounded loss\nfunction l((cid:1);(cid:1)) that measures the difference between the predicted and actual values, we write the\naverage loss over training sets of a coupled completion model as\nLSTrain;PTrain(W;V) :=\n\n[ \u2211\n\nl(Xi1;:::;iK ;Wi1;:::;iK )\n\n1\n\njSTrain [ PTrainj\n\n(i1;:::;iK )2STrain\n\n\u2211\n\n+\n\n]\nl(Yj1;:::;jK\u2032 ;Vj1;:::;jK\u2032 )\n\n;\n\nthe\n\n(j1;:::;jK\u2032 )2PTrain\n\ntransductive Rademacher\n\n(6)\nand the average loss over test sets can be constructed similarly as LSTest;PTest(W;V) by substituting\nSTrain and PTrain in (6) with STest and PTest, respectively.\nBy\nShamir and Shalev-Shwartz, 2014),\nity 1 (cid:0) (cid:14) as\nLSTest;PTest(W;V)(cid:0) LSTrain;PTrain(W;V) (cid:20) 4RS;P(l\u25e6W; l\u25e6V) + bl\n\u2211\nwhere RS;P(l \u25e6 W; l \u25e6 V), which is expressed as\n\n2007;\nthe excess risk can be upper bounded with the probabil-\n\n(El-Yaniv and Pechyony,\n\ncomplexity\n\n\u221a\n\ntheory\n\n(\n\n)\n\n; (7)\n\n[\n\nRS;P(l \u25e6 W; l \u25e6 V) =\n\n1jS [ PjE(cid:27)\n\nsupW;V2W\n\ni1;::;iK\n\n(cid:6)i1;:::;iK l(Xi1;:::;iK ;Wi1;:::;iK ) +\n\u2211\n\n11 + 4\n\nlog 1\n(cid:14)\n\n\u221ajSTrain [ PTrainj\n]\n\nj1;:::;jK\u2032 l(Yj1;:::;jK\u2032 ;Vj1;:::;jK\u2032 )\n\u2032\n\n(cid:6)\n\n;\n\n(8)\n\nj1;:::;jK\u2032\n\n\u2032 2 Rn(cid:2)n(cid:2)(cid:1)(cid:1)(cid:1)n are K-mode and K\n\nwhere (cid:6) 2 Rn(cid:2)n(cid:2)(cid:1)(cid:1)(cid:1)n and (cid:6)\n\u2032-mode tensors, respectively, and\n(cid:6)i1;:::;iK = (cid:27)\u03d5 2 f(cid:0)1; 1g with probability 0:5 if (i1; ::; iK) 2 S belonging to an index \u03d5 2\nj1;:::;jK\u2032 = (cid:27)jSj+\u03d5\u2032 2 f(cid:0)1; 1g with probability 0:5 if\n1; : : : ;jSj or (cid:6)i1;:::;iK = 0 otherwise, and (cid:6)\n(j1; ::; jK\u2032) 2 P belonging to an index \u03d5\nIn the next two theorems, we show the Rademacher complexities for proposed coupled nuclear\nnorms \u2225W;V\u2225ccp;((cid:21)1;F);((cid:21)1;F) and \u2225W;V\u2225ccp;((cid:21)1;(cid:21)2;L);((cid:21)3;F) (Detailed proof of these theorems are\ngiven in Section C of appendix.)\n\n\u2032 2 1; : : : ;jPj or (cid:6)\n\nj1;:::;jK\u2032 = 0 otherwise.\n\n\u2032\n\n\u2032\n\n5\n\n\fTheorem 1. Let us consider \u2225W;V\u2225a\nfW;V : \u2225W;V\u2225a\nbounded as\n\nccp;((cid:21)1;F)((cid:21)2;F) and its associated hypothesis class as W =\nccp;((cid:21)1;F)((cid:21)1;F); rank(W) = rank(V) = rg. Then the Rademacher complexity is\n]\n\nn(ln n)K(cid:0)1=2\n\nrBW 23K+K\n\n[\n\np\n\nK\n\n\u2032\n\nRS;P(l \u25e6 W; l \u25e6 V) (cid:20) c(cid:3)jS [ Pj\n\n+ rBV 2K+3K\n\nK\n\nn(ln n)K\n\n\u2032(cid:0)1=2\n\n:\n\n\u2032p\n\n\u2032\n\nwhere (cid:13)1 (cid:20) BW and (cid:23)1 (cid:20) BV of (2) and c is a constant.\nTheorem 2. Let us consider \u2225W;V\u2225a\nfW (1);W (2);V :\nr1; rank(W (2)) = r2g. Then the Rademacher complexity is bounded as\nRS;P(l \u25e6 W; l \u25e6 V) (cid:20) c(cid:3)jS [ Pj\n\nccp;((cid:21)1;(cid:21)2;L)((cid:21)3;F) and its hypothesis class as W =\nccp;((cid:21)1;(cid:21)2;L)((cid:21)3;F); rank(W (1)) = rank(V) =\n]\n\ninfW (1)+W (2)=W \u2225W;V\u2225a\n\n(r1BW1 + r2BW2)23K+K\n\n[\n\np\n\nK\n\n\u2032\n\nn(ln n)K(cid:0)1=2\n\u2032p\n\n\u2032\n\n+ r2BV 2K+3K\n\nK\n\nn(ln n)K\n\n\u2032(cid:0)1=2\n\n:\n\n1\n\n1\n\np\n\np\n\nnK(cid:0)1 +\n\np\nr\u2032K[\n\n(cid:20) BW1, (cid:13)(2)\n\n(cid:20) BW2, (cid:23)1 (cid:20) BV of (3) and c is a constant.\n\nwhere (cid:13)(1)\nThe Rademacher complexities in Theorems 1 and 2 show that for K (cid:21) K\n\u2032 and r = r1 = r2, excess\nn(ln n)K(cid:0)1=2). The excess risk bound for the coupled norm in\nrisks is bounded by O(r24KK\n\u2032\np\np\n(Wimalawarne et al., 2018) for the coupled tensors with multilinear rank (r\n) are bounded by\n; : : : ; r\nn(ln n)K(cid:0)1=2 compared\nO(\nnK(cid:0)1, coupled nuclear norms\nto coupled norms (Wimalawarne et al., 2018) that are bounded by\ncan lead to lower excess risk when coupled tensors have large dimensions (n and K are large).\nThough CP rank and multilinear rank cannot be compared directly, when the CP rank is smaller than\nthe mode dimensions (r < n) our theoretical analysis shows that coupled nuclear norms are more\ncapable of better performance compared to multilinear rank based coupled norms. Additionally, the\nRademacher complexity is divided by the total number of observed samples from both the coupled\ntensors (jS[Pj) leading to a lower Rademacher complexity compared to separate tensor completions.\n\nn]). Since coupled nuclear norms are bounded by\n\np\n\n\u2032\n\n5 Experiments\n\nWe carried out several simulations and real world data experiments to evaluate empirical perfor-\nmances of our proposed methods.\n\n5.1 Simulation Experiments\n\nWe designed simulation experiments using coupled tensors using both the CP rank and the multilin-\near rank. For each simulation, we created a tensor T 2 R20(cid:2)20(cid:2)20 and a matrix M 2 R20(cid:2)30 with\n\u2211\nspeci\ufb01ed ranks and coupled them on their \ufb01rst modes (without losing generality) by sharing a certain\namount of singular components along the \ufb01rst mode. We chose these dimensions of coupled tensors\nin accordance with simulation experiments in (Wimalawarne et al., 2018) for easier comparison. In\norder to create a tensor T with CP rank of r, we generated the tensor as T =\na=1 (cid:16)aua (cid:10) va (cid:10) wa\nwhere ua 2 R20, va 2 R20, and wa 2 R20 are randomly generated unit vectors and (cid:16)a 2 R.\nTo create T with multilinear rank of (r1; r2; r3), we generated orthogonal matrices U 2 R20(cid:2)r1,\nV 2 R20(cid:2)r2, and W 2 R20(cid:2)r3, and a core tensor C 2 Rr1(cid:2)r2(cid:2)r3 with elements randomly sampled\nfrom a Normal distribution and compute T = C (cid:2)1 U (cid:2)2 V (cid:2)3 W . We also created a rank R\n\u22a4 with orthogonal matrices X 2 R20(cid:2)R and Y 2 RR(cid:2)30 , and a diagonal matrix\nmatrix M = XSY\nS 2 RR(cid:2)R randomly generated from R+. We coupled the T and M by sharing r\n\u2032 components be-\n\u2032\n\u2032\ntween them as X(:; 1 : r\n) = [u1; : : : ; ur\u2032] for CP rank based tensors and X(:; 1 : r\n)\n) = U (:; 1 : r\nfor multilinear rank based tensors. In order to generate datasets for simulations, we selected training\nsets of 30, 50, and 70 percentages from total number of elements of the tensor and the matrix, 10\npercent as validation sets and the rest as test sets. For each simulation we repeated experiments with\n10 random selections.\n\n\u2032\n\nr\n\n6\n\n\f(a) Matrix Completion (M)\n\n(b) Tensor Completion (T )\nFigure 1: Performances of completion of the tensor with dimensions of 20 (cid:2) 20 (cid:2) 20 and CP rank\nof 5 and matrix with dimensions of 20 (cid:2) 30 and rank of 5 both sharing 5 components.\n\nproposed\n\ncoupled\n\n1\n\nnorms\n\nexperimented with\n\n\u2225T ; M\u2225ccp;((cid:21)1;F);((cid:21)1;F),\nWe\nnuclear\n\u2225T ; M\u2225ccp;((cid:21)2;(cid:21)3;L);((cid:21)2;F), and \u2225T ; M\u2225ccp;((cid:21)4;F);((cid:21)4;(cid:21)5;L).\nFor visual convenience in \ufb01g-\nures, we use shortened names for \u2225T ; M\u2225ccp;((cid:21)1;F);((cid:21)1;F), \u2225T ; M\u2225ccp;((cid:21)2;(cid:21)3;L);((cid:21)2;F), and\n\u2225T ; M\u2225ccp;((cid:21)4;F);((cid:21)4;(cid:21)5;L) as ccp-1, ccp-2, and ccp-3, respectively. For all these norms, we used\nthe regularization parameters (cid:21)1; : : : ; (cid:21)5 in the range from 0:01 to 50 with intervals of 1. As\nbaseline methods, we performed completion of each individual tensor using the overlapped trace\nnorm (OTN) and the scaled latent trace norm (SLTN) and individual matrix completion using the\nmatrix trace norm (MTN). We also used the tensor nuclear norm as a baseline method to evaluate\nindividual tensor completion. Additionally, we performed coupled completion with coupled norms\n(Wimalawarne et al., 2018). However, due to the dif\ufb01culty in plotting all the norms in a single graph\nonly the result from the best coupled norm is plotted. For all the baseline methods, we selected the\noptimal regularization parameters from the range of 0:01 to 5 in divisions of 0:025.\nFor our \ufb01rst experiment we created T by specifying a CP rank of 5 and M with rank of 5. We\ncoupled T and M by sharing all components on their \ufb01rst modes. Figure 1 shows that coupled\nnuclear norms have outperformed individual completion of the tensor and the matrix, as well as\ncoupled completion by the coupled norm (O; O; O).\nNext, we give a simulation experiment with coupled tensor using multilinear ranks. We constructed\nT with multilinear rank of (5; 5; 5) and M with rank of 5 and shared all components on the\n\ufb01rst mode. Figure 2 shows that the proposed coupled nuclear norms \u2225T ; M\u2225ccp;((cid:21)1;F);((cid:21)1;F) and\n\u2225T ; M\u2225ccp;((cid:21)2;(cid:21)3;L);((cid:21)2;F) have outperformed (O; O; O) for both tensor and matrix completion in-\ndicating that proposed norms are versatile for coupled tensors with multilinear ranks.\n\n(a) Matrix Completion (M)\n\n(b) Tensor Completion (T )\n\nFigure 2: Performances of completion of the tensor with dimensions of 20(cid:2) 20(cid:2) 20 and multilinear\nrank of (5; 5; 5) and matrix with dimensions of 20 (cid:2) 30 and rank of 5 both sharing 5 components.\n\nIn all the above experiments, coupled nuclear norms have performed comparable or better than\nindividual tensor and matrix completion. We give further simulation experiments in Section D of\nthe Appendix.\n\n1Code and data are available at http://kishan-wimalawarne.com/onewebmedia/NeurIPS_2018_code.rar\n\n7\n\n0.20.30.40.50.60.70.8Fraction of training samples00.020.040.060.080.1MSEMTN(O,O,O)ccp-1ccp-2ccp-30.20.30.40.50.60.70.8Fraction of training samples0.010.0120.0140.0160.0180.020.0220.024MSEOTNSLTNTNN(O,O,O)ccp-1ccp-2ccp-30.20.30.40.50.60.70.8Fraction of training samples00.10.20.30.40.5MSEMTN(O,O,O)ccp-1ccp-2ccp-30.20.30.40.50.60.70.8Fraction of training samples0.010.020.030.040.050.060.070.08MSEOTNSLTNTNN(O,O,O)ccp-1ccp-2ccp-3\f5.2 Real Data Experiments\n\nWe used the UCLAF dataset as our real data experiment.\n\n5.2.1 UCLAF Dataset\n\nThe UCLAF dataset (Zheng et al., 2010) is a commonly used benchmark dataset for coupled tensor\ncompletion (Ermis et al., 2015; Wimalawarne et al., 2018). The UCLAF dataset contains GPS data\ncollected from 164 users in 168 locations performing 5 activities. These GPS data forms a user-\nlocation-activity tensor T 2 R164(cid:2)168(cid:2)5 consisting of only a few observed elements. In order to\nlearn the unobserved elements, tensor completion can be performed. However, the UCLAF dataset\nalso contains side information that can be coupled to the tensor to improve the completion procedure.\nSimilar to (Wimalawarne et al., 2018), we used the coupling of T with the user-location matrix\nX 2 R164(cid:2)168. We used the same random data selection and validation processes as simulation\nexperiments.\n\nApart from the coupled nuclear norms, we\nexperimented with the same baseline meth-\nods for tensors as in the previous section.\nFor these experiments, we selected regular-\nization parameters from logarithmic linear\nscale from 0:01 to 5000 with 200 divisions.\nAdditionally, we compared our results with\nthe SDF model (Sorber et al., 2015) by using\na CP rank of 2.\nFigure 3 shows that\nthe coupled nuclear\nnorm \u2225 (cid:1) \u2225ccp;((cid:21);F);((cid:21);F) (ccp-1) gives the\nbest performance.\nThe coupled norm\n(S; O; O) which has given the best perfor-\nmance among multilinear rank based cou-\npled norms (Wimalawarne et al., 2018) is\noutperformed by all\nthe coupled nuclear\nnorms.\n\nFigure 3: Performances on the UCLAF data set\n\nBoth simulation and UCLAF data experiments indicate that coupled nuclear norms lead to better\nperformance compared to existing coupled norms (Wimalawarne et al., 2018).\n\n6 Acknowledgment\n\nThis work has been partially supported by MEXT KAKENHI Grant Number 16H02868, Grant Num-\nber JPMJAC1503 ACCEL JST, FiDiPro Tekes (currently Business Finland) and AIPSE Academy of\nFinland.\n\n7 Conclusion and Future Work\n\nWe introduce coupled nuclear norms by integrating the CP rank into coupled norms. We propose\nnew coupled completion models regularized by coupled nuclear norms and discuss optimization\nprocedures to solve them. Our excess risk bounds for coupled completion show that the proposed\nnorms lead to better performances compared to existing multilinear rank based coupled norms. Our\ntheoretical analysis is validated through simulation and real world data experiments, where we show\nthat coupled nuclear norms can give better performance compared to existing methods. We believe\nthat the proposed coupled nuclear norms should be further investigated to be widely applicable in\nreal world problems.\nApplying coupled nuclear norms to solve large scale problems is an important future research di-\nrection. More speci\ufb01cally, developing computationally feasible optimization methods is important\nsince computing the coupled nuclear norms can be computationally costly. Future research in this\ndirection can consider developing globally optimal power methods (Anandkumar et al., 2017) to ap-\nproximate coupled nuclear norms. Furthermore, theoretical analysis of coupled nuclear norms with\nmore than two tensors is another important future research direction.\n\n8\n\n0.20.30.40.50.60.70.8Fraction of training samples0.511.52MSEOTNSLTN(S,O,O)TNNSDFccp-1ccp-2ccp-3\fReferences\nAcar, E., Papalexakis, E. E., G\u00fcrdeniz, G., Rasmussen, M. A., Lawaetz, A. J., Nilsson, M., and Bro, R. (2014).\n\nStructure-revealing data fusion. BMC Bioinformatics, 15:239.\n\nAnandkumar, A., Ge, R., and Janzamin, M. (2017). Analyzing tensor power method dynamics in overcomplete\n\nregime. Journal of Machine Learning Research, 18:22:1\u201322:40.\n\nBouchard, G., Yin, D., and Guo, S. (2013). Convex collective matrix factorization. In AISTATS, volume 31 of\n\nJMLR Workshop and Conference Proceedings, pages 144\u2013152. JMLR.org.\n\nCarroll, J. D. and Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an\n\nn-way generalization of \u201ceckart-young\u201d decomposition. Psychometrika, 35(3):283\u2013319.\n\nEl-Yaniv, R. and Pechyony, D. (2007). Transductive rademacher complexity and its applications. In Learning\n\nTheory, volume 4539, pages 157\u2013171.\n\nErmis, B., Acar, E., and Cemgil, A. T. (2015). Link prediction in heterogeneous data via generalized coupled\n\ntensor factorization. Data Mining and Knowledge Discovery, 29(1):203\u2013236.\n\nHarshman, R. A. (1970). Foundations of the PARAFAC procedure: models and conditions for an explanatory\n\nmultimodal factor analysis. UCLA Working Papers in Phonetics, 16:1\u201384.\n\nHitchcock, F. L. (1927). The expression of a tensor or a polyadic as a sum of products. J. Math. Phys, 6(1):164\u2013\n\n189.\n\nJaggi, M. (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In ICML, volume 28,\n\npages 427\u2013435. PMLR.\n\nKolda, T. G. and Bader, B. W. (2009). Tensor Decompositions and Applications. SIAM Review, 51(3):455\u2013500.\n\nLedoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical surveys and monographs.\n\nAmerican Mathematical Society.\n\nLi, C., Zhao, Q., Li, J., Cichocki, A., and Guo, L. (2015). Multi-tensor completion with common structures. In\n\nAAAI, pages 2743\u20132749.\n\nLim, L. and Comon, P. (2014). Blind multilinear identi\ufb01cation. IEEE Trans. Information Theory, 60(2):1260\u2013\n\n1280.\n\nLiu, J., Musialski, P., Wonka, P., and Ye, J. (2013). Tensor completion for estimating missing values in visual\n\ndata. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):208\u2013220.\n\nNguyen, N. H., Drineas, P., and Tran, T. D. (2015). Tensor sparsi\ufb01cation via a bound on the spectral norm of\n\nrandom tensors. Information and Inference: A Journal of the IMA, 4(3):195\u2013229.\n\nShamir, O. and Shalev-Shwartz, S. (2014). Matrix completion with the trace norm: Learning, bounding, and\n\ntransducing. Journal of Machine Learning Research, 15:3401\u20133423.\n\nSorber, L., Barel, M. V., and Lathauwer, L. D. (2015). Structured data fusion. IEEE Journal of Selected Topics\n\nin Signal Processing, 9(4):586\u2013600.\n\nTomioka, R. and Suzuki, T. (2013). Convex tensor decomposition via structured schatten norm regularization.\n\nIn NIPS.\n\nVershynin, R. (2011). Spectral norm of products of random and deterministic matrices. Probability Theory and\n\nRelated Fields, 150(3):471\u2013509.\n\nWimalawarne, K., Sugiyama, M., and Tomioka, R. (2014). Multitask learning meets tensor factorization: task\n\nimputation via convex optimization. In NIPS.\n\nWimalawarne, K., Yamada, M., and Mamitsuka, H. (2018). Convex coupled matrix and tensor completion.\n\nNeural Computation, 30(11):1\u201333.\n\nYang, Y., Feng, Y., and Suykens, J. A. K. (2015). A rank-one tensor updating algorithm for tensor completion.\n\nIEEE Signal Processing Letters, 22(10):1633\u20131637.\n\nYuan, M. and Zhang, C.-H. (2016). On tensor completion via nuclear norm minimization. Foundations of\n\nComputational Mathematics, 16(4):1031\u20131068.\n\nZheng, V. W., Cao, B., Zheng, Y., Xie, X., and Yang, Q. (2010). Collaborative \ufb01ltering meets mobile recom-\n\nmendation: A user-centered approach. In AAAI.\n\n9\n\n\f", "award": [], "sourceid": 3443, "authors": [{"given_name": "Kishan", "family_name": "Wimalawarne", "institution": "Kyoto University"}, {"given_name": "Hiroshi", "family_name": "Mamitsuka", "institution": "Kyoto University"}]}