{"title": "Efficient Moments-based Permutation Tests", "book": "Advances in Neural Information Processing Systems", "page_first": 2277, "page_last": 2285, "abstract": "In this paper, we develop an efficient moments-based permutation test approach to improve the system\u2019s efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation.  Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency.", "full_text": " \n\nEfficient Moments-based Permutation Tests \n\n \n \n\nChunxiao Zhou Huixia Judy Wang \n Dept. of Statistics \n\n Dept. of Electrical and Computer Eng. \n University of Illinois at Urbana-Champaign \n Champaign, IL 61820 Raleigh, NC 27695 \n czhou4@gmail.com \n wang@stat.ncsu.edu \n\nNorth Carolina State University \n\n \n\nYongmei Michelle Wang \n\nDepts. of Statistics, Psychology, and Bioengineering \n\nUniversity of Illinois at Urbana-Champaign \n\nChampaign, IL 61820 \n\n ymw@illinois.edu \n\nAbstract \n\nIn this paper, we develop an efficient moments-based permutation test \napproach to improve the test(cid:8217)s computational efficiency by approximating \nthe permutation distribution of the test statistic with Pearson distribution \nseries. This approach involves the calculation of the first four moments of \nthe permutation distribution. We propose a novel recursive method to derive \nthese moments theoretically and analytically without any permutation. \nExperimental results using different test statistics are demonstrated using \nsimulated data and real data. The proposed strategy takes advantage of \nnonparametric permutation \ntests and parametric Pearson distribution \napproximation to achieve both accuracy and efficiency. \n\n \nIntroduction \n\n1 \nPermutation tests are flexible nonparametric alternatives to parametric tests in small \nsamples, or when the distribution of a test statistic is unknown or mathematically intractable. \nIn permutation tests, except exchangeability, no other statistical assumptions are required. \nThe p-values can be obtained by using the permutation distribution. Permutation tests are \nappealing in many biomedical studies, which often have limited observations with unknown \ndistribution. They have been used successfully in structural MR image analysis [1, 2, 3], in \nfunctional MR image analysis [4], and in 3D face analysis [5]. \nThere are three common approaches to construct the permutation distribution [6, 7, 8]: (1) \nexact permutation enumerating all possible arrangements; (2) approximate permutation \nbased on random sampling from all possible permutations; (3) approximate permutation \nusing the analytical moments of the exact permutation distribution under the null hypothesis. \nThe main disadvantage of the exact permutation is the computational cost, due to the \nfactorial increase in the number of permutations with the increasing number of subjects. The \nsecond technique often gives inflated type I errors caused by random sampling. When a large \nnumber of repeated \nis also \ncomputationally expensive to achieve satisfactory accuracy. Regarding the third approach, \nthe exact permutation distribution may not have moments or moments with tractability. In \nmost applications, it is not the existence but the derivation of moments that limits the third \napproach. \n\nthe random permutation strategy \n\ntests are needed, \n\n\fTo the best of our knowledge, there is no systematic and efficient way to derive the moments \nof the permutation distribution. Recently, Zhou [3] proposed a solution by converting the \npermutation of data to that of the statistic coefficients that are symmetric to the permutation. \nSince the test statistic coefficients usually have simple presentations, it is easier to track the \npermutation of the test statistic coefficients than that of data. However, this method requires \nthe derivation of the permutation for each specific test statistic, which is not accessible to \npractical users. \nIn this paper, we propose a novel strategy by employing a general theoretical method to \nderive the moments of the permutation distribution of any weighted v-statistics, for both \nunivariate and multivariate data. We note that any moments of the permutation distribution \nfor weighted v-statistics [9] can be considered as a summation of the product of data \nfunction term and index function term over a high dimensional index set and all possible \npermutations. Our key idea is to divide the whole index set into several permutation \nequivalent (see Definition 2) index subsets such that the summation of the data/index \nfunction term over all permutations is invariant within each subset and can be calculated \nwithout conducting any permutation. Then we can obtain the moments by summing up \nseveral subtotals. The proposed method can be extended to equivalent weighted v-statistics \nby replacing them with monotonic weighted v-statistics. This is due to the fact that only the \norder of test statistics of all permutations matters for obtaining the p-values, so that the \nmonotonic weighted v-statistics shares the same p-value with the original test statistic. Given \nthe first four moments, the permutation distribution can be well fitted by Pearson \ndistribution series. The p-values are then obtained without conducting any real permutation. \nFor multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, \nthe number of tests m = 2,000, we need to conduct m\u00d7(n1+ n2)!/n1!/n2! (cid:8776) 1.1\u00d71015 \npermutations for the exact permutation test. Even for 20,000 random permutations per test, \nwe still need m\u00d720,000 (cid:8776) 4\u00d7107 permutations. Alternatively, our moments-based permutation \nmethod using Pearson distribution approximation only involves the calculation of the first \nfour analytically-derived moments of exact permutation distributions to achieve high \naccuracy (see section 3). Instead of calculating test statistics in factorial scale with exact \npermutation, our moments-based permutation only requires computation of polynomial \norder. For example, the computational cost for univariate mean difference test statistic and \nmodified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3), respectively, where \nn = n1+ n2. \n \n2 \nIn this section, we shall mainly discuss how to calculate the moments of the permutation \ndistribution for weighted v-statistics. For other test statistics, a possible solution is to \nreplace them with their equivalent weighted v-statistics by monotonic transforms. The \ndetailed discussion about equivalent test statistics can be found in [7, 8, 10]. \n \n2. 1 \nLet us first look at a toy example. Suppose we have a two-group univariate data \n, where the first n1 elements are in group A and the rest, n2 ,are \nx\nin group B. For comparison of the two groups, the hypothesis is typically constructed as: \nH\nm m are the population means of the groups A \n,A\nand \n\nComp utational c halle nge \n\nas the sample means of two \n\nand B, respectively. Define \n\nM ethodology \n\n= L\n\nx\nn n\n+\n1\n\n vs. \n\nB\nx\n\nm\u201e\n\nm=\n\nx\nn\n1\n\nx\nn\n1\n\n:a\n\n0 :\n\nL\n\nx\n1\n\nH\n\nm\n\nm\n\n1\n+\n\nx\n\n(\n\n)\n\nA\n\nB\n\nA\n\n,\n\n,\n\n2\n\n,\n\n,\n\n,\n\nB\n\n, where \nn\n1\n= (cid:229)\ni\n1\n=\n\nx n\ni\n1\n\n/\n\nA\n\nn\n= (cid:229)\ni n\n1\n= +\n1\n\nB\n\nx n\ni\n\n/\n\n2\n\ngroups, where n=n1+n2. We choose the univariate group mean difference as the test \nw i x\n, where \nstatistic, \n, \n( )\n=\ni\n+ L . Then the total number of all possible \nn\nn\ni\nw i\n, }\n1,\n1{\n( )\n, }nL is n!. To calculate the fourth moment of the permutation \n\nn\n= (cid:229)\nB\ni\n1\n=\nn\n, if \n1/\n2\n\nfunction \n\nA\n= -\n\n( ) 1/\n\nindex \n\ni.e., \nn\n1\n\nT x\n( )\n\u02db L and \nif\npermutations of {1,\ndistribution, \n\nthe \n\nw i\n\n{1,\n\nn\n1\n\n\u02db\n\n=\n\n-\n\n}\n\nx\n\nx\n\ni\n\n \n\n \n\n,\n\n\f4\n\nE T x\np\n\n( ))=\n\n(\n\n1\nn\n!\n\nn\n(cid:229) (cid:229)\nS\n1\n\u02db\n=\n\n(\n\ni\n\nn\n\np\n\nw i x\n( )\np\n\ni\n( )\n\n4\n) =\n\n1\nn\n!\n\nn\n\nn\n\nn\n\nn\n(cid:229) (cid:229) (cid:229) (cid:229) (cid:229)\nS i\ni\n1\n\u02db\n=\nn\n1\n4\n\n1\n=\n\n1\n=\n\n1\n=\n\ni\n2\n\ni\n3\n\np\n\nw i w i w i w i x\n( )\n)\n1\np\n\n(\n\n(\n\n(\n\n)\n\n)\n\n3\n\n4\n\n2\n\nx\np\n\ni\n( )\n1\n\n(\n\ni\n2\n\n)\n\nx\np\n\n(\n\ni\n3\n\n)\n\nx\np\n\n(\n\ni\n4\n\n)\n\n, \n\nwhere (cid:960) is the permutation operator and the symmetric group Sn [11] includes all distinct \npermutations. The above example shows that the moment calculation can be considered as a \nsummation over all possible permutations and a large index set. It is noticeable that the \ncomputational challenge here is to go through the factorial level permutations and \npolynomial level indices. \n \n2. 2 \nIn this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of \ndegree d [9], that is, \n is a data \n\nPar tition the inde x se t \n\n, where \n\n,\n\nL\n\n,\n\n)T\n\nx\n\nx\n\nx\n\n=\n\n(\n\nT x\n( )\n\n,\n\nL\n\n,\n\nL\n\n,\n\nx\n1\n\n,\n\n2\n\nn\n\ni h x\n) (\ni\nd\n1\n\nx\n)d\ni\n\nn\nn\n(cid:229)L\n= (cid:229)\ni\n1\n1\n=\n=\n1\n\ni\nd\n\nw i\n( ,\n1\n\nwith n observations, and w is a symmetric index function. h is a symmetric data function, \niL . Though the symmetry property is not required \ni.e., invariant under permutation of \nkx can be \nfor our method, it helps reduce the computational cost. Here, each observation \neither univariate or multivariate. In the above toy example, d=1 and h is the identity \nfunction. Therefore, the r-th moment of the test statistic from the permutated data is: \n\ni\n1( ,\n\n)d\n\n,\n\nr\n\nE T x\np\n\n( ))\n\n(\n\n=\n\nE\np\n\n(\n\n(cid:229)\nL\ni\n,\n,\nd\n\ni\ni\n,\n1 2\n\nw i\n( ,\n1\n\nL\n\n,\n\ni h x\n) (\nd\np\n\n,\n\nL\n\n,\n\nx\np\n\n))\n\n(\n\ni\nd\n\n)\n\ni\n( )\n1\n\nr\n\n \n\nr\n{\n(cid:213)\nk\n1\n=\n\n=\n\nE\np\n\n[\n(1)\ni\n1\nL\nr\n( )\ni\n1\n\n(cid:229)\nL\ni\n,\n,\nd\n\n(1)\n\n,\n\nr\n( )\n\nL\ni\n,\n,\nd\n\nk\n( )\nw i\n(\n1\n\n,\n\nL\n\n,\n\ni\nd\n\nk\n( )\n\n)\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nk\n\n)\n\n(\n\n(\ni\n1\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n)\n\n)}] .\n\n \n\nThen we can exchange the summation order of permutations and that of indices, \n\nr\n\nE T x\n( ))\np\n\n(\n\n{(\n\nr\n(cid:213)\nk\n1\n=\n\n=\n\n(cid:229)\nL\ni\n,\n,\nd\n\n(1)\n\n,\n\nr\n( )\n\nL\ni\n,\n,\nd\n\n(1)\ni\n1\nL\nr\n( )\ni\n1\n\nk\n( )\nw i\n(\n1\n\n,\n\nL\n\n,\n\ni\nd\n\nk\n( )\n\n))\n\nE\np\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nk\n\n)\n\n(\n\n(\ni\n1\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n)\n\n))}.\n\n \n\nThus any moment of permutation distribution can be considered as a summation of the \nproduct of data function term and index function term over a high dimensional index set and \nall possible permutations. \nSince all possible permutations map any index value between 1 and n to all possible index \nvalues from 1 to n with equal probability, \n, the summation of \ndata function over all permutations is only related to the equal/unequal relationship among \nindices. It is natural to divide the whole index set \n \nL\n\nh x\n(\np\n\nr\n(cid:213)\nk\n1\n=\n\nE\np\n\nx\np\n\nL\n\n{(\n\nU\n\n(\ni\n1(\n\n))\n\nL\n\n)\n,\n\n=\n\n=\n\ni\nd\n\n(\n\n(1)\n\n,\n\n,\n\n(\n\n)\n\n)\n\n,\n\n,\n\nk\n\nk\n\n(\n\n)\n\n)\n\n(1)\ni\n1\n\ni\nd\n\n,\n\nd\n\n)\n\n)\n\n(\n\nk\n\n)\n\n,\n\n,\n\n \n\n(\n\nk\n\n)\n\n,\n\nL\n\nr\n( )\n\ni\nd\n\nx\np\n\n))\n\n(\ni\n1(\n\niL\n,\n\n)d\n\nr\n( )\ni\n1\n\ni\n1( ,\n\niL and \n\n into the union of disjoint index subsets, in which \n\n)}\n(\n,\nis invariant. \nDefinition 1. Since h is a symmetric function, two index elements \n \n)d\nare said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) \n= (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). \nDefinition 2. Two indices \n are \n(\n)\nL\n,\n,\nsaid to be permutation equivalent/(cid:8801) if there exists a permutation \np \u02db such that \n. Here \"=\" means they \n{(\np\nhave same index elements by Definition 1. For example, for d = 2, n = 4, r = 2, {(1, 2), (2, \n3)} (cid:8801) {(2, 4), (1, 4)} since we can apply (cid:960): 1(cid:8594)1, 2(cid:8594)4, 3(cid:8594)2, 4(cid:8594)3, such that {( (cid:960)(1), (cid:960)(2)), \n((cid:960)(2), (cid:960)(3))} = {(1, 4), (4, 2)}= {(2, 4), (1, 4)}. As a result, the whole index set for d = 2, r = \n2, can be divided into seven permutation equivalent subsets, [{(1, 1), (1, 1)}], [{(1, 1), (1, \n2)}], [{(1, 1), (2, 2)}], [{(1, 2), (1, 2)}], [{(1, 1), (2, 3)}], [{(1, 2), (1, 3)}], [{(1, 2), (3, 4)}], \nwhere [ ] denotes the equivalence class. Note that the number of the permutation equivalent \nsubsets is only related to the order of weighted v-test statistic d and the order of moment r , \n\nL\n,\nnS\n\n))} {(\n\n and \n\njL\n,\n\n{(\ni\n1\n\nj\n1(\n\nr\n( )\nj\n1\n\nr\n( )\ni\n1\n\n(1)\nj\n1\n\n(\np\n\n{(\n\n)}\n\n)}\n\n)}\n\n))\n\np\n\np\n\n=\n\nL\n\nL\n\nL\n\nL\n\nL\n\nL\n\nL\n\nL\n\nL\n\nL\n\ni\nd\n\ni\nd\n\nj\n1\n\ni\n1\n\nj\n1\n\ni\n1\n\n(\n\n(\n\n(\n\n(\n\n(\n\n(\n\n)\n\n)\n\n)\n\n)\n\nr\n( )\n\nr\n( )\n\nr\n( )\n\nr\n( )\n\nr\n( )\n\n(1)\n\n(\n\nr\n\n)\n\n(1)\n\n(1)\n\n(1)\n\n(1)\n\n,\n\ni\n\n(1)\n\n(1)\n\nj\n\nd\n\nj\n\nd\n\nj\n\nd\n\ni\n\nd\n\nj\n\nd\n\n,\n\nd\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\nL\ni\n1{ ,\nr\nE\n(\n(cid:213)\np\nk\n1\n=\n\nr\ni\n, }\nd\nh x\n(\np\n\n\fbut not related to the data size n, and it is small for the first several moments calculation \n(small r) with low order test statistics (small d). \nUsing the permutation equivalent relationship defined in Definition 2, the whole index set U \ncan be partitioned into several permutation equivalent index subsets. Then we can calculate \nthe r-th moment by summing up subtotals of all index subsets. This procedure can be done \nwithout any real permutations based on Proposition 1 and Proposition 2 below. \n\nProposition 1. We claim that the data function sum \nwithin each equivalent index subset, and \n\nE\np\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nk\n\n)\n\n(\ni\n1(\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n)\n\n))\n\n is invariant \n\nh x\n(\n\nr\n(cid:213)\nk\n1\n=\nr\n( )\n)}])\n\nL\n,\n\n,\n\nx\n\nk\n\n)\n\n)\n\n)\n\n(\n\n,\n\n,\n\n,\n\n,\n\n,\n\nk\n\n)\n\n)\n\n(1)\n\n(1)\n\nr\n( )\n\n(\n\n(\n\nk\n\n)\n\nj\nd\n\n{(\n\n(\nj\n1\n\n(\n\ni\nd\n\n(\ni\n1\n\n=\n\n(1)\n\n),\n\n))\n\nL\n,\n,\n\ni\nd\n\n,(\n\n(1)\nj\n1\n\nx\np\n\nL\n),\n,(\n\nL\n\nL\n\nL\n\nE\np\n\n([{(\n\n(1)\ni\n1\n\nr\n( )\ni\n1\n\ncard\n\nh x\n(\np\n\nL\ni\n,\n,\nd\n(1)\ni\nd\n\nL\ni\n,\n,\nd\nr\n( )\nL\ni\n,\n1\n\nr\n( )\nL\ni\n),\n,(\n1\nL\n,(\n),\n\n(cid:229)\nr\nr\n(1)\n( )\n( )\nL\ni\nj\nj\n,\n)} [{(\n,\n\u02db\nd\n1\n1\n(1)\nL\ncard\ni\n,\n([{(\n1\nr\n( )\n is the number of indices falling into the \n)}])\n(1)\nL\ni\n,\n,\nd\n\nr\n(cid:213)\nk\n1\n=\nwhere \n,\npermutation equivalent index subset \nProof sketch: \nSince all indices in the same permutation equivalent subset are equivalent with respect to the \nsymmetric group Sn, \nr\n(\n(cid:213)\nk\n1\n=\n\nL\ni\n,\nd\n(1)\ni\n[{(\n1\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nh x\n(\np\n\n)}]\ni\n,\nd\n\n(cid:229)\nS\np\u02db\n\n1\nn\n!\n\nr\n( )\ni\n1\n\n)}]\n\nE\np\n\nL\n\nL\n\nL\n\nL\n\nx\np\n\nx\np\n\ni\nd\n\n,(\n\n))\n\nr\n( )\n\n),\n\n. \n\n=\n\n=\n\n(\ni\n1\n\n(\ni\n1\n\n(\n\ni\nd\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n(\n\nk\n\n)\n\nj\nd\n\n(\n\nk\n\n)\n\n)\n\n \n\nk\n\n)\n\n)\n\nk\n\n)\n\n)\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n(\n\n(\n\n)\n\n)\n\nn\n\n{(\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\n(1)\n\nL\n),\n,(\n\n=\n\nr\n( )\nj\n1\n\n(cid:229)\nr\n(1)\n( )\nL\ni\nj\n,\n)} [{(\n,\n\u02db\nd\n1\n(1)\ni\ncard\n,\n([{(\n1\n\nL\ni\n,\nd\nL\ni\n,\nd\n\nr\n( )\n(\n(1)\nL\nL\ni\ni\n,(\n,\n),\nd\n1\nr\n( )\n(1)\nL\ni\n,(\n),\n1\nn\n!\n\n{(\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\n(1)\n\nL\n),\n,(\n\n=\n\n(\nj\n1\n\n)\n\nr\n\nL\n,\n,\ncard\n\n)\n\n(\n\nr\nj\nd\n([{(\n\n(cid:229)\n(1)\nL\ni\n,\n)} [{(\n,\n\u02db\n1\n(1)\nL\ni\ni\n,\n,\nd\n1\n\n(1)\n\ni\nd\n(1)\n\nL\n,(\n),\nL\n),\n\nr\n( )\nL\ni\n,\n,\n1\nr\n( )\ni\n,(\n1\n\ni\nd\n,\n\n(\n\nr\n\n)\n\nL\n\n)}]\ni\n,\nd\n\n(\n\nh x\n(\n\nr\n(cid:213)\nk\n1\n=\nr\n( )\n)}])\n\nh x\n(\n\n,\n\nL\n\n,\n\nx\n\n)\n\nn\n\n!)\n\n(\n\nk\n\n)\n\nj\nd\n\nk\n\n)\n\n(\nj\n1\n\n(\n\nr\n(cid:213)\nk\n1\n=\nr\n( )\ni\nd\n\nr\n,\n\n)\n\n)}]\nL\n\n,\n\n)}])\n\n,\n\nL\n\n,\n\nx\n\n))\n\n(\n\nk\n\n)\n\nj\nd\n\nk\n\n)\n\n(\nj\n1\n\n,\n\n \n\n.\n\n \n\nProposition 2. Thus we can obtain the r-th moment by summing up the production of the \ndata partition sum wl and the index partition sum hl over all permutation equivalent \nsubsets, i.e., \nE T x\np\n\nis any permutation \n\n, where \n\n(1)\ni\n[{(\n1\n\nr\n( )\ni\n,(\n1\n\nw h\nl l\n\n( ))\n\n)}]\n\nL\n\nL\n\nL\n\ni\nd\n\ni\nd\n\nl =\n\nr\n( )\n\n),\n\n(1)\n\n(\n\n \n\n,\n\n,\n\n,\n\n,\n\nr\n\n= (cid:229)\nU\n[\nl\u02db\n\n]\n\nequivalent subset of the whole index set U. [U] denotes the set of all distinct permutation \nequivalent classes of U. The data partition sum is \n))\n\nh x\n(\n\nL\n\nx\n\n,\n\n,\n\nk\n\nk\n\n(\n\n)\n\n)\n\n(\nj\n1\n\nj\nd\n\n{(\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\n(1)\n\nr\n( )\nj\n1\n\nL\n,\n,\n\n(cid:229)\nL\n),\n,(\n\nh\nl\n\n=\n\n(\n\nr\n(cid:213)\nk\n1\nj\n=\n)}\nl\n\u02db\nd\ncard\n( )\nl\n\nr\n( )\n\n,\n\nand the index partition sum is \n\n(cid:229)\nL\n),\n,(\n\n(1)\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nw x\n(\n\nk\n\n)\n\n(\nj\n1\n\n,\n\nL\n\n,\n\nx\n\n(\n\nk\n\n)\n\nj\nd\n\n)) .\n\n \n\nr\n( )\nj\n1\n\nL\n,\n,\n\nj\nd\n\nr\n( )\n\n)}\nl\n\u02db\n\nw\nl\n\n=\n\n{(\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\n \nProof sketch: \n\nWith Proposition 1, \nsubset, therefore, \nr\n\nE T x\np\n\n( ))\n\n(\n\nE\np\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nk\n\n)\n\n(\ni\n1(\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n)\n\n))\n\n is invariant within each equivalent index \n\n{(\n\nr\n(cid:213)\nk\n1\n=\n\n=\n\n(cid:229)\nL\ni\n,\n,\nd\n\n(1)\n\n,\n\nr\n( )\n\nL\n,\n,\n\ni\nd\n\n(1)\ni\n1\nL\nr\n(\ni\n1\n\n)\n\nk\n( )\nw i\n(\n1\n\n,\n\nL\n\n,\n\ni\nd\n\nk\n( )\n\n))\n\nE\np\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\nk\n\n)\n\n(\n\n(\ni\n1\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\ni\nd\n\n(\n\nk\n\n)\n\n)\n\n))}\n\n=\n\n \n\n\f=\n\n(cid:229)\nU\n[\n\u02db\n\nl\n\n]\n\n{(\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\n(1)\n\n(cid:229)\nL\n),\n,(\n\nr\n( )\nj\n1\n\nL\n,\n,\n\nj\nd\n\nr\n( )\n\n)}\nl\n\u02db\n\n{(\n\nr\n(cid:213)\nk\n1\n=\n\nk\n( )\nw j\n(\n1\n\n,\n\nL\n\n,\n\nj\nd\n\nk\n( )\n\n))\n\nE\np\n\n(\n\nr\n(cid:213)\nk\n1\n=\n\nh x\n(\np\n\n(\n\n(\nj\n1\n\nk\n\n)\n\n)\n\n,\n\nL\n\n,\n\nx\np\n\n(\n\nj\nd\n\n(\n\nk\n\n)\n\n)\n\n))}\n\n=\n\n \n\n=\n\n(cid:229)\nU\n[\n\u02db\n\nl\n\n]\n\n{(\n\n(cid:229)\nL\n),\n,(\n\n(1)\n\n(1)\nj\n1\n\nL\n,\n,\n\nj\nd\n\nr\n( )\nj\n1\n\nL\n,\n,\n\nj\nd\n\nr\n{\n(cid:213)\nk\n1\n=\n\nr\n( )\n\n)}\nl\n\u02db\n\nk\n( )\nw j\n(\n1\n\n,\n\nL\n\n,\n\nj\nd\n\nk\n( )\n\n)\n\nh\nl\n\n}\n\n=\n\nw h\nl l\n\n.\n\n \n\n(cid:229)\nU\n[\n\u02db\n\nl\n\n]\n\nRe c ur sive c alc ulation \n\n \nSince both data partition sum wl and the index partition sum hl can be calculated by \nsummation over all distinct indices within each permutation equivalent index subset, no any \nreal permutation is needed for computing the moments. \n \n2. 3 \nDirect calculation of the data partition sum and index partition sum leads to traversing \nthroughout the whole index set. So the computational cost is O(ndr). In the following, we \nshall discuss how to reduce the cost by a recursive calculation algorithm. \nDefinition 3. Let \n. \nl and n are two different permutation equivalent subsets of the whole index set U. We say \nlp , if l can be converted to n \nthat the partition order of n is less than that of l , i.e., n\nby merging two or more index elements. For instance, \n[(1,2),(3, 4)] ,\n \nn\nsince by merging 1 and 2, l is converted to [{(1, 1), (3, 4)}] = [{(1, 1), (2, 3)}]. [{(1, 1), (3, \n4)}] and [{(1, 1), (2, 3)}] are the same permutation equivalent index subsets because we can \napply the permutation \u00b9: 1(cid:8594)1, 2(cid:8594)4, 3(cid:8594)3, 4(cid:8594)2 to [{(1, 1), (3, 4)}]. Note that the merging \noperation may not be unique, for example, n can also be converted to l by merging 3 and \n4. To clarify the concept of partition order, we list the order of all partitions when d=2 and \nr=2 in figure 1. The partition order of a permutation equivalent subset n is said to be lower \nthan that of another permutation equivalent subset l if there is a directed path from l to n . \n\n[(1,1),(2,3)]\n\n=p\n\n and \n\nr\n( )\ni\n1\n\n(1)\ni\n1\n\nn =\n\nl =\n\n[{(\n\n)}]\n\nr\n( )\nj\n1\n\n(1)\nj\n1\n\nL\n\nL\n\nL\n\ni\nd\n\n,(\n\ni\nd\n\n)]\n\n[(\n\nr\n( )\n\n),\n\n(1)\n\nl\n\n=\n\nL\n\nL\n\nL\n\n)\n,\n\n(\n\nr\n( )\n\n(1)\n\n,\n\n,\n\n,\n\n,\n\nj\n\nj\n\nd\n\nd\n\n,\n\n,\n\n,\n\n,\n\n,\n\n \n \n \n \n \n \n \n (\n{\n)\n[ 1, 1 , 1,1\n\n}\n)\n]\n\n \n \n\n \n[\n\n(\n\n[\n\n{\n(\n1, 1 , 2, 2\n\n(\n\n)\n\n{\n(\n1, 1 , 1, 2\n\n(\n\n)\n\n[\n\n{\n(\n1, 2 , 1, 2\n\n(\n\n)\n\n}\n)\n]\n\n}\n)\n]\n\n}\n)\n]\n\n{\n(\n[ 1, 1 , 2 , 3\n\n(\n\n)\n\n{\n(\n[ 1, 2 , 1, 3\n\n(\n\n)\n\n}\n)\n]\n\n}\n)\n]\n\n[\n\n{\n(\n1, 2 , 3, 4\n\n(\n\n)\n\n}\n)\n]\n\n \n\nFigure 1: Order of all permutation equivalent subsets when d = 2 and r = 2. \n\nThe difficulty for computing data partition sum and index partition sum comes from two \nconstraints; equal constraint and unequal constraint. For example, in the permutation \nequivalent subset [{(1, 1), (2, 2)}], the equal constraint is that the first and the second index \nnumber are equal and the third and fourth index are also equal. On the other hand, the \nunequal constraint requires that the first two index numbers are different from those of the \nlast two. Due to the difficulties mentioned, we solve this problem by first relaxing the \nunequal constraint and then applying the principle of inclusion and exclusion. Thus, the \ncalculation of a partition sum can be separated into two parts: the relaxed partition sum \nwithout unequal constraint, and lower order partition sums. For example, \nw\nl\n\nw i i w j\n( , )\n\n[(1,1), (2,2)]*\n=\n\n[(1,1), (2,2)]\n=\n\n[(1,1), (1,1)]\n=\n\n( , ))\n\nw\nl\n\nw\nl\n\n-\n\n=\n\n=\n\n=\n\n(\n\nj\n\n \n\n(cid:229)\ni\nj\n\u201e\n\n(\n\nw i i w j\n\n( , )\n\n( , ))\n\nj\n\n(\n\nw i i w j\n( , )\n\n( , ))\n\nj\n\n=\n\nw i i\n( , ))\n\n2\n\n-\n\n(cid:229)\ni\n\nw i i\n( , )\n\n2\n\n, as the relaxed index partition \n\n(\n\n(cid:229)\ni\n(cid:229)\ni\n\n=\n\n(cid:229)\ni j\n,\nsum \n\n-\n\n=\n\n(cid:229)\ni\nj\n=\n(\n(cid:229)\ni j\n,\n\nw\nl=\n\n[(1,1), (2,2)]*\n\nw i i w j\n\n( , )\n\n( , ))\n\nj\n\n=\n\n(\n\nw i i\n( , ))\n\n2\n\n. \n\nProposition 3. The index partition sum wl can be calculated by subtracting all lower order \ni.e., \npartition \n\nindex partition \n\ncorresponding \n\nrelaxed \n\nsums \n\nfrom \n\nsum \n\nthe \n\n*wl , \n\n\fw\nl\n\n=\n\n*\n\nw\nl\n\n-\n\nw\nn\n\n(cid:229)\np\nn l\n\n#( ) #(\nl\n#( )\nn\n\nl\n\n\ufb01\n\nn\n\n)\n\n, where #( )l \n\nis \n\nthe number of distinct order-sensitive \n\n(cid:229)\ni j k l\n,\n,\n,\nj w i\nj\n( , )\n( , )\n\nl =\n\n[(1, 1),(2, 3)]\n\npermutation equivalent subsets. For example, there are 2!2!2!/2!/2!=2 order-sensitive index \npartition types for \n. They are [(1, 1), (2, 3)] and [(2, 3), (1, 1)]. Note that [(1, 1), \n(2, 3)] and [(1, 1), (3, 2)] are the same type. #(\nn\ufb01 is the number of different ways of \nl\nmerging a higher order permutation equivalent subset l to a low order permutation equivalent \nsubset n . \nThe calculation of the data index partition sum is similar. Therefore, the computational cost \nmainly depends on the calculation of relaxed partition sum and the lowest order partition sum. \nSince the computational cost of the lowest order term is O(n), we mainly discuss the calculation \nof relaxed partition sums in the following paragraphs. \nTo reduce the computational cost, we develop a greedy graph search algorithm. For \ndemonstration, we use the following example. \n\n)\n\n*\nw\nl\n\n=\n\n[(1,1),(1,2),(1,2),(1,3),(2,3),(1,4)]\n\n=\n\n#( )\nl\n\n(cid:229)\ni j k l\n,\n,\n,\n\nw i i w i\n\n( , )\n\n( , )\n\nj w i\n\n( , )\n\nj w i k w j k w i l\n( , )\n\n( , ) ( , )\n\n. The permutation \n\nequivalent index subset is represented by an undirected graph. Every node denotes an index \nnumber. We connect two different nodes if these two corresponding index numbers are in the \nsame index element, i.e., in the same small bracket. In figure 2, the number 2 on the edge ij \ndenotes that the pair (i, j) is used twice. The self-connected node is also allowed. We assume there \nis no isolated subgraph in the following discussion. If any isolated subgraph exists, we only need \nto repeat the same procedure for all isolated subgraphs. \n\n*\nwl=\nNow we shall discuss the steps to compute the \nthe weights of edges and self-connections, i.e., \n\n[(1,1),(1,2),(1,2),(1,3),(2,3),(1,4)]\nj w i\n\nw i i w i\n\n( , )\n\n( , )\n\n. Firstly, we get rid of \nj w i k w j k w i l\n \n( , )\n\n( , )\n\n( , )\n\n( , )\n\n=\n\n(cid:229)\ni j k l\n,\n,\n,\n\na i\n( , )\n\nj w i k w j k w i l\n( , )\n\n( , )\n\n( , )\n\n, as \n\na i\nj\n( , )\n\n=\n\nw i i w i\n\n( , )\n\n. Then we search a node with the \n\n( , )\n\n( , )\n\na i\n( , )\n\nj w i k w j k w i l\n( , )\n\nlowest degree and do summation for all indices connected with respect to the chosen node, i.e., \n. The chosen \n\n(cid:229)\ni j k l\n,\n,\n,\nnodes and connected edges are deleted after the above computation. We repeat the same step until a \nsymmetric graph occurs. Since every node in the symmetric graph has the same degree, we randomly choose \nany node; \n, as \n\nfor example, k \n\nj w i k w j k\n( , )\n\nj w i k w j k\n( , )\n\nb i\nj\n( , ) ( , )\n\nsummation, \n\nj w i l\n( , )\n\nb i\nj\n( , )\n\na i\n( , )\n\nb i\n( , )\n\n= (cid:229)\nl\n\nb i\n( , )\n\n(cid:229)\ni j k\n,\n,\n\n( , )\n\n( , )\n\nj c i\n\n, as \n\nthen\n\nfor \n\n=\n\n=\n\n(cid:229)\ni j k\n,\n,\n\n(cid:229)\ni j\n,\n\nc i j\n( , )\n\n= (cid:229)\nk\n\nw i k w j k\n( , ) ( , )\n \n\n. Finally, we clear the whole graph and obtain the relaxed index partition sum. \n\ni \n\n2 \n\nl \n\ni \n\nl \n\ni \n\ni \n\nk \n\nj \nFigure 2: Greedy Search Algorithm for computing \n\nk \n\nk \n\nj \n\nj \n\nj \n\n \n\nw i\n\nThe most computational-expensive case is the complete graph in which every pair of nodes is \n*cl is determined by the subtotal that has the largest \nconnected. Hence, the computational cost of \nsymmetric subgraph in its graph representation. For example, the most expensive relaxed index \npartition sum for d=2 and r=3 is \nj w i k w j k , which is a triangle in the graph \nrepresentation. \nProposition 4 For d>=2, let \n, where r is the order of \nmoment and m is an integer. For a d-th order test statistic, the computational cost of the partition \nsum for the r-th moment is bounded by O(nm). When d = 1 the computational complexity of the \npartition sum is O(n). \nSpecifically, the computational cost of the 3rd and 4th moments for a second order test statistic is \nO(n3). The computational cost for the 1st and 2nd moments is O(n2). \n\n/ 2 (\n<\n\n( , )\n\n1) / 2\n\n( , )\n\n( , )\n\nm m\n\nr d\n(\n\n/ 2\n\n1)\n\nm\n\nm\n\n1)\n\n-\n\n+\n\n\u00a3\n\n-\n\nd\n\n(\n\n\fFitting \n\nExperimental resu lts \n\n \n2. 4 \nThe Pearson distribution series (Pearson I ~ VII) is a family of probability distributions that are \nmore general than the normal distribution [12]. It covers all distributions in the ((cid:946)1, (cid:946)2) plane \nincluding normal, beta, gamma, log-normal, and etc., where distribution shape parameters (cid:946)1, (cid:946)2 \nare the square of standardized skewness and kurtosis measurements, respectively. Given the first \nfour moments, the Pearson distribution series can be utilized to approximate the permutation \ndistribution of the test statistic without conducting real permutation. \n \n3 \nTo evaluate the accuracy and efficiency of our moments-based permutation tests, we generate \nsimulated data and conduct permutation tests for both linear and quadratic test statistics. We \nconsider six simulated cases in the first experiment for testing the difference between two groups, \nA and B. We use mean difference statistics here. For group A, n1 observations are generated \nindependently from Normal(0,1) in Cases 1-2, from Gamma(3,3) in Cases 3-4, and from Beta(0.8, \n0.8) in Cases 5-6. For group B, n2 independent observations are generated from Normal(1, 0.5) in \nCases 1-2, from Gamma (3,2) in Cases 3-4, and from Beta(0.1, 0.1) in Cases 5-6. The design is \nbalanced in Cases 1, 3, and 5 with n1 = n2 = 10, and unbalanced in Cases 2, 4, and 6 with n1 = 6, n2 \n= 18. \nTable 1 illustrates the high accuracy of our moments-based permutation technique. Furthermore, \ncomparing with exact permutation or random 10,000 permutations, the moments-based \npermutation tests reduce more than 99.8% of the computation cost, and this efficiency gain \nincreases with sample size. Table 1 shows the computation time and p-values of three permutation \nmethods from one simulation. In order to demonstrate the robustness of our method, we repeated \nthe simulation for 10 times in each case, and calculated the mean and variance of the absolute \nbiases of p-values of both moments-based permutation and random permutation, treating the p-\nvalues of exact permutation as gold standard. In most cases, our moments-based permutation is \nless biased and more stable than random permutation (Table 2), which demonstrates the \nrobustness and accuracy of our method. \nTable 1: Comparison of computation costs and p-values of three permutation methods: Moments-\nbased permutation (MP), random permutation (RP), and exact permutation (EP). The t_MP, t_RP, \nand t_EP denote the computation time (in seconds), and p_MP, p_RP, and p_EP are the p-values \nof the three permutation methods. \n\n \n\nC a s e 1 C a s e 2 C a s e 3 C a s e 4 C a s e 5 C a s e 6 \nt _ M P 6 . 7 9e -4 5.37e-4 5.54e-4 5.16e-4 5.79e-4 6.53e-4 \nt _ R P 5 . 0 7e -1 5.15e-1 5.06e-1 1.30e-1 2.78e-1 5.99e-1 \nt _ E P 3. 9 9e -0 1.21e-0 3.71e-0 1.21e-0 3.71e-0 1.22e-0 \np _M P 1 . 1 9e -1 2.45e-2 1.34e-1 1.19e-1 3.58e-2 5.07e-5 \np _ R P 1 . 2 1e -1 2.56e-2 1.36e-1 1.20e-1 3.53e-2 5.09e-2 \np _ E P 1 . 1 9e -1 2.39e-2 1.34e-1 1.15e-1 3.55e-2 5.11e-2 \n\nWe consider three simulated cases in the second experiment for testing the difference among three \ngroups D, E, and F. We use modified F statistics [7] here. For group D, n1 observations are \ngenerated independently from Normal(0,1) in Case 7, from Gamma(3,2) in Case 8, and from \nBeta(0.8, 0.8) in Case 9. For group E, n2 independent observations are generated from Normal(0,1) \nin Case 7, from Gamma(3,2) in Case 8, and from Beta(0.8, 0.8) in Case 9. For group F, n3 \nindependent observations are generated from Normal(0.1,1) in Case 7, from Gamma(3,1) in Case \n8, and from Beta(0.1, 0.1) in Case 9.The design is unbalanced with n1 = 6, n2 = 8, and n3 =12. \nSince the exact permutation is too expensive here, we consider the p-values of 200,000 random \npermutations (EP) as gold standard. Our methods are more than one hundred times faster than \n2,000 random permutation (RP) and also more accurate and robust (Table 3). \nWe applied the method to the MRI hippocampi belonging to 2 groups, with 21 subjects in \ngroup A and 15 in group B. The surface shapes of different objects are represented by the \nsame number of location vectors (with each location vector consisting of the spatial x, y, and \nz coordinates of the corresponding vertex) for our subsequent statistical shape analysis. \nThere is no shape difference at a location if the corresponding location vector has an equal \n\n\fmean between two groups. Evaluation of the hypothesis test using our moments-based \npermutation with the modified Hotelling(cid:8217)s T2 test statistics [8] is shown in Fig. 3(a) and 3(b). \nIt can be seen that the Pearson distribution approximation leads to ignorable discrepancy \nwith the raw p-value map from real permutation. The false positive error control results are \nshown in Fig. 3(c). \nTable 2: Robustness and accuracy comparison of moments-based permutation and random \npermutation across 10 simulations, considering the p-values of exact permutation as gold standard. \nMean_ABias_MP and VAR_MP are the mean of the absolute biases and the variance of the biases \nof p-values of moments-based permutation; Mean_ABias_RP and VAR_RP are the mean of the \nabsolute biases and the variance of the biases of p-values of random permutation. Mean difference \nstatistic is used. \n\n \n\nCa s e 1 Ca s e 2 Ca s e 3 Ca s e 4 Ca s e 5 Ca s e 6 \nMean_ABias_MP 1.62e-4 3.04e-4 6.36e-4 8.41e-4 1.30e-3 3.50e-3 \nMean_ABias_RP 7.54e-4 3.39e-4 9.59e-4 8.39e-4 1.30e-3 2.00e-3 \nVAR_MP 6.42e-8 2.74e-7 1.54e-6 1.90e-6 3.76e-6 2.77e-5 \nVAR_RP 7.85e-7 1.86e-7 1.69e-6 3.03e-6 4.24e-5 1.88e-5 \n\n \n\nTable 3: Computation cost, robustness, and accuracy comparison of moments-based permutation \nand random permutation across 10 simulations. Modified F statistic is used. \n\n C a s e 7 C a s e 8 C a s e 9 \n\nC a s e 7 C a s e 8 C a s e 9 \nt_MP 1.03e-3 1.42e-3 1.64e-3 Mean_ABias_MP 9.23e-4 2.37e-4 2.11e-3 \nt_RP 1.51e-1 1.48e-1 1.38e-1 Mean_ABias_RP 3.94e-3 2.79e-3 3.42e-3 \nt_EP 1.76e+1 1.86e+1 2.37e+1 VAR_MP 1.10e-6 8.74e-8 1.23e-5 \nVAR_RP 2.27e-5 1.48e-5 1.85e-5 \n \n\n \n\n \n\n \n\n \n\np-value>0.05 \n =0.05 \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n =0.0 \n\n \n\n (a) (b) (c) (d) \n\n \n \n\n \n\n (e) \n\n \n\n \n\n0.05\n\nFigure 3. (a) and (b): Comparison of techniques in raw p-value measurement at \n(without correction), through real permutation ((a); number of permutations = \na =\n10,000) and using the present moments-based permutation (b). (c) p-map after BH(cid:8217)s FDR \ncorrection of (b). (e) Facial differences between Asian male and white male. Locations in red \non the 3D surface denote significant face shape differences (significance level (cid:945) = 0.01 with \nfalse discovery rate control). \nWe also applied our method to the 3D face comparison between Asian males and white \nmales. We choose 10 Asian males and 10 white males out of the USF face database to \ncalculate their differences with the modified Hotelling(cid:8217)s T2 test statistics. Each face surface \nis represented by 4,000 voxels. All surfaces are well aligned. Results from our algorithm in \nFig. 3(e) show that significant differences occur at eye edge, nose, lip corners, and cheeks. \nThey are consistent with anthropology findings and suggest the discriminant surface regions \nfor ethnic group recognition. \n \n4 \nWe present and develop novel moments-based permutation tests where the permutation \ndistributions are accurately approximated through Pearson distributions for considerably reduced \ncomputation cost. Comparing with regular random permutation, \nthe proposed method \nconsiderably reduces computation cost without loss of accuracy. General and analytical \nformulations for the moments of permutation distribution are derived for weighted v-test statistics. \nThe proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson \ndistribution approximation to achieve both accuracy/flexibility and efficiency. \n\nConclusion \n\n\fRe fe re nce s \n \n[1] \n\n[2] \n\n[3] \n\n[4] \n\n[5] \n\n[6] \n\n[7] \n\n[8] \n\n[9] \n\nNichols, T. E., and A. P. Holmes (2001), Nonparametric permutation tests for \nfunctional neuroimaging: A primer with examples, Human Brain Mapping, 15, 1-25. \nZhou, C., D. C. Park, M. Styner, and Y. M. Wang (2007), ROI constrained statistical \nsurface morphometry, IEEE International Symposium on Biomedical Imaging, \nWashington, D. C., 1212-1215. \nZhou, C., and Y. M. Wang (2008), Hybrid permutation test with application to \nsurface shape analysis, Statistica Sinica, 18, 1553-1568. \nPantazis, D., R. M. Leahy, T. E. Nichols, and M. Styner (2004), Statistical surface-\nInternational \nbased morphometry using a non-parametric approach, \nSymposium on Biomedical Imaging, 2, 1283-1286. \nZhou, C., Y. Hu, Y. Fu., H. Wang, Y. M. Wang, and T. S. Huang (2008), 3D face \nanalysis for distinct features using statistical randomization, IEEE International \nConference on Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, 981-\n984. \nHubert, L. (1987), Assignment Methods in Combinatorial Data Analysis, Marcel \nDekker, New York. \nMielke, P. W., and K. J. Berry (2001), Permutation Methods: A Distance Function \nApproach, Springer, New York. \nGood, P. (2005), Permutation, Parametric and Bootstrap Tests of Hypotheses, 3rd \ned., Springer, New York. \nSerfling, R. J. (1980), Approximation Theorems of Mathematical Statistics, Wiley, \nNew York. \n\nIEEE \n\n[10] Edgington, E., and P. Onghena (2007), Randomization Tests, 4th ed., Chapman & \n\nHall, London. \n\n[11] Nicholson, W. K. (2006), Introduction to Abstract Algebra, 3rd ed., Wiley, New \n\nYork. \n\n[12] Hahn, G. J., and S. S. Shapiro (1967), Statistical Models in Engineering, John Wiley \n\nand Sons, Chichester, England. \n\n\f", "award": [], "sourceid": 699, "authors": [{"given_name": "Chunxiao", "family_name": "Zhou", "institution": null}, {"given_name": "Huixia", "family_name": "Wang", "institution": null}, {"given_name": "Yongmei", "family_name": "Wang", "institution": null}]}