{"title": "Intrinsic Dimension Estimation Using Packing Numbers", "book": "Advances in Neural Information Processing Systems", "page_first": 697, "page_last": 704, "abstract": null, "full_text": "Intrinsic Dimension Estimation Using Packing\n\nNumbers\n\nDepartment of Computer Science and Operations Research\n\nBal\u00b4azs K\u00b4egl\n\nCP 6128 succ. Centre-Ville, Montr\u00b4eal, Canada H3C 3J7\n\nUniversity of Montreal\n\nkegl@iro.umontreal.ca\n\nAbstract\n\nWe propose a new algorithm to estimate the intrinsic dimension of data\nsets. The method is based on geometric properties of the data and re-\nquires neither parametric assumptions on the data generating model nor\ninput parameters to set. The method is compared to a similar, widely-\nused algorithm from the same family of geometric techniques. Experi-\nments show that our method is more robust in terms of the data generating\ndistribution and more reliable in the presence of noise.\n\n1\n\nIntroduction\n\nHigh-dimensional data sets have several unfortunate properties that make them hard to an-\nalyze. The phenomenon that the computational and statistical ef\ufb01ciency of statistical tech-\nniques degrade rapidly with the dimension is often referred to as the \u201ccurse of dimension-\nality\u201d. One particular characteristic of high-dimensional spaces is that as the volumes of\nconstant diameter neighborhoods become large, exponentially many points are needed for\nreliable density estimation. Another important problem is that as the data dimension grows,\nsophisticated data structures constructed to speed up nearest neighbor searches rapidly be-\ncome inef\ufb01cient.\n\nFortunately, most meaningful, real life data do not uniformly \ufb01ll the spaces in which\nthey are represented. Rather, the data distributions are observed to concentrate to non-\nlinear manifolds of low intrinsic dimension. Several methods have been developed to \ufb01nd\nlow-dimensional representations of high-dimensional data, including Principal Component\nAnalysis (PCA), Self-Organizing Maps (SOM) [1], Multidimensional Scaling (MDS) [2],\nand, more recently, Local Linear Embedding (LLE) [3] and the ISOMAP algorithm [4].\nAlthough most of these algorithms require that the intrinsic dimension of the manifold be\nexplicitly set, there has been little effort devoted to design and analyze techniques that\nestimate the intrinsic dimension of data in this context.\n\nThere are two principal areas where a good estimate of the intrinsic dimension can be\nuseful. First, as mentioned before, the estimate can be used to set input parameters of\ndimension reduction algorithms. Certain methods (e.g., LLE and the ISOMAP algorithm)\nalso require a scale parameter that determines the size of the local neighborhoods used in\nthe algorithms. In this case, it is useful if the dimension estimate is provided as a function\nof the scale (see Figure 1 for an intuitive example where the intrinsic dimension of the data\n\n\fdepends on the resolution). Nearest neighbor searching algorithms can also pro\ufb01t from\na good dimension estimate. The complexity of search data structures (e.g., kd-trees and\nR-trees) increase exponentially with the dimension, and these methods become inef\ufb01cient\nif the dimension is more than about 20. Nevertheless, it was shown by Ch\u00b4avez et al. [5]\nthat the complexity increases with the intrinsic dimension of the data rather then with the\ndimension of the embedding space.\n\n(c) D \u2019 1\n\n(b) D \u2019 2\n\nFigure 1: Intrinsic dimension D at dif-\nferent resolutions.\n(a) At very small\nscale the data looks zero-dimensional.\n(b) If the scale is comparable to the\nthe intrinsic dimension\nnoise level,\nseems larger than expected.\n(c) The\n\u201cright\u201d scale in terms of noise and cur-\nvature. (d) At very large scale the global\ndimension dominates.\n\nPSfrag replacements\n\n(d) D \u2019 2\n\n(a) D \u2019 0\n\nIn this paper we present a novel method for intrinsic dimension estimation. The estimate is\nbased on geometric properties of the data, and requires no parameters to set. Experimental\nresults on both arti\ufb01cial and real data show that the algorithm is able to capture the scale\ndependence of the intrinsic dimension. The main advantage of the method over existing\ntechniques is its robustness in terms of the generating distribution. The paper is organized\nas follows. In Section 2 we introduce the \ufb01eld of intrinsic dimension estimation, and give\na short overview of existing approaches. The proposed algorithm is described in Section 3.\nExperimental results are given in Section 4.\n\n2\n\nIntrinsic dimension estimation\n\nInformally, the intrinsic dimension of a random vector X is usually de\ufb01ned as the number of\n\u201cindependent\u201d parameters needed to represent X. Although in practice this informal notion\nseems to have a well-de\ufb01ned meaning, formally it is ambiguous due to the existence of\nspace-\ufb01lling curves. So, instead of this informal notion, we turn to the classical concept of\ntopological dimension, and de\ufb01ne the intrinsic dimension of X as the topological dimension\nof the support of the distribution of X . For the de\ufb01nition, we need to introduce some\nnotions. Given a topological space X , the covering of a subset S is a collection C of open\nsubsets in X whose union contains S. A re\ufb01nement of a covering C of S is another covering\nC0 such that each set in C 0 is contained in some set in C . The following de\ufb01nition is based\non the observation that a d-dimensional set can be covered by open balls such that each\npoint belongs to maximum (d + 1) open balls.\n\nDe\ufb01nition 1 A subset S of a topological space X has topological dimension Dtop (also\nknown as Lebesgue covering dimension) if every covering C of S has a re\ufb01nement C 0 in\nwhich every point of S belongs to at most (Dtop +1) sets in C 0, and Dtop is the smallest such\ninteger.\n\nThe main technical dif\ufb01culty with the topological dimension is that it is computationally\ndif\ufb01cult to estimate on a \ufb01nite sample. Hence, practical methods use various other de\ufb01ni-\ntions of the intrinsic dimension. It is common to categorize intrinsic dimension estimating\nmethods into two classes, projection techniques and geometric approaches.\n\nProjection techniques explicitly construct a mapping, and usually measure the dimen-\nsion by using some variants of principal component analysis. Indeed, given a set Sn =\n\n\fthe global (intrinsic) dimension of the data rather than the local dimension of the manifold.\n\nnumber of eigenvalues of C that are larger than a given threshold. The \ufb01rst disadvantage of\nthe technique is the requirement of a threshold parameter that determines which eigenval-\n\nfX1; : : : ; Xng; Xi 2 X ; i = 1; : : : ; n of data points drawn independently from the distribution\nof X, probably the most obvious way to estimate the intrinsic dimension is by looking at\nthe eigenstructure of the covariance matrix C of Sn. In this approach, bDpca is de\ufb01ned as the\nues are to discard. In addition, if the manifold is highly nonlinear, bDpca will characterize\nbDpca will always overestimate Dtop; the difference depends on the level of nonlinearity of\nthe manifold. Finally, bDpca can only be used if the covariance matrix of Sn can be calcu-\nlated (e.g., when X = Rd). Although in Section 4 we will only consider Euclidean data\nsets, there are certain applications where only a distance metric d : X (cid:2)X 7! R+ [f0g and\nthe matrix of pairwise distances D = [di j] = d(xi;x j) are given.\nBruske and Sommer [6] present an approach to circumvent the second problem. Instead\nof doing PCA on the original data, they \ufb01rst cluster the data, then construct an optimally\ntopology preserving map (OPTM) on the cluster centers, and \ufb01nally, carry out PCA locally\non the OPTM nodes. The advantages of the method are that it works well on non-linear\ndata, and that it can produce dimension estimates at different resolutions. At the same time,\nthe threshold parameter must still be set as in PCA, moreover, other parameters, such as\nthe number of OPTM nodes, must also be decided by the user. The technique is similar\nin spirit to the way the dimension parameter of LLE is set in [3]. The algorithm runs in\nO(n2d) time (where n is the number of points and d is the embedding dimension) which\nis slightly worse than the O(ndbDpca) complexity of the fast PCA algorithm of Roweis [7]\nwhen computing bDpca.\n\nAnother general scheme in the family of projection techniques is to turn the dimensionality\nreduction algorithm from an embedding technique into a probabilistic, generative model\n[8], and optimize the dimension as any other parameter by using cross-validation in a max-\nimum likelihood setting. The main disadvantage of this approach is that the dimension\nestimate depends on the generative model and the particular algorithm, so if the model\ndoes not \ufb01t the data or if the algorithm does not work well on the particular problem, the\nestimate can be invalid.\n\nThe second basic approach to intrinsic dimension estimation is based on geometric proper-\nties of the data rather then projection techniques. Methods from this family usually require\nneither any explicit assumption on the underlying data model, nor input parameters to set.\nMost of the geometric methods use the correlation dimension from the family of fractal\ndimensions due to the computational simplicity of its estimation. The formal de\ufb01nition is\nbased on the observation that in a D-dimensional set the number of pairs of points closer to\neach other than r is proportional to rD.\nDe\ufb01nition 2 Given a \ufb01nite set Sn = fx1; : : : ; xng of a metric space X , let\n\nCn(r) =\n\n2\n\nn(n(cid:0) 1)\n\nn(cid:229)\n\nn(cid:229)\n\ni=1\n\nj=i+1\n\nIfkxi(cid:0)x jk<rg\n\nwhere IA is the indicator function of the event A. For a countable set S = fx1; x2; : : :g (cid:26) X ,\nthe correlation integral is de\ufb01ned as C(r) = limn!\u00a5 Cn(r). If the limit exists, the correlation\ndimension of S is de\ufb01ned as\n\nDcorr = lim\nr!0\n\nlogC(r)\n\nlogr\n\n:\n\nFor a \ufb01nite sample, the zero limit cannot be achieved so the estimation procedure usually\nlogC(r)\nconsists of plotting logC(r) versus logr and measuring the slope\nlogr of the linear part\n\n\u00b6\n\u00b6\n\fof the curve [9, 10, 11]. To formalize this intuitive procedure, we present the following\nde\ufb01nition.\n\nDe\ufb01nition 3 The scale-dependent correlation dimension of a \ufb01nite set Sn = fx1; : : : ; xng\nis\n\nbDcorr(r1; r2) =\n\nlogr2 (cid:0) logr1\n\nlogC(r2)(cid:0) logC(r1)\n\n:\n\nIt is known that Dcorr (cid:20) Dtop and that Dcorr approximates well Dtop if the data distribution\non the manifold is nearly uniform. However, using a non-uniform distribution on the same\nmanifold, the correlation dimension can severely underestimate the topological dimension.\nTo overcome this problem, we turn to the capacity dimension, which is another member of\nthe fractal dimension family. For the formal de\ufb01nition, we need to introduce some more\nconcepts. Given a metric space X with distance metric d((cid:1);(cid:1)), the r-covering number N(r)\nof a set S (cid:26) X is the minimum number of open balls B(x0; r) = fx 2 Xjd(x0; x) < rg whose\nunion is a covering of S. The following de\ufb01nition is based on the observation that the\ncovering number N(r) of a D-dimensional set is proportional to r(cid:0)D.\n\nDe\ufb01nition 4 The capacity dimension of a subset S of a metric space X is\n\nDcap = (cid:0) lim\nr!0\n\nlogN(r)\n\nlogr\n\n:\n\nThe principal advantage of Dcap over Dcorr is that Dcap does not depend on the data distri-\nbution on the manifold. Moreover, if both Dcap and Dtop exist (which is certainly the case\nin machine learning applications), it is known that the two dimensions agree. In spite of\nthat, Dcap is usually discarded in practical approaches due to the high computational cost\nof its estimation. The main contribution of this paper is an ef\ufb01cient intrinsic dimension\nestimating method that is based on the capacity dimension. Experiments on both synthetic\nand real data con\ufb01rm that our method is much more robust in terms of the data distribution\nthan methods based on the correlation dimension.\n\n3 Algorithm\n\nFinding the covering number even of a \ufb01nite set of data points is computationally dif\ufb01cult.\nTo tackle this problem, \ufb01rst we rede\ufb01ne Dcap by using packing numbers rather than cover-\ning numbers. Given a metric space X with distance metric d((cid:1);(cid:1)), a set V (cid:26) X is said to\nbe r-separated if d(x; y) (cid:21) r for all distinct x; y 2 V . The r-packing number M(r) of a set\nS (cid:26) X is de\ufb01ned as the maximum cardinality of an r-separated subset of S. The follow-\ning proposition follows from the basic inequality between packing and covering numbers\nN(r) (cid:20) M(r) (cid:20) N(r=2).\n\nProposition 1 Dcap = (cid:0) lim\nr!0\n\nlogM(r)\n\nlogr\n\n.\n\nFor a \ufb01nite sample, the zero limit cannot be achieved so, similarly to the correlation dimen-\nsion, we need to rede\ufb01ne the capacity dimension in a scale-dependent manner.\n\nDe\ufb01nition 5 The scale-dependent capacity dimension of a \ufb01nite set Sn = fx1; : : : ; xng is\n\nbDcap(r1; r2) = (cid:0)\n\nlogM(r2)(cid:0) logM(r1)\n\nlogr2 (cid:0) logr1\n\n:\n\n\fFinding M(r) for a data set Sn = fx1; : : : ; xng is equivalent to \ufb01nding the cardinality of a\nmaximum independent vertex set MI(Gr) of the graph Gr(V; E) with vertex set V = Sn\nand edge set E = f(xi; x j)jd(xi; x j) < rg. This problem is known to be NP-hard. There are\nresults that show that for a general graph, even the approximation of MI(G) within a factor\nof n1(cid:0)e , for any e > 0, is NP-hard [12]. On the positive side, it was shown that for such\ngeometric graphs as Gr, MI(G) can be approximated arbitrarily well by polynomial time\nalgorithms [13]. However, approximating algorithms of this kind scale exponentially with\nthe data dimension both in terms of the quality of the approximation and the running time1\nso they are of little practical use for d > 2. Hence, instead of using one of these algorithms,\nwe apply the following greedy approximation technique. Given a data set Sn, we start with\nan empty set of centers C , and in an iteration over Sn we add to C data points that are at a\ndistance of at least r from all the centers in C (lines 4 to 10 in Figure 2). The estimate bM(r)\nis the cardinality of C after every point in Sn has been visited.\nThe procedure is designed to produce an r-packing but certainly underestimates the packing\nnumber of the manifold, \ufb01rst, because we are using a \ufb01nite sample, and second, because in\n\nsimple greedy procedure described above seems to work well in practice.\n\ngeneral bM(r) < M(r). Nevertheless, we can still obtain a good estimate for bDcap by using\nbM(r) in the place of M(r) in De\ufb01nition 5. To see why, observe that, for a good estimate for\nbDcap, it is enough if we can estimate M(r) with a constant multiplicative bias independent\nof r. Although we have no formal proof that the bias of bM(r) does not change with r, the\nEven though the bias of bM(r) does not affect the estimation of bDcap as long as it does\nnot change with r, the variance of bM(r) can distort the dimension estimate. The main\nsource of the variance is the dependence of bM(r) on the the order of the data points in\non random permutations of the data, and compute the estimate bDpack by using the average\n\nof the logarithms of the packing numbers. The number of repetitions depends on r1, r2,\nand a preset parameter that determines the accuracy of the \ufb01nal estimate (set to 99% in all\nexperiments) . The complete algorithm is given formally in Figure 2.\n\nwhich they are visited. To eliminate this variance, we repeat the procedure several times\n\nThe running time of the algorithm is O(cid:0)nM(r)d(cid:1) where r = min(r1; r2). At smaller scales,\nwhere M(r) is comparable with n, it is O(cid:0)n2d(cid:1). On the other hand, since the variance of the\n\nestimate also tends to be smaller at smaller scales, the algorithm iterates less for the same\naccuracy.\n\n4 Experiments\n\nThe two main objectives of the four experiments described here is to demonstrate the ability\nof the method to capture the scale-dependent behavior of the intrinsic dimension, and to\nunderline its robustness in terms of the data generating distribution. In all experiments, the\n\nare measured on consecutive pairs of a sequence r1; : : : ; rm of resolutions, and the estimate\n\nestimate bDpack is compared to the correlation dimension estimate bDcorr. Both dimensions\nis plotted halfway between the two parameters (i.e., bD(ri; ri+1) is plotted at (ri + ri+1)=2.)\n\nIn the \ufb01rst three experiments the manifold is either known or can be approximated easily.\nIn these experiments we use a two-sided multivariate power distribution with density\n\np(x) = I\n\nfx2[(cid:0)1;1]dg(cid:16) p\n\n2(cid:17)d d(cid:213)\n\ni=1(cid:0)1(cid:0)jx(i)j(cid:1)p(cid:0)1\n\n(1)\n\n1Typically, the computation of an independent vertex set of G of size at least (cid:0)1(cid:0) 1\n\nk(cid:1)d MI(G)\n\nrequires O(nkd\n\n) time.\n\n\fPACKINGDIMENSION(Sn; r1; r2;e)\n\nPermute Sn randomly\nfor k 1 to 2 do\n\nfor \u2018 1 to \u00a5 do\nC /0\nfor i 1 to n do\n\nif d(cid:0)Sn[i];C [ j](cid:1) < rk then\n\nfor j 1 to jCj do\nj n + 1\nC C [fSn[i]g\n\nif j < n + 1 then\n\nbLk[\u2018] = logjCj\n\n\u00b5(bL2)(cid:0) \u00b5(bL1)\nlogr2 (cid:0) logr1\n\nbDpack = (cid:0)\nif \u2018 > 10 and 1:65ps 2(bL1)+s 2(bL2)\np\u2018(logr2(cid:0)logr1)\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n\n12\n\n13\n\n< bDpack (cid:3) (1(cid:0) e)=2 then\n\n14\n\nreturn bDpack\nFigure 2: The algorithm returns the packing dimension estimate bDpack(r1; r2) of a data set\nSn with e accuracy nine times out of ten.\n\nwith different exponents p to generate uniform (p = 1) and non-uniform data sets on the\nmanifold.\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\nscale-dependency observed in Figure 1. As the distribution becomes uneven, bDcorr severely\nunderestimates bDtop while bDpack remains stable.\n\nThe \ufb01rst synthetic data is that of Figure 1. We generated 5000 points on a spiral-shaped\nmanifold with a small uniform perpendicular noise. The curves in Figure 3(a) re\ufb02ect the\n\np = 3\n\np = 1\n\np = 2\n\n(a) Spiral\n\n(b) Hypercube\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\n(c) D \u2019 1\n(d) D \u2019 2\n\np = 2\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\n\n2.5\n\n2\n\n1.5\n\nbD\n\n1\n\n0.5\n\n0\n\n0\n\nd = 6\n\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nbDpack\nbDcorr\n\np = 1\np = 3\np = 5\np = 8\n\np = 1\n\np = 3\np = 5\np = 8\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\nr\n\np = 5\n\np = 8\n\n6\n\n5\n\n4\n\nbD\n\n3\n\n2\n\n1\n0.05\n\nbDpack; p = 1\nbDcorr; p = 1\nbDpack; p = 3\nbDcorr; p = 3\n\nd = 6\n\nd = 5\n\n}\n\n}\n\n{\n\nd = 3\n\n{\n\nd = 4\n\n}\n\nd = 2\n\n0.1\n\n0.15\n\n0.2\n\n0.25\n\n0.3\n\nr\n\nFigure 3: Intrinsic dimension of (a) a spiral-shaped manifold and (b) hypercubes of differ-\nent dimensions. The curves re\ufb02ect the scale-dependency observed in Figure 1. The more\n\nuneven the distribution, the more bDcorr underestimates bDtop while bDpack remains relatively\n\nstable.\n\nThe second set of experiments were designed to test how well the methods estimate the\ndimension of 5000 data points generated in hypercubes of dimensions two to six (Fig-\n\nure 3(b)). In general, both bDcorr and bDpack underestimates bDtop. The negative bias grows\n\nwith the dimension, probably due to the fact that data sets of equal cardinality become\n\n\fp = 1\n\np = 1\n\np = 1\n\np = 2\n\np = 3\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nsparser in a higher dimensional space. To compensate this bias on a general data set,\nCamastra and Vinciarelli [10] propose to correct the estimate by the bias observed on a\nPSfrag replacements\nuniformly generated data set of the same cardinality. Our experiment shows that, in the\n(a) D \u2019 0\n(b) D \u2019 2\nPSfrag replacements\n(c) D \u2019 1\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nbDpack\n(c) D \u2019 1\n(d) D \u2019 2\nbDcorr\nbDpack\n\nPSfrag replacements\n(a) D \u2019 0\ncase of bDcorr, this calibrating procedure can fail if the distribution is highly non-uniform.\n(b) D \u2019 2\nOn the other hand, the technique seems more reliable for bDpack due to the relative stability\nPSfrag replacements\n(c) D \u2019 1\nof bDpack.\n(d) D \u2019 2\n(a) D \u2019 0\nbDcorr\n(b) D \u2019 2\nWe also tested the methods on two sets of image data. Both sets contained 64(cid:2) 64 images\nbDpack\n(c) D \u2019 1\nwith 256 gray levels. The images were normalized so that the distance between a black\n(d) D \u2019 2\nand a white image is 1. The \ufb01rst set is a sequence of 481 snapshots of a hand turning a\ncup from the CMU database2 (Figure 4(a)). The sequence of images sweeps a curve in\nbDcorr\nbDpack\na 4096-dimensional space so its informal intrinsic dimension is one. Figure 5(a) shows\nthat at a small scale, both methods \ufb01nd a local dimension between 1 and 2. At a slightly\nhigher scale the intrinsic dimension increases indicating a relatively high curvature of the\nimage sequence curve. To test the distribution dependence of the estimates, we constructed\na polygonal curve by connecting consecutive points of the sequence, and resampled 481\npoints by using the power distribution (1) with p = 2;3. We also constructed a highly-\nuniform, lattice-like data set by drawing approximately equidistant consecutive points from\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nthe polygonal curve. Our results in Figure 5(a) con\ufb01rm again that bDcorr varies extensively\nr\nbDcorr; p = 3\nwith the generating distribution on the manifold while bDpack remains remarkably stable.\nbDpack; p = 3\nbD\nd = 4\nr\n(a)\nd = 3\nd = 6\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\np = 3\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\np = 8\n\np = 3\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\np = 8\n\np = 3\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\nbDcorr; p = 1\nbD\nbDpack; p = 1\nr\nbDcorr; p = 3\nbDpack; p = 3\nbD\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\nd = 4\nr\nd = 3\nd = 6\n\np = 8\np = 2\n\np = 8\np = 2\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 5\np = 1\n\np = 8\np = 2\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 2\n\np = 3\n\np = 5\np = 1\n\np = 1\n\np = 2\n\np = 3\n\np = 5\n\np = 8\n\np = 5\n\np = 8\n\np = 5\n\np = 8\n\np = 5\np = 1\n\np = 3\n\np = 5\n\np = 8\n\np = 5\n\np = 8\n\np = 5\n\np = 8\n\np = 5\n\np = 8\n\np = 5\np = 1\n\np = 3\n\np = 5\n\np = 8\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\nd = 6\n\nd = 5\n\np = 1\n\np = 1\n\np = 1\n\np = 1\n\np = 3\n\np = 3\n\np = 3\n\np = 1\n\np = 1\n\np = 3\n\np = 3\n\np = 5\n\np = 5\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\n(c) D \u2019 1\n(d) D \u2019 2\n\nd = 2\nd = 5\n\nd = 4\n\n(b)\n\nd = 3\n\nd = 2\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nd = 4\n\nd = 4\n\nd = 2\nd = 5\n\nd = 2\nd = 5\n\nd = 2\nd = 5\n\nPSfrag replacements\n(a) D \u2019 0\n(b) D \u2019 2\n(c) D \u2019 1\n(d) D \u2019 2\n\nd = 2\n\nd = 3\n\nd = 2\n\nd = 3\n\nd = 3\n\nd = 2\n\nd = 4\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\nd = 2\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\np = 1\n\np = 5\n\np = 8\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\n\nFigure 4: The real datasets. (a) Sequence of snapshots of a hand turning a cup. (b) Faces\ndatabase from ISOMAP [4].\n\nThe \ufb01nal experiment was conducted on the \u201cfaces\u201d database from the ISOMAP paper [4]\n(Figure 4(b)). The data set contained 698 images of faces generated by using three free\nparameters: vertical and horizontal orientation, and light direction. Figure 5(b) indicates\nthat both estimates are reasonably close to the informal intrinsic dimension.\n\np = 3\n\np = 2\n\np = 1\n\np = 5\n\n(a) Turning cup\n\np = 8\n\nbDcorr; p = 1\nbDpack; p = 1\nbDcorr; p = 3\nbDpack; p = 3\n\nbDpack\nbDcorr\n\n5\n\n4.5\n\n4\n\n3.5\n\nd = 6\n\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\n3\nbD\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\nlattice\noriginal\n\np = 2\n\np = 3\n\n0.04\n\n0.06\n\n0.08\n\n0.1\n\n0.12\n\n0.14\n\n0.16\n\n0.18\n\nr\n\nd = 6\n\nd = 5\n\nd = 4\n\nd = 3\n\nd = 2\n\noriginal\n\nlattice\n\n4.5\n\n4\n\n3.5\n\n3\n\n2.5\nbD\n2\n\n1.5\n\n1\n\n0.5\n\n0\n0.15\n\n(b) ISOMAP faces\n\nbDpack\nbDcorr\n\n0.2\n\n0.25\n\n0.3\n\n0.35\n\n0.4\n\n0.45\n\n0.5\n\nr\n\nFigure 5: The intrinsic dimension of image data sets.\n\nWe found in all experiments that at a very small scale bDcorr tends to be higher than bDpack,\n\n2http://vasc.ri.cmu.edu/idb/html/motion/hand/index.html\n\n\fintrinsic dimension. On the other hand, if the data contains noise (in which case at a very\nsmall scale we are estimating the dimension of the noise rather than the dimension of the\n\nwhile bDpack tends to be more stable as the scale grows. Hence, if the data contains very little\nnoise and it is generated uniformly on the manifold, bDcorr seems to be closer to the \u201creal\u201d\nmanifold), or the distribution on the manifold is non-uniform, bDpack seems more reliable\nthan bDcorr.\n5 Conclusion\n\nWe have presented a new algorithm to estimate the intrinsic dimension of data sets. The\nmethod estimates the packing dimension of the data and requires neither parametric as-\nsumptions on the data generating model nor input parameters to set. The method is com-\npared to a widely-used technique based on the correlation dimension. Experiments show\nthat our method is more robust in terms of the data generating distribution and more reliable\nin the presence of noise.\n\nReferences\n\n[1] T. Kohonen, The Self-Organizing Map, Springer-Verlag, 2nd edition, 1997.\n[2] T. F. Cox and M. A. Cox, Multidimensional Scaling, Chapman & Hill, 1994.\n[3] S. Roweis and Saul L. K., \u201cNonlinear dimensionality reduction by locally linear embedding,\u201d\n\nScience, vol. 290, pp. 2323\u20132326, 2000.\n\n[4] J. B. Tenenbaum, V. de Silva, and Langford J. C., \u201cA global geometric framework for nonlinear\n\ndimensionality reduction,\u201d Science, vol. 290, pp. 2319\u20132323, 2000.\n\n[5] E. Ch\u00b4avez, G. Navarro, R. Baeza-Yates, and J. Marroqu\u00b4\u0131n, \u201cSearching in metric spaces,\u201dACM\n\nComputing Surveys, p. to appear, 2001.\n\n[6] J. Bruske and G. Sommer, \u201cIntrinsic dimensionality estimation with optimally topology pre-\nserving maps,\u201d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no.\n5, pp. 572\u2013575, 1998.\n\n[7] S. Roweis, \u201cEM algorithms for PCA and SPCA,\u201d inAdvances in Neural Information Processing\n\nSystems. 1998, vol. 10, pp. 626\u2013632, The MIT Press.\n\n[8] C. M. Bishop, M. Svens\u00b4en, and C. K. I. Williams, \u201cGTM: The generative topographic mapping,\u201d\n\nNeural Computation, vol. 10, no. 1, pp. 215\u2013235, 1998.\n\n[9] P. Grassberger and I. Procaccia, \u201cMeasuring the strangeness of strange attractors,\u201d Physica,\n\nvol. D9, pp. 189\u2013208, 1983.\n\n[10] F. Camastra and A. Vinciarelli, \u201cEstimating intrinsic dimension of data with a fractal-based\napproach,\u201d IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, to appear.\n[11] A. Belussi and C. Faloutsos, \u201cSpatial join selectivity estimation using fractal concepts,\u201dACM\n\nTransactions on Information Systems, vol. 16, no. 2, pp. 161\u2013201, 1998.\n\n[12] J. Hastad, \u201cClicque is hard to approximate within n 1(cid:0)e ,\u201d in Proceedings of the 37th Annual\n\nSymposium on Foundations of Computer Science FOCS\u201996, 1996, pp. 627\u2013636.\n\n[13] T. Erlebach, K. Jansen, and E. Seidel, \u201cPolynomial-time approximation schemes for geometric\ngraphs,\u201d in Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms SODA\u201901,\n2001, pp. 671\u2013679.\n\n\f", "award": [], "sourceid": 2290, "authors": [{"given_name": "Bal\u00e1zs", "family_name": "K\u00e9gl", "institution": null}]}