{"title": "Large scale networks fingerprinting and visualization using the k-core decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 41, "page_last": 50, "abstract": null, "full_text": "Large scale networks \ufb01ngerprinting and\n\nvisualization using the k-core decomposition\n\nJ. Ignacio Alvarez-Hamelin\u2217\nLPT (UMR du CNRS 8627),\n\nUniversit\u00b4e de Paris-Sud,\n\n91405 ORSAY Cedex France\n\nLuca Dall\u2019Asta\n\nLPT (UMR du CNRS 8627),\n\nUniversit\u00b4e de Paris-Sud,\n\n91405 ORSAY Cedex France\n\nIgnacio.Alvarez-Hamelin@lri.fr\n\nLuca.Dallasta@th.u-psud.fr\n\nAlain Barrat\n\nLPT (UMR du CNRS 8627),\n\nUniversit\u00b4e de Paris-Sud,\n\n91405 ORSAY Cedex France\n\nAlessandro Vespignani\nSchool of Informatics,\n\nIndiana University,\n\nBloomington, IN 47408, USA\n\nAlain.Barrat@th.u-psud.fr\n\nalexv@indiana.edu\n\nAbstract\n\nWe use the k-core decomposition to develop algorithms for the analysis\nof large scale complex networks. This decomposition, based on a re-\ncursive pruning of the least connected vertices, allows to disentangle the\nhierarchical structure of networks by progressively focusing on their cen-\ntral cores. By using this strategy we develop a general visualization algo-\nrithm that can be used to compare the structural properties of various net-\nworks and highlight their hierarchical structure. The low computational\ncomplexity of the algorithm, O(n + e), where n is the size of the net-\nwork, and e is the number of edges, makes it suitable for the visualization\nof very large sparse networks. We show how the proposed visualization\ntool allows to \ufb01nd speci\ufb01c structural \ufb01ngerprints of networks.\n\n1\n\nIntroduction\n\nIn recent times, the possibility of accessing, handling and mining large-scale networks\ndatasets has revamped the interest in their investigation and theoretical characterization\nalong with the de\ufb01nition of new modeling frameworks. In particular, mapping projects of\nthe World Wide Web and the physical Internet offered the \ufb01rst chance to study topology\nand traf\ufb01c of large-scale networks. Other studies followed describing population networks\nof practical interest in social science, critical infrastructures and epidemiology [1, 2, 3].\nThe study of large scale networks, however, faces us with an array of new challenges. The\nde\ufb01nitions of centrality, hierarchies and structural organizations are hindered by the large\nsize of these networks and the complex interplay of connectivity patterns, traf\ufb01c \ufb02ows and\ngeographical, social and economical attributes characterizing their basic elements. In this\n\u2217Further author information: J.I.A-H. is also with Facultad de Ingenier\u00b4\u0131a, Universidad de Buenos\n\nAires, Paseo Col\u00b4on 850, C 1063 ACV Buenos Aires, Argentina.\n\n\fcontext, a large research effort is devoted to provide effective visualization and analysis\ntools able to cope with graphs whose size may easily reach millions of vertices.\n\nIn this paper, we propose a visualization algorithm based on the k-core decomposition\nable to uncover in a two-dimensional layout several topological and hierarchical properties\nof large scale networks. The k-core decomposition [4] consists in identifying particular\nsubsets of the graph, called k-cores, each one obtained by recursively removing all the\nvertices of degree smaller than k, until the degree of all remaining vertices is larger than or\nequal to k. Larger values of the index k clearly correspond to vertices with larger degree\nand more central position in the network\u2019s structure.\n\nThis visualization tool allows the identi\ufb01cation of real or computer-generated networks\u2019\n\ufb01ngerprints, according to properties such as hierarchical arrangement, degree correlations\nand centrality. The distinction between networks with seemingly similar properties is\nachieved by inspecting the different layouts generated by the visualization algorithm. In\naddition, the running time of the algorithm grows only linearly with the size of the net-\nwork, granting the scalability needed for the visualization of very large sparse networks.\nThe proposed (publicly available [5]) algorithm appears therefore as a convenient method\nfor the general analysis of large scale complex networks and the study of their architecture.\n\nThe paper is organized as follows: after a brief survey on k-core studies (section 2), we\npresent the basic de\ufb01nitions and the graphical algorithms in section 3 along with the basic\nfeatures of the visualization layout. Section 4 shows how the visualizations obtained with\nthe present algorithm may be used for network \ufb01ngerprinting, and presents two examples\nof visualization of real networks.\n\n2 Related work\n\nWhile a large number of algorithms aimed at the visualization of large scale networks have\nbeen developed (e.g., see [6]), only a few consider explicitly the k-core decomposition.\nVladimir Batagelj et al. [7] studied the k-core decomposition applied to visualization prob-\nlems, introducing some graphical tools to analyse the cores, mainly based on the visualiza-\ntion of the adjacency matrix of certain k-cores. To the best of our knowledge, the algorithm\npresented by Baur et al. in [8] is the only one completely based on a k-core analysis and\ndirectly targeted at the study of large information networks. This algorithm uses a spectral\nlayout to place vertices having the largest shell index. A combination of barycentric and\niteratively directed-forces allows to place the vertices of each k-shell, in decreasing order.\nFinally, the network is drawn in three dimensions, using the z axis to place each shell in a\ndistinct horizontal layer. Note that the spectral layout is not able to distinguish two or more\ndisconnected components. The algorithm by Baur et al. is also tuned for representing AS\ngraphs and its total complexity depends on the size of the highest k-core (see [9] for more\ndetails on spectral layout), making the computation time of this proposal largely variable.\nIn this respect, the algorithm presented here is different in that it can represent networks in\nwhich k-cores are composed by several connected components. Another difference is that\nrepresentations in 2D are more suited for information visualization than other representa-\ntions (see [10] and references therein). Finally, the algorithm parameters can be universally\nde\ufb01ned, yielding a fast and general tool for analyzing all types of networks.\n\nIt is interesting to note that the notion of k-cores has been recently used in biologically\nrelated contexts, where it was applied to the analysis of protein interaction networks [11] or\nin the prediction of protein functions [12, 13]. Further applications in Internet-related areas\ncan be found in [14], where the k-core decomposition is used for \ufb01ltering out peripheral\nAutonomous Systems (ASes), and in [15] where the scale invariant structure of degree\ncorrelations and mapping biases in AS maps is shown. Finally in [16, 17], an interesting\napproach based on the k-core decomposition has been used to provide a conceptual and\n\n\fstructural model of the Internet; the so-called medusa model for the Internet.\n\n3 Graphical representation\nLet us consider a graph G = (V, E) of |V | = n vertices and |E| = e edges; a k-core is\nde\ufb01ned as follows [4]:\n-A subgraph H = (C, E|C) induced by the set C \u2286 V is a k-core or a core of order k iff\n\u2200v \u2208 C : degreeH(v) \u2265 k, and H is the maximum subgraph with this property.\nA k-core of G can therefore be obtained by recursively removing all the vertices of degree\nless than k, until all vertices in the remaining graph have at least degree k. Furthermore,\nwe will use the following de\ufb01nitions:\n-A vertex i has shell index c if it belongs to the c-core but not to (c + 1)-core. We denote\nby ci the shell index of vertex i.\n-A shell Cc is composed by all the vertices whose shell index is c. The maximum value\nc such that Cc is not empty is denoted cmax. The k-core is thus the union of all shells Cc\nwith c \u2265 k.\n-Each connected set of vertices having the same shell index c is a cluster Qc. Each shell\nCc is thus composed by clusters Qc\nmax is the\nnumber of clusters in Cc.\nThe visualization algorithm we propose places vertices in 2 dimensions, the position of\neach vertex depending on its shell index and on the index of its neighbors. A color code\nallows for the identi\ufb01cation of shell indices, while the vertex\u2019s original degree is provided\nby its size that depends logarithmically on the degree. For the sake of clarity, our algorithm\nrepresents a small percentage of the edges, chosen uniformly at random. As mentioned, a\ncentral role in our visualization method is played by multi-components representation of k-\ncores. In the most general situation, indeed, the recursive removal of vertices having degree\nless than a given k can break the original network into various connected components,\neach of which might even be once again broken by the subsequent decomposition. Our\nmethod takes into account this possibility, however we will \ufb01rst present the algorithm in\nthe simpli\ufb01ed case, in which none of the k-cores is fragmented. Then, this algorithm will\nbe used as a subroutine for treating the general case (Table 1).\n\nm, such that Cc = \u222a1\u2264m\u2264qc\n\nmax Qc\n\nm, where qc\n\n3.1 Drawing algorithm for k-cores with single connected component\n\nk-core decomposition. The shell index of each vertex is computed and stored in a vector\nC, along with the shells Cc and the maximum index cmax. Each shell is then decomposed\ninto clusters Qc\nm of connected vertices, and each vertex i is labeled by its shell index ci and\nby a number qi representing the cluster it belongs to.\nThe two dimensional graphical layout. The visualization is obtained assigning to each\nvertex i a couple of polar coordinates (\u03c1i, \u03b1i): the radius \u03c1i is a function of the shell index\nof the vertex i and of its neighbors; the angle \u03b1i depends on the cluster number qi. In\nthis way, k-shells are displayed as layers with the form of circular shells, the innermost\none corresponding to the set of vertices with highest shell index. A vertex i belongs to the\ncmax \u2212 ci layer from the center.\nMore precisely, \u03c1i is computed according to the following formula:\n\n\u03c1i = (1 \u2212 \u0001)(cmax \u2212 ci) +\n\n\u0001\n\n|Vcj\u2265ci(i)|\n\n(cmax \u2212 cj) ,\n\n(1)\n\nX\n\nj\u2208Vcj\u2265ci (i)\n\n\fVcj\u2265ci(i) is the set of neighbors of i having shell index cj larger or equal to ci. The pa-\nrameter \u0001 controls the possibility of rings overlapping, and is one of the only three external\nparameters required to tune image\u2019s rendering.\nInside a given shell, the angle \u03b1i of a vertex i is computed as follow:\n\nX\n\n1\u2264m 1\nS\u2190 compute components k\n(X, Y )\u2190compute origin coordinates cmp k (Eqs. from 3 to 4)\nU\u2190compute unit size cmp k (Eq. 5)\nk := k + 1\n\nset \u03c1i and \u03b1i according to a uniform distribution in the disk of radius u (u is the core\nrepresentation unit size)\n\nelse\n\n12\n13\n14 (X ,Y)\u2190compute final coordinates \u03c1 \u03b1 U X Y (Eq. 6)\n\nset \u03c1i and \u03b1i according to Eqs. 1 and 2\n\nTable 1: Algorithm for the representation of networks using k-cores decomposition\n\nFinally, the unit length uh of a component h is computed as\n\nP\n\n|Sh|\n1\u2264j\u2264hmax\n\nuh =\n\n|Sj| \u00b7 up ,\n\n(5)\n\nwhere up is the unit length of its parent component. Larger unit length and size are therefore\nattributed to larger components.\n\nFor each vertex i, radial and angular coordinates are computed by equations 1 and 2 as\nin the previous algorithm. These coordinates are then considered as relative to the center\n(Xh, Yh) of the component to which i belongs. The position of i is thus given by\n\nxi = Xh + \u03b3 \u00b7 uh \u00b7 \u03c1i \u00b7 cos(\u03b1i); yi = Yh + \u03b3 \u00b7 uh \u00b7 \u03c1i \u00b7 sin(\u03b1i)\n\n(6)\n\nwhere \u03b3 is a parameter controlling the component\u2019s diameter.\n\nFirst,\n\n(P\n\nThe main loop is com-\nThe global algorithm is formally presented in Table 1.\nthe function {(end,C) \u2190make core k}\nposed by the following functions.\nrecursively removes all vertices of degree k \u2212 1, obtaining the k-core, and stores\ninto C the shell index k \u2212 1 of the removed vertices. The boolean variable end\nis set to true if the k-core is empty, otherwise it is set to f alse. The function\n{(Q,T ) \u2190 compute clusters k \u2212 1} operates the decomposition of the (k \u2212 1)-\nshell into clusters, storing for each vertex the cluster label into the vector Q, and \ufb01ll-\ning table T , which is indexed by the shell index c and cluster label q: T (c, q) =\n1\u2264m