{"title": "On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2132, "page_last": 2140, "abstract": "In this paper, we argue for representing networks as a bag of {\\it triangular motifs}, particularly for important network problems that current model-based approaches handle poorly due to computational bottlenecks incurred by using edge representations. Such approaches require both 1-edges and 0-edges (missing edges) to be provided as input, and as a consequence, approximate inference algorithms for these models usually require $\\Omega(N^2)$ time per iteration, precluding their application to larger real-world networks. In contrast, triangular modeling requires less computation, while providing equivalent or better inference quality. A triangular motif is a vertex triple containing 2 or 3 edges, and the number of such motifs is $\\Theta(\\sum_{i}D_{i}^{2})$ (where $D_i$ is the degree of vertex $i$), which is much smaller than $N^2$ for low-maximum-degree networks. Using this representation, we develop a novel mixed-membership network model and approximate inference algorithm suitable for large networks with low max-degree. For networks with high maximum degree, the triangular motifs can be naturally subsampled in a {\\it node-centric} fashion, allowing for much faster inference at a small cost in accuracy. Empirically, we demonstrate that our approach, when compared to that of an edge-based model, has faster runtime and improved accuracy for mixed-membership community detection. We conclude with a large-scale demonstration on an $N\\approx 280,000$-node network, which is infeasible for network models with $\\Omega(N^2)$ inference cost.", "full_text": "On Triangular versus Edge Representations \u2014\n\nTowards Scalable Modeling of Networks\n\nQirong Ho\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\nqho@cs.cmu.edu\n\nJunming Yin\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\njunmingy@cs.cmu.edu\n\nEric P. Xing\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nepxing@cs.cmu.edu\n\nAbstract\n\n\u0398((cid:80)\n\nIn this paper, we argue for representing networks as a bag of triangular motifs,\nparticularly for important network problems that current model-based approaches\nhandle poorly due to computational bottlenecks incurred by using edge represen-\ntations. Such approaches require both 1-edges and 0-edges (missing edges) to be\nprovided as input, and as a consequence, approximate inference algorithms for\nthese models usually require \u2126(N 2) time per iteration, precluding their applica-\ntion to larger real-world networks. In contrast, triangular modeling requires less\ncomputation, while providing equivalent or better inference quality. A triangular\nmotif is a vertex triple containing 2 or 3 edges, and the number of such motifs is\ni ) (where Di is the degree of vertex i), which is much smaller than N 2\nfor low-maximum-degree networks. Using this representation, we develop a novel\nmixed-membership network model and approximate inference algorithm suitable\nfor large networks with low max-degree. For networks with high maximum de-\ngree, the triangular motifs can be naturally subsampled in a node-centric fashion,\nallowing for much faster inference at a small cost in accuracy. Empirically, we\ndemonstrate that our approach, when compared to that of an edge-based model,\nhas faster runtime and improved accuracy for mixed-membership community de-\ntection. We conclude with a large-scale demonstration on an N \u2248 280, 000-node\nnetwork, which is infeasible for network models with \u2126(N 2) inference cost.\n\ni D2\n\nIntroduction\n\n1\nNetwork analysis methods such as MMSB [1], ERGMs [20], spectral clustering [17] and latent\nfeature models [12] require the adjacency matrix A of the network as input, re\ufb02ecting the natu-\nral assumption that networks are best represented as a set of edges taking on the values 0 (absent)\nor 1 (present). This assumption is intuitive, reasonable, and often necessary for some tasks, such\nas link prediction, but it comes at a cost (which is not always necessary, as we will discuss later)\nfor other tasks, such as community detection in both the single-membership or admixture (mixed-\nmembership) settings. The fundamental difference between link prediction and community detec-\ntion is that the \ufb01rst is concerned with link outcomes on pairs of vertices, for which providing links\nas input is intuitive. However, the second task is about discovering the community memberships of\nindividual vertices, and links are in fact no longer the only sensible representation. By representing\nthe input network as a bag of triangular motifs \u2014 by which we mean vertex triples with 2 or 3\nedges \u2014 one can design novel models for mixed-membership community detection that outperform\nmodels based on the adjacency matrix representation.\nThe main advantage of the bag-of-triangles representation lies in its huge reduction of computa-\ntional cost for certain network analysis problems, with little or no loss of outcome quality. In the\ntraditional edge representation, if N is the number of vertices, then the adjacency matrix has size\n\u0398(N 2) \u2014 thus, any network analysis algorithm that touches every element must have \u2126(N 2) run-\ntime complexity. For probabilistic network models, this statement applies to the cost of approximate\n\n1\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\ni D2\n\nbounded by \u0398((cid:80)\n\nFigure 1: Four types of triangular motifs: (a) full-triangle; (b) 2-triangle; (c) 1-triangle; (d) empty-triangle.\nFor mixed-membership community detection, we only focus on full-triangles and 2-triangles.\ninference. For example, the Mixed Membership Stochastic Blockmodel (MMSB) [1] has \u0398(N 2)\nlatent variables, implying an inference cost of \u2126(N 2) per iteration. Looking beyond, the popular p\u2217\nor Exponential Random Graph models [20] are normally estimated via MCMC-MLE, which entails\ndrawing network samples (each of size \u0398(N 2)) from some importance distribution. Finally, latent\nfactor models such as [12] only have \u0398(N ) latent variables, but the Markov blanket of each variable\ndepends on \u0398(N ) observed variables, resulting in \u2126(N 2) computation per sweep over all variables.\nWith an inference cost of \u2126(N 2), even modestly large networks with only \u223c 10, 000 vertices are\ninfeasible, to say nothing of modern social networks with millions of vertices or more.\nOn the other hand, it can be shown that the number of 2- and 3-edge triangular motifs is upper-\ni ), where Di is the degree of vertex i. For networks with low maximum degree,\nthis quantity is (cid:28) N 2, allowing us to construct more parsimonious models with faster inference\nalgorithms. Moreover, for networks with high maximum degree, one can subsample \u0398(N \u03b42) of\nthese triangular motifs in a node-centric fashion, where \u03b4 is a user-chosen parameter. Speci\ufb01cally,\nwe assign triangular motifs to nodes in a natural manner, and then subsample motifs only from nodes\nwith too many of them. In contrast, MMSB and latent factor models rely on distributions over 0/1-\nedges (i.e. edge probabilities), and for real-world networks, these distributions cannot be preserved\nwith small (i.e. o(N 2)) sample sizes because the 0-edges asymptotically outnumber the 1-edges.\nAs we will show, a triangular representation does not preserve all information found in an edge repre-\nsentation. Nevertheless, we argue that one should represent complex data objects in a task-dependent\nmanner, especially since computational cost is becoming a bottleneck for real-world problems like\nanalyzing web-scale network data. The idea of transforming the input representation (e.g. from\nnetwork to bag-of-triangles) for better task-speci\ufb01c performance is not new. A classic example is\nthe bag-of-words representation of a document, in which the ordering of words is discarded. This\nrepresentation has proven effective in natural language processing tasks such as topic modeling [2],\neven though it eliminates practically all grammatical information. Another example from computer\nvision is the use of superpixels to represent images [3, 4]. By grouping adjacent pixels into larger\nsuperpixels, one obtains a more compact image representation, in turn leading to faster and more\nmeaningful algorithms. When it comes to networks, triangular motifs (Figure 1) are already of\nsigni\ufb01cant interest in biology [13], social science [19, 9, 10, 16], and data mining [21, 18, 8]. In\nparticular, 2- and 3-edge triangular motifs are central to the notion of transitivity in the social sci-\nences \u2014 if we observe edges A-B and B-C, does A have an edge to C as well? Transitivity is of\nspecial importance, because high transitivity (i.e. we frequently observe the third edge A-C) intu-\nitively leads to stronger clusters with more within-cluster edges. In fact, the ratio of 3-edge triangles\nto connected vertex triples (i.e. 2- and 3-edge triangular motifs) is precisely the de\ufb01nition of the\nnetwork clustering coef\ufb01cient [16], which is a popular measure of cluster strength.\nIn the following sections, we begin by characterizing the triangular motifs, following which we de-\nvelop a mixed-membership model and inference algorithm based on these motifs. Our model, which\nwe call MMTM or the Mixed-Membership Triangular Model, performs mixed-membership commu-\nnity detection, assigning each vertex i to a mixture of communities. This allows for better outlier de-\ntection and more informative visualization compared to single-membership modeling. In addition,\nmixed-membership modeling has two key advantages: \ufb01rst, MM models such as MMSB, Latent\nDirichlet Allocation and our MMTM are easily modi\ufb01ed for specialized tasks \u2014 as evidenced by\nthe rich literature on topic models [2, 1, 14, 5]. Second, MM models over disparate data types (text,\nnetwork, etc.) can be combined by fusing their latent spaces, resulting in a multi-view model \u2014 for\nexample, [14, 5] model both text and network data from the same mixed-membership vectors. Thus,\nour MMTM can serve as a basic modeling component for massive real-world networks with copious\nside information. After developing our model and inference algorithm, we present simulated exper-\niments comparing them on a variety of network types to an adjacency-matrix-based model (MMSB)\nand its inference algorithm. These experiments will show that triangular mixed-membership mod-\neling results in both faster inference and more accurate mixed-membership recovery. We conclude\nby demonstrating our model/algorithm on a network with N \u2248 280, 000 nodes and \u223c 2, 300, 000\nedges, which is far too large for \u2126(N 2) inference algorithms such as variational MMSB [1] and the\nGibbs sampling MMSB inference algorithm we developed for our experiments.\n\n2\n\nijkijkijkijkijkijkijkijk\f2 Triangular Motif Representation of a Network\nIn this work, we consider undirected networks over N vertices, such as social networks. Most of\nthe ideas presented here also generalize to directed networks, though the analysis is more involved\nsince directed networks can generate more motifs than undirected ones. To prevent confusion, we\nshall use the term \u201c1-edge\u201d to refer to edges that exist between two vertices, and the term \u201c0-\nedge\u201d to refer to missing edges. Now, de\ufb01ne a triangular motif Eijk involving vertices i < j < k\nto be the type of subgraph over these 3 vertices. There are 4 basic classes of triangular motifs\n(Figure 1), distinguished by their number of 1-edges: full-triangle \u22063 (three 1-edges), 2-triangle \u22062\n(two 1-edges), 1-triangle \u22061 (one 1-edge), and empty-triangle \u22060 (no 1-edges). The total number of\ntriangles, over all 4 classes, is \u0398(N 3). However, our goal is not to account for all 4 classes; instead,\nwe will focus on \u22063 and \u22062 while ignoring \u22061 and \u22060. We have three primary motivations for this:\n1. In the network literature, the most commonly studied \u201cnetwork motifs\u201d [13], de\ufb01ned as\npatterns of signi\ufb01cantly recurring inter-connections in complex networks, are the three-\nnode connected subgraphs (namely \u22063 and \u22062) [13, 19, 9, 10, 16].\n\n2. Since the full-triangle and 2-triangle classes are regarded as the basic structural elements\nof most networks [19, 13, 9, 10, 16], we naturally expect them to characterize most of the\ncommunity structure in networks (cf. network clustering coef\ufb01cient, as explained in the\nintroduction). In particular, the \u22063 and \u22062 triangular motifs preserve almost all 1-edges\nfrom the original network: every 1-edge appears in some triangular motif \u22062, \u22063, except\nfor isolated 1-edges (i.e. connected components of size 2), which are less interesting from\na large-scale community detection perspective.\n\nLemma 1. The total number of \u22063\u2019s and \u22062\u2019s is upper bounded by (cid:80)\n\u0398((cid:80)\n\n3. For real networks, which have far more 0- than 1-edges, focusing only on \u22063 and \u22062\n2 (Di)(Di \u2212 1) =\n\ngreatly reduces the number of triangular motifs, via the following lemma:\n\ni ), where Di is the degree of vertex i.\n\ni D2\n\n1\n\ni\n\ni\n\n1\n\ni D2\n\nProof. Let Ni be the neighbor set of vertex i. For each vertex i, form the set Ti of tuples (i, j, k)\nwhere j < k and j, k \u2208 Ni, which represents the set of all pairs of neighbors of i. Because j and\nk are neighbors of i, for every tuple (i, j, k) \u2208 Ti, Eijk is either a \u22063 or a \u22062. It is easy to see\n(cid:80)\ni |Ti| =(cid:80)\nthat each \u22062 is accounted for by exactly one Ti, where i is the center vertex of the \u22062, and that\neach \u22063 is accounted for by three sets Ti,Tj and Tk, one for each vertex in the full-triangle. Thus,\nFor networks with low maximum degree D, \u0398((cid:80)\n\n2 (Di)(Di \u2212 1) is an upper bound of the total number of \u22063\u2019s and \u22062\u2019s.\n\ni ) = \u0398(N D2) is typically much smaller than\n\u0398(N 2), allowing triangular models to scale to larger networks than edge-based models. As for net-\nworks with high maximum degree, we suggest the following node-centric subsampling procedure,\nwhich we call \u03b4-subsampling: for each vertex i with degree Di > \u03b4 for some threshold \u03b4, sample\n2 \u03b4(\u03b4 \u2212 1) triangles without replacement and uniformly at random from Ti; intuitively, this is similar\n1\nto capping the network\u2019s maximum degree at Ds = \u03b4. A full-triangle \u22063 associated with vertices\ni, j and k shall appear in the \ufb01nal subsample only if it has been subsampled from at least one of\nTi,Tj and Tk. To obtain the set of all subsampled triangles \u22062 and \u22063, we simply take the union of\nsubsampled triangles from each Ti, discarding those full-triangles duplicated in the subsamples.\nAlthough this node-centric subsampling does not preserve all properties of a network, such as the\ndistribution of node degrees, it approximately preserves the local cluster properties of each vertex,\nthus capturing most of the community structure in networks. Speci\ufb01cally, the \u201clocal\u201d clustering\ncoef\ufb01cient (LCC) of each vertex i, de\ufb01ned as the ratio of #(\u22063) touching i to #(\u22063, \u22062) touching\ni, is well-preserved. This follows from subsampling the \u22063 and \u22062\u2019s at i uniformly at random,\nthough the LCC has a small upwards bias since each \u22063 may also be sampled by the other two\nvertices j and k. Hence, we expect community detection based on the subsampled triangles to be\nnearly as accurate as with the original set of triangles \u2014 which our experiments will show.\nWe note that other subsampling strategies [11, 22] preserve various network properties, such as\ndegree distribution, diameter, and inter-node random walk times. In our triangular model, the main\nproperty of interest is the distribution over \u22063 and \u22062, analogous to how latent factor models and\nMMSB model distributions over 0- and 1-edges. Thus, subsampling strategies that preserve \u22063/\u22062\ndistributions (e.g. our \u03b4-subsampling) would be appropriate for our model. In contrast, 0/1-edge\nsubsampling for MMSB and latent factor models is dif\ufb01cult: most networks have \u0398(N 2) 0-edges\nbut only o(N 2) 1-edges, thus sampling o(N 2) 0/1-edges leads to high variance in their distribution.\n\n3\n\n\f3 Mixed-Membership Triangular Model\nGiven a network, now represented by triangular motifs \u22063 and \u22062, our goal is to perform community\ndetection for each network vertex i, in the same sense as what an MMSB model would enable. Under\nan MMSB, each vertex i is assigned to a mixture over communities, as opposed to traditional single-\nmembership community detection, which assigns each vertex to exactly one community. By taking\na mixed-membership approach, one gains many bene\ufb01ts over single-membership models, such as\noutlier detection, improved visualization, and better interpretability [2, 1].\nFollowing a design principle similar to the one underly-\ning the MMSB models, we now present a new mixed-\nmembership network model built on the more parsimo-\nnious triangular representation. For each triplet of ver-\ntices i, j, k \u2208 {1, . . . , N} , i < j < k, if the subgraph on\ni, j, k is a 2-triangle with i, j, or k at the center, then let\nEijk = 1, 2 or 3 respectively, and if the subgraph is a full-\ntriangle, then let Eijk = 4. Whenever i, j, k corresponds\nto a 1- or an empty-triangle, we do not model Eijk. We\nassume K latent communities, and that each vertex takes\na distribution (i.e. mixed-membership) over them. The\nobserved bag-of-triangles {Eijk} is generated according\nto (1) the distribution over community-memberships at\neach vertex, and (2) a tensor of triangle generation proba-\nbilities, containing different triangle probabilities for dif-\nferent combinations of communities.\nMore speci\ufb01cally, each vertex i is associated with a community mixed-membership vector \u03b8i \u2208\n\u2206K\u22121 restricted to the (K \u2212 1)-simplex \u2206K\u22121. This mixed-membership vector \u03b8i is used to gen-\nerate community indicators si,jk \u2208 {1, . . . , K}, each of which represents the community chosen\nby vertex i when it is forming a triangle with vertices j and k. The probability of observing a tri-\nangular motif Eijk depends on the community-triplet si,jk, sj,ik, sk,ij, and a tensor of multinomial\nparameters B. Let x, y, z \u2208 {1, . . . , K} be the values of si,jk, sj,ik, sk,ij, and assume WLOG that\nx < y < z1. Then, Bxyz \u2208 \u22063 represents the probabilities of generating the 4 triangular motifs2\namong vertices i, j and k. In detail, Bxyz,1 is the probability of the 2-triangle whose center vertex\nhas community x, and analogously for Bxyz,2 and community y, and for Bxyz,3 and community z;\nBxyz,4 is the probability of the full-triangle.\nThe MMTM generative model is summarized below; see Figure 2 for a graphical model illustration.\n\nFigure 2: Graphical model representation\nfor MMTM, our mixed-membership model\nover triangular motifs.\n\n\u2022 Triangle tensor Bxyz \u223c Dirichlet (\u03bb) for all x, y, z \u2208 {1, . . . , K}, where x < y < z\n\u2022 Community mixed-membership vectors \u03b8i \u223c Dirichlet (\u03b1) for all i \u2208 {1, . . . , N}\n\u2022 For each triplet (i, j, k) where i < j < k,\n\n\u2013 Community indices si,jk \u223c Discrete (\u03b8i), sj,ik \u223c Discrete (\u03b8j), sk,ij \u223c Discrete (\u03b8k).\n\u2013 Generate the triangular motif Eijk based on Bxyz and the ordered values of\nsi,jk, sj,ik, sk,ij; see Table 1 for the exact conditional probabilities. There are 6 entries\nin Table 1, corresponding to the 6 possible orderings of si,jk, sj,ik, sk,ij.\n\nInference\n\n4\nWe adopt a collapsed, blocked Gibbs sampling approach, where \u03b8 and B have been integrated out.\nThus, only the community indices s need to be sampled. For each triplet (i, j, k) where i < j < k,\n\nP (si,jk, sj,ik, sk,ij | s\u2212ijk, E, \u03b1, \u03bb) \u221d P (Eijk|E\u2212ijk, s, \u03bb) P (si,jk | si,\u2212jk, \u03b1)\nP (sj,ik | sj,\u2212ik, \u03b1) P (sk,ij | sk,\u2212ij, \u03b1) ,\n\n1The cases x = y = z, x = y < z and x < y = z require special treatment, due to ambiguity cased by\nhaving identical communities. In the interest of keeping our discussion at a high level, we shall refer the reader\nto the appendix for these cases.\n\n2It is possible to generate a set of triangles that does not correspond to a network, e.g. a 2-triangle centered\non i for (i, j, k) followed by a 3-triangle for (j, k, (cid:96)), which produces a mismatch on the edge (j, k). This is a\nconsequence of using a bag-of-triangles model, just as the bag-of-words model in Latent Dirichlet Allocation\ncan generate sets of words that do not correspond to grammatical sentences. In practice, this is not an issue for\neither our model or LDA, as both models are used for mixed-membership recovery, rather than data simulation.\n\n4\n\nBxyzsi,jk\u03b8i\u03b8jsj,iksk,ij\u03b8k\u03b1Eijk\u03bb\fOrder\n\nsi,jk < sj,ik < sk,ij\nsi,jk < sk,ij < sj,ik\nsj,ik < si,jk < sk,ij\nsj,ik < sk,ij < si,jk\nsk,ij < si,jk < sj,ik\nsk,ij < sj,ik < si,jk\n\nConditional probability of Eijk \u2208 {1, 2, 3, 4}\nDiscrete([Bxyz,1, Bxyz,2, Bxyz,3, Bxyz,4])\nDiscrete([Bxyz,1, Bxyz,3, Bxyz,2, Bxyz,4])\nDiscrete([Bxyz,2, Bxyz,1, Bxyz,3, Bxyz,4])\nDiscrete([Bxyz,3, Bxyz,1, Bxyz,2, Bxyz,4])\nDiscrete([Bxyz,2, Bxyz,3, Bxyz,1, Bxyz,4])\nDiscrete([Bxyz,3, Bxyz,2, Bxyz,1, Bxyz,4])\n\nTable 1: Conditional probabilities of Eijk given si,jk, sj,ik and sk,ij. We de\ufb01ne x, y, z to be the ordered (i.e.\nsorted) values of si,jk, sj,ik, sk,ij.\nwhere s\u2212ijk is the set of all community memberships except for si,jk, sj,ik, sk,ij, and si,\u2212jk is the\nset of all community memberships of vertex i except for si,jk. The last three terms are predictive\ndistributions of a multinomial-Dirichlet model, with the multinomial parameter \u03b8 marginalized out:\n\nP (si,jk | si,\u2212jk, \u03b1) =\n\n# [si,\u2212jk = si,jk] + \u03b1\n\n# [si,\u2212jk] + K\u03b1\n\n.\n\nThe \ufb01rst term is also a multinomial-Dirichlet predictive distribution (refer to appendix for details).\n5 Comparing Mixed-Membership Network Models on Synthetic Networks\nFor a mixed-membership network model to be useful, it must recover some meaningful notion of\nmixed community membership for each vertex. The precise de\ufb01nition of network community has\nbeen a subject of much debate, and various notions of community [1, 15, 17, 12, 6] have been\nproposed under different motivations. Our MMTM, too, conveys another notion of community\nbased on membership in full triangles \u22063 and 2-triangles \u22062, which are key aspects of network\nclustering coef\ufb01cients. In our simulations, we shall compare our MMTM against an adjacency-\nmatrix-based model (MMSB), in terms of how well they recover mixed-memberships from networks\ngenerated under a range of assumptions. Note that some of these synthetic networks will not match\nthe generative assumptions of either our model or MMSB; this is intentional, as we want to compare\nthe performance of both models under model misspeci\ufb01cation.\nWe shall also demonstrate that MMTM leads to faster inference, particularly when \u03b4-subsampling\ntriangles (as described in Section 2). Intuitively, we expect the mixed-membership recovery of our\ninference algorithm to depend on (a) the degree distribution of the network, and (b) the \u201cdegree\nlimit\u201d \u03b4 used in subsampling the network; performance should increase as the number of vertices i\nhaving degree Di \u2264 \u03b4 goes up. In particular, our experiments will demonstrate that subsampling\nyields good performance even when the network contains a few vertices with very large degree Di\n(a characteristic of many real-world networks).\nSynthetic networks We compared our MMTM to MMSB3 [1] on multiple synthetic networks,\nevaluating them according to how well their inference algorithms recovered the vertex mixed-\nmembership vectors \u03b8i. Each network was generated from N = 4, 000 mixed-membership vectors\n\u03b8i of dimensionality K = 5 (i.e. 5 possible communities), according to one of several models:\n\n1\n\n1. The Mixed Membership Stochastic Blockmodel [1], an admixture generalization of the\nstochastic blockmodel. The probability of a link from i to j is \u03b8iB\u03b8j for some block matrix\nB, and we convert all directed edges into undirected edges. In our experiments, we use a\nB with on-diagonal elements Baa = 1/80, and off-diagonal elements Bab = 1/800. Our\nvalues of B are lower than typically seen in the literature, because they are intended to\nreplicate the 1-edge density of real-world networks with size around N = 4, 000.\n2. A simplex Latent position model, where the probability of a link between i, j is \u03b3(1 \u2212\n2||\u03b8i \u2212 \u03b8j||1) for some scaling parameter \u03b3. In other words, the closer that \u03b8i and \u03b8j are,\nthe higher the link probability. Note that 0 \u2264 ||\u03b8i \u2212 \u03b8j||1 \u2264 2, because \u03b8i and \u03b8j lie in the\nsimplex. We choose \u03b3 = 1/40, again to reproduce the 1-edge density seen in real networks.\n3. A \u201cBiased\u201d scale-free model that combines the preferred attachment model [7] with a\nmixed-membership model. Speci\ufb01cally, we generated M = 60, 000 1-edges as follows: (a)\npick a vertex i with probability proportional to its degree; (b) randomly pick a destination\ncommunity k from \u03b8i; (c) \ufb01nd the set Vk of all vertices v such that \u03b8vk is the largest\nelement of \u03b8v (i.e. the vertices that mostly belong to community k); (d) within Vk, pick\nthe destination vertex j with probability proportional to its degree. The resulting network\n\n3MMSB is applicable to both directed and undirected networks; our experiments use the latter.\n\n5\n\n\fMMSB\nLatent position\nBiased scale-free\nPure membership\n\n#0,1-edges #1-edges max(Di)\n51\n51\n231\n44\n\n7,998,000\n(cid:113)\n(cid:113)\n(cid:113)\n\n55,696\n56,077\n60,000\n55,651\n\n#\u22063, \u22062\n1,541,085\n1,562,710\n3,176,927\n1,533,365\n\n\u03b4 = 20\n749,018\n746,979\n497,737\n746,796\n\n\u03b4 = 15\n418,764\n418,448\n304,866\n418,222\n\n\u03b4 = 10\n179,841\n179,757\n144,206\n179,693\n\n\u03b4 = 5\n39,996\n39,988\n35,470\n39,986\n\nTable 2: Number of edges, maximum degree, and number of 3- and 2-edge triangles \u22063, \u22062 for each N =\n4, 000 synthetic network, as well as #triangles when subsampling at various degree thresholds \u03b4. MMSB\ninference is linear in #0,1-edges, while our MMTM\u2019s inference is linear in #\u22063, \u22062.\n\nexhibits both a block diagonal structure, as well as a power-law degree distribution. In\ncontrast, the other two models have binomial (i.e. Gaussian-like) degree distributions.\nTo use these models, we must input mixed-memberships \u03b8i. These were generated as follows:\n\n1. Divide the N = 4, 000 vertices into 5 groups of size 800. Assign each group to a (different)\n\ndominant community k \u2208 {1, . . . , 5}.\n\n2. Within each group:\n\n(a) Pick 160 vertices to have mixed-membership in 3 communities: 0.8 in the dominant\n\ncommunity k, and 0.1 in two other randomly chosen communities.\n\n(b) The remaining 640 vertices have mixed-membership in 2 communities: 0.8 in the\n\ndominant community k, and 0.2 in one other randomly chosen community.\n\nestimates according to(cid:80)\n\nIn other words, every vertex has a dominant community, and one or two other minor communities.\nUsing these \u03b8i\u2019s, we generated one synthetic network for each of the three models described. In\naddition, we generated a fourth \u201cpure membership\u201d network under the MMSB model, using pure\n\u03b8i\u2019s with full membership in the dominant community. This network represents the special case of\nsingle-community membership. Statistics for all 4 networks can be found in Table 2.\nInference and Evaluation For our MMTM4, we used our collapsed, blocked Gibbs sampler for\ninference. The hyperparameters were \ufb01xed at \u03b1, \u03bb = 0.1 and K = 5, and we ran each experiment\nfor 2,000 iterations. For evaluation, we estimated all \u03b8i\u2019s using the last sample, and scored the\ni ||\u02c6\u03b8i \u2212 \u03b8i||2, the sum of (cid:96)2 distances of each estimate \u02c6\u03b8i from its true value\n\u03b8i. These results were taken under the most favorable permutation for the \u02c6\u03b8i\u2019s, in order to avoid the\npermutation non-identi\ufb01ability issue. We repeated every experiment 5 times.\nTo investigate the effect of \u03b4-subsampling triangles (Section 2), we repeated every MMTM exper-\niment under four different values of \u03b4: 20, 15, 10 and 5. The triangles were subsampled prior to\nrunning the Gibbs sampler, and they remained \ufb01xed during inference.\nWith MMSB, we opted not to use the variational inference algorithm of [1], because we wanted our\nexperiments to be, as far as possible, a comparison of models rather than inference techniques. To\naccomplish this, we derived a collapsed, blocked Gibbs sampler for the MMSB model, with added\nBeta hyperparameters \u03bb1, \u03bb2 on each element of the block matrix B. The mixed-membership vectors\n\u03b8i (\u03c0i in the original paper) and blockmatrix B were integrated out, and we Gibbs sampled each edge\n(i, j)\u2019s associated community indicators zi\u2192j, zi\u2190j in a block fashion. Hence, this MMSB sampler\nuses the exact same techniques as our MMTM sampler, ensuring that we are comparing models\nrather than inference strategies. Furthermore, its per-iteration runtime is still \u0398(N 2), equal to the\noriginal MMSB variational algorithm. All experiments were conducted in exactly the same manner\nas with MMTM, with the MMSB hyperparameters \ufb01xed at \u03b1, \u03bb1, \u03bb2 = 0.1 and K = 5.\nResults Figure 3 plots the cumulative (cid:96)2 error for each experiment, as well as the time taken per\ntrial. On all 4 networks, the full MMTM model performs better than MMSB \u2014 even on the MMSB-\ngenerated network! MMTM also requires less runtime for all but the biased scale-free network,\nwhich has a much larger maximum degree than the others (Table 2). Furthermore, \u03b4-subsampling\nis effective: MMTM with \u03b4 = 20 runs faster than full MMTM, and still outperforms MMSB while\napproaching full MMTM in accuracy. The runtime bene\ufb01t is most noticable on the biased scale-free\nnetwork, underscoring the need to subsample real-world networks with high maximum degree.\nWe hypothesize MMSB\u2019s poorer performance on networks of this size (N = 4, 000) results from\nhaving \u0398(N 2) latent variables, while noting that the literature has only considered smaller N <\n1, 000 networks [1]. Compared to MMTM, having many latent variables not only increases runtime\nper iteration, but also the number of iterations required for convergence, since the latent variable state\nspace grows exponentially with the number of latent variables. In support of this, we have observed\n\n4As explained in Section 2, we \ufb01rst need to preprocess the network adjacency list into the \u22063, \u22062 triangle\nrepresentation. The time required is linear in the number of \u22063, \u22062 triangles, and is insigni\ufb01cant compared to\nthe actual cost of MMTM inference.\n\n6\n\n\fFigure 3: Mixed-membership community recovery task: Cumulative (cid:96)2 errors and runtime per trial for MMSB,\nMMTM and MMTM with \u03b4-subsampling, on N = 4, 000 synthetic networks.\nthat the MMSB sampler\u2019s complete log-likelihood \ufb02uctuates greatly across all 2000 iterations; in\ncontrast, the MMTM sampler plateaus within 500 iterations, and remains stable.\n\nScalability Experiments Although the preceding N = 4, 000 experiments appear fairly small, in\nactual fact, they are close to the feasible limit for adjacency-matrix-based models like MMSB. To\ndemonstrate this, we generated four networks with sizes N \u2208 {1000, 4000, 10000, 40000} from the\nMMSB generative model. The generative parameters for the N = 4, 000 network are identical to our\nearlier experiment, while the parameters for the other three network sizes were adjusted to maintain\nthe same average degree5. We then ran the MMSB, MMTM, and MMTM with \u03b4-subsampling\ninference algorithms on all 4 networks, and plotted the average per-iteration runtime in Figure 4.\nThe \ufb01gure clearly exposes the scalability differences between MMSB and MMTM. The \u03b4-\nsubsampled MMTM experiments show linear runtime dependence on N, which is expected since\nthe number of subsampled triangles is O(N \u03b42). The full MMTM experiment is also roughly linear\n\u2014 though we caution that this is not necessarily true for all networks, particularly high maximum\ndegree ones such as scale-free networks. Conversely, MMSB shows a clear quadratic dependence on\nN. In fact, we had to omit the MMSB N = 40, 000 experiment because the latent variables would\nnot \ufb01t in memory, and even if they did, the extrapolated runtime would have been unreasonably long.\n6 A Larger Network Demonstration\nThe MMTM model with \u03b4-subsampling scales to even larger networks than the ones we have been\ndiscussing. To demonstrate this, we ran the MMTM Gibbs sampler with \u03b4 = 20 on the SNAP\nStanford Web Graph6, containing N = 281, 903 vertices (webpages), 2, 312, 497 1-edges, and ap-\nproximately 4 billiion 2- and 3-edge triangles \u22063, \u22062, which we reduced to 11, 353, 778 via \u03b4 = 20-\nsubsampling. Note that the vast majority of triangles are associated with exceptionally high-degree\nvertices, which make up a small fraction of the network. By using \u03b4-subsampling, we limited the\nnumber of triangles that come from such vertices, thus making the network feasible for MMTM.\nWe ran the MMTM sampler with settings identical to our synthetic experiments: 2,000 sampling\niterations, hyperparameters \ufb01xed to \u03b1, \u03bb = 0.1. The experiment took 74 hours, and we observed\nlog-likelihood convergence within 500 iterations.\nThe recovered mixed-membership vectors \u03b8i are visualized in Figure 5. A key challenge is that\nthe \u03b8i exist in the 4-simplex \u22064, which is dif\ufb01cult to visualize in two dimensions. To overcome\nthis, Figure 5 uses both position and color to communicate the values of \u03b8i. Every vertex i is\ndisplayed as a circle ci, whose size is proportional to the network degree of i. The position of ci is\nequal to a convex combination of the 5 pentagon corners\u2019 (x, y) coordinates, where the coordinates\nare weighted by the elements of \u03b8i.\nIn particular, circles ci at the pentagon\u2019s corners represent\nsingle-membership \u03b8i\u2019s, while circles on the lines connecting the corners represent \u03b8i\u2019s with mixed-\nmembership in 2 communities. All other circles represent \u03b8i\u2019s with mixed-membership in \u2265 3\ncommunities. Furthermore, each circle ci\u2019s color is also a \u03b8i-weighted convex combination, this\ntime of the RGB values of 5 colors: blue, green, red, cyan and purple. This use of color helps\ndistinguish between vertices with 2 versus 3 or more communities: for example, even though the\nlargest circle sits on the blue-red line (which initially suggets mixed-membership in 2 communities),\nits dark green color actually comes from mixed-membership in 3 communities: green, red and cyan.\n\n5Note that the maximum degree still increases with N, because MMSB has a binomial degree distribution.\n6Available at http://snap.stanford.edu/data/web-Stanford.html\n\n7\n\nMMSBLatent positionBiased scale\u2212freePure membership050010001500200025003000350040004500Mixed\u2212membership community recovery: AccuracyCumulative L2 error  MMSBMMTMMMTM \u03b4=20MMTM \u03b4=15MMTM \u03b4=10MMTM \u03b4=5MMSBLatent positionBiased scale\u2212freePure membership024681012x 104Mixed\u2212membership community recovery: Total runtimeTotal runtime (s)  MMSBMMTMMMTM \u03b4=20MMTM \u03b4=15MMTM \u03b4=10MMTM \u03b4=5\fFigure 4:\nPer-iteration runtimes for MMSB,\nMMTM and MMTM with \u03b4-subsampling, on syn-\nthetic networks with N ranging from 1,000 to 40,000,\nbut with constant average degree.\n\nFigure 5: N = 281, 903 Stanford web graph,\nMMTM mixed-membership visualization.\n\nMost high-degree vertices (large circles) are found at the pentagon\u2019s corners, leading to the intuitive\nconclusion that the \ufb01ve communities are centered on hub webpages with many links. Interestingly,\nthe highest-degree vertices are all mixed-membership, suggesting that these webpages (which are\nmostly frontpages) lie on the boundaries between the communities. Finally, if we focus on the sets\nof vertices near each corner, we see that the green and red sets have distinct degree (i.e. circle size)\ndistributions, suggesting that those communities may be functionally different from the other three.\n7 Future Work and Conclusion\nWe have focused exclusively on triangular motifs because of their popularity in the literature, their\nrelationship to community structure through the network clustering coef\ufb01cient, and the ability to\nsubsample them in a natural, node-centric fashion with minor impact on accuracy. However, the\nbag-of-network-motifs idea extends beyond triangles \u2014 one could easily consider subgraphs over 4\nor more vertices, as in [13]. As with triangular motifs, it is algorithmically infeasible to consider all\npossible subgraphs; rather, we must focus our attention on a meaningful subset of them. Neverthe-\nless, higher order motifs could be more suited for particular tasks, thus meriting their investigation.\nIn modeling terms, we have applied triangular motifs to a generative mixed-membership setting,\nwhich is suitable for visualization but not necessarily for attribute prediction. Recent developments\nin constrained learning of generative models [23, 24] have yielded signi\ufb01cant improvements in pre-\ndictive accuracy, and these techniques are also applicable to mixed-membership triangular modeling.\nAlso, given how well \u03b4 = 20-subsampling works for MMTM at N = 4, 000, the next step would be\ninvestigating how to adaptively choose \u03b4 as N increases, in order to achieve good performance.\nTo summarize, we have introduced the bag-of-triangles representation as a parsimonius alternative to\nthe network adjacency matrix, and developed a model (MMTM) and inference algorithm for mixed-\nmembership community detection in networks. Compared to mixed-membership models that use\nthe adjacency matrix (exempli\ufb01ed by MMSB), our model features a much smaller latent variable\nspace, leading to faster inference and better performance at mixed-membership recovery. When\ncombined with triangle subsampling, our model and inference algorithm scale easily to networks\nwith 100,000s of vertices, which are completely infeasible for \u0398(N 2) adjacency-matrix-based mod-\nels \u2014 the adjacency matrix might not even \ufb01t in memory, to say nothing of runtime.\nAs a \ufb01nal note, we speculate that the local nature of the triangles lends itself better to parallel infer-\nence than the adjacency matrix representation; it may be possible to \ufb01nd good \u201ctriangle separators\u201d,\nsmall subsets of triangles that divide the remaining triangles into large, non-vertex-overlapping sub-\nsets, which can then be inferred in parallel. This is similar to classical 1-edge separators that di-\nvide networks into non-overlapping subgraphs, which are unfortunately inapplicable to adjacency-\nmatrix-based models, as they require separators over both the 0- and 1-edges. With triangle separa-\ntors, we expect triangle models to scale to networks with millions of vertices and more.\nAcknowledgments\nThis work was supported by AFOSR FA9550010247, NIH 1R01GM093156 to Eric P. Xing. Qirong\nHo is supported by an Agency for Science, Research and Technology, Singapore fellowship. Jun-\nming Yin is a Lane Fellow under the Ray and Stephanie Lane Center for Computational Biology.\n\n8\n\n00.511.522.533.54x 104050100150200250Per\u2212iteration runtime for MMSB and MMTM Gibbs samplersTime per iteration (s)Number of vertices  MMSBMMTMMMTM d=20MMTM d=15MMTM d=10MMTM d=5\fReferences\n[1] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixed membership stochastic blockmodels. The\n\nJournal of Machine Learning Research, 9:1981\u20132014, 2008.\n\n[2] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning\n\nResearch, 3:993\u20131022, 2003.\n\n[3] L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classi\ufb01cation\n\nof objects and scenes. In ICCV 2007, pages 1\u20138. IEEE, 2007.\n\n[4] B. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel\n\nneighborhoods. In ICCV 2009, pages 670\u2013677. IEEE, 2009.\n\n[5] Q. Ho, J. Eisenstein, and E.P. Xing. Document hierarchies from text and links. In Proceedings of the 21st\n\ninternational conference on World Wide Web, pages 739\u2013748. ACM, 2012.\n\n[6] Q. Ho, A. Parikh, L. Song, and EP Xing. Multiscale community blockmodel for network exploration. In\n\nProceedings of the 14th International Conference on Arti\ufb01cial Intelligence and Statistics, 2011.\n\n[7] M.J. Keeling and K.T.D. Eames. Networks and epidemic models. Journal of the Royal Society Interface,\n\n2(4):295\u2013307, 2005.\n\n[8] R. Kondor, N. Shervashidze, and K.M. Borgwardt. The graphlet spectrum. In Proceedings of the 26th\n\nAnnual International Conference on Machine Learning, pages 529\u2013536. ACM, 2009.\n\n[9] D. Krackhardt and M. Handcock. Heider vs simmel: Emergent features in dynamic structures. Statistical\n\nNetwork Analysis: Models, Issues, and New Directions, pages 14\u201327, 2007.\n\n[10] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In\nProceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 462\u2013470. ACM, 2008.\n\n[11] J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD\n\ninternational conference on Knowledge discovery and data mining, pages 631\u2013636. ACM, 2006.\n\n[12] K.T. Miller, T.L. Grif\ufb01ths, and M.I. Jordan. Nonparametric latent feature models for link prediction.\n\nAdvances in Neural Information Processing Systems (NIPS), pages 1276\u20131284, 2009.\n\n[13] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple\n\nbuilding blocks of complex networks. Science, 298(5594):824\u2013827, 2002.\n\n[14] R.M. Nallapati, A. Ahmed, E.P. Xing, and W.W. Cohen. Joint latent topic models for text and citations. In\nProceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 542\u2013550. ACM, 2008.\n\n[15] M.E.J. Newman. Modularity and community structure in networks. Proceedings of the National Academy\n\nof Sciences, 103(23):8577\u20138582, 2006.\n\n[16] M.E.J. Newman and J. Park. Why social networks are different from other types of networks. Arxiv\n\npreprint cond-mat/0305612, 2003.\n\n[17] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in\n\nneural information processing systems, 2:849\u2013856, 2002.\n\n[18] N. Shervashidze, SVN Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt. Ef\ufb01cient graphlet kernels\nfor large graph comparison. In Proceedings of the International Workshop on Arti\ufb01cial Intelligence and\nStatistics. Society for Arti\ufb01cial Intelligence and Statistics, 2009.\n\n[19] G. Simmel and K.H. Wolff. The Sociology of Georg Simmel. Free Press, 1950.\n[20] T.A.B. Snijders. Markov chain monte carlo estimation of exponential random graph models. Journal of\n\nSocial Structure, 3(2):1\u201340, 2002.\n\n[21] C.E. Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and\nlaws. In Data Mining, 2008. ICDM\u201908. Eighth IEEE International Conference on, pages 608\u2013617. IEEE,\n2008.\n\n[22] A. Vattani, D. Chakrabarti, and M. Gurevich. Preserving personalized pagerank in subgraphs. In ICML\n\n2011, 2011.\n\n[23] J. Zhu, A. Ahmed, and E.P. Xing. Medlda: maximum margin supervised topic models for regression and\nclassi\ufb01cation. In Proceedings of the 26th Annual International Conference on Machine Learning, pages\n1257\u20131264. ACM, 2009.\n\n[24] J. Zhu, N. Chen, and E.P. Xing. In\ufb01nite latent svm for classi\ufb01cation and multi-task learning. Advances in\n\nNeural Information Processing Systems, 25.\n\n9\n\n\f", "award": [], "sourceid": 1048, "authors": [{"given_name": "Qirong", "family_name": "Ho", "institution": null}, {"given_name": "Junming", "family_name": "Yin", "institution": null}, {"given_name": "Eric", "family_name": "Xing", "institution": null}]}