Part of Advances in Neural Information Processing Systems 21 (NIPS 2008)
Deli Zhao, Xiaoou Tang
Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we attempt to tackle clustering problems for complex data of multiple distributions and large multi-scales. To this end, we develop an algorithm named Zeta $l$-links, or Zell which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local $l$-links of the graph. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of the cluster in determinantal form. The popularity character of the cluster is conceptualized as the global fusion of variations of the structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data, real imagery data, and real sensory data show the promising performance of Zell. The $98.1\%$ accuracy, in the sense of the normalized mutual information, is obtained on the FRGC face data of 16028 samples and 466 facial clusters. The MATLAB codes of Zell will be made publicly available for peer evaluation.