{"title": "Learning to Classify Galaxy Shapes Using the EM Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 1521, "page_last": 1528, "abstract": null, "full_text": "Learning to Classify Galaxy Shapes Using the\n\nEM Algorithm\n\nSergey Kirshner\n\nInformation and Computer Science\n\nUniversity of California\nIrvine, CA 92697-3425\n\nIgor V. Cadez\n\nSparta Inc.,\n\n23382 Mill Creek Drive #100,\n\nLaguna Hills, CA 92653\n\nskirshne@ics.uci.edu\n\nigor cadez@sparta.com\n\nPadhraic Smyth\n\nInformation and Computer Science\n\nUniversity of California\nIrvine, CA 92697-3425\nsmyth@ics.uci.edu\n\nChandrika Kamath\n\nCenter for Applied Scienti\u00a3c Computing\nLawrence Livermore National Laboratory\n\nLivermore, CA 94551\nkamath2@llnl.gov\n\nAbstract\n\nWe describe the application of probabilistic model-based learning to the\nproblem of automatically identifying classes of galaxies, based on both\nmorphological and pixel intensity characteristics. The EM algorithm can\nbe used to learn how to spatially orient a set of galaxies so that they\nare geometrically aligned. We augment this \u201cordering-model\u201d with a\nmixture model on objects, and demonstrate how classes of galaxies can\nbe learned in an unsupervised manner using a two-level EM algorithm.\nThe resulting models provide highly accurate classi\u00a3cation of galaxies in\ncross-validation experiments.\n\n1\n\nIntroduction and Background\n\nThe \u00a3eld of astronomy is increasingly data-driven as new observing instruments permit the\nrapid collection of massive archives of sky image data. In this paper we investigate the\nproblem of identifying bent-double radio galaxies in the FIRST (Faint Images of the Radio\nSky at Twenty-cm) Survey data set [1]. FIRST produces large numbers of radio images of\nthe deep sky using the Very Large Array at the National Radio Astronomy Observatory. It\nis scheduled to cover more that 10,000 square degrees of the northern and southern caps\n(skies). Of particular scienti\u00a3c interest to astronomers is the identi\u00a3cation and cataloging\nof sky objects with a \u201cbent-double\u201d morphology, indicating clusters of galaxies ([8], see\nFigure 1). Due to the very large number of observed deep-sky radio sources, (on the order\nof 106 so far) it is infeasible for the astronomers to label all of them manually.\nThe data from the FIRST Survey (http://sundog.stsci.edu/) is available in both raw image\nformat and in the form of a catalog of features that have been automatically derived from\nthe raw images by an image analysis program [8]. Each entry corresponds to a single\ndetectable \u201cblob\u201d of bright intensity relative to the sky background: these entries are called\n\n\fFigure 1: 4 examples of radio-source galaxy images. The two on the left are labelled as\n\u201cbent-doubles\u201d and the two on the right are not. The con\u00a3gurations on the left have more\n\u201cbend\u201d and symmetry than the two non-bent-doubles on the right.\n\ncomponents. The \u201cblob\u201d of intensities for each component is \u00a3tted with an ellipse. The\nellipses and intensities for each component are described by a set of estimated features such\nas sky position of the centers (RA (right ascension) and Dec (declination)), peak density\n\u00a4ux and integrated \u00a4ux, root mean square noise in pixel intensities, lengths of the major and\nminor axes, and the position angle of the major axis of the ellipse counterclockwise from\nthe north. The goal is to \u00a3nd sets of components that are spatially close and that resemble\na bent-double. In the results in this paper we focus on candidate sets of components that\nhave been detected by an existing spatial clustering algorithm [3] where each set consists\nof three components from the catalog (three ellipses). As of the year 2000, the catalog\ncontained over 15,000 three-component con\u00a3gurations and over 600,000 con\u00a3gurations\ntotal. The set which we use to build and evaluate our models consists of a total of 128\nexamples of bent-double galaxies and 22 examples of non-bent-double con\u00a3gurations. A\ncon\u00a3guration is labelled as a bent-double if two out of three astronomers agree to label it\nas such. Note that the visual identi\u00a3cation process is the bottleneck in the process since it\nrequires signi\u00a3cant time and effort from the scientists, and is subjective and error-prone,\nmotivating the creation of automated methods for identifying bent-doubles.\n\nThree-component bent-double con\u00a3gurations typically consist of a center or \u201ccore\u201d com-\nponent and two other side components called \u201clobes\u201d. Previous work on automated classi\u00a3-\ncation of three-component candidate sets has focused on the use of decision-tree classi\u00a3ers\nusing a variety of geometric and image intensity features [3]. One of the limitations of the\ndecision-tree approach is its relative in\u00a4exibility in handling uncertainty about the object\nbeing classi\u00a3ed, e.g., the identi\u00a3cation of which of the three components should be treated\nas the core of a candidate object. A bigger limitation is the \u00a3xed size of the feature vec-\ntor. A primary motivation for the development of a probabilistic approach is to provide a\nframework that can handle uncertainties in a \u00a4exible coherent manner.\n\n2 Learning to Match Orderings using the EM Algorithm\n\nWe denote a three-component con\u00a3guration by C = (c 1; c2; c3), where the ci\u2019s are the\ncomponents (or \u201cblobs\u201d) described in the previous section. Each component cx is repre-\nsented as a feature vector, where the speci\u00a3c features will be de\u00a3ned later. Our approach\nfocuses on building a probabilistic model for bent-doubles: p (C) = p (c1; c2; c3), the like-\nlihood of the observed ci under a bent-double model where we implicitly condition (for\nnow) on the class \u201cbent-double.\u201d\n\nBy looking at examples of bent-double galaxies and by talking to the scientists study-\ning them, we have been able to establish a number of potentially useful characteristics\nof the components, the primary one being geometric symmetry. In bent-doubles, two of\nthe components will look close to being mirror images of one another with respect to a\nline through the third component. We will call mirror-image components lobe compo-\n\n\fcore\n\ncore\n\nlobe 2\n\n1\n\n2\n\n3\n\nlobe 2\n\nlobe 1\n\nlobe 1\n\ncomponent 2\n\ncomponent 1\n\ncomponent 3\n\nlobe 1\n\nlobe 1\n4\n\nlobe 1\n\nlobe 2\n5\n\nlobe 2\n\ncore\n6\n\nlobe 2\n\ncore\n\ncore\n\ncore\n\nlobe 2\n\nlobe 1\n\nFigure 2: Possible orderings for a hypothetical bent-double. A good choice of ordering\nwould be either 1 or 2.\n\nnents, and the other one the core component. It also appears that non-bent-doubles either\ndon\u2019t exhibit such symmetry, or the angle formed at the core component is too straight\u2014\nthe con\u00a3guration is not \u201cbent\u201d enough. Once the core component is identi\u00a3ed, we can\ncalculate symmetry-based features. However, identifying the most plausible core compo-\nnent requires either an additional algorithm or human expertise. In our approach we use a\nprobabilistic framework that averages over different possible orderings weighted by their\nprobability given the data.\n\nIn order to de\u00a3ne the features, we \u00a3rst need to determine the mapping of the components to\nlabels \u201ccore\u201d, \u201clobe 1\u201d, and \u201clobe 2\u201d (c, l1, and l2 for short). We will call such a mapping\nan ordering. Figure 2 shows an example of possible orderings for a con\u00a3guration. We can\nnumber the orderings 1; : : : ; 6. We can then write\n\np (C) =\n\n6\n\nX\n\nk=1\n\np (cc; cl1 ; cl2 j\u203a = k) p (\u203a = k) ;\n\n(1)\n\ni.e., a mixture over all possible orientations. Each ordering is assumed a priori to be equally\nlikely, i.e., p(\u203a = k) = 1\n6 . Intuitively, for a con\u00a3guration that clearly looks like a bent-\ndouble the terms in the mixture corresponding to the correct ordering would dominate,\nwhile the other orderings would have much lower probability.\nWe represent each component cx by M features (we used M = 3). Note that the features\ncan only be calculated conditioned on a particular mapping since they rely on properties of\nthe (assumed) core and lobe components. We denote by fmk (C) the values corresponding\nto the mth feature for con\u00a3guration C under the ordering \u203a = k, and by f mkj (C) we\ndenote the feature value of component j: fmk (C) = (fmk1 (C) ; : : : ; fmkBm (C)) (in our\ncase, Bm = 3 is the number of components). Conditioned on a particular mapping \u203a = k,\nwhere x 2 fc; l1; l2g and c,l1,l2 are de\u00a3ned in a cyclical order, our features are de\u00a3ned as:\n\n\u2020 f1k (C) : Log-transformed angle, the angle formed at the center of the component\n\n(a vertex of the con\u00a3guration) mapped to label x;\n\n\u2020 f2k (C) : Logarithms of side ratios, jcenter of x to center of next(x)j\njcenter of x to center of prev(x)j\n\u2020 f3k (C) : Logarithms of intensity ratios, peak \u00a4ux of next(x)\npeak \u00a4ux of prev(x)\n\n,\n\n;\n\nand so (Cj\u203a = k) = (f1k (C) ; f2k (C) f3k (C)) for a 9-dimensional feature vector in total.\nOther features are of course also possible. For our purposes in this paper this particular set\nappears to capture the more obvious visual properties of bent-double galaxies.\n\n\fFor a set D = fd1; : : : ; dN g of con\u00a3gurations, under an i.i.d. assumption for con\u00a3gura-\ntions, we can write the likelihood as\n\nP (D) =\n\nN\n\nK\n\nY\n\nX\n\ni=1\n\nk=1\n\nP (\u203ai = k) P (f1k (di) ; : : : ; fM k (di)) ;\n\nwhere \u203ai is the ordering for con\u00a3guration d i. While in the general case one can model\nP (f1k (di) ; : : : ; fM k (di)) as a full joint distribution, for the results reported in this paper\nwe make a number of simplifying assumptions, motivated by the fact that we have rela-\ntively little labelled training data available for model building. First, we assume that the\nfmk (di) are conditionally independent. Second, we are also able to reduce the number\nof components for each fmk (di) by noting functional dependencies. For example, given\ntwo angles of a triangle, we can uniquely determine the third one. We also assume that\nthe remaining components for each feature are conditionally independent. Under these\nassumptions the multivariate joint distribution P (f1k (di) ; : : : ; fM k (di)) is factored into\na product of simple distributions, which (for the purposes of this paper) we model using\nGaussians. If we know for every training example which component should be mapped to\nlabel c, we can then unambiguously estimate the parameters for each of these distributions.\n\nIn practice, however, the identity of the core component is unknown for each object. Thus,\nwe use the EM algorithm to automatically estimate the parameters of the above model.\nWe begin by randomly assigning an ordering to each object. For each subsequent iteration\nthe E-step consists of estimating a probability distribution over possible orderings for each\nobject, and the M-step estimates the parameters of the feature-distributions using the prob-\nabilistic ordering information from the E-step. In practice we have found that the algorithm\nconverges relatively quickly (in 20 to 30 iterations) on both simulated and real data. It is\nsomewhat surprising that this algorithm can reliably \u201clearn\u201d how to align a set of objects,\nwithout using any explicit objective function for alignment, but instead based on the fact\nthat feature values for certain orderings exhibit a certain self-consistency relative to the\nmodel. Intuitively it is this self-consistency that leads to higher-likelihood solutions and\nthat allows EM to effectively align the objects by maximizing the likelihood.\n\nAfter the model has been estimated, the likelihood of new objects can also be calculated\nunder the model, where the likelihood now averages over all possible orderings weighted\nby their probability given the observed features.\n\nThe problem described above is a speci\u00a3c instance of a more general feature unscrambling\nproblem. In our case, we assume that con\u00a3gurations of three 3-dimensional components\n(i.e. 3 features) each are generated by some distribution. Once the objects are generated, the\norders of their components are permuted or scrambled. The task is then to simultaneously\nlearn the parameters of the original distributions and the scrambling for each object. In the\nmore general form, each con\u00a3guration consists of L M-dimensional con\u00a3gurations. Since\nthere are L! possible orderings of L components, the problem becomes computationally\nintractable if L is large. One solution is to restrict the types of possible scrambles (to cyclic\nshifts for example).\n\n3 Automatic Galaxy Classi\u00a3cation\n\nWe used the algorithm described in the previous section to estimate the parameters of fea-\ntures and orderings of the bent-double class from labelled training data and then to rank\ncandidate objects according to their likelihood under the model. We used leave-one-out\ncross-validation to test the classi\u00a3cation ability of this supervised model, where for each\nof the 150 examples we build a model using the positive examples from the set of 149\n\u201cother\u201d examples, and then score the \u201cleft-out\u201d example with this model. The examples are\nthen sorted in decreasing order by their likelihood score (averaging over different possi-\n\n\f1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nt\n\ne\na\nr\n \n\ne\nv\ni\nt\ni\ns\no\np\n\n \n\ne\nu\nr\nT\n\n0\n0\n\n0.1\n\n0.2\n\n0.4\n\n0.3\n0.7\nFalse positive rate\n\n0.5\n\n0.6\n\n0.8\n\n0.9\n\n1\n\nFigure 3: ROC plot for a model using angle, ratio of sides, and ratio of intensities, as\nfeatures, and learned using ordering-EM with labelled data.\n\nble orderings) and the results are analyzed using a receiver operating characteristic (ROC)\nmethodology. We use AROC, the area under the curve, as a measure of goodness of the\nmodel, where a perfect model would have AROC = 1 and random performance corre-\nsponds to AROC = 0:5. The supervised model, using EM for learning ordering models,\nhas a cross-validated AROC score of 0.9336 (Figure 3) and appears to be quite useful at\ndetecting bent-double galaxies.\n\n4 Model-Based Galaxy Clustering\n\nA useful technique in understanding astronomical image data is to cluster image objects\nbased on their morphological and intensity properties. For example, consider how one\nmight cluster the image objects in Figure 1 into clusters, where we have features on angles,\nintensities, and so forth. Just as with classi\u00a3cation, clustering of the objects is impeded by\nnot knowing which of the \u201cblobs\u201d corresponds to the true \u201ccore\u201d component.\n\nFrom a probabilistic viewpoint, clustering can be treated as introducing another level of\nhidden variables, namely the unknown class (or cluster) identity of each object. We can\ngeneralize the EM algorithm for orderings (Section 2) to handle this additional hidden\nlevel. The model is now a mixture of clusters where each cluster is modelled as a mixture\nof orderings. This leads to a more complex two-level EM algorithm than that presented\nin Section 2, where at the inner-level the algorithm is learning how to orient the objects,\nand at the outer level the algorithm is learning how to group the objects into C classes.\nSpace does not permit a detailed presentation of this algorithm\u2014however, the derivation is\nstraightforward and produces intuitive update rules such as:\n\n^\u201ecmj =\n\n1\n\nN ^P (cl = cj\u00a3)\n\nN\n\nK\n\nX\n\nX\n\ni=1\n\nk=1\n\nP (cli = cj\u203ai = k; D; \u00a3) P (\u203ai = kjD; \u00a3) fmkj (di)\n\nwhere \u201ecmj is the mean for the cth cluster (1 \u2022 c \u2022 C), the mth feature (1 \u2022 m \u2022 M),\nand the jth component of fmk (di), and \u203ai = k corresponds to ordering k for the ith\nobject.\n\nWe applied this algorithm to the data set of 150 sky objects, where unlike the results in\nSection 3, the algorithm now had no access to the class labels. We used the Gaussian\nconditional-independence model as before, and grouped the data into K = 2 clusters.\nFigures 4 and 5 show the highest likelihood objects, out of 150 total objects, under the\n\n\fBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nBent\u2212double\n\nFigure 4: The 8 objects with the highest likelihood conditioned on the model for the larger\nof the two clusters learned by the unsupervised algorithm.\n\nBent\u2212double\n\nNon\u2212bent\u2212double\n\nNon\u2212bent\u2212double\n\nNon\u2212bent\u2212double\n\nNon\u2212bent\u2212double\n\nNon\u2212bent\u2212double\n\nBent\u2212double\n\nNon\u2212bent\u2212double\n\nFigure 5: The 8 objects with the highest likelihood conditioned on the model for the\nsmaller of the two clusters learned by the unsupervised algorithm.\n\n\fi\n\nk\nn\na\nR\n \nd\ne\ns\nv\nr\ne\np\nu\ns\nn\nU\n\n150\n\n100\n\n50\n\n0\n0\n\nbent\u2212doubles\nnon\u2212bent\u2212doubles\n\n50\n\n100\n\n150\n\nSupervised Rank\n\nFigure 6: A scatter plot of the ranking from the unsupervised model versus that of the\nsupervised model.\n\nmodels for the larger cluster and smaller cluster respectively. The larger cluster is clearly a\nbent-double cluster: 89 of the 150 objects are more likely to belong to this cluster under the\nmodel and 88 out of the 89 objects in this cluster have the bent-double label. In other words,\nthe unsupervised algorithm has discovered a cluster that corresponds to \u201cstrong examples\u201d\nof bent-doubles relative to the particular feature-space and model. In fact the non-bent-\ndouble that is assigned to this group may well have been mislabelled (image not shown\nhere). The objects in Figure 5 are clearly inconsistent with the general visual pattern of\nbent-doubles and this cluster consists of a mixture of non-bent-double and \u201cweaker\u201d bent-\ndouble galaxies. The objects in Figures 5 that are labelled as bent-doubles seem quite\natypical compared to the bent-doubles in Figure 4.\n\nA natural hypothesis is that cluster 1 (88 bent-doubles) in the unsupervised model is in fact\nvery similar to the supervised model learned using the labelled set of 128 bent-doubles in\nSection 3. Indeed the parameters of the two Gaussian models agree quite closely and the\nsimilarity of the two models is illustrated clearly in Figure 6 where we plot the likelihood-\nbased ranks of the unsupervised model versus those of the supervised model. Both models\nare in close agreement and both are clearly performing well in terms of separating the\nobjects in terms of their class labels.\n\n5 Related Work and Future Directions\n\nA related earlier paper is Kirshner et al [6] where we presented a heuristic algorithm for\nsolving the orientation problem for galaxies. The generalization to an EM framework in\nthis paper is new, as is the two-level EM algorithm for clustering objects in an unsupervised\nmanner.\n\nThere is a substantial body of work in computer vision on solving a variety of different\nobject matching problems using probabilistic techniques\u2014see Mjolsness [7] for early ideas\nand Chui et al. [2] for a recent application in medical imaging. Our work here differs in\na number of respects. One important difference is that we use EM to learn a model for\nthe simultaneous correspondence of N objects, using both geometric and intensity-based\nfeatures, whereas prior work in vision has primarily focused on matching one object to\n\n\fanother (essentially the N = 2 case). An exception is the recent work of Frey and Jojic\n[4, 5] who used a similar EM-based approach to simultaneously cluster images and estimate\na variety of local spatial deformations. The work described in this paper can be viewed as\nan extension and application of this general methodology to a real-world problem in galaxy\nclassi\u00a3cation.\n\nEarlier work on bent-double galaxy classi\u00a3cation used decision tree classi\u00a3ers based on a\nvariety of geometric and intensity-based features [3]. In future work we plan to compare\nthe performance of this decision tree approach with the probabilistic model-based approach\nproposed in this paper. The model-based approach has some inherent advantages over\na decision-tree model for these types of problems. For example, it can directly handle\nobjects in the catalog with only 2 blobs or with 4 or more blobs by integrating over missing\nintensities and over missing correspondence information using mixture models that allow\nfor missing or extra \u201cblobs\u201d. Being able to classify such con\u00a3gurations automatically is of\nsigni\u00a3cant interest to the astronomers.\n\nAcknowledgments\n\nThis work was performed under a sub-contract from the ASCI Scienti\u00a3c Data Manage-\nment Project of the Lawrence Livermore National Laboratory. The work of S. Kirshner\nand P. Smyth was also supported by research grants from NSF (award IRI-9703120), the\nJet Propulsion Laboratory, IBM Research, and Microsoft Research. I. Cadez was supported\nby a Microsoft Graduate Fellowship. The work of C. Kamath was performed under the aus-\npices of the U.S. Department of Energy by University of California Lawrence Livermore\nNational Laboratory under contract No. W-7405-Eng-48. We gratefully acknowledge our\nFIRST collaborators, in particular, Robert H. Becker for sharing his expertise on the sub-\nject.\n\nReferences\n\n[1] R. H. Becker, R. L. White, and D. J. Helfand. The FIRST Survey: Faint Images of the\n\nRadio Sky at Twenty-cm. Astrophysical Journal, 450:559, 1995.\n\n[2] H. Chui, L. Win, R. Schultz, J. S. Duncan, and A. Rangarajan. A uni\u00a3ed feature\nregistration method for brain mapping. In Proceedings of Information Processing in\nMedical Imaging, pages 300\u2013314. Springer-Verlag, 2001.\n\n[3] I. K. Fodor, E. Cant\u00b4u-Paz, C. Kamath, and N. A. Tang. Finding bent-double radio\nIn Proceedings of the Interface: Computer\n\ngalaxies: A case study in data mining.\nScience and Statistics Symposium, volume 33, 2000.\n\n[4] B. J. Frey and N. Jojic. Estimating mixture models of images and inferring spatial\ntransformations using the EM algorithm. In Proceedings of IEEE Computer Society\nConference on Computer Vision and Pattern Recognition, 1999.\n\n[5] N. Jojic and B. J. Frey. Topographic transformation as a discrete latent variable. In\n\nAdvances in Neural Information Processing Systems 12. MIT Press, 2000.\n\n[6] S. Kirshner, I. V. Cadez, P. Smyth, C. Kamath, and E. Cant\u00b4u-Paz. Probabilistic model-\nbased detection of bent-double radio galaxies. In Proceedings 16th International Con-\nference on Pattern Recognition, volume 2, pages 499\u2013502, 2002.\n\n[7] E. Mjolsness. Bayesian inference on visual grammars by neural networks that opti-\nmize. Technical Report YALEU/DCS/TR-854, Department of Computer Science, Yale\nUniversity, May 1991.\n\n[8] R. L. White, R. H. Becker, D. J. Helfand, and M. D. Gregg. A catalog of 1.4 GHz radio\n\nsources from the FIRST Survey. Astrophysical Journal, 475:479, 1997.\n\n\f", "award": [], "sourceid": 2197, "authors": [{"given_name": "Sergey", "family_name": "Kirshner", "institution": null}, {"given_name": "Igor", "family_name": "Cadez", "institution": null}, {"given_name": "Padhraic", "family_name": "Smyth", "institution": null}, {"given_name": "Chandrika", "family_name": "Kamath", "institution": null}]}