{"title": "Exploratory Data Analysis Using Radial Basis Function Latent Variable Models", "book": "Advances in Neural Information Processing Systems", "page_first": 529, "page_last": 535, "abstract": null, "full_text": "Exploratory Data Analysis Using Radial Basis \n\nFunction Latent Variable Models \n\nDERA \n\nSt Andrews Road, Malvern \n\nWorcestershire U.K. WR14 3PS \n\nAlan D. Marrs and Andrew R. Webb \n\n{marrs,webb}@signal.dera.gov.uk \n\n@British Crown Copyright 1998 \n\nAbstract \n\nTwo developments of nonlinear latent variable models based on radial \nbasis functions are discussed: in the first, the use of priors or constraints \non allowable models is considered as a means of preserving data structure \nin low-dimensional representations for visualisation purposes. Also, a \nresampling approach is introduced which makes more effective use of \nthe latent samples in evaluating the likelihood. \n\n1 \n\nINTRODUCTION \n\nRadial basis functions (RBF) have been extensively used for problems in discrimination \nand regression. Here we consider their application for obtaining low-dimensional repre(cid:173)\nsentations of high-dimensional data as part of the exploratory data analysis process. There \nhas been a great deal of research over the years into linear and nonlinear techniques for \ndimensionality reduction. The technique most commonly used is principal components \nanalysis (PCA) and there have been several nonlinear generalisations, each taking a partic(cid:173)\nular definition of PCA and generalising it to the nonlinear situation. \n\nOne approach is to find surfaces of closest fit (as a generalisation of the PCA definition \ndue to the work of Pearson (1901) for finding lines and planes of closest fit). This has \nbeen explored by Hastie and Stuetzle (1989), Tibshirani (1992) (and further by LeBlanc \nand Tibshirani, 1994) and various authors using a neural network approach (for example, \nKramer, 1991). Another approach is one of variance maximisation subject to constraints \non the transformation (Hotelling, 1933). This has been investigated by Webb (1996), using \na transformation modelled as an RBF network, and in a supervised context in Webb (1998). \n\nAn alternative strategy also using RBFs, based on metric multidimensional scaling, is de(cid:173)\nscribed by Webb (1995) and Lowe and Tipping (1996). Here, an optimisation criterion, \n\n\f530 \n\nA. D. Marrs and A. R. Webb \n\ntermed stress, is defined in the transformed space and the weights in an RBF model deter(cid:173)\nmined by minimising the stress. \n\nThe above methods use a radial basis function to model a transformation from the high(cid:173)\ndimensional data space to a low-dimensional representation space. A complementary ap(cid:173)\nproach is provided by Bishop et al (1998) in which the structure of the data is modelled as \na function of hidden or latent variables. Termed generative topographic mapping (GTM), \nthe model may be regarded as a nonlinear generalisation of factor analysis in which the \nmapping from latent space to data space is characterised by an RBF. \n\nSuch generative models are relevant to a wide range of applications including radar target \nmodelling, speech recognition and handwritten character recognition. \n\nHowever, one of the problems with GTM that limits its practical use for visualising data on \nmanifolds in high dimensional space arises from distortions in the structure that it imposes. \nThis is acknowledged in Bishop et al (1997) where 'magnification factors' are introduced \nto correct for the GTM's deficiency as a means of data visualisation. \n\nThis paper considers two developments: constraints on the permissible models and resam(cid:173)\npiing of the latent space. Section 2 presents the background to latent variable models; \nModel constraints are discussed in Section 3. Section 4 describes a re-sampling approach \nto estimation of the posterior pdf on the latent samples. An illustration is provided in Sec(cid:173)\ntion 5. \n\n2 BACKGROUND \n\nBriefly, we shall re-state the basic GTM model, retaining the notation of Bishop et al \n(1998). Let Ui, i = 1, ... , N}, ti E RP represent measurements on the data space vari(cid:173)\nables; Z E R. represent the latent variables. \nLet t be normally-distributed with mean y(z; W) and covariance matrix {3-1 I; y(z; W) \nis a nonlinear transformation that depends on a set of parameters W. Specifically, we shall \nassume a basis function model \n\ny(z; W) = L Wi I I \n.~.,...... I \n\u00a3~: l' \n-u r ... 1 \n-to.,. \n\n:., ...... \n\nI \n\nFigure 2: Results for standard GTM model. \n\nFigure 3: Results for regularisedlresampled model. \n\nFigure 2 shows results for the standard GTM (uniform grid of latent samples) projection of \nthe data to two dimensions. The central figure shows the projection onto the latent space, \nexhibiting significant distortion. The left figure shows the projection of the regular grid of \nlatent samples (red points) into the data space. Distortion of this grid can be easily seen. \nThe right figure is a plot of the magnification factor as defined in section 3, with mean value \nof 4.577. For this data set most stretching occurs at the edges of the latent variable space. \n\nFigure 3 shows results for the regularisedlresampled version of the latent variable model \nfor A = 1.0. Again the central figure shows the projection onto the latent space after 2 \niterations of the resampling procedure. The left-hand figure shows the projection of the \ninitial regular grid of latent samples into the data space. The effect of regularisation is \nevident by the lack of severe distortions. Finally the magnification factors can be seen in \nthe right-hand figure to be lower, with a mean value of 0.976. \n\n\fExploratory Data Analysis Using Radial Basis Function Latent Variable Models \n\n535 \n\n6 DISCUSSION \n\nWe have considered two developments of the GTM latent variable model: the incorporation \nof priors on the allowable model and a resampling approach to the maximum likelihood pa(cid:173)\nrameter estimation. Results have been presented for this regularisedlresampling approach \nand magnification factors lower than the standard model achieved, using the same RBF \nmodel. However, further reduction in magnification factor is possible with different RBF \nmodels, but the example illustrates that resampling offers a more robust approach. Current \nwork is aimed at assessing the approach on realistic data sets. \n\nReferences \n\nBishop, C.M. and Svensen, M. and Williams, C.K.1. (1997). Magnification factors for the \nGTM algorithm. lEE International Conference on Artificial Neural Networks, 465-471. \n\nBishop, C.M. and Svensen, M. and Williams, C.K.1. (1998). GTM: the generative topo(cid:173)\ngraphic mapping. Neural Computation, 10,215-234. \n\nHastie, T. and Stuetzle, W. (1989). Principal curves, Journal of the American Statistical \nAssociation, 84, 502-516. \n\nHotelling, H. (1933). Analysis of a complex of statistical variables into principal compo(cid:173)\nnents. Journal of Educational Psychology, 24, 417-441,498-520. \n\nKramer, M.A. (1991). Nonlinear principal component analysis using autoassociative neural \nnetworks. American Institute of Chemical Engineers Journal, 37(2),233-243. \n\nLeBlanc, M. and Tibshirani, R. (1994). Adaptive principal surfaces. Journal of the Ameri(cid:173)\ncan Statistical Association, 89(425), 53-664. \n\nLowe, D. and Tipping, M. (1996). Feed-forward neural networks and topographic map(cid:173)\npings for exploratory data analysis. Neural Computing and Applications, 4, 83-95. \n\nPearson, K. (1901). On lines and planes of closest fit. Philosophical Magazine, 6, 559-572. \n\nSalmond, D.J. (1990). Mixture reduction algorithms for target tracking in clutter. Signal & \nData processing of small targets, edited by O. Drummond, SPlE, 1305. \n\nSilverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & \nHall,1986. \n\nTibshirani, R. (1992). Principal curves revisited. Statistics and Computing, 2(4), 183-190. \n\nWebb, A.R. (1995). Multidimensional scaling by iterative majorisation using radial basis \nfunctions. Pattern Recognition, 28(5), 753-759. \n\nWebb, A.R. (1996). An approach to nonlinear principal components analysis using \nradially-symmetric kernel functions. Statistics and Computing, 6, 159-168. \n\nWebb, A.R. (1997). Radial basis functions for exploratory data analysis: an iterative ma(cid:173)\njorisation approach for Minkowski distances based on multidimensional scaling. Journal \nof Classification, 14(2),249-267. \n\nWebb, A.R. (1998). Supervised nonlinear principal components analysis. (submitted for \npublication ). \n\nWest, M. (1993). Approximating posterior distributions by mixtures. J. R. Statist. Soc B, \n55(2), 409-422. \n\n\f", "award": [], "sourceid": 1544, "authors": [{"given_name": "Alan", "family_name": "Marrs", "institution": null}, {"given_name": "Andrew", "family_name": "Webb", "institution": null}]}