{"title": "Geometric entropy minimization (GEM) for anomaly detection and localization", "book": "Advances in Neural Information Processing Systems", "page_first": 585, "page_last": 592, "abstract": null, "full_text": "Geometric entropy minimization (GEM) for anomaly detection and localization\nAlfred O Hero, III University of Michigan Ann Arbor, MI 48109-2122 hero@umich.edu\n\nAbstract\nWe introduce a novel adaptive non-parametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N   their span recovers the entropy minimizing set that supports at least  = K/N (100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance  = 1 - . A method for implementing this non-parametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.\n\n1\n\nIntroduction\n\nAnomaly detection and localization are important but notoriously difficult problems. In such problems it is crucial to identify a nominal or baseline feature distribution with respect to which statistically significant deviations can be reliably detected. However, in most applications there is seldom enough information to specify the nominal density accurately, especially in high dimensional feature spaces for which the baseline shifts over time. In such cases standard methods that involve estimation of the multivariate feature density from a fixed training sample are inapplicable (high dimension) or unreliable (shifting baseline). In this paper we propose an adaptive non-parametric method that is based on a class of entropic graphs [1] called K -point minimal spanning trees [2] and overcomes the limitations of high dimensional feature spaces and baseline shift. This method detects outliers by comparing them to the most concentrated subset of points in the training sample. It follows from [2] that this most concentrated set converges to the minimum entropy set of probability  as N   and K/N  . Thus we call this approach to anomaly detection the geometric entropy minimization (GEM) method. Several approaches to anomaly detection have been previously proposed. Parametric approaches such as the generalized likelihood ratio test lead to simple and classical algorithms such as the Student t-test for testing deviation of a Gaussian test sample from a nominal mean value and the Fisher F-test for testing deviation of a Gaussian test sample from a nominal variance. These methods fall under the statistical nomenclature of the classical slippage problem [3] and have been applied to detecting abrupt changes in dynamical systems, image segmentation, and general fault detection applications [4]. The main drawback of these algorithms is that they rely on a family of parameterically defined nominal (no-fault) distributions.\n\n\f\nAn alternative to parametric methods of anomaly detection are the class of novelty detection algorithms and include the GEM approach described herein. Scholkopf and Smola introduced a kernelbased novelty detection scheme that relies on unsupervised support vector machines (SVM) [5]. The single class minimax probability machine of Lanckriet etal [6] derives minimax linear decision regions that are robust to unknown anomalous densities. More closely related to our GEM approach is that of Scott and Nowak [7] who derive multiscale approximations of minimum-volume-sets to estimate a particular level set of the unknown nominal multivariate density from training samples. For a simple comparative study of several of these methods in the context of detecting network intrusions the reader is referred to [8]. The GEM method introduced here has several features that are summarized below. (1) Unlike the MPM method of Lanckriet etal [6] the GEM anomaly detector is not restricted to linear or even convex decision regions. This translates to higher power for specified false alarm level. (2) GEMs computational complexity scales linearly in dimension and can be applied to level set estimation in feature spaces of unprecedented (high) dimensionality. (3) GEM has no complicated tuning parameters or function approximation classes that must be chosen by the user. (4) Like the method of Scott and Nowak [7] GEM is completely non-parametric, learning the structure of the nominal distribution without assumptions of linearity, smoothness or continuity of the level set boundaries. (5) Like Scott and Nowak's method, GEM is provably optimal - indeed uniformly most powerful of specified level - for the case that the anomaly density is a mixture of the nominal and a uniform density. (6) GEM easily adapts to local structure, e.g. changes in local dimensionality of the support of the nominal density. We introduce an incremental Leave-one-out (L1O) kNNG as a particularly versatile and fast anomaly detector in the GEM class. Despite the similarity in nomenclature, the L1O kNNG is different from k nearest neighbor (kNN) anomaly detection of [9]. The kNN anomaly detector is based on thresholding the distance from the test point to the k-th nearest neighbor. The L1O kNNG detector computes the change in the topology of the entire kNN graph due to the addition of a test sample and does not use a decision threshold. Furthermore, the parent GEM anomaly detection methodology has proven theoretical properties, e.g. the (restricted) optimality property for uniform mixtures and general consistency properties. We introduce the statistical framework for anomaly detection in the next section. We then describe the GEM approach in Section . Several simulations are presented n Section 4.\n\n2\n\nStatistical framework\n\nThe setup is the following. Assume that a training sample Xn = {X1 , . . . , Xn } of d-dimensional vectors Xi is available. Given a new sample X the objective is to declare X to be a \"nominal\" sample consistent with Xn or an \"anomalous\" sample that is significantly different from Xn . This declaration is to be constrained to give as few false positives as possible. To formulate this problem we adopt the standard statistical framework for testing composite hypotheses. Assume that Xn is an independent identically distributed (i.i.d.) sample from a multivariate density f0 (x) supported on the unit d-dimensional cube [0, 1]d . Let X have density f (x). Anomaly detection can be formulated as testing the hypotheses H0 : f = fo versus H0 : f = fo at a prescribed level  of significance P (declare H1 |H0 )  . The mini um-volume-set of level  is defined as a set  in I d which minimizes the volume m R  | | =  dx subject to the constraint  f0 (x)dx  1 - . The minimum-entropy-set of level  1  R  is defined as a set  in I d which minimizes the Renyi entropy H ( ) = 1- ln  f  (x)dx  subject to the constraint  f0 (x)dx  1 - . Here  is any real valued parameter between 0 <  < 1. When f is a Lebesgue density in I d it is easy to show that these three sets are identical R almost everywhere. The test \"decide anomaly if X   \" is equivalent to implementing the test function 1 . , x   (x) = 0, o.w. This test has a strong optimality property: when f0 is Lebesgue continuous it is a uniformly most powerful (UMP) level  for testing anomalies that follow a uniform mixture distribution. Specif-\n\n\f\nically, let X have density f (x) = (1 - )f0 (x) + U (x) where U (x) is the uniform density over [0, 1]d and  [0, 1]. Consider testing the hypotheses H0 H1 : : =0 >0 (1) (2)\n\nProposition 1 Assume that under H0 the random vector X has a Lebesgue continuous density f0 and that Z = f0 (X ) is also a continuous random variable. Then the level-set test of level  is uniformly most powerful for testing (2). Furthermore, its power function  = P (X   |H1 ) is given by  = (1 - ) + (1 - | |). A sufficient condition for the random variable Z above to be continuous is that the density f0 (x) have no flat spots over its support set {f0 (x) > 0}. The proof of this proposition is omitted. There are two difficulties with implementing the level set test. First, for known f0 the level set may be very difficult if not impossible to determine in high dimensions d 2. Second, when only a training sample from f0 is available and f0 is unknown the level sets have to be learned from the training data. There are many approaches to doing this for minimum volume tests and these are reviewed in [7]. These methods can be divided into two main approaches: density estimation followed by plug in estimation of  via variational methods; and (2) direct estimation of the level set using function approximation and non-parametric estimation. Since both approaches involve explicit approximation of high dimensional quantities, e.g. the multivariate density or the boundary of the set , these methods are difficult to apply in high dimensional problems, i.e. d > 2. The GEM method we propose in the next section overcomes these difficulties.\n\n3\n\nGEM and entropic graphs\n\nGEM is a method that directly estimates the critical region for detecting anomalies using minimum coverings of subsets of points in a nominal training sample. These coverings are obtained by constructing minimal graphs, e.g. a MST or kNNG, covering a K -point subset that is a given proportion of the training sample. Points not covered by these K -point minimal graphs are identified as tail events and allow one to adaptively set a pvalue for the detector. For a set of n points Xn in I d a graph G over Xn is a pair (V , E ) where V = Xn is the set of vertices R and E = {e} is the set of edges of the graph. The total power weighted length, or, more simply, the e  length, of G is L(Xn ) = E |e| where  > 0 is a specified edge exponent parameter. 3.1 K-point MST\n\nThe MST with power weighting  is defined as the graph that spans Xn with minimum total length: e |e| . LM S T (Xn ) = min\nT T T\n\nns Definition 1 K-point MST: Let Xn,K denote one of the K ubsets of K distinct points from Xn . Among all of the MST's spanning these sets, the K-MST is defined as the one having minimal length minXn,K Xn LM S T (Xn,k ). The K -MST thus specifies the minimal subset of K points in addition to specifying the minimum length. This subset of points, which we call a minimal graph covering of Xn of size K , can be viewed as capturing the densest region of Xn . Furthermore, if Xn is a i.i.d. sample from a multivariate density f (x) and if limK,n K/n =  and a greedy version of the K -MST is implemented, this set converges a.s. to the minimum  -entropy set containing a proportion of at least  = K/n of the mass of the (Lebesgue component of) f (x), where  = (d -  )/d. This fact was used in [2] to motivate the greedy K -MST as an outlier resistant estimator of entropy for finite n, K . Define the K -point subset\n Xn,K = argminXn,K Xn LM S T (Xn,K )\n\nwhere T is the set of all trees spanning Xn .\n\n\f\nselected by the greedy K-MST. Then we have the following As the minimum entropy set and minimum volume set are identical, this suggests the following minimal-volume-set anomaly detection algorithm, which we call the \"K-MST anomaly detector.\" K-MST anomaly detection algorithm [1]Process training sample: Given a level of significance  and a training sample Xn =  {X1 , . . . , Xn }, construct the greedy K-MST and retain its vertex set Xn,K . [2]Process test sample: Given a test sample X run the K-MST on the merged training-test sample  Xn+1 = Xn  {X } and store the minimal set of points Xn+1,K . [3]Make decision: Using the test function  defined below decide H1 if (X ) = 1 and decide H0 if (X ) = 0. 1  , X  Xn+1,K (x) = . 0, o.w. When the density f0 generating the training sample is Lebesgue continuous, it follows from [2, Theorem 2] that as K, n   the K-MST anomaly detector has false alarm probability that converges to  = 1 - K/n and power that converges to that of the minimum-volume-set test of level . When the density f0 is not Lebesgue continuous some optimality properties of the K-MST anomaly detector still hold. Let this nominal density have the decomposition f0 = 0 + 0 , where 0 is Lebesgue continuous and 0 is singular. Then, according to [2, Theorem 2], the K-MST anomaly detector will have false alarm probability that converges to (1 -  ), where  is the mass of the singular component of f0 , and it is a uniformly most powerful test for anomalies in the continuous component, i.e. for the test of H0 :  = 0 ,  = 0 against H1 :  = (1 - )0 + U (x),  = 0 . It is well known that the K-MST construction is of exponential complexity in n [10]. In fact, even for K = n - 1, a case one can call the leave-one-out MST, there is no simple fast algorithm for computation. However, the leave-one-out kNNG, described below, admits a fast incremental algorithm. 3.2 K-point kNNG\n\nLet Xn = {X1 , . . . , Xn } be a set of n points. The k nearest neighbors (kNN) {Xi(1) , . . . Xi(k) } of a point Xi  Xn are the k closest points to Xi points in Xn - {Xi }. Here the measure of closeness is the Euclidean distance. Let {ei(1) , . . . , ei(k) } be the set of edges between Xi and its k nearest neighbors. The kNN graph (kNNG) over Xn is defined as the union of all of the kNN edges {ei(1) , . . . , ei(k) }n 1 and the total power weighted edge length of the kNN graph is i= LkN N (Xn ) = in l k\n=1 =1\n\n|ei(l) | .\n\nns Definition 2 K-point kNNG: Let Xn,K denote one of the K ubsets of K distinct points from Xn . Among all of the kNNG over each of these sets, the K-kNNG is defined as the one having minimal length minXn,K Xn LkN N (Xn,k ). As the kNNG length is also a quasi additive continuous functional [11], the asymptotic KMST theory of [2] extends to the K-point kNNG. Of course, computation of the K-point kNNG also has exponential complexity. However, the same type of greedy approximation introduced by Ravi [10] for the K -MST can be implemented to reduce complexity of the K-point kNNG. This approximation to the K-point kNNG will satisfy the tightly coverable graph property of [2, Defn. 2]. We have the following result that justifies the use of such an approximation as an anomaly detector of level  = 1 - , where  = K/n:\n Proposition 2 Let Xn,K be the set of points in Xn that results from any approximation to the K point kNNG that satisfies the property [2, Defn. 2]. Then limn P0 (Xn,K   )  1 and  limn P0 (Xn,K   )  0, where K = K (n) = floor(n),  is a minimum-volume-set of level  = 1 -  and  = [0, 1]d -  .\n\n\f\nProof: We provide a rough sketch using the terminology of [2]. Recall that a set B m  [0, 1]d of resolution 1/m is representable by a union of elements of the uniform partition of [0, 1]d into hypercubes of volume 1/md . Lemma 3 of [2] asserts that there exists an M such that for m > M the limits claimed in Proposition 2 hold with  replaced by Am , a minimum volume set of resolution  1/m that contains  . As limm Am =  this establishes the proposition.  Figures 1-2 illustrate the use of the K-point kNNG as an anomaly detection algorithm.\nBivariate Gaussian mixture density 3 2 1 0 -1 -2 -3 -4 -5 -6 3 2 1 0 -1 -2 -3 -4 -5 -6 K-point kNNG, k=5, N=200, =0.9, K=180\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\nFigure 1: Left: level sets of the nominal bivariate mixture density used to illustrate the K point kNNG anomaly detection algorithms. Right: K-point kNNG over N=200 random training samples drawn from the nominal bivariate mixture at left. Here k=5 and K=180, corresponding to a significance level of  = 0.1.\nK-point kNNG, k=5, N=200, =0.9, K=180 3 2 1 0 -1 -2 -3 -4 -5 -6 3 2 1 0 -1 -2 -3 -4 -5 -6 K-point kNNG, k=5, N=200, =0.9, K=180\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\nFigure 2: Left: The test point '*' is declared anomalous at level  = 0.1 as it is not captured by the K-point kNNG (K=180) constructed over the combined test sample and the training samples drawn from the nominal bivariate mixture shown in Fig. 1. Right: A different test point '*' is declared non-anomalous as it is captured by this K-point kNNG. 3.3 Leave-one-out kNNG (L1O-kNNG)\n\nThe theoretical equivalence between the K-point kNNG and the level set anomaly detector motivates a low complexity anomaly detection scheme, which we call the leave-one-out kNNG, discussed in this section and adopted for the experiments below. As before, assume a single test sample X = Xn+1 and a training sample Xn . Fix k and assume that the kNNG over the set Xn has been computed. To determine the kNNG over the combined sample Xn+1 = Xn  {Xn+1 } one can execute the following algorithm: L1O kNNG anomaly detection algorithm 1. For each Xi  Xn+1 , i = 1, . . . , n + 1, compute the kNNG total length difference i LkN N = LkN N (Xn+1 ) - LkN N (Xn+1 - {Xi }) by the following steps. For each i:\n\n\f\n(a) Find the k edges Eik  of all of the kNN's of Xi .  k (b) Find the edges Ei of other points in Xn+1 - {Xi } that have Xi as one of their k kNNs. For these points find the edges E +1 to their respective k + 1st NN point. e e e   (c) Compute i LkN N = |e| + |e| - k E k E k E +1 |e|\ni i\n\n2. Define the kNNG most \"outlying point\" as Xo = argmaxi=1,...,n+1 i LkN N . 3. Declare the test sample Xn+1 an anomaly if Xn+1 = Xo . This algorithm will detect anomalies with a false alarm level of approximately 1/(n + 1). Thus larger sizes n of the training sample will correspond to more stringent false alarm constraints. Furthermore, the p-value of each test point Xi is easily computed by recursing over the size n of the training v f sample. In particular, let n ary from k to n and define n as the minimum value of n or which Xi is declared an anomaly. Then the p-value of Xi is approximately 1/(n + 1). A useful relative influence coefficient  can be defined for each point Xi in the combined sample Xn+1  (Xi ) = i LkN N . maxi i LkN N (3)\n\nThe coefficient  (Xn+1 ) = 1 when the test point Xn+1 is declared an anomaly. Using matlab's matrix sort algorithm step 1 of this algorithm can be computed an order of magnitude faster than the K-point MST (N 2 log N vs N 3 log N ). For example, the experiments below have shown that the above algorithm can find and determine the p-value of 10 outliers among 1000 test samples in a few seconds on a Dell 2GHz processor running Matlab 7.1.\n\n4\n\nIllustrative examples\n\nHere we focus on the L1O kNNG algorithm due to its computational speed. We show a few representative experiments for simple Gaussian and Gaussian mixture nominal densities f0 .\nL1O kNN scores. rho=0.998, Mmin=500 , detection rate=0.009\niteration 20, pvalue 0.001 iteration 203, pvalue 0.001 4 2 0 -2 -4 -6 -4 4 2 0 -2 -4 -6 -4 iteration 246, pvalue 0.001\n\n1 0.8 score =  /max ( ) 0.6 0.4 0.2 0 -0.2 -0.4 100 200 300 400 500 600 sample number 700 800 900 1000\ni\n\n4 2 0 -2 -4 -6 -4\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n-2\n\n0\n\n2\n\n4\n\n6\n\ni\n\niteration 294, pvalue 0.001443 4 2 0 -2 -4 -6 -2 4 2 0 -2 -4 -6 -4\n\niteration 307, pvalue 0.001 4 2 0 -2 -4 -6 -2\n\niteration 334, pvalue 0.001\n\nn\n\n0\n\n2\n\n4\n\n6\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n0\n\n2\n\n4\n\n6\n\niteration 574, pvalue 0.001 4 2 0 0 -2 -4 -6 -5 -2 -4 -2 4 2\n\niteration 712, pvalue 0.0011614 4 2 0 -2 -4 -6 -2\n\niteration 791, pvalue 0.0011682\n\n0\n\n5\n\n0\n\n2\n\n4\n\n6\n\n0\n\n2\n\n4\n\n6\n\nFigure 3: Left: The plot of the anomaly curve for the L1O kNNG anomaly detector for detecting deviations from a nominal 2D Gaussian density with mean (0,0) and correlation coefficient -0.5. The boxes on peaks of curve correspond to positions of detected anomalies and the height of the boxes are equal to one minus the computed p-value. Anomalies were generated (on the average) every 100 samples and drawn from a 2D Gaussian with correlation coefficient 0.8. The parameter  is equal to 1 - , where  is the user defined false alarm rate. Right: the resampled nominal distribution (\"\") and anomalous points detected (\"*\") at the iterations indicated at left. First we illustrate the L1O kNNG algorithm for detection of non-uniformly distributed anomalies from training samples following a bivariate Gaussian nominal density. Specifically, a 2D Gaussian density with mean (0,0) and correlation coefficient -0.5 was generated to train of the L1O kNNG detector. The test sample consisted of a mixture of this nominal and a zero mean 2D Gaussian with correlation coefficient 0.8 with mixture coefficient = 0.01. In Fig. 3 the results of simulation with a training sample of 2000 samples and 1000 tests samples are shown. Fig. 3 is a plot of the relative\n\n\f\ninfluence curve (3) over the test samples as compared to the most outlying point in the (resampled) training sample. When the relative influence curve is equal to 1 the corresponding test sample is the most outlying point and is declared an anomaly. The 9 detected anomalies in Fig. 3 have p-values less than 0.001 and therefore one would expect an average of only one false alarm at this level of significance. In the right panel of Fig. 3 the detected anomalies (asterisks) are shown along with the training sample (dots) used to grow the L1O kNNG for that particular iteration - note that to protect against bias the training sample is resampled at each iteration. Next we compare the performance of the L1O kNNG detector to that of the UMP test for the hypotheses (2). We again trained on a bivariate Gaussian f0 with mean zero, but this time with identical component variances of  = 0.1. This distribution has essential support on the unit square. For this simple case the minimum volume set of level  is a disk centered at the ori gin with radius 2 2 ln 1/ and the power of the of the UMP can be computed in closed form:  = (1 - ) + (1 - 2  2 ln 1/). We implemented the GEM anomaly detector with the incremental leave-one-out kNNG using k = 5. The training set consisted of 1000 samples from f0 and the test set consisted of 1000 samples from the mixture of a uniform density and f0 with parameter ranging from 0 to 0.2. Figure 4 shows the empirical ROC curves obtained using the GEM test vs the theoretical curves (labeled \"clairvoyant\") for several different values of the mixing parameter. Note the good agreement between theoretical prediction and the GEM implementation of the UMP using the kNNG.\nROC curves for Gaussian+uniform mixture. k=5, N=1000, Nrep=10 0.5 L1O-kNN Clairvoyant =0.5\n\n0.45\n\n0.4\n\n0.35 =0.3\n\n0.3\n\n0.25\n\n\n\n0.2 =0.1\n\n0.15\n\n0.1\n\n=0\n\n0.05\n\n0\n\n0\n\n0.01\n\n0.02\n\n0.03\n\n0.04\n\n0.05 \n\n0.06\n\n0.07\n\n0.08\n\n0.09\n\n0.1\n\nFigure 4: ROC curves for the leave-one-out kNNG anomaly detector described in Sec. 3.3. The labeled \"clairvoyant\" curve is the ROC of the UMP anomaly detector. The training sample is a zero mean 2D spherical Gaussian distribution with standard deviation 0.1 and the test sample is a this 2D Gaussian and a 2D uniform-[0, 1]2 mixture density. The plot is for various values of the mixture parameter .\n\n5\n\nConclusions\n\nA new and versatile anomaly detection method has been introduced that uses geometric entropy minimization (GEM) to extract minimal set coverings that can be used to detect anomalies from a set of training samples. This method can be implemented through the K-point minimal spanning tree (MST) or the K-point nearest neighbor graph (kNNG). The L1O kNNG is significantly less computationally demanding than the K-point MST. We illustrated the L1O kNNG method on simulated data containing anomalies and showed that it comes close to achieving the optimal performance of the UMP detector for testing the nominal against a uniform mixture with unknown mixing parameter. As the L1O kNNG computes p-values on detected anomalies it can be easily extended to account for false discovery rate constraints. By using a sliding window, the methodology derived in this paper is easily extendible to on-line applications and has been applied to non-parametric intruder detection using our Crossbow sensor network testbed (reported elsewhere).\n\n\f\nAcknowledgments This work was partially supported by NSF under Collaborative ITR grant CCR-0325571.\n\nReferences\n[1] A. Hero, B. Ma, O. Michel, and J. Gorman, \"Applications of entropic spanning graphs,\" IEEE Signal Processing Magazine, vol. 19, pp. 8595, Sept. 2002. www.eecs.umich.edu/~hero/imag_proc.html. [2] A. Hero and O. Michel, \"Asymptotic theory of greedy approximations to minimal k-point random graphs,\" IEEE Trans. on Inform. Theory, vol. IT-45, no. 6, pp. 19211939, Sept. 1999. [3] T. S. Ferguson, Mathematical Statistics - A Decision Theoretic Approach. Academic Press, Orlando FL, 1967. [4] I. V. Nikiforov and M. Basseville, Detection of abrupt changes: theory and applications. Prentice-Hall, Englewood-Cliffs, NJ, 1993. [5] B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, \"Support vector method for novelty detection,\" in Advances in Neural Information Processing Systems (NIPS), vol. 13, 2000. [6] G. R. G. Lanckriet, L. El Ghaoui, and M. I. Jordan, \"Robust novelty detection with single-class mpm,\" in Advances in Neural Information Processing Systems (NIPS), vol. 15, 2002. [7] C. Scott and R. Nowak, \"Learning minimum volume sets,\" Journal of Machine Learning Research, vol. 7, pp. 665704, April 2006. [8] A. Lazarevic, A. Ozgur, L. Ertoz, J. Srivastava, and V. Kumar, \"A comparative study of anomaly detection schemes in network intrusion detection,\" in SIAM Conference on data mining, 2003. [9] S. Ramaswamy, R. Rastogi, and K. Shim, \"Efficient algorithms for mining outliers from large data sets,\" in Proceedings of the ACM SIGMOD Conference, 2000. [10] R. Ravi, M. Marathe, D. Rosenkrantz, and S. Ravi, \"Spanning trees short or small,\" in Proc. 5th Annual ACM-SIAM Symposium on Discrete Algorithms, (Arlington, VA), pp. 546555, 1994. [11] J. E. Yukich, Probability theory of classical Euclidean optimization, vol. 1675 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1998.\n\n\f\n", "award": [], "sourceid": 3145, "authors": [{"given_name": "Alfred", "family_name": "Hero", "institution": null}]}