{"title": "Joint MRI Bias Removal Using Entropy Minimization Across Images", "book": "Advances in Neural Information Processing Systems", "page_first": 761, "page_last": 768, "abstract": null, "full_text": "        Joint MRI Bias Removal Using Entropy\n                   Minimization Across Images\n\n\n\n            Erik G. Learned-Miller                         Parvez Ahammad\n       Department of Computer Science               Division of Electrical Engineering\n     University of Massachusetts, Amherst           University of California, Berkeley\n              Amherst, MA 01003                            Berkeley, CA 94720\n\n\n\n                                        Abstract\n\n         The correction of bias in magnetic resonance images is an important\n         problem in medical image processing. Most previous approaches have\n         used a maximum likelihood method to increase the likelihood of the pix-\n         els in a single image by adaptively estimating a correction to the unknown\n         image bias field. The pixel likelihoods are defined either in terms of a\n         pre-existing tissue model, or non-parametrically in terms of the image's\n         own pixel values. In both cases, the specific location of a pixel in the im-\n         age is not used to calculate the likelihoods. We suggest a new approach\n         in which we simultaneously eliminate the bias from a set of images of\n         the same anatomy, but from different patients. We use the statistics from\n         the same location across different images, rather than within an image, to\n         eliminate bias fields from all of the images simultaneously. The method\n         builds a \"multi-resolution\" non-parametric tissue model conditioned on\n         image location while eliminating the bias fields associated with the orig-\n         inal image set. We present experiments on both synthetic and real MR\n         data sets, and present comparisons with other methods.\n\n\n\n1    Introduction\n\nThe problem of bias fields in magnetic resonance (MR) images is an important problem\nin medical imaging. This problem is illustrated in Figure 1. When a patient is imaged in\nthe MR scanner, the goal is to obtain an image which is a function solely of the underlying\ntissue (left of Figure 1). However, typically the desired anatomical image is corrupted by a\nmultiplicative bias field (2nd image of Figure 1) that is caused by engineering issues such\nas imperfections in the radio frequency coils used to record the MR signal. The result is a\ncorrupted image (3rd image of Figure 1). (See [1] for background information.) The goal\nof MR bias correction is to estimate the uncorrupted image from the corrupted image.\n\nA variety of statistical methods have been proposed to address this problem. Wells et\nal. [7] developed a statistical model using a discrete set of tissues, with the brightness\ndistribution for each tissue type (in a bias-free image) represented by a one-dimensional\nGuassian distribution. An expectation-maximization (EM) procedure was then used to\nsimultaneouly estimate the bias field, the tissue type, and the residual noise. While this\nmethod works well in many cases, it has several drawbacks: (1) Models must be developed\na priori for each type of acquistion (for each different setting of the MR scanner), for each\n\n\f\nFigure 1: On the left is an idealized mid-axial MR image of the human brain with little\nor no bias field. The second image is a simulated low-frequency bias field. It has been\nexaggerated for ease of viewing. The third image is the result of pixelwise multiplication\nof the image by the bias field. The goal of MR bias correction is to recover the low-bias\nimage on the left from the biased image on the right. On the right is the sine/cosine basis,\nused to construct band-limited bias fiels (see text).\n\n\n\nnew area of the body, and for different patient populations (like infants and adults). (2)\nModels must be developed from \"bias-free\" images, which may be difficult or impossible\nto obtain in many cases. (3) The model assumes a fixed number of tissues, which may\nbe inaccurate. For example, during development of the human brain, there is continuous\nvariability between gray matter and white matter. In addition, a discrete tissue model does\nnot handle so-called partial volume effects in which a pixel represents a combination of\nseveral tissue types. This occurs frequently since many pixels occur at tissue boundaries.\n\nNon-parametric approaches have also been suggested, as for example by Viola [10]. In\nthat work, a non-parametric model of the tissue was developed from a single image. Using\nthe observation that the entropy of the pixel brightness distribution for a single image is\nlikely to increase when a bias field is added, Viola's method postulates a bias-correction\nfield by minimizing the entropy of the resulting pixel brightness distribution. This approach\naddresses several of the problems of fixed-tissue parametric models, but has its own draw-\nbacks: (1) The statistical model may be weak, since it is based on data from only a single\nimage. (2) There is no mechanism for distinguishing between certain low-frequency image\ncomponents and a bias field. That is, the method may mistake signal for noise in certain\ncases when removal of the true signal reduces the entropy of the brightness distriibution.\nWe shall show that this is a problem in real medical images.\n\nThe method we present overcomes or improves upon problems associated with both of\nthese methods and their many variations (see, e.g., [1] for recent techniques). It models tis-\nsue brightness non-parametrically, but uses data from multiple images to provide improved\ndistribution estimates and alleviate the need for bias-free images for making a model. It\nalso conditions on spatial location, taking advantage of a rich information source ignored\nin other methods. Experimental results demonstrate the effectiveness of our method.\n\n\n2    The Image Model and Problem Formulation\n\nWe assume we are given a set I of observed images Ii with 1  i  N, as shown on the\nleft side of Figure 2. Each of these images is assumed to be the product of some bias-free\nimage Li and a smooth bias field Bi  B. We shall refer to the bias-free images as latent\nimages (also called intrinsic images by some authors). The set of all latent images shall be\ndenoted L and the set of unknown bias fields B. Then each observed image can be written\nas the product Ii(x, y) = Li(x, y)  Bi(x, y), where (x, y) gives the pixel coordinates of each\npoint, with P pixels per image.\n\n\f\nConsider again Figure 2. A pixel-stack through each image set is shown as the set of pixels\ncorresponding to a particular location in each image (not necessarily the same tissue type).\nOur method relies on the principle that the pixel-stack values will have lower entropy when\nthe bias fields have been removed. Figure 3 shows the simulated effect, on the distribution\nof values in a pixel-stack, of adding different bias fields to each image.\n\nThe latent image generation model assumes that each pixel is drawn from a fixed distribu-\ntion px,y() which gives the probability of each gray value at the the location (x, y) in the\nimage. Furthermore, we assume that all pixels in the latent image are independent, given\nthe distributions from which they are drawn. It is also assumed that the bias fields for each\nimage are chosen independently from some fixed distribution over bias fields. Unlike most\nmodels for this problem which rely on statistical regularities within an image, we take a\ncompletely orthogonal approach by assuming that pixel values are independent given their\nimage locations, but that pixel-stacks in general have low entropy when bias fields are\nremoved.\n\nWe formulate the problem as a maximum a posteriori (MAP) problem, searching for the\nmost probable bias fields given the set of observed images. Letting B represent the 25-\ndimensional product space of smooth bias fields (corresponding to the 25 basis images of\nFigure 1), we wish to find\n\n                                  (a)\n               arg maxP(B|I)       =      arg maxP(I|B)P(B)                                                            (1)\n                 BB                        BB\n\n                                  (b)\n                                   =      arg maxP(I|B)                                                                (2)\n                                            BB\n\n                                   (c)\n                                   =      arg maxP(L(I, B))                                                            (3)\n                                            BB\n\n                                                                N\n                                   =      arg max  px,y(Li(x,y))                                                     (4)\n                                            BB         x,y i=1\n\n                                                                N\n                                   =      arg max   log px,y(Li(x,y))                                                (5)\n                                            BB         x,y i=1\n\n                                  (d)\n                                         arg min H(px,y)                                                             (6)\n                                           BB          x,y\n\n                                   (e)\n                                         arg min  ^HVasicek(L1(x,y),...,LN(x,y))                                     (7)\n                                           BB          x,y\n\n                                                                      I                         I\n                                   =                                   1(x, y)                  N (x, y)\n                                          arg min  ^HVasicek(                        , ...,                     ).    (8)\n                                           BB                        B                         B\n                                                        x,y                1(x, y)                   N (x, y)\n\n\n\nHere H is the Shannon entropy (-E(log P(x))) and ^\n                                                                      HVasicek is a sample-based entropy\nestimator.1 (a) is just an application of Bayes rule. (b) assumes a uniform prior over the\nallowed bias fields. The method can easily be altered to incorporate a non-uniform prior.\n\n   1The entropy estimator used is similar to Vasicek's estimator [6], given (up to minor details) by\n\n                                                   1      N-m         N\n                     ^\n                    HVasicek(Z1, ..., ZN) =                     log (Z(i+m) -Z(i)) ,                                  (9)\n                                               N - m                  m\n                                                               i=1\n\n\nwhere Zi's represent the values in a pixel-stack, Z(i)'s represent those same values in rank order, N is\nthe number of values in the pixel-stack and m is a function of N (like N0.5) such that m/N goes to 0\nas m and N go to infinity. These entropy estimators are discussed at length elsewhere [3].\n\n\f\nFigure 2: On the left are a set of mid-coronal brain images from eight different infants,\nshowing clear signs of bias fields. A pixel-stack, a collection of pixels at the same point\nin each image, is represented by the small square near the top of each image. Although\nthere are probably no more than two or three tissue types represented by the pixel-stack,\nthe brightness distribution through the pixel-stack has high empirical entropy due to the\npresence of different bias fields in each image. On the right are a set of images that have\nbeen corrected using our bias field removal algorithm. While the images are still far from\nidentical, the pixel-stack entropies have been reduced by mapping similar tissues to similar\nvalues in an \"unsupervised\" fashion, i.e. without knowing or estimating the tissue types.\n\n\n\n(c) expresses the fact that the probability of the observed image given a particular bias field\nis the same as the probability of the latent image associated with that observed image and\nbias field. The approximation (d) replaces the empirical mean of the log probability at each\npixel with the negative entropy of the underlying distribution at that pixel. This entropy is\nin turn estimated (e) using the entropy estimator of Vasicek [6] directly from the samples in\nthe pixel-stack, without ever estimating the distributions px,y explicitly. The inequality (d)\nbecomes an equality as N grows large by the law of large numbers, while the consistency\nof Vasicek's entropy estimator [2] implies that (e) also goes to equality with large N. (See\n[2] for a review of entropy estimators.)\n\n\n3    The Algorithm\n\nUsing these ideas, it is straightforward to construct algorithms for joint bias field removal.\nAs mentioned above, we chose to optimize Equation (8) over the set of band-limited bias\nfields. To do this, we parameterize the set of bias fields using the sine/cosine basis images\nshown on the right of Figure 1:\n\n                                            25\n                                    B               \n                                      i =  j j(x, y).\n                                            j=1\n\n\nWe optimize Equation (8) by simultaneously updating the bias field estimates (taking a step\nalong the numerical gradient) for each image to reduce the overall entropy. That is, at time\nstep t, the coefficients  j for each bias field are updated using the latent image estimates\nand entropy estimates from time step t - 1. After all 's have been updated, a new set of\nlatent images and pixel-stack entropies are calculated, and another gradient step is taken.\nThough it is possible to do a full gradient descent to convergence by optimizing one image\nat a time, the optimization landscape tends to have more local minima for the last few\nimages in the process. The appeal of our joint gradient descent method, on the other hand,\nis that the ensemble of images provides a natural smoothing of the optimization landscape\n\n\f\n                    8                                                 8\n\n\n\n\n                    7                                                 7\n\n\n\n\n                    6                                                 6\n\n\n\n\n                    5                                                 5\n\n\n\n\n                    4                                                 4\n\n\n\n\n                    3                                                 3\n\n\n\n\n                    2                                                 2\n\n\n\n\n                    1                                                 1\n\n\n\n\n                    0                                                 0\n                         0    50    100    150       200       250         0    50    100    150    200    250\n\n\n\n\n\nFigure 3: On the left is a simulated distribution from a pixel-stack taken through a particu-\nlar set of bias-free mid-axial MR images. The two sharp peaks in the brightness distribution\nrepresent two tissues which are commonly found at that particular pixel location. On the\nright is the result of adding an independent bias field to each image. In particular, the\nspread, or entropy, of the pixel distribution increases. In this work, we seek to remove bias\nfields by seeking to reduce the entropy of the pixel-stack distribution to its original state.\n\n\n\nin the joint process. It is in this sense that our method is \"multi-resolution\", proceeding\nfrom a smooth optimization in the beginning to a sharper one near the end of the process.\n\nWe now summarize the algorithm:\n\n      1. Initialize the bias field coefficients for each image to 0, with the exception of\n          the coefficient for the DC-offset (the constant bias field component), which is\n          initialized to 1. Initialize the gradient descent step size  to some value.\n\n      2. Compute the summed pixelwise entropies for the set of images with initial \"neu-\n          tral\" bias field corrections. (See below for method of computation.)\n\n      3. Iterate the following loop until no further changes occur in the images.\n\n           (a) For each image:\n\n                 i. Calculate the numerical gradient HVasicek of (8) with respect to the bias\n                    field coefficients ( j's) for the current image.\n                ii. Set  =  +  ^\n                                                  HVasicek.\n           (b) Update  (reduce its value according to some schedule).\n\nUpon convergence, it is assumed that the entropy has been reduced as much as possible by\nchanging the bias fields, unless one or more of the gradient descents is stuck in a local min-\nimum. Empirically, the likelihood of sticking in local minima is dramatically reduced by\nincreasing the number of images (N) in the optimization. In our experiments described be-\nlow with only 21 real infant brains, the algorithm appears to have found a global minimum\nof all bias fields, at least to the extent that this can be discerned visually.\n\nNote that for a set of identical images, the pixel-stack entropies are not increased by mul-\ntiplying each image by the same bias field (since all images will still be the same). More\ngenerally, when images are approximately equivalent, their pixel-stack entropies are not\nsignficantly affected by a \"common\" bias field, i.e. one that occurs in all of the images.2\nThis means that the algorithm cannot, in general, eliminate all bias fields from a set of im-\nages, but can only set all of the bias fields to be equivalent. We refer to any constant bias\nfield remaining in all of the images after convergence as the residual bias field.\n\n   2Actually, multiplying each image by a bias field of small magnitude can artificially reduce the\nentropy of a pixel-stack, but this is only the result of the brightness values shrinking towards zero.\nSuch artificial reductions in entropy can be avoided by normalizing a distribution to unit variance\nbetween iterations of computing its entropy, as is done in this work.\n\n\f\nFortunately, there is an effect that tends to minimize the impact of the residual bias field\nin many test cases. In particular, the residual bias field tends to consist of components for\neach  j that approximate the mean of that component across images. For example, if half\nof the observed images have a positive value for a particular component's coefficient, and\nhalf have a negative coefficient for that component, the residual bias field will tend to have\na coefficient near zero for that component. Hence, the algorithm naturally eliminates bias\nfield effects that are non-systematic, i.e. that are not shared across images.\n\nIf the same type of bias field component occurs in a majority of the images, then the algo-\nrithm will not remove it, as the component is indistinguishable, under our model, from the\nunderlying anatomy. In such a case, one could resort to within-image methods to further\nreduce the entropy. However, there is a risk that such methods will remove components\nthat actually represent smooth gradations in the anatomy. This can be seen in the bottom\nthird of Figure 4, and will be discussed in more detail below.\n\n\n4     Experiments\n\nTo test our algorithm, we ran two sets of experiments, the first on synthetic images for\nvalidation, and the second on real brain images. We obtained synthetic brain images from\nthe BrainWeb project [8, 9] such as the one shown on the left of Figure 1. These images can\nbe considered \"idealized\" MR images in the sense that the brightness values for each tissue\nare constant (up to a small amount of manually added isotropic noise). That is, they contain\nno bias fields. The initial goal was to ensure that our algorithm could remove synthetically\nadded bias fields, in which the bias field coefficients were known. Using K copies of a\nsingle \"latent\" image, we added known but different bias fields to each one. For as few as\nfive images, we could reliably recover the known bias field coefficients, up to a fixed offset\nfor each image, to within 1% of the power of the original bias coefficients.\n\nMore interesting are the results on real images, in which the latent images come from\ndifferent patients. We obtained 21 pre-registered3 infant brain images (top of Figure 4)\nfrom Brigham and Women's Hospital in Boston, Massachusetts. Large bias fields can be\nseen in many of the images. Probably the most striking is a \"ramp-like\" bias field in the\nsixth image of the second row. (The top of the brain is too bright, while the bottom is too\ndark.) Because the brain's white matter is not fully developed in these infant scans, it is\ndifficult to categorize tissues into a fixed number of classes as is typically done for adult\nbrain images; hence, these images are not amenable to methods based on specific tissue\nmodels developed for adults (e.g. [7]).\n\nThe middle third of Figure 4 shows the results of our algorithm on the infant brain images.\n(These results must be viewed in color on a good monitor to fully appreciate the results.)\nWhile a trained technician can see small imperfections in these images, the results are\nremarkably good. All major bias artifacts have been removed.\n\nIt is interesting to compare these results to a method that reduces the entropy of each image\nindividually, without using constraints between images. Using the results of our algorithm\nas a starting point, we continued to reduce the entropy of the pixels within each image\n(using a method akin to Viola's [10]), rather than across images. These results are shown\nin the bottom third of Figure 4. Carefully comparing the central brain regions in the middle\nsection of the figure and the bottom section of the figure, one can see that the butterfly\nshaped region in the middle of the brain, which represents developing white matter, has\n\n     3It is interesting to note that registration is not strictly necessary for this algorithm to work. The\nproposed MAP method works under very broad conditions, the main condition being that the bias\nfields do not span the same space as parts of the actual medical images. It is true, however, that as the\nlatent images become less registered or differ in other ways, that a much larger number of images is\nneeded to get good estimates of the pixel-stack distributions.\n\n\f\nbeen suppressed in the lower images. This is most likely because the entropy of the pixels\nwithin a particular image can be reduced by increasing the bias field \"correction\" in the\ncentral part of the image. In other words, the algorithm strives to make the image more\nuniform by removing the bright part in the middle of the image. However, our algorithm,\nwhich compares pixels across images, does not suppress these real structures, since they\noccur across images. Hence coupling across images can produce superior results.\n\n\n5    Discussion\n\nThe idea of minimizing pixelwise entropies to remove nuisance variables from a set of im-\nages is not new. In particular, Miller et al. [4, 5] presented an approach they call congealing\nin which the sum of pixelwise entropies is minimized by separate affine transforms applied\nto each image. Our method can thus be considered an extension of the congealing process\nto non-spatial transformations. Combining such approaches to do registration and bias re-\nmoval simulataneously, or registration and lighting rectification of faces, for example, is an\nobvious direction for future work.\n\nThis work uses information unused in other methods, i.e. information across images. This\nsuggests an iterative scheme in which both types of information, both within and across\nimages, are used. Local models could be based on weighted neighborhoods of pixels, pixel\ncylinders, rather than single pixel-stacks, in sparse data scenarios. For \"easy\" bias correc-\ntion problems, such an approach may be overkill, but for difficult problems in bias correc-\ntion, where the bias field is difficult to separate from the underlying tissue, as discussed in\n[1], such an approach could produce critical extra leverage.\n\nWe would like to thank Dr. Terrie Inder and Dr. Simon Warfield for graciously providing\nthe infant brain images for this work. The images were obtained under NIH grant P41\nRR13218. Also, we thank Neil Weisenfeld and Sandy Wells for helpful discussions. This\nwork was partially supported by Army Research Office grant DAAD 19-02-1-0383.\n\n\nReferences\n\n[1] Fan, A., Wells, W., Fisher, J., Cetin, M., Haker, S., Mulkern, C., Tempany, C., Willsky, A. A\n     unified variational approach to denoising and bias correction in MR. Proceedings of IPMI, 2003.\n\n[2] Beirlant, J., Dudewicz, E., Gyorfi, L. and van der Meulen, E. Nonparametric entropy estimation:\n     An overview. International Journal of Mathematical and Statistical Sciences, 6. pp.17-39. 1997.\n\n[3] Learned-Miller, E. G. and Fisher, J. ICA using spacings estimates of entropy. Journal of Machine\n     Learning Research, Volume 4, pp. 1271-1295, 2003.\n\n[4] Miller, E. G., Matsakis, N., Viola, P. A. Learning from one example through shared densities on\n     transforms. IEEE Conference on Computer Vision and Pattern Recognition. 2000.\n\n[5] Miller, E. G. Learning from one example in machine vision by sharing probability densities.\n     Ph.D. thesis. Massachusetts Institute of Technology. 2002.\n\n[6] Vasicek, O. A test for normality based on sample entropy. Journal of the Royal Statistical Society\n     Series B, 31. pp. 632-636, 1976.\n\n[7] Wells, W. M., Grimson, W. E. L., Kikinis, R., Jolesz, F. Adaptive segmentation of MRI data.\n     IEEE Transactions on Medical Imaging, 15. pp. 429-442, 1996.\n\n[8] Collins, D.L., Zijdenbos, A.P., Kollokian, J.G., Sled, N.J., Kabani, C.J., Holmes, C.J., Evans,\n     A.C. Design and Construction of a realistic digital brain phantom. IEEE Transactions on Medical\n     Imaging, 17. pp. 463-468, 1998.\n\n[9] http://www.bic.mni.mcgill.ca/brainweb/\n\n[10] Viola, P.A. Alignment by maximization of mutual information. Ph.D. Thesis. Massachusetts\n     Institute of Technology. 1995.\n\n\f\nFigure 4: NOTE: This image must be viewed in color (preferably on a bright display) for\nfull effect. Top. Original infant brain images. Middle. The same images after bias removal\nwith our algorithm. Note that developing white matter (butterfly-like structures in middle\nbrain) is well-preserved. Bottom. Bias removal using a single image based algorithm.\nNotice that white matter structures are repressed.\n\n\f\n", "award": [], "sourceid": 2684, "authors": [{"given_name": "Erik", "family_name": "Learned-miller", "institution": null}, {"given_name": "Parvez", "family_name": "Ahammad", "institution": null}]}