{"title": "A Neural Network that Learns to Interpret Myocardial Planar Thallium Scintigrams", "book": "Advances in Neural Information Processing Systems", "page_first": 755, "page_last": 762, "abstract": null, "full_text": "A Neural Network that Learns to Interpret \nMyocardial Planar Thallium Scintigrams \n\nCharles Rosenberg, Ph.D: \n\nDepartment of Computer Science \n\nHebrew University \nJerusalem, Israel \n\nJacob Erel, M.D. \n\nDepartment of Cardiology \n\nSapir Medical Center \nMeir General Hospital \n\nKfar Saba, Israel \n\nHenri Atlan, M.D., PhD. \n\nDepartment of Biophysics and Nuclear Medicine \n\nHadassah Medical Center \n\nJerusalem, Israel \n\nAbstract \n\nThe planar thallium-201 myocardial perfusion scintigram is a widely used \ndiagnostic technique for detecting and estimating the risk of coronary \nartery disease. Neural networks learned to interpret 100 thallium scinti(cid:173)\ngrams as determined by individual expert ratings. Standard error back(cid:173)\npropagation was compared to standard LMS, and LMS combined with \none layer of RBF units. Using the \"leave-one-out\" method, generaliza(cid:173)\ntion was tested on all 100 cases. Training time was determined automati(cid:173)\ncally from cross-validation perfonnance. Best perfonnance was attained \nby the RBF/LMS network with three hidden units per view and compares \nfavorably with human experts. \n\n1 \n\nIntroduction \n\nCoronary artery disease (CAD) is one of the leading causes of death in the Western World. \nThe planar thallium-201 is considered to be a reliable diagnostic tool in the detection of \n\n\u2022 Current address: Geriatrics, Research, Educational and Clinical Center, VA Medical Center, Salt \n\nLake City, Utah. \n\n755 \n\n\f756 \n\nRosenberg, Erel, and Atlan \n\nCAD. Thallium is a radioactive isotope that distributes in mammalian tissues after interve(cid:173)\nnous administration and is imaged by a gamma camera. The resulting scintigram is visually \ninterpreted by the physician for the presence or absence of defects -\nareas with relatively \nlower perfusion levels. In myocardial applications, thallium is used to measure myocardial \nischemia and to differentiate between viable and non-viable (infarcted) heart muscle (po(cid:173)\nhost and Henzlova, 1990). \n\nDiagnosis of CAD is based on the comparison of two sets of images, one set acquired \nimmediately after a standard effort test (BRUCE protocol), and the second following a \ndelay period of four hours. During this delay, the thallium redistributes in the heart muscle \nand spontaneously decays. Defects caused by scar tissue are relatively unchanged over \nthe delay period (fixed defect), while those caused by ischemia are partially or completely \nfilled-in (reversible defect) (Beller, 1991; Datz et al., 1992). \n\nImage interpretation is difficult for a number of reasons: the inherent variability in biolog(cid:173)\nical systems which makes each case essentially unique, the vast amount of irrelevant and \nnoisy information in an image, and the \"context-dependency\" of the interpretation on data \nfrom many other tests and clinical history. Interpretation can also be significantly affected \nby attentional shifts, perceptual abilities, and mental state (Franken Jr. and Berbaum, 1991; \nCuar6n et al., 1980). \n\nWhile networks have found considerable application in ECG processing (e.g. (Artis et al., \n1991)) and clinical decision-making (Baxt, 1991b; Baxt, 1991a), they have thus far found \nlimited application in the field of nuclear medicine. Non-cardiac imaging applications in(cid:173)\nclude the grading of breast carcinomas (Dawson et al., 1991) and the discrimination of nor(cid:173)\nmal vs. Alzheimer's PET scans (Kippenhan et al., 1990). Of the studies dealing specifically \nwith cardiac imaging, neural networks have been applied to several problems in cardiology \nincluding the identification of stenosis (Porenta et al., 1990; Cios et al., 1989; Cios et al., \n1991; Cianflone et al., 1990; Fujita et al., 1992). These studies encouraged us to explore \nthe use of neural networks in the interpretation of cardiac scintigraphy. \n\n2 Methods \n\nWe trained one network consisting of a layer of gaussian RBF units in an unsupervised fash(cid:173)\nion to discover features in circumferential profiles in planar thallium scintigraphy. Then a \nsecond network was trained in a supervised way to map these features to physician's visual \ninterpretations of those images using the delta rule (Widrow and Hoff, 1960). This archi(cid:173)\ntecture was previously found to compare favorably to other network learning algorithms \n(2-layer backpropagation and single-layer networks) on this task (Rosenberg et al., 1993; \nErel et al., 1993). \nIn our experiments, all of the input vectors representing single views f were first normalized \nto unit length V = IIfll . The activation value of a gaussian unit, OJ, is then given by: \n\nnetj \nO \u00b7 = exp(--) \n\n1 \n\nw \n\n(1) \n\n(2) \n\nwhere j is an index to a gaussian unit and i is an input unit index. The width of the gaussian, \n\n\fA Neural Network that Learns to Interpret Myocardial Planar Thallium Scintigrams \n\n757 \n\n0 \n\nIi: \n\n1/1 \n\nR~gion.1 Scores \n\nI\\-\n~ \n\n0 \n\n~ \n\n)( \nIL. \n< \n\nIL. \nIi. \n;!; \n\n0 \nIi. \n~ \n\nIL. \n\n.:. \nt. \n\n<;I \nt-\nt. \nIII I \n\n\u2022 Severe \n\u2022 Moderate \n\n\" Mild \n\no Normal \n\nIL. \n\n1/1 \n\nIi: \nIll! \n\nOutput \n\nRBF \n\nInput \n\nANT \n\nLAO 45 \n\nVIEWS \n\nLAT \n\nFigure 1: The network architecture. The first layer (Input) encoded the three circumferen(cid:173)\ntial profiles representing the three views, anterior (ANT), left lateral oblique (LAO). and \nleft lateral (LAT). The second layer consisted of radial basis function (RBF) units, the third \nlayer, semi-linear units trained in a supervised fashion. The outputs of the network corre(cid:173)\nsponded to the visual scores as given by the expert observer. An additional unit per view \nencoded the scaling factor of the input patterns lost as a result of input normalization. \n\ngiven by w, was fixed at 0.25 for all units 1\u2022 \n\nThe gaussian units were trained using a competitive learning rule which moves the center \nof the unit closest to the current input pattern (Omax, i.e. the \"winner\") closer to the input \npattern2: \n\n~tui,winner = 1](v; - Wi,winner) \n\n(3) \n\n2.1 Data Acquisition and Selection \n\nScintigraphic images were acquired for each of three views: anterior (ANT), left lateral \noblique (LAO 45), and left lateral (LAT) for each patient case. Acquisition was performed \ntwice, once immediately following a standard effort test and once following a delay period \nof four hours. Each image was pre-processed to produce a circumferential profile (Garcia \net aI., 1981; Francisco et aI., 1982) , in which maximum pixel counts within each of 60, \n6\u00b0 contiguous segmental regions are plotted as a function of angle (Garcia, 1991). Pre(cid:173)\nprocessing involved positioning of the region of interest (ROI), interpolative background \nsubtraction, smoothing and rotational alignment to the heart's apex (Garcia, 1991). \n\n1 We have considered applying the learning rule to the unit widths (w) as well as the RBF weights, \n\nhowever we have not as yet pursued this possibility. \n\n2Following Rumelhart and Zipser (Rumelhart and Zipser, 1986), the other units were also pulled \ntowards the input vector, although to a much smaller extent than the winner. We used a ratio of 1 to \n100. \n\n3The profiles were generated using the Elscint CTL software package for planar quantitative \nthallium-20l based on the Cedars-Sinai technique (Garcia et aI., 1981; Maddahi et aI., 1981; Areeda \net aI., 1982). \n\n\f758 \n\nRosenberg, Ere!, and Atlan \n\nLesion \n\nsingle \n\nmultiple \n\nTotal \n\nmild moderate \n\nsevere Total \n\n12 \n\n16 \n\n28 \n\n5 \n\n16 \n\n21 \n\n0 \n\n11 \n\n11 \n\n17 \n\n43 \n\n60 \n\nTable 1: Distribution of Abnormal Cases as Scored by the Expert Observer. Defects occur(cid:173)\nring in any combination of two or more regions (even the proximal and distal subregions \nof a single area) were treated as one multiple defect. The severity level of multiple lesions \nwas based on the most severe lesion present. \n\nCases were pre-selected based on the following criteria (Beller, 1991): \n\n\u2022 Insufficient exercise. Cases in which the heart rate was less than 130 b.p.m. were \neliminated, as this level of stress is generally deemed insufficient to accurately \ndistinguish normal from abnormal conditions. \n\n\u2022 Positional abnormalities. In a few cases, the \"region of interest\" was not posi(cid:173)\n\ntioned or aligned correctly by the technician. \n\n\u2022 Increased lung uptake. Typically in cases of multi-vessel disease, a significant \nproportion of the perfusion occurs in the lungs as well as in the heart, making it \nmore difficult to determine the condition of the heart due to the partially overlap(cid:173)\nping positions of the heart and lungs. \n\n\u2022 Breast artifacts. \n\nCases were selected at random between August, 1989 and March, 1992. Approximately a \nthird of the cases were eliminated due to insufficient heart rate, 4-5% due to breast artifacts, \n4% due to lung uptake, and 1-2% due to positional abnormalities. A set of one hundred \nusable cases remained. \n\n2.2 Visual Interpretation \n\nEach case was visually scored by a single expert observer for each of nine anatomical re(cid:173)\ngions generally accepted as those that best relate to the coronary circulation: Septal: prox(cid:173)\nimal and distal, Anterior: proximal and distal, Apex, Inferior: proximal and distal, and \nPosterior-Lateral: proximal and distal. Scoring for each region was from normal (I) to \nsevere (4), indicating the level of the observed perfusion deficit. \n\nIntra-observer variability was examined by having the observer re-interpret 17 of the cases \na second time. The observer was unable to remember the cases from the first reading and \ncould not refer to the previous scores. \n\nExact matches were obtained on 91.5% of the regions; only 8 of the 153 total regions (5%) \nwere labeled as a defect (mild, moderate or severe) on one occasion and not on the other. \nAll differences, when they occurred, were of a single rating level4\u2022 \n\n4In contrast, measured inter-observer variability was much higher. A set of 13 cases was individ-\n\n\fA Neural Network that Learns to Interpret Myocardial Planar Thallium Scintigrams \n\n759 \n\n2.3 The Network Model \n\nThe input units of the network were divided into 3 groups of 60 units each, each group \nrepresenting the circumferential profile for a single view. A set of 3 RBF units were assigned \nto each input group. Then a second layer of weights was trained using the delta rule to \nreproduce the target visual scores assigned by the expert observer. The categorical visual \nscores were translated to numerical values to make the data suitable for network learning: \nnormal = 0.0, mild defect = 0.3, moderate defect = 0.7, and severe defect = 1.0. \nIn order to make efficient use of the available data, we actually trained 100 identical net(cid:173)\nworks; each network was trained on a subset of 99 of the 100 cases and tested on the re(cid:173)\nmaining one. This procedure, sometimes referred to as the \"leave-one-out\" or \"jack-knife\" \nmethod, enabled us to determine the generalization performance for each case. This pro(cid:173)\ncedure was followed for both the RBF and the delta rule training5. Training of a single \nnetwork took only a few minutes of Sun 4 computer time. \n\n3 Results \n\nBecause of the larger numbers of confusions between normal and mild regions in both the \ninter- and intra-observer scores, disease was defined as moderate or severe defects. The \nthreshold value dividing the output values of the network into these two sets was varied \nfrom 0 to 1 in 0.01 step increments. The number of agreements between the expert observer \nand the network were computed for each threshold value. The resulting scores, accumulated \nover all threshold values, were plotted as a Receiver Operating Characteristic (ROC) curve. \n\nBest performance (percent correct) was achieved with a threshold value of 0.28, which \nyielded an overall accuracy of 88.7% (798/900 regions) on the stress data. However, this \nvalue of the threshold heavily favored specificity over sensitivity due to the preponderance \nof normal regions in the data. Using the decision threshold which maximized the sum \nof sensitivity and specificity, 0.10, accuracy dropped to 84.9% (764/900) but sensitivity \nimproved to 0.771 (121/157), and specificity was 0.865 (643/743). \n\n3.1 Distinguishing Fixed vs. Reversible Defects \n\nIn order to take into account the delayed distribution as well as the stress set of images, the \nnetwork was essentially duplicated: one network processed the stress data, and the other, \n\nually interpreted by 3 expert observers in a previous experiment (Rosenberg et aI., 1993). Percent \nagreement (exact matches) between the observers was 82% (288/351). Of the 63 mis-matches, 5 or \nabout 8% of the regions were of 2 levels of severity. There were no differences of 3 levels of severity. \nApproximately two-thirds of the disagreements were between normal and mild regions. These results \nindicate that the single observer data employed in the present study are more reliable than the mixed \nconsensus and individual scores used previously. \n\n5Details of network learning were as follows: Each of the 100 networks was initialized and trained \nin the same way. RBF-to-output unit weights were initialized to small random values between 0.5 and \n-0.5. Input-to-RBF unit weights were first randomized and then normalized so that the weight vectors \nto each RBF unit were of unit length. Unsupervised, competitive training of the RBF units continued \nfor 100 \"epochs\" or complete sweeps through the set of 99 cases: 20 epochs with a learning rate (11) \nof 0.1 followed by 80 epochs at 0.01 without momentum (0'). Supervised training using a learning \nrate of 0.05 and momentum 0.9, was terminated based on cross-validation testing after 200 epochs. \nFurther training led to over-training and poorer generalization. \n\n\f760 \n\nRosenberg, Erel, and Atlan \n\nthe redistribution data. (For details, see (Erel et al., 1993).) \n\nThe combined network exhibited only a limited ability to distinguish between scar and \nischemia. Performance on scar detection was good (sens. 0.728 (75/103), spec. 0.878 \n(700{797\u00bb, but the sensitivity of the network on ischemia detection was only 0.185 (10/54). \nThis result may be explained, at least in part, by the much smaller number of ischemic re(cid:173)\ngions included in the data set as compared with scars (54 versus 103). \n\n4 Conclusions and Future Directions \n\nWe suspect that our major limitation is in defect sampling. In order that a statistical system \n(networks or otherwise) generalize well to new cases, the data used in training must be \nrepresentative of the full population of data likely to be sampled. This is unlikely to happen \nwhen the number of positive cases is on the order of 50, as was the case with ischemia, \nsince each possible defect location, plus all the possible combinations of locations must be \nincluded. \n\nA variant ofbackpropagation, called competitive backpropagation, has recently been devel(cid:173)\noped which is claimed to generalize appropriately in the presence of multiple defects (Cho \nand Reggia, 1993). Weights in this network are constrained to take on positive values, \nso that diagnoses made by the system add constructively. In a standard backpropagation \nnetwork, multiple diseases can cancel each other out, due to complex interactions of both \npositive and negative connection strengths. We are currently planning to investigate the \napplication of this learning algorithm to the problem of ischemia detection. \n\nOther improvements and extensions include: \n\n\u2022 Elicit confidence ratings. Expert visual interpretations could be augmented by \n\ndegree of confidence ratings. Highly ambiguous cases could be reduced in im(cid:173)\nportance or eliminated. The ratings could also be used as additional targets for \nthe network6: cases indicated by the network with low levels of confidence would \nrequire closer inspection by a physician. Initial results are promising in this regard. \n\n\u2022 Provide additional information. We have not yet incorporated clinical history, \ngender, and examination EKG. Clinical history has been found to have a profound \nimpact on interpretation of radiographs (Doubilet and Herman, 1981). The inclu(cid:173)\nsion of these variables should allow the network to approximate more closely a \ncomplete diagnosis, and boost the utility of the network in the clinical setting. \n\n\u2022 Add constraints. Currently we do not utilize the angles that relate the three views. \nIt may be possible to build these angles in as constraints and thereby cut down on \nthe number of free network parameters. \n\n\u2022 Expand application. Besides planar thallium, our approach may also be applied \nto non-planar 3-D imaging technologies such as SPECT and other nuclear agents or \nstress-inducing modalities such as dipyridamole. Preliminary results are promis(cid:173)\ning in this regard. \n\n6See (fesauro and Sejnowski, 1988) for a related idea. \n\n\fA Neural Network that Learns to Interpret Myocardial Planar Thallium Scintigrams \n\n761 \n\nAcknowledgements \n\nThe authors wish to thank Mr. Haim Karger for technical assistance, and the Departments \nof Computer Science and Psychology at the Hebrew University for computational support. \nWe would also like to thank Drs. David Shechter, Moshe Bocher, Roland Chisin and the \nstaff of the Department of Medical Biophysics and Nuclear Medicine for their help, both \nlarge and small, and two anonymous reviewers. Terry Sejnowski suggested our use of RBF \nunits. \n\nReferences \nAreeda, J., Train, K. v., Garcia, E. Y., Maddahi, J., Rosanki, A., Waxman, A., and Berman, \nD. (1982). \nImproved analysis of segmental thallium-201 myocardial scintigrams: \nQuantitation of distribution, washout, and redistribution. In Esser, P. D., editor, Digital \nImaging. Society of Nuclear Medicine, New York. \n\nArtis, S., Mark, R, and Moody, G. (1991). Detection of atrial fibrillation using artificial \nneural networks. In Computers in Cardiology, pages 173-176, Venice, Italy. IEEE, \nIEEE Computer Society Press. \n\nBaxt, W. (1991a). Use of an artificial neural network for data analysis in clinical decision(cid:173)\n\nmaking: The diagnosis of acute coronary occlusion. Neural Computation, 2:480-489. \n\nBaxt, W. (1991b). Use of an artificial neural network for the diagnosis of myocardial in(cid:173)\n\nfarction. Annals of Internal Medicine, 115:843-848. \n\nBeller, G. A. (1991). Myocardial perfusion imaging with thallium-201. In Marcus, M. L., \nSchelbert, H. R., Skorton, D. J., and Wolf, G. L., editors, Cardiac Imaging. W. B. \nSanders. \n\nCho, S. and Reggia, J. (1993). Multiple disorder diagnosis with adaptive competitive neural \n\nnetworks. Artificial Intelligence in Medicine. To appear. \n\nCianfione, D., Carandente, 0., Fragasso, G., Margononato, A., Meloni, C., Rossetti, E., \nGerundini, P., and Chiechia, S. L. (1990). A neural network based model of predicting \nthe probability of coronary lesion from myocardial perfusion SPECT data. In Pro(cid:173)\nceedings of the 37th Annual Meeting of the Society of Nuclear Medicine, page 797. \n\nCios, K. J., Goodenday, L. S., Merhi, M., and Langenderfer, R. (1989). Neural networks in \ndetection of coronary artery disease. In Computers in Cardiology Conference, pages \n33-37, Jerusalem, Israel. IEEE, IEEE Computer Society Press. \n\nCios, K. J., Shin, 1., and Goodenday, L. S. (1991). Using fuzzy sets to diagnose coronary \n\nartery stenosis. Computer, pages 57-63. \n\nCuar6n, A., Acero, A., Cardena, M., Huerta, D., Rodriguez, A., and de Garay, R. (1980). In(cid:173)\n\nterobserver variability in the interpretation of myocardial images with Tc-99m-Iabeled \ndiphosponate and pyrophosphate. Journal of Nuclear Medicine, 21(1):1-9. \n\nDatz; E, Gabor, E, Christian, P., Gullber, G., Menzel, C., and Morton, K. (1992). The use of \ncomputer-assisted diagnosis in cardiac-perfusion nuclear medicine studies: A review. \nJournal of Digital Imaging, 5(4):1-14. \n\nDawson, A., Austin, R, and Weinberg, D. (1991). Nuclear grading of breast carcinoma by \n\nimage analysis. American Journal of Clinical Pathology, 95(4):S29-S37. \n\n\f762 \n\nRosenberg, Erel, and Atlan \n\nDoubilet, P. and Herman, P. (1981). Interpretation of radiographs: Effect of clinical history. \n\nAmerican Journal of Roentgenology, 137: 1055-1058. \n\nErel, J., Rosenberg, c., and Atlan, H. (1993). Neural network for automatic interpretation \n\nof thallium scintigrams. In preparation. \n\nFrancisco, D. A., Collins, S. M., and et al., R. T. G. (1982). Tomographic thallium-201 \nmyocardial perfusion scintigrams after maximal coronary artery vasodiliation with in(cid:173)\ntravenous dipyridamole: Comparison of qualitative and quantitative approaches. Cir(cid:173)\nculation, 66(2). \n\nFranken Jr., E. A. and Berbaum, K. S. (1991). Perceptual aspects of cardiac imaging. In \nMarcus, M. L., Schelbert, H. R., Skorton, D. J., and Wolf, G. L., editors, Cardiac \nImaging. W. B. Sanders. \n\nFujita, H., Katafuchi, T., Uehara, T., and Nishimura, T. (1992). Application of artificial \nneural network to computer-aided diagnosis of coronary artery disease in myocardial \nSPECT bull's-eye images. The Journal of Nuclear Medicine, 33(2):272-276. \n\nGarcia, E. V. (1991). Physics and instrumentation of radionuclide imaging. In Marcus, \nM. L., Schelbert, H. R., Skorton, D. J., and Wolf, G. L., editors, Cardiac Imaging. W. \nB. Sanders. \n\nGarcia, E. V., Maddahi, J., Berman, D. S., and Waxman, A. (1981). Space-time quantitation \nof thallium-201 myocardial scintigraphy. Journal of Nuclear Medicine, 22:309-317. \nKippenhan, J., Barker, W., Pascal, S., and Duara, R. (1990). A neural-network classifier \n\napplied to PET scans of normal and Alzheimer's disease (AD) patients. In The Pro(cid:173)\nceedings of the 37th Annual Meeting of the Society of Nuclear Medicine, volume 31, \nWashington, D.C. \n\nMaddahi, J., Garcia, E. V., Berman, D. S., Waxman, A., Swan, H. J. C., and Forrester, \n\nJ. (1981). Improved noninvasive assessment of coronary artery disease by quantita(cid:173)\ntive analysis of regional stress myocardial distribution and washout of thallium-20l. \nCirculation, 64 :924-935. \n\nPohost, G. M. and Henzlova, M. J. (1990). The value of thallium-201 imaging. New \n\nEng land Journal of Medicine, 323(3): 190-192. \n\nPorenta, G., Kundrat, S., Dorffner, G., Petta, P., Duit, J., and r, H. S. (19~0). Computer based \nimage interpretations of thallium- 201 scintigrams: Assessment of coronary artery \ndisease using the parallel distributed processing approach. In Proceedings of the 37th \nAnnual Meeting of the Society of Nuclear Medicine, page 825. \n\nRosenberg, C., Erel, J., and Atlan, H. (1993). A neural network that learns to interpret \n\nmyocardial planar thallium scintigrams. Neural Computation. To appear. \n\nRumelhart, D. and Zipser, D. (1986). Feature discovery by competitive learning. In Rumel(cid:173)\nhart, D. and McClelland, J., editors, Parallel Distributed Processing, volume 1, chap(cid:173)\nter 5, pages 151-193. MIT Press, Cambridge, Mass. \n\nTesauro, G. and Sejnowski, T. J. (1988). A parallel network that learns to play backgammon. \nTechnical Report CCSR-88-2, University of Illinois at Urbana-Champaign Center for \nComplex Systems Research. \n\nWidrow, B. and Hoff, M. (1960). Adaptive switching circuits. In 1960 IRE WESCON \n\nConvention Record, volume 4, pages 96-104. IRE, New York. \n\n\fPART X \n\nIMPLEMENTATIONS \n\n\f\f", "award": [], "sourceid": 600, "authors": [{"given_name": "Charles", "family_name": "Rosenberg", "institution": null}, {"given_name": "Jacob", "family_name": "Erel", "institution": null}, {"given_name": "Henri", "family_name": "Atlan", "institution": null}]}