{"title": "CCD Neural Network Processors for Pattern Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 741, "page_last": 747, "abstract": null, "full_text": "CCD Neural Network Processors for Pattern \n\nRecognition \n\nAlice M. Chiang \n\nMichael L. Chuang \n\nJeffrey R. LaFranchise \n\nMIT Lincoln Laboratory \n\n244 Wood Street \n\nLexington, MA 02173 \n\nAbstract \n\nA CCD-based processor that we call the NNC2 is presented. The NNC2 \nimplements a fully connected 192-input, 32-output two-layer network and \ncan be cascaded to form multilayer networks or used in parallel for ad(cid:173)\nditional input or output nodes. The device computes 1.92 x 109 connec(cid:173)\ntions/sec when clocked at 10 MHz. Network weights can be specified to six \nbits of accuracy and are stored on-chip in programmable digital memories. \nA neural network pattern recognition system using NNC2 and CCD im(cid:173)\nage feature extractor (IFE) devices is described. Additionally, we report \na CCD output circuit that exploits inherent nonlinearities in the charge \ninjection process to realize an adjustable-threshold sigmoid in a chip area \nof 40 x 80 J.tlU2 . \n\n1 \n\nINTRODUCTION \n\nA neural network chip based on charge-coupled device (CCD) technology, the NNC2, \nis presented. The NNC2 implements a fully connected two-layer net and can be cas(cid:173)\ncaded to form multilayer networks. An image feature extractor (IFE) device (Chiang \nand Chuang, 1991) is briefly l\u00b7eviewed. The IFE is suited for neural networks with \nlocal connections and shared weights and can also be used for image preprocessing \ntasks. A neural network pattern recognition system based on feature extraction \nusing IFEs and classification using NNC2s is proposed. The efficacy of neural net(cid:173)\nworks with local connections and shared weights for feature extraction in character \n741 \n\n\f742 \n\nChiang, Chuang, and LaFranchise \n\nrecognition and phoneme recognition t.asks has been demonstrated by researchers \nsuch as (LeCun et. al. 1989) and (Waibel d. aI., 1989), respectively. :rvlore complex \nrecognition tasks are likely to prove amenable to a system using locally connected \nnetworks as a front end with outputs generated by a highly-connected classifier. \nBoth the IFE and the NNC2 are hybrids composed of analog and digital compo(cid:173)\nnents. Network weights are stored digitally while neuron states and computation \nresults are represented in analog form. Data enter and leave the devices in digital \nform for ease of integration into digital systems. \n\nThe sigmoid is used in many network models as the nonlinear neuron output func(cid:173)\ntion. We have designed, fabricated and tested a compact CCD sigmoidal output \ncircuit that is described below. The paper concludes with a discussion of strategies \nfor implementing networks with particularly high or low fan-in to fan-out ratios. \n\n2 THE NNC2 AND IFE DEVICES \n\nThe NNC2 is a neural network processor that implements a fully connected two(cid:173)\nlayer net with 192 input nodes and 32 output nodes. The device is an expanded \nversion of a previous neural network classifier (NNC) chip (Chiang, 1990) hence the \nappellation \"NNC2.\" The NNC2 consists of a 192-stage CCD tapped delay line for \nholding and shifting input values, 192 four-quadrant multipliers, and 192 32-word \nlocal memories for weight storage. vVhen clocked at 10 l\\iIHz, the NNC2 performs \n1.92 x 109 connections/sec. The device was fabricated using a 2-J,lm minimum feature \nsize double-metal, double-polysilicon CCD/CMOS process. The NNC2 measures \n8.8 x 9.2 mm2 and is depicted in Figure 1. \n\nDIGITAL __ .~ \nMEMORY \n\n-\n\n. . \n\n. .! \n\nt \n\n~,! :: \n\ni..:..: _. \n\n-\n\n.... ~ \n\n-\n\n~ \n\n~ \n\n._ \n\nK . ' ~ - \u2022 \u2022 \n\n. .. \n\n\u2022 - \u2022. : - .. ' ; ; .J.. 't .\"'! ~'f \n\n.. ~~~~~~~~~~~~~----.~- -\n\nMDAC-~~~ \n\nt-~;--! \n\n! \n\n; p' \u2022 ..l \n\n6 . . . k \n\n~ . \" \n\n\u2022 \n\nCCDTAPPED ______ ~ __ --~~ \nDELAYUNE \n\nFigure 1: Photomicrograph of the NNC2 \n\nTests indicate that the NNC2 has an output dynamic range exceeding 42 dB. \nFigure 2 shows the output of the NNC2 when the input consists of the cosine \nwaveforms In = 0.2cos(27r2n/192) + 0.4cos(27r3n/192) and the weights are set to \n\n\fCCO Neural Network Processors for Pattern Recognition \n\n743 \n\ncos(2?Tnk/192), k = \u00b11, \u00b12, ... , \u00b116. Due to the orthogonality of sinusoids of differ(cid:173)\nent frequencies, the output correlations 91e = 2:~~o fncos(2?Tnk/192) should yield \nscaled impulses with amplitudes of \u00b10.2 and \u00b10.4 for k = \u00b12 and \u00b13 only; this is in(cid:173)\ndeed the case as the output (lower trace) in Figure 2 shows. This test demonstrates \nthe linearity of the weighted sum (inner product) computed by the NNC2. \n\nFigure 2: Response of the NNC2 to input cosine waveforms \n\nLocally connected, shared weight networks can be implemented using the IFE which \nraster scans up to 20 sets of 7x 7 weights over an input image. At every window \nposition the inner product of the windowed pixels and each of the 20 sets of weights \nis computed. For additonal details, see (Chiang and Chuang, 1991). The IFE and \nthe NNC2 share a number of common features that are described below. \n\n2.1 MDACS \n\nThe multiplications of the inner product are performed in parallel by multiplying(cid:173)\nD/ A-converters (MDACs), of which there are 192 in the NNC2 and 49 in the IFE. \nEach MDAC produces a charge paclcet proportional to the product of an input and \na digital weight. The partial products are summed on an output line common to \nall the MDACs, yielding a complete inner product every clock cycle. The design \nand operation of an MDAC are described in detail in (Chiang, 1990). Using a 2-J.lm \ndesign rule, a four-quadrant MDAC with 8-bit weights occupies an area of 200x 200 \nJ.lm2 . \n\n2.2 WEIGHT STORAGE \n\nThe NNC2 and IFE feature on-chip digital storage of programmable network \nweights, specified to 6 and 8 bits, respectively. The NNC2 contains 192 local mem(cid:173)\nories of 32 words each, while the IFE has forty-nine 20-word memories. Individual \nwords can be addressed by means of a row pointer and a column pointer. Each bit \nof the CCD shift register memories is equipped with a feedback enable switch that \nobviates the need to refresh the volatile CCD storage medium explictly; words are \n\n\f744 \n\nChiang, Chuang, and LaFranchise \n\nrewritten as they are read for use in computation, so that no cycles need be devoted \nto memory refresh. \n\n2.3 \n\nINPUT BUFFER \n\nInputs to the NNC2 are held in a 192-stage CCD analog floating-gate tapped delay \nline. At each stage the floating gate is coupled to the input of the corresponding \nMDAC, permitting inputs to be sensed nondestructively for computation. The \nNNC2 delay line is composed of three 64-stage subsections (see Figure 1). This \npartionning allows the NNC2 to compute either the weighted sum of 192 inputs or \nthree 64-point inner products. The latter capability is well-matched to Time-Delay \nNeural Networks (TDNNs) that implement a moving temporal window for phoneme \nrecognition (Waibel et. ai., 1989). The IFE contains a similar 775-stage delay line \nthat holds six lines of a 128-pixel input image plus an additional seven pixels. Taps \nare placed on the first seven of every 128 stages in the IFE delay line so that the \n1-dimensionalline emulates a 2-dimensional window. \n\n3 CCD SIGMOIDAL OUTPUT CIRCUIT \n\nA sigmoidal charge-domain nonlinear detection circuit is shown in Figure 3. The cir(cid:173)\ncuit has a programmable input-threshold controlled by the amplitude of the transfer \ngate voltage, VTG. If the incoming signal charge is below the threshold set by VTG \nno charge is transferred to the output port and the incoming signal is ignored. If the \ninput is above threshold, the amount of charge transferred to the output port is the \ndifference between the charge input and the threshold level. The circuit design is \nbased on the ability to calculate the charge transfer efficiency from an n+ diffusion \nregion over a bias gate to a receiving well as a function of device parameters and \nexploits the fact that under certain operating conditions a nonlinear dependence ex(cid:173)\nists between the input and output charge (Thornber, 1971). The maximum output \nproduced can be bounded by the size and gate voltage of the receiving well. The \npredicted and measured responses of the circuit for two different threshold levels \nare shown in the bottom of Figure 3. The circuit has an area of 40 x 80 J1.m 2 and \ncan be integrated with the NNC2 or IFE chips to perform both the weighted-sum \nand output-nonlinearity computations on a single device. \n\n4 DESIGN STRATEGIES \n\nThe NNC2 uses a time-multiplexed output (TMO) structure (Figure 4a), where the \nnumber of multipliers and the number of local memories is equal to the number \nof inputs, N. The depth of each local memory is equal to the number of output \nnodes, M, and the outputs are computed serially as each set of weights is read in \nsequence from the memories. A 256-input, 256-output device with 64k 8-bit weights \nhas been designed and can be realized in a chip area of 14x 14 mm2 . This chip is \nreconfigurable so that a single such device can be used to implement multilayer net(cid:173)\nworks. If a network with a large (>1000) number of input nodes is required, then \na time-multiplexed input (TMI) architecture with M multipliers may be more suit(cid:173)\nable (Figure 4b). In contrast to a TMO system that computes the M inner products \n\n\fCCO Neural Network Processors for Pattern Recognition \n\n745 \n\nTGGATE \n\nCLOCKlllS.... \n\nCALCULATED \n\nN'\"\"3.75 \nE \nu \n...... \n,... \n\". 2.50 \n0 \nT\"\" \n51.25 \n0 \n0 \n\n-\n\nMEASURED \n\n---.--\n, , \n, \n, \n, \n, \n, \n, \n\n\" -'TG =2.5V \n---- 'TG = 0.5 V \n\nI \n\n0 \n\n1.25 \n\n2.50 \n\n3.75 \n\n5.00 \n\naln (107 .-'cm2) \n\n1 2 3 \n\n4 5 6 7 8 9 \n\n10 \n\nINPUT VOLTAGE (V) \n\nFigure 3: Schematic, micrograph, and test results of the sigmoid cIrcuit \n\nxl, x2, ... , xN \n(Serial Inputs) \n\n__ ....... t.e- ... _----. \n\nx1 \n\nx2 \n\nxN \n\n\u2022 \u2022 \u2022 \n\nlWE'~HTS \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022 \u2022 \n\n\"'--........ - - ... --......... -~ \n\nyl, y2, ... , yM \n\n(Serial Outputs) \n\ny1 \n\ny2 \n\nyM \n\n(a) \n\n(b) \n\nFigure 4: (a) Time-multiplexed output ('1'1\\10), (b) Time-multiplexed input (TMI) \n\n\f746 \n\nChiang. Chuang. and LaFranchise \n\nsequentially (the multiplications of each inner product are performed in parallel), \na TMI structure performs N sets of At multiplications each (all M inner products \nare serially computed in parallel). As each input element arrives it is broadcast to \nall At multipliers. Each multiplier multiplies the input by an appropriate weight \nfrom its N -word deep local memory and places the result in an accumulator. The \nM inner products appear in the accumulators one cycle after receipt of the final , \nNth input. \n\n5 SUMMARY \n\nWe have presented the NNC2, a CCD chip that implements a fully connected two(cid:173)\nlayer network at the rate of 1.92 x 109 connections/second. The NNC2 may be used \nin concert with IFE devices to form a CCD-based neural network pattern recogniton \nsystem or as a co-processor to speed up neural network simulations on conventional \ncomputers. A VME-bus board for the NNC2 is presently being constructed. A \ncompact CCD circuit that generates a sigmoidal output function was described, \nand finally, the relative merits of time-multiplexing input or output nodes in neural \nnetwork devices were enumerated. Table 1 below is a comparison of recent neural \nnetwork chips. \n\nNo. OF OUTPUT NODES \nNo. OF INPUT NODES \n\nMIT LINCOLN LAB \n\nNNC2 \n\n32 \n192 \n\nCIT \nNN \n\n256 \n256 \n\nSYNAPSE ACCURACY \n\n6b ' ANALOG \n\n1 b ' ANALOG \n\nPROGRAMMABLE \nSYNAPSES \nTHROUGHPUT RATE \n(109 Connections/s) \n\nCHIP AREA (mm2) \n\nCLOCK RATE \n\nWEIGHT STORAGE \n\nON CHIP LEARNING \n\nDESIGN RULE \n\n6k \n\n1.92 \n\n8.8 \u00b7 9.2 \n\n10MHz \n\nDIGITALb \n\nNO \n211m \n\nCCD/CMOS \n\nINTEL MITSUBISHI \nETANN \n\nNN \n\nAT&T \nNN \n\nHITACHI \nWSINN \n\nADAPT. SOL. \n\nXl \n\nTWO 64 \nTWO 64 \n\n168 \n168 \n\n16 (or 256) \n256 (or 16) \n\n576 \n64 \n\n64 \n4k \n\nANALOG \u00b7 \nANALOG \n\nANALOG\u00b7 \nANALOG \n\n10 k \n\n28 k \n\n2 \n\n? \n\n3b ' 6b 8b \u2022 9 b \n\n9 b ' 16 b \n\n4k \n\n5.1 \n\n37k \n\n256 k \n\n1.2 \n\n1.6 \n\n125 \u00b7 125 \n2.1 MHza \n\n26.2 \u2022 27.5 \n\n25 MHz \n\n64k \n\n0.5 \n\n, \n\n11.2 ' 7.5 14.5' 14.5 \n\n4.5 ' 7 \n\n1.5MHz \n\n400 kHz \n\n? \n\n20 MHz \n\nANALOG \n\nANALOG \n\nANALOG \n\nANALOG \n\nDIGITAL \n\nDIGITAL \n\nNO \n211m \nCCD \n\nNO \n\n111m \nCMOS \n\nYESc \n111m \nCMOS \n\nNO \n\n0.9 11m \nCMOS \n\nNO \n\n0.8 11m \nCMOS \n\nYES \n0.8 11m \nCMOS \n\nREPORTED AT: \n\nNIPS 91 \n\nIJCNN 90 \n\nIJCNN89 \n\nISSCC91 \n\nISSCC 91 \n\nIJCNN90 \n\nISSCC 91 \n\nNOTE: \n\na - CLOCK RATE FOR WSINN IS EXTRAPOLATED BASED ON 1/STEP TIME. \nb - NO DEGRADATION OBSERVED ON DIGITALLY STORED AND REFRESHED WEIGHTS. \nc - A SIMPLIFIED BOLTZMANN MACHINE LEARNING ALGORITHM IS USED. \n\nTable 1: Selected neural network chips \n\nAcknow ledgements \n\nThis work was supported by DARPA, the Office of Naval Research, and the Depart(cid:173)\nment of the Air Force. The IFE and NN C2 were fabricated by Orbit Semiconductor. \n\n\fCCD Neural Network Processors for Pattern Recognition \n\n747 \n\nReferences \n\nA. J. Agranat, C. F. Neugebauer and A. Yariv, \"A CCD Based Neural Network \nIntegrated Circuit with 64k Analog Programmable Synapses,\" IlCNN, 1990 Pro(cid:173)\nceedings, pp. 11-551-11-555. \n\nY. Arima et. al., \"A 336-Neuron 28-k Synapse Self-Learning Neural Network Chip \nwith branch-Neuron-Unit Architecture,\" in ISSCC Dig. of Tech. Papers, pp. 182-\n183, Feb. 1991. \n\nB. E. Boser and E. Sackinger, \"An Analog Neural Network Processor with Pro(cid:173)\ngrammable Network Topology,\" in ISSCC Dig. of Tech. Papers, pp. 184-185, Feb. \n1991. \n\nA. M. Chiang, \"A CCD Programmable Signal Processor,\" IEEE lour. Solid-State \nCirc., vol. 25, no. 6, pp. 1510-1517, Dec. 1990. \n\nA. M. Chiang and M. L. Chuang, \"A CCD Programmable Image Processor and its \nNeural Network Applications,\" IEEE lour. Solid-State Circ., vol. 26, no. 12, pp. \n1894-1901, Dec. 1991. \n\nD. Hammerstrom, \"A VLSI Architecture for High-Performance, Low-Cost On-chip \nLearning,\" IlCNN, 1990 Proceedings, pp. 11-537-11-543. \n\nM. Holler et. al., \"An Electrically Trainable Artificial Neural Network (ETANN) \nwith 10240 \"Floating Gate\" Synapses,\" IlCNN, 1989 Proceedings, pp. 11-191-11-\n196. \n\nY . Le Cun et. al., \"Handwritten Digit Recognition with a Back-Propagation Net(cid:173)\nwork,\" in D. S. Touretzky (ed.), Advances in Neural Information Processing Systems \n2, pp. 396-404, San Mateo, CA: Morgan Kaufmann, 1989. \n\nK. K. Thornber, \"Incomplete Charge Transfer in IGFET Bucket-Brigade Shift Reg(cid:173)\nisters,\" IEEE Trans. Elect. Dev., vol. ED-18, no. 10, pp.941-950, 1971. \n\nA. Waibel et. al., \"Phoneme Recognition Using Time-Delay Neural Networks,\" IEEE \nTrans. on Acoust., Speech, Sig. Proc., vol. 37, no. 3, pp. 329-339, March 1989. \n\nM. Yasunaga et. al., \"Design, Fabrication and Evaluation of a 5-Inch Wafer Scale \nNeural Network LSI Composed of 576 Digital Neurons,\" IlCNN, 1990 Proceedings, \npp. 11-527-11-535. \n\n\f", "award": [], "sourceid": 471, "authors": [{"given_name": "Alice", "family_name": "Chiang", "institution": null}, {"given_name": "Michael", "family_name": "Chuang", "institution": null}, {"given_name": "Jeffrey", "family_name": "LaFranchise", "institution": null}]}