{"title": "A Charge-Based CMOS Parallel Analog Vector Quantizer", "book": "Advances in Neural Information Processing Systems", "page_first": 779, "page_last": 786, "abstract": "", "full_text": "A Charge-Based CMOS Parallel Analog \n\nVector Quantizer \n\nGert Cauwenberghs \n\nJohns Hopkins University \n\nECE Department \n3400 N. Charles St. \n\nBaltimore, MD 21218-2686 \n\ngert@jhunix.hcf.jhu.edu \n\nVolnei Pedroni \n\nCalifornia Institute of Technology \n\nEE Department \n\nMail Code 128-95 \nPasadena, CA 91125 \n\npedroni@romeo.caltech.edu \n\nAbstract \n\nWe present an analog VLSI chip for parallel analog vector quantiza(cid:173)\ntion. The MOSIS 2.0 J..Lm double-poly CMOS Tiny chip contains an \narray of 16 x 16 charge-based distance estimation cells, implementing a \nmean absolute difference (MAD) metric operating on a 16-input analog \nvector field and 16 analog template vectors. The distance cell includ(cid:173)\ning dynamic template storage measures 60 x 78 J..Lm2 \u2022 Additionally, \nthe chip features a winner-take-all (WTA) output circuit of linear com(cid:173)\nplexity, with global positive feedback for fast and decisive settling of a \nsingle winner output. Experimental results on the complete 16 x 16 VQ \nsystem demonstrate correct operation with 34 dB analog input dynamic \nrange and 3 J..Lsec cycle time at 0.7 mW power dissipation. \n\n1 Introduction \n\nVector quantization (VQ) [1] is a common ingredient in signal processing, for applications \nof pattern recognition and data compression in vision, speech and beyond. Certain neural \nnetwork models for pattern recognition, such as Kohonen feature map classifiers [2], are \nclosely related to VQ as well. The implementation of VQ, in its basic form, involves a \nsearch among a set of vector templates for the one which best matches the input vector, \nwhereby the degree of matching is quantified by a given vector distance metric. Effi-\n\n\f780 \n\nGert Cauwenberghs, Volnei Pedroni \n\ncient hardware implementation requires a parallel search over the template set and a fast \nselection and encoding of the \"winning\" template. The chip presented here implements \na parallel synchronous analog vector quantizer with 16 analog input vector components \nand 16 dynamically stored analog template vectors, producing a 4-bit digital output word \nencoding the winning template upon presentation of an input vector. The architecture is \nfully scalable as in previous implementations of analog vector quantizers, e.g. [3,4,5,6], \nand can be readily expanded toward a larger number of vector components and template \nvectors without structural modification of the layout. Distinct features of the present \nimplementation include a linear winner-take-all (WTA) structure with globalized posi(cid:173)\ntive feedback for fast selection of the winning template, and a mean absolute difference \n(MAD) metric for the distance estimations, both realized with a minimum amount of \ncircuitry. Using a linear charge-based circuit topology for MAD distance accumulation, \na wide voltage range for the analog inputs and templates is achieved at relatively low \nenergy consumption per computation cycle. \n\n2 System Architecture \n\nThe core of the VQ consists of a 16 x 16 2-D array of distance estimation cells, configured \nto interconnect columns and rows according to the vector input components and template \noutputs. Each cell computes in parallel the absolute difference distance between one \ncomponent x} of the input vector x and the corresponding component yi} of one of the \ntemplate vectors y, \n\nThe mean absolute difference (MAD) distance between input and template vectors is \naccumulated along rows \n\n(1) \n\n\u2022 \n\n1 ~ . \n\nA \n\nd(X,y') = 16 ~ Ix} - y'}I, i = 1 ... 16 \n\nand presented to the WTA, which selects the single winner \n\nJ=l \n\nkWTA = arg min d(x, f) . \n\ni \n\n(2) \n\n(3) \n\nAdditional parts are included in the architecture for binary encoding of the winning \noutput, and for address selection to write and refresh the template vectors. \n\n3 VLSI Circuit Implementation \n\nThe circuit implementation of the major components of the VQ, for MAD distance \nestimation and WTA selection, is described below. Both MAD distance and WTA cells \noperate in clocked synchronous mode using a precharge/evaluate scheme in the voltage \ndomain. The approach followed here offers a wide analog voltage range of inputs and \ntemplates at low power weak-inversion MOS operation, and a fast and decisive settling of \nthe winning output using a single communication line for global positive feedback. The \noutput encoding and address decoding circuitry are implemented using standard CMOS \nlogic. \n\n\fA Charge-Based CMOS Parallel Analog Vector Quantizer \n\n781 \n\nVVRi -r--------------------~-----\u00ad\nVVRi \n\n812 \n\nXj \n\nVref \n\nPRE -1 \n\n4/2 \n\n-r----------~----------------Zi \n\n(a) \n\n(b) \n\nFigure 1: Schematic of distance estimation circuitry. \nOutput precharge circuitry. \n\n(a) Absolute distance cell. \n\n( b) \n\n3.1 Distance Estimation Cell \n\nThe schematic of the distance estimation cell, replicated along rows and columns of \nthe VQ array, is shown in Figure 1 (a). The cell contains two source followers, which \nbuffer the input voltage x j and the template voltage yi j . The template voltage is stored \ndynamically onto Cstore, written or refreshed by activating WRi while the y' j value is \npresented on the Xj input line. The WRi and WRi signal levels along rows of the VQ \narray are driven by the address decoder, which selects a single template vector yi to be \nwritten to with data presented at the input x when WR is active. \n\nh I \n\nAdditional lateral transistors connect symmetrically to the source follower outputs x / \nand yi /- By means of resistive division, the lateral transistors construct the maximum \nand minimum of x l' and yi / on Zi i HI and Zi j LO, respectively. In particular, when x j is \nmuc \napproaches \nyi /- By symmetry, the complementary argument holds in case Xj is much smaller than \nyi j. Therefore, the differential compo~ent of Zi j HI and Zi J LO approximately represents the \nabsolute difference value of x J and y' j : \n\napproaches x j and the voltage Z' j \n\narger than y' j' the vo tage Z' j \n\n. LO \n\nI \n\n. \n\nHI \n\n, \n\ni HI \n\nZ j \n\n-\n\ni LO \n\nz j \n\n, \n\ni \n~ max(xj, y j \ni 'I \n\n, \n\n-\n\nI\nXj - Y j ~ K Xj - Y j \n\n. \n\n' i ') \n- mm(xj ,y j \ni I \n, \n\nI \n\n') \n\n(4) \n\nwith K the MOS back gate effect coefficient [7] . \n\nThe mean absolute difference (MAD) distances (2) are obtained by accumulating con-\n\n\f782 \n\nGert Cauwenberghs, Volnei Pedroni \n\ntributions (4) along rows of cells through capacitive coupling, using the well known \ntechnique of correlated double sampling. To this purpose, a coupling capacitor Cc is \nprovided in every cell, coupling its differential output to the corresponding output row \nline. In the precharge phase, the maximum values Zi j HI are coupled to the output by \nactivating HI, and the output lines are preset to reference voltage Vref by activating PRE, \nFigure 1 (b). In the evaluate phase, PRE is de-activated, and the minimum values Zi j LO \nare coupled to the output by activating LO. From (4), the resulting voltage outputs on \nthe floating row lines are given by \n\nV, f -\nre \n\n. LO \n- Zl . \n) \nJ \n\n(5) \n\n1 L16 \n-\n16. \n\n. HI \n(Zl . \nJ \n\nJ=1 \n1 16 \n\nVref- K 16~ IXj-/jl. \n\nJ=1 \n\nThe last term in (5) corresponds directly to the distance measure d(x, yi) in (2). Notice \nthat the negative sign in (5) could be reversed by interchanging clocks HI and LO, if \nneeded. Since the subsequent WTA stage searches for maximum Zi, the inverted distance \nmetric is in the form needed for VQ. \n\nCharacteristics of the MAD distance estimation (5), measured directly on the VQ ar(cid:173)\nray with uniform inputs x j and templates yi j' are shown in Figure 2. The magnified \nview in Figure 2 (b) clearly illustrates the effective smoothing of the absolute difference \nfunction (4) near the origin, x j ~ yi j' The smoothing is caused by the shift in x/and \nyi j' due to the conductance of the lateral coupling transistors connected to the source \nfollower outputs in Figure 1 (a), and extends over a voltage range comparable to the \nthermal voltage kT /q depending on the relative geometry of the transistors and current \nbias level of the source followers. The observed width of the flat region in Figure 2 \nspans roughly 60 mY, and shows little variation for bias current settings below 0.5 f-I,A. \nTuning of the bias current allows to balance speed and power dissipation requirements, \nsince the output response is slew-rate limited by the source followers. \n\n3.2 Winner-Take-All Circuitry \n\nThe circuit implementation of the winner-take-all (WTA) function combines the com(cid:173)\npact sizing and modularity of a linear architecture as in [4,8,9] with positive feedback \nfor fast and decisive output settling independent of signal levels, as in [6,3]. Typical \npositive feedback structures for WTA operation use a logarithmic tree [6] or a fully \ninterconnected network [3], with implementation complexities of order O(n log n) and \nO(n2) respectively, n being the number of WTA inputs. The present implementation fea(cid:173)\ntures an O(n) complexity in a linear structure by means of globalized positive feedback, \ncommunicated over a single line. \n\nThe schematic of the WTA cell, receiving the input Zi and constructing the digital output \ndi through global competition communicated over the COMM line, is shown in Figure 3. \nThe global COMM line is source connected to input transistor Mi and positive feedback \ntransistor Mf, and receives a constant bias current lb (WTA) from Mbl. Locally, the WTA \noperation is governed by the dynamics of d;' on (parasitic) capacitor C p' A high pulse \n\n\fA Charge-Based CMOS Parallel Analog Vector Quantizer \n\n783 \n\n0.0 \n\n-0.2 \n\n-0.4 \n\n~ \nl \n~ -0.6 \n' -N \n\nI \n\n-0.8 \n\n-1.0 \n\n-0.12 \n\n:> -0.14 -..... -0.16 \n\n~ \n~ -018 \n\n\u2022 \n\nI \n\n-0.20 \n\n-0.22 \n\n-0.2 \n\n-0.1 \n\n0.0 \n! \n\nXj-Yj (V) \n\n0.1 \n\n0.2 \n\n(b) \n\n1.5 \n\n2.5 \n\n~--~----~------~--~ \n\n2.0 \n\nXj \n\n(V) \n\n(a) \n\nFigure 2: Distance estimation characteristics, (a) for various values of yi j; \n(b) magnified view. \n\non RST, resetting d/ to zero, marks the beginning of the WTA cycle. With Mf initially \ninactive, the total bias current n Ib (WTA) through COMM is divided over all competing \nWTA cells, according to the relative Zi voltage levels, and each cell fraction is locally \nmirrored by the Mml-Mm2 pair onto d;', charging C p' The cell with the highest Zi input \nvoltage receives the largest fraction of bias current, and charges C p at the highest rate. \nThe winning output is detennined by the first d;' reaching the threshold to turn on the \ncorresponding Mf feedback transistor, say i = k. This threshold voltage is given by the \nsource voltage on COMM, common for all cells. The positive feedback of the state dk ' \nthrough Mf, which eventually claims the entire fraction of the bias current, enhances and \nlatches the winning output level dk ' to the positive supply and shuts off the remaining \nlosing outputs d;' to zero, i :f:. k. The additional circuitry at the output stage of the cell \nserves to buffer the binary d;' value at the d; output tenninal. \n\nNo more than one winner can practically co-exist at equilibrium, by nature of the com(cid:173)\nbined positive feedback and global renonnalization in the WTA competition. Moreover, \nthe output settling times of the winner and losers are fairly independent of the input signal \nlevels, and are given mainly by the bias current level Ib (WTA) and the parasitic capacitance \nCpo Tests conducted on a separate 16-element WTA array, identical to the one used on \nthe VQ chip, have demonstrated single-winner WTA operation with response time below \n0.5 /-Lsec at less than 2 /-L W power dissipation per cell. \n\n4 Functionality Test \n\nTo characterize the performance of the entire VQ system under typical real-time condi(cid:173)\ntions, the chip was presented a periodic sequence of 16 distinct input vectors x(i), stored \nand refreshed dynamically in the 16 template locations y; by circularly incrementing the \ntemplate address and activating WR at the beginning of every cycle. The test vectors rep-\n\n\f784 \n\nGert Cauwenberghs, Volnei Pedroni \n\nMm2 \n2212 \n\nZi -1 \n\nMI \n11/2 \n\n11/2 \n\nT \n\nMb1 \n\n11/2 \n\nVbWfA \n\nRST \n\nd i \n\nCOMM \n\n8/2 \n\n812 \n\nFigure 3: Circuit schematic of winner-take-all cell. \n\nresent a single triangular pattern rotated over the 16 component indices with single index \nincrements in sequence. The fundamental component xo(i) is illustrated on the top trace \nof the scope plot in Figure 4. The other components are uniformly displaced in time over \nj mod 16). Figure 4 \none period, by a number of cycles equal to the index, x j (i) = xo(i -\nalso displays the VQ output waveforms in response to the triangular input sequence, with \nthe desired parabolic profile for the analog distance output ZO and the expected alternating \nbit pattern of the WTA least significant output bit. l The triangle test performed correctly \nat speeds limited by the instrumentation equipment, and the dissipated power on the chip \nmeasures 0.7 mW at 3 f.Lsec cycle time2 and 5 V supply voltage. \n\nAn estimate for the dynamic range of analog input and template voltages was obtained \ndirectly by observing the smallest and largest absolute voltage difference still resolved \ncorrectly by the VQ output, uniformly over all components. By tuning the voltage range \nof the triangular test vectors, the recorded minimum and maximum voltage amplitudes \nfor 5 V supply voltage are Ymin = 87.5 mV and YlII3lt = 4 V, respectively. The estimated \nanalog dynamic range YlII3lt /Ymin is thus 45.7, or roughly 34 dB, per cell. The value \nobtained for Y min indicates that the dynamic range is limited mainly by the smoothing \nof the absolute distance measure characteristic (1) near the origin in Figure 2 (b). We \nnotice that a similar limitation of dynamic range applies to other distance metrics with \nvanishing slope near the origin as well, the popular mean square error (MSE) formulation \nin particular. The MSE metric is frequently adopted in VQ implementations using strong \ninversion MOS circuitry, and offers a dynamic range typically worse than obtained here \nregardless of implementation accuracy, due to the relatively wide flat region of the MSE \ndistance function near the origin. \n\nITbe voltages on the scope plot are inverted as a consequence of the chip test setup. \n2including template write operations \n\n\fA \"Charge-Based CMOS Parallel Analog Vector Quantizer \n\n785 \n\nFigure 4: \ndistance output zO. Bottom: Least significant bit of encoded output. \n\nScope plot of VQ waveforms. \n\nTop: Analog input Xo. Center: Analog \n\nTable 1: \n\nFeatures of the VQ chip \n\nTechnology \nSupply voltage \nPower dissipation \n\nVQchip \n\nDynamic range \n\n2 ~ p-well double-poly CMOS \n+5V \n\n0.7 mW (3 J.lSec cycle time) \n\ninputs, templates \n\n34 dB \n\nArea \n\nVQchip \ndistance cell \nWTAcell \n\n2.2 mm X 2.25 mm \n6O~X78~ \n76~X80~ \n\n\f786 \n\nGert Cauwenberghs, Volnei Pedroni \n\n5 Conclusion \n\nWe proposed and demonstrated a synchronous charge-based CMOS VLSI system for \nparallel analog vector quantization, featuring a mean absolute difference (MAD) metric, \nand a linear winner-take-all (WTA) structure with globalized positive feedback. By virtue \nof the MAD metric, a fairly large (34 dB) analog dynamic range of inputs and templates \nhas been obtained in the distance computations through simple charge-based circuitry. \nLikewise, fast and unambiguous settling of the WTA outputs, using global competition \ncommunicated over a single wire, has been obtained by adopting a compact linear circuit \nstructure to implement the positive feedback WTA function. The resulting structure of \nthe VQ chip is highly modular, and the functional characteristics are fairly consistent \nover a wide range of bias levels, including the MOS weak inversion and subthreshold \nregions. This allows the circuitry to be tuned to accommodate various speed and power \nrequirements. A summary of the chip features of the 16 x 16 vector quantizer is presented \nin Table I. \n\nAcknowledgments \n\nFabrication of the CMOS chip was provided through the DARPAINSF MOSIS service. \nThe authors thank Amnon Yariv for stimulating discussions and encouragement. \n\nReferences \n\n[1] A. Gersho and RM. Gray, Vector Quantization and Signal Compression, Norwell, \nMA: Kluwer, 1992. \n[2] T. Kohonen, Self-Organisation and Associative Memory, Berlin: Springer-Verlag, \n1984. \n[3] Y. He and U. Cilingiroglu, \"A Charge-Based On-Chip Adaptation Kohonen Neural \nNetwork,\" IEEE Transactions on Neural Networks, vol. 4 (3), pp 462-469, 1993. \n[4] J.e. Lee, B.J. Sheu, and W.e. Fang, \"VLSI Neuroprocessors for Video Motion De(cid:173)\ntection,\" IEEE Transactions on Neural Networks, vol. 4 (2), pp 78-191, 1993. \n[5] R Tawel, \"Real-Time Focal-Plane Image Compression,\" in Proceedings Data Com(cid:173)\npression Conference\" Snowbird, Utah, IEEE Computer Society Press, pp 401-409, 1993. \n[6] G.T. Tuttle, S. Fallahi, and A.A. Abidi, \"An 8b CMOS Vector NO Converter,\" in \nISSCC Technical Digest, IEEE Press, vol. 36, pp 38-39, 1993. \n[7] C.A. Mead, Analog VLSI and Neural Systems, Reading, MA: Addison-Wesley, 1989. \n[8] J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead, \"Winner-Take-All Net(cid:173)\nworks of O(n) Complexity,\" in Advances in Neural Information Processing Systems, San \nMateo, CA: Morgan Kaufman, vol. 1, pp 703-711, 1989. \n[9] A.G. Andreou, K.A. Boahen, P.O. Pouliquen, A. Pavasovic, RE. Jenkins, and K. Stro(cid:173)\nhbehn, \"Current-Mode Subthreshold MOS Circuits for Analog VLSI Neural Systems,\" \nIEEE Transactions on Neural Networks, vol. 2 (2), pp 205-213, 1991. \n\n\f", "award": [], "sourceid": 947, "authors": [{"given_name": "Gert", "family_name": "Cauwenberghs", "institution": null}, {"given_name": "Volnei", "family_name": "Pedroni", "institution": null}]}