{"title": "Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 249, "page_last": 256, "abstract": null, "full_text": " Sub-Microwatt Analog VLSI\n Support Vector Machine for\n Pattern Classification and Sequence Estimation\n\n\n\n Shantanu Chakrabartty and Gert Cauwenberghs\n Department of Electrical and Computer Engineering\n Johns Hopkins University, Baltimore, MD 21218\n {shantanu,gert}@jhu.edu\n\n\n Abstract\n\n An analog system-on-chip for kernel-based pattern classification and se-\n quence estimation is presented. State transition probabilities conditioned\n on input data are generated by an integrated support vector machine. Dot\n product based kernels and support vector coefficients are implemented\n in analog programmable floating gate translinear circuits, and probabil-\n ities are propagated and normalized using sub-threshold current-mode\n circuits. A 14-input, 24-state, and 720-support vector forward decod-\n ing kernel machine is integrated on a 3mm3mm chip in 0.5m CMOS\n technology. Experiments with the processor trained for speaker verifica-\n tion and phoneme sequence estimation demonstrate real-time recognition\n accuracy at par with floating-point software, at sub-microwatt power.\n\n\n\n1 Introduction\n\nThe key to attaining autonomy in wireless sensory systems is to embed pattern recognition\nintelligence directly at the sensor interface. Severe power constraints in wireless integrated\nsystems incur design optimization across device, circuit, architecture and system levels [1].\nAlthough system-on-chip methodologies have been primarily digital, analog integrated sys-\ntems are emerging as promising alternatives with higher energy efficiency and integration\ndensity, exploiting the analog sensory interface and computational primitives inherent in\ndevice physics [2]. Analog VLSI has been chosen, for instance, to implement Viterbi [3]\nand HMM-based [4] sequence decoding in communications and speech processing.\nForward-Decoding Kernel Machines (FDKM) [5] provide an adaptive framework for gen-\neral maximum a posteriori (MAP) sequence decoding, that avoid the need for backward\nrecursion over the data in Viterbi and HMM-based sequence decoding [6]. At the core of\nFDKM is a support vector machine (SVM) [7] for large-margin trainable pattern classifi-\ncation, performing noise-robust regression of transition probabilities in forward sequence\nestimation. The achievable limits of FDKM power-consumption are determined by the\nnumber of support vectors (i.e., regression templates), which in turn are determined by\nthe complexity of the discrimination task and the signal-to-noise ratio of the sensor inter-\nface [8].\n\n\f\n MVM MVM\n 24\n\n\n 2\n 1\n\n\n SUPPORT VECTORS KERNEL\n\n s\n x \n s i1\n 30x24 30x24\n K(x,x\n\n\n s )\n\n x f (x)\n 14 24x24\n i1\n\n INPUT\n\n NORMALIZATION\n\n P P\n i1 i24\n 24x24\n 24\n FORWARD DECODING j[n-1]\n 24 i[n]\n\n Figure 1: FDKM system architecture.\n\n\n\nIn this paper we describe an implementation of FDKM in silicon, for use in adaptive se-\nquence detection and pattern recognition. The chip is fully configurable with parameters\ndirectly downloadable onto an array of floating-gate CMOS computational memory cells.\nBy means of calibration and chip-in-loop training, the effect of mismatch and non-linearity\nin the analog implementation is significantly reduced.\nSection 2 reviews FDKM formulation and notations. Section 3 describes the schematic\ndetails of hardware implementation of FDKM. Section 4 presents results from experiments\nconducted with the fabricated chip and Section 5 concludes with future directions.\n\n\n2 FDKM Sequence Decoding\n\nFDKM recognition and sequence decoding are formulated in the framework of MAP (max-\nimum a posteriori) estimation, combining Markovian dynamics with kernel machines.\nThe MAP forward decoder receives the sequence X[n] = {x[1], x[2], . . . , x[n]} and pro-\nduces an estimate of conditional probability measure of state variables q[n] over all classes\ni 1, .., S, i[n] = P (q[n] = i | X[n]). Unlike hidden Markov models, the states\ndirectly encode the symbols, and the observations x modulate transition probabilities be-\ntween states [6]. Estimates of the posterior probability i[n] are obtained from estimates\nof local transition probabilities using the forward-decoding procedure [6]\n\n S\n P\n i[n] = ij [n] j [n - 1] (1)\n j=1\n\nwhere Pij[n] = P (q[n] = i | q[n - 1] = j, x[n]) denotes the probability of making a\ntransition from class j at time n - 1 to class i at time n, given the current observation\nvector x[n]. Forward decoding (1) expresses first order Markovian sequential dependence\nof state probabilities conditioned on the data.\nThe transition probabilities Pij[n] in (1) attached to each outgoing state j are obtained by\nnormalizing the SVM regression outputs fij(x):\n\n Pij[n] = [fij(x[n]) - zj[n]]+ (2)\n\n\f\n Vdd\n\n\n M4\n\n A\n V V\n g ref g\n V M1 V M2\n c c\n\n M3\n C B\n V V I\n tunn tunn out\n\n Iin\n (a)\n\n (x.x )2\n Vdd s\n x\n M7 M9\n\n M10\n\n M8\n Vbias\n M5 M6\n\n (b) sK(x, x )\n ij s\n\n\nFigure 2: Schematic of the SVM stage. (a) Multiply accumulate cell and reference cell for\nthe MVM blocks in Figure 1. (b) Combined input, kernel and MVM modules.\n\n\n\nwhere [.]+ = max(., 0). The normalization mechanism is subtractive rather than divisive,\nwith normalization offset factor zj[n] obtained using a reverse-waterfilling criterion with\nrespect to a probability margin [10],\n\n [fij(x[n]) - zj[n]]+ = . (3)\n i\n\nBesides improved robustness [8], the advantage of the subtractive normalization (3) is its\namenability to current mode implementation as opposed to logistic normalization [11]\nwhich requires exponentiation of currents. The SVM outputs (margin variables) fij(x)\nare given by:\n N\n f s K\n ij (x) = (x, x\n ij s) + bij (4)\n s\n\nwhere K(, ) denotes a symmetric positive-definite kernel1 satisfying the Mercer condi-\ntion, such as a Gaussian radial basis function or a polynomial spline [7], and xs[m], m =\n1, .., N denote the support vectors. The parameters s in (4) and the support vectors x\n ij s[m]\nare determined by training on a labeled training set using a recursive FDKM procedure de-\nscribed in [5].\n\n\n3 Hardware Implementation\n\nA second order polynomial kernel K(x, y) = (x.y)2 was chosen for convenience of im-\nplementation. This inner-product based architecture directly maps onto an analog compu-\ntational array, where storage and computation share common circuit elements. The FDKM\n\n 1K(x, y) = (x).(y). The map () need not be computed explicitly, as it only appears in\ninner-product form.\n\n\f\n f [n] Vdd Vdd Vdd Vdd\n \n ij i[n]\n\n M6 M9\n Aij P [n]\n ij M7 M8\n \n M4\n\n\n M2 M3\n M5\n M1 Vref\n j[n-1]\n\n Figure 3: Schematic of the margin propagation block.\n\n\nsystem architecture is shown in Figure 1. It consists of several SVM stages that generates\nstate transition probabilities Pij[n] modulated by input data x[n], and a forward decoding\nblock that performs maximum a posteriori (MAP) estimation of the state sequence i[n].\n\n3.1 SVM Stage\n\nThe SVM stage implements (4) to generate unnormalized probabilities. It consists of a ker-\nnel stage computing kernels K(x ,\n s x) between input vector x and stored support vectors\nxs, and a coefficient stage linearly combining kernels using stored training parameters s .\n ij\nBoth kernel and coefficient blocks incorporate an analog matrix-vector multiplier (MVM)\nwith embedded storage of support vectors and coefficients. A single multiply-accumulate\ncell, using floating-gate CMOS non-volative analog storage, is shown in Figure 2(a). The\nfloating gate node voltages (Vg) of transistors M2 are programmed using hot-electron in-\njection and tunneling [12]. The input stage comprising transistors M1, M3 and M4 forms\na key component in the design of the array and sets the voltage at node A as a function\nof input current. By operating the array in weak-inversion, the output current through the\nfloating gate element M2 in terms of the input stage floating gate potential Vgref and mem-\nory element floating gate potential Vg is given by\n\n I e-(Vg-Vgref)/UT\n out = Iin (5)\n\nas a product of two pseudo-currents, leading to single quadrant multiplier. Two observa-\ntions can be directly made regarding Eqn. (5):\n\n 1. The input stage eliminates the effect of the bulk on the output current, making it\n a function of the reference floating gate voltage which can be easily programmed\n for the entire row.\n\n 2. The weight is differential in the floating gate voltages Vg - Vgref , allowing to\n increase or decrease the weight by hot electron injection only, without the need\n for repeated high-voltage tunneling. For instance, the leakage current in unused\n rows can be reduced significantly by programming the reference gate voltage to a\n high value, leading to power savings.\n\nThe feedback transistor in the input stage M3 reduces the output impedance of node A\ngiven by ro gd1/gm1gm2. This makes the array scalable as additional memory elements\ncan be added to the node without pulling the voltage down. An added benefit of keeping\nthe voltage at node A fixed is reduced variation in back gate parameter in the floating\ngate elements. The current from each memory element is summed on a low impedance\nnode established by two diode connected transistors M7-M10. This partially compensates\nfor large Early voltage effects implicit in floating gate transistors.\n\n\f\n (a) (b)\n\n\nFigure 4: Single input-output response of the SVM stage illustrating the square transfer\nfunction of the kernel block (log(Iout) vs. log(Iin)) where all the MVM elements are pro-\ngrammed for unity gain. (a) Before calibration showing mismatch between rows. (b)\nAfter pre-distortion compensation of input and output coefficients.\n\n\n\nThe array of elements M2 with peripheral circuits as shown in Figure 2(a) thus implement a\nsimple single quadrant matrix-vector multiplication module. The single quadrant operation\nis adequate for unsigned inputs, and hence unsigned support vectors. A simple squaring\ncircuit M7-M10 is used to implement the non-linear kernel as shown in figure 2(b). The\nrequirement on the type of non-linearity is not stringent and can be easily incorporated\ninto the kernel in SVM training procedure [5]. The coefficient block consists of the same\nmatrix-vector multiplier given in figure 2(a). For the general probability model given by (2)\na single quadrant multiplication is sufficient to model any distribution. This can be easily\nverified by observing that the distribution (2) is invariant to uniform offset in the coefficients\ns .\n ij\n\n\n3.2 Forward Decoding Stage\n\nThe forward recursion decoding is implemented by a modified version of the sum-product\nprobability propagation circuit in [13], performing margin-based probability propagation\naccording to (1). In contrast to divisive normalization that relies on the translinear principle\nusing sub-threshold MOS or bipolar circuits in [13], the implementation of margin-based\nsubtractive normalization shown in figure 3 [10] is device operation independent. The\ncircuit consists of several normalization cells Aij along columns computing Pij = [fij -\nz]+ using transistors M1-M4. Transistors M5-M9 form a feedback loop that compares and\nstabilizes the circuit to the normalization criterion (3). The currents through transistors\nM4 are auto-normalized to the previous state value j[n - 1] to produce a new estimate\nof i[n1] based on recursion (1). The delay in equation (1) is implemented using a log-\ndomain filter and a fixed normalization current ensures that all output currents be properly\nscaled to stabilize the continuous-time feedback loop.\n\n\n4 Experimental Results\n\nA 14-input, 24-state, and 2430-support vector FDKM was integrated on a 3mm3mm\nFDKM chip, fabricated in a 0.5m CMOS process, and fully tested. Figure 5(c) shows the\nmicrograph of the fabricated chip. Labeled training data pertaining to a certain task were\nused to train an SVM, and the training coefficients thus obtained were programmed onto\nthe chip.\n\n\f\n Table 1: FDKM Chip Summary\n\n Technology Value\n Area 3mm3mm\n Technology 0.5 CMOS\n Supply Voltage 4 V\n System Parameters\n Floating Cell Count 28814\n Number of Support Vectors 720\n Input Dimension 14\n Number of States 24\n Power Consumption 80nW - 840nW\n Energy Efficiency 1.6pJ/MAC\n\n\n\n x2 x3 x4 x5 x6\n\n x1 q2 q3 q4 q5 q6 q7\n q1\n\n\n x6 q8 q9 q10 q11 q12 q13\n x5 x4 x3 x2 x1\n\n\n (a)\n\n\n\n\n\n (b) (c)\n\n\nFigure 5: (a) Transition-based sequence detection in a 13-state Markov model. (b) Ex-\nperimental recording of 7 = P (q7), detecting one of two recurring sequences in inputs\nx1 x6 (x1, x3 and x5 shown). (c) Micrograph of the FDKM chip\n\n\nProgramming of the trained coefficients was performed by programming respective cells\nM2 along with the corresponding input stage M1, so as to establish the desired ratio of\ncurrents. The values were established by continuing hot electron injection until the de-\nsired current was attained. During hot electron injection, the control gate Vc was adjusted\nto set the injection current to a constant level for stable injection. All cells in the kernel\nand coefficient modules of the SVM stage are random accessible for read, write and cali-\nbrate operations. The calibration procedure compensates for mismatch between different\ninput/output paths by adapting the floating gate elements in the MVM cells. This is illus-\ntrated in Figure 4 where the measured square kernel transfer function is shown before and\nafter calibration.\nThe chip is fully reconfigurable and can perform different recognition tasks by program-\nming different training parameters, as demonstrated through three examples below. De-\npending on the number of active support vectors and the absolute level of currents (in\nrelation to decoding bandwidth), power dissipation is in the lower nanowatt to microwatt\nrange.\n\n\f\n 100\n\n\n 95 Simulated\n Measured\n\n 90\n\n\n\n 85\n\n\n\n 80\n\n True Positive (%)\n\n 75\n\n\n\n 70\n\n\n\n 650 5 10 15 20 25\n False Positive (%)\n\n (a) (b)\n\n\nFigure 6: (a) Measured and simulated ROC curve for the speaker verification experiment.\n(b) Experimental phoneme recognition by FDKM chip. The state probability shown is for\nconsonant /t/ in words \"torn,\" \"rat,\" and \"error.\" Two peaks are located as expected from\nthe input sequence, shown on top.\n\n\n\nFor the first set of experiments, parameters corresponding to a simple Markov chain shown\nin figure 5(a) were programmed onto the chip to differentiate between two given sequences\nof input features: one a sweep of active input components in rising order (x1 through\nx6), and the other in descending order (x6 through x1). The output of state q7 in the\nMarkov chain is shown in figure 5(b). It can be clearly observed that state q7 \"fires\" only\nwhen a rising sequence of pulse trains arrives. The FDKM chip thereby demonstrates\nprobability propagation similar to that in the architecture of [4]. The main difference is that\nthe present architecture can be configured for detecting other, more complex sequences\nthrough programming and training.\nFor the second set of experiments the FDKM chip was programmed to perform speaker ver-\nification using speech data from YOHO corpus. For training we chose 480 utterances cor-\nresponding to 10 separate speakers (101-110). For each of these utterances 12 mel-cepstra\ncoefficients were computed for every 25ms frames. These coefficients were clustered using\nk-means clustering to obtain 50 clusters per speaker which were then used for training the\nSVM. For testing 480 utterances for those speakers were chosen, and confidence scores\nreturned by the SVMs were integrated over all frames of an utterance to obtain a final\ndecision. Verification results obtained from the chip demonstrate 97% true acceptance at\n1% false positive rate, identical to the performance obtained through floating point soft-\nware simulations as shown by the receiver operating characteristic shown in figure 6(a).\nThe total power consumption for this task is only 840nW, demonstrating its suitability for\nautonomous sensor applications.\nA third set of experiment aimed at detecting phone utterances in human speech. Mel-\ncepstra coefficients of six phone utterances (/t/,/n/,/r/,/ow/,/ah/,/eh/) selected from\nthe TIMIT corpus were transformed using singular value decomposition and thresholding.\nEven though the recognition was demonstrated for the reduced set of features, the chip op-\nerates internally with analog inputs. Figure 6(b) illustrates correct detection of phonemes as\nidentified by the presence of phone /t/ at the expected time instances in the input sequence.\n\n\n5 Discussion and Conclusion\n\nWe designed an FDKM based sequence recognition system on silicon and demonstrated\nits performance on simple but general tasks. The chip is fully reconfigurable and differ-\nent sequence recognition engines can be programmed using parameters obtained through\n\n\f\nSVM training. FDKM decoding is performed in real-time and is ideally suited for sequence\nrecognition and verification problems involving speech features. All analog processing in\nthe chip is performed by transistors operating in weak-inversion resulting in power dissipa-\ntion in the nanowatt to microwatt range. Non-volatile storage of training parameters further\nreduces standby power dissipation.\nWe also note that while low power dissipation is a virtue in many applications, increased\npower can be traded for increased bandwidth. For instance, the presented circuits could be\nadapted using heterojunction bipolar junction transistors in a SiGe process for ultra-high\nspeed MAP decoding applications in digital communication, using essentially the same\nFDKM architecture as presented here.\n\nAcknowledgement: This work is supported by a grant from The Catalyst Founda-\ntion (http://www.catalyst-foundation.org), NSF IIS-0209289, ONR/DARPA N00014-00-C-\n0315, and ONR N00014-99-1-0612. The chip was fabricated through the MOSIS service.\n\n\nReferences\n\n [1] Wang, A. and Chandrakasan, A.P, \"Energy-Efficient DSPs for Wireless Sensor Net-\n works,\" IEEE Signal Proc. Mag., vol. 19 (4), pp. 68-78, July 2002.\n\n [2] Vittoz, E.A., \"Low-Power Design: Ways to Approach the Limits,\" Dig. 41st IEEE\n Int. Solid-State Circuits Conf. (ISSCC), San Francisco CA, 1994.\n\n [3] Shakiba, M.S, Johns, D.A, and Martin, K.W, \"BiCMOS Circuits for Analog Viterbi\n Decoders,\" IEEE Trans. Circuits and Systems II, vol. 45 (12), Dec. 1998.\n\n [4] Lazzaro, J, Wawrzynek, J, and Lippmann, R.P, \"A Micropower Analog Circuit Imple-\n mentation of Hidden Markov Model State Decoding,\" IEEE J. Solid-State Circuits,\n vol. 32 (8), Aug. 1997.\n\n [5] Chakrabartty, S. and Cauwenberghs, G. \"Forward Decoding Kernel Machines: A\n hybrid HMM/SVM Approach to Sequence Recognition,\" IEEE Int. Conf. of Pattern\n Recognition: SVM workshop. (ICPR'2002), Niagara Falls, 2002.\n\n [6] Bourlard, H. and Morgan, N., Connectionist Speech Recognition: A Hybrid Ap-\n proach, Kluwer Academic, 1994.\n\n [7] Vapnik, V. The Nature of Statistical Learning Theory, New York: Springer-Verlag,\n 1995.\n\n [8] Chakrabartty, S., and Cauwenberghs, G. \"Power Dissipation Limits and Large Mar-\n gin in Wireless Sensors,\" Proc. IEEE Int. Symp. Circuits and Systems(ISCAS2003),\n vol. 4, 25-28, May 2003.\n\n [9] Bahl, L.R., Cocke J., Jelinek F. and Raviv J. \"Optimal Decoding of Linear Codes for\n Minimizing Symbol Error Rate,\" IEEE Transactions on Inform. Theory, vol. IT-20,\n pp. 284-287, 1974.\n\n[10] Chakrabartty, S., and Cauwenberghs, G. \"Margin Propagation and Forward Decoding\n in Analog VLSI,\" Proc. IEEE Int. Symp. Circuits and Systems(ISCAS2004), Vancou-\n ver Canada, May 23-26, 2004.\n\n[11] Jaakkola, T. and Haussler, D. \"Probabilistic kernel regression models,\" Proc. Seventh\n Int. Workshop Artificial Intelligence and Statistics , 1999.\n\n[12] C. Dorio,P. Hasler,B. Minch and C.A. Mead, \"A Single-Transistor Silicon Synapse,\"\n IEEE Trans. Electron Devices, vol. 43 (11), Nov. 1996.\n\n[13] H. Loeliger, F. Lustenberger, M. Helfenstein and F. Tarkoy, \"Probability Propagation\n and Decoding in Analog VLSI,\" IEEE Proc. ISIT, 1998.\n\n\f\n", "award": [], "sourceid": 2573, "authors": [{"given_name": "Shantanu", "family_name": "Chakrabartty", "institution": null}, {"given_name": "Gert", "family_name": "Cauwenberghs", "institution": null}]}