{"title": "Compact EEPROM-based Weight Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 1001, "page_last": 1007, "abstract": null, "full_text": "Compact  EEPROM-based Weight  Functions \n\nA.  Kramer, C.  K.  Sin, R.  Chu, and P. K.  Ko \nDepartment of Electrical Engineering  and  Computer Science \nUniversity of California at  Berkeley \nBerkeley,  CA  94720 \n\nAbstract \n\nWe are focusing on the development of a  highly compact neural net weight \nfunction based on the use of EEPROM devices.  These devices have already \nproven  useful  for  analog  weight storage,  but  existing  designs  rely  on  the \nuse  of conventional voltage multiplication as the weight function,  requiring \nadditional  transistors  per  synapse.  A  parasitic  capacitance  between  the \nfloating gate and the drain of the EEPROM structure leads to an unusual \nJ-V characteristic  which can  be used  to advantage in  designing a compact \nsynapse.  This  novel  behavior  is  well  characterized  by  a  model  we  have \ndeveloped.  A single-device circuit results in a  1-quadrant synapse function \nwhich  is  nonlinear,  though  monotonic.  A  simple  extension  employing  2 \nEEPROMs  results  in  a  2  quadrant  function  which  is  much  more  linear. \nThis approach offers  the  potential for  more than a  ten-fold  increase  in  the \ndensity of neural net  implementations. \n\n1 \n\nINTRODUCTION - ANALOG  WEIGHTING \n\nThe recent surge of interest in neural networks and parallel analog computation has \nmotivated  the  need  for  compact  analog computing blocks.  Analog  weighting is  an \nimportant computational function  of this class.  Analog  weighting is  the combining \nof two analog values, one  of which  is  typically varying (the input) and one of which \nis  typically fixed  (the  weight)  or  at  least  varying  more  slowly.  The  varying  value \nis  \"weighted\"  by  the  fixed  value  through  the  \"weighting function\",  typically  mul(cid:173)\ntiplication.  Analog  weighting is  most  interesting  when  the  overall computational \ntask  involves  computing  the  \"weighted  sum  of the  inputs.\"  That  is,  to  compute \n\n2:7=1  t(lOj, Vi)  where  to  is  the  weighting function  and  ~v =  {lOb W2,  ... , wn}  and \n\n1001 \n\n\f1002 \n\nKramer, Sin, Chu, and Ko \n\nv =  {VI,  V2,  \u2022\u2022\u2022 ,  V n }  are  the  n-dimensional analog-valued  weight  and  input  vectors. \nThis weighted sum is  simply the dot product  in  the  case  where  the  weighting func(cid:173)\ntion is  multiplication. \n\nFor large  n,  the only  way to perform  this computation efficiently is  to use  compact \nweighting functions  and to  take advantage of current  summing.  Using  \"conductive \nmultiplication\" as  the  weighting function  (weights stored as conductances  of single \ndevices)  results  in an efficient implementation such as that shown in  figure  1a.  This \nimplementation is  probably optimal, but in practice it is  not possible to implement \nsmall single-device  programmable conductances  which are  linear. \n\nv \n\nv \n\nv \n\nI=f(W,V) \n\n(a) \n\n(b) \n\n(c) \n\n(d) \n\nFigure  1:  Weighting  function  implementations:  (a)  ideal,  (b)  conventional,  (c) \nEEPROM-based storage,  (d)  compact  EEPROM-based  nonlinear weight function \n\n1.1  CONVENTIONAL  APPROACHES \n\nThe  problem  of implementing analog  weighting is  often  divided  into the  separate \ntasks  of storing  the  fixed  value  (the  weight)  and  combining  the  two analog values \nthrough  the  weighting function  (figure  1 b).  Conventional approaches  to storing  a \nfixed  analog  weight  value  are  to  use  either  digital storage  with  some  form  of D/ A \nconversion or  to  use  volatile analog storage,  which  requires  a  large capacitor.  Both \nof these  storage technologies  require  a large  area. \nThe simplest  and  most  widespread  weighting function  is  multiplication  [f( w, -i)  = \nwi].  Multiplication  is  attractive  because  of  its  mathematical  and  computational \nsimplicity.  Multiplication is  also a fairly straightforward operation to implement in \nanalog circuitry.  When  conventional technologies  are  used  for  weight  storage,  the \nadditional area required  to provide a  multiplication function  is  not  significant.  Of \ncourse,  the  problem  with  this  approach  is  that  since  a  large  area  is  required  for \nweight storage,  the  result  is  not sufficiently compact. \n\n2  EEPROMS \n\nEEPROMs  are  \"electrically  erasable,  programmable,  read-only  memories\".  They \nare  essentially  a  JFET  with  a  floating  gate  and  a  thin-oxide  tunneling  region  be(cid:173)\ntween  the  floating  gate  and  the  drain  (figure  2).  A  sufficiently  high  field  across \nthe  tunneling oxide  will  cause  electrons  to  tunnel  into or  out  of the  floating  gate, \n\n\fCompact EEPROM-based Weight Functions \n\n1003 \n\neffectively  altering  the  threshold  voltage  of the  device  as  seen  from  the  top  gate. \nNormal operating (reading)  voltages are sufficiently small to cause only insignificant \n\"disturbance programming\"  of the charge on the floating gate, so an EEPROM can \nbe  viewed  as  a  compact storage capacitor  with  a  very  long storage lifetime. \n\ntunneling  ~~~ ~ \n\nff top oxide  ~ \n\nr--N~~ \n\nsource) \n\ntunneling oxide \n\nFigure  2:  EEPROM layout and  cross  section \n\nSeveral groups  have found  that charge leakage on  EEPROMs is  sufficiently small to \nguarantee that the threshold of a device can be retained with 4-8 bits of precision for \na  period  of years  [Kramer,  1989][Holler,  1989].  There  are  several drawbacks to  the \nuse of EEPROMs.  Correct programming of these devices to the desired  value is  hard \nto control  and  requires  feedback.  While  the  programming  time  for  a  single  device \nis  less  than  a  millisecond,  because  devices  must be  programmed one-at-a-time, the \ntime to program all the devices on a chip can be  prohibitive.  In addition, fabrication \nof EEPROMs is  a  non-standard  process  requiring several additional masks  and  the \nability to make a  thin tunneling oxide. \n\n2.1  EEPROM-BASED WEIGHT  STORAGE \n\nThe most straightfOl'ward manner to use  an EEPROM in  a  weighting function is  to \nstore the  weight with the device.  For example, the threshold of an EEPROM device \ncould  be  programmed  to  produce  the  desired  bias  current  for  an  analog  amplifier \n(figure  lc).  There  are  two advantages to this approach.  Firstly, the  weight storage \nmechanism  is  divorced  from  the  actual  weight  function  computation  and  hence \nplaces few  constraints on  it,  and  secondly, if the  EEPROM is  used  in a  static mode \n(all  applied  voltages  are  constant),  the  exact  I-V  characteristics  of the  EEPROM \ndevice  are  inconsequential. \n\nThe  major  disadvantage of this  approach  is  that  of inefficiency,  as  additional  cir(cid:173)\ncuitry  is  needed  to  perform  the  weight  function  computation.  An  example  of this \ncan  be  seen  in  a  recent  EEPROM-based  neural  net  implementation developed  by \nthe  Intel corporation  [Holler,  1989].  Though  the  weight  value  in  this  implementa(cid:173)\ntion  is  stored  on  only  two  EEPROMs,  an  additional  4  transistors  are  needed  for \nthe multiplication function.  In addition, though the circuit was designed  to perform \nmultiplication the output is  not quite linear under  the best of conditions and, under \ncertain  conditions,  exhibits  severe  nonlinearity,  Despite  these  limitations,  this  de(cid:173)\nsign demonstrates  the advantage of EEPROM storage technology over conventional \napproaches,  as  it  is  the  most  dense  neural network implementation to date. \n\n\f1004 \n\nKramer, Sin, Chu, and Ko \n\n3  EEPROM I-V  CHARACTERISTICS \n\nSince linearity is difficult to implement and not a strict requirement of the weighting \nfunction,  we  have  investigated the  possibility of using  the I-V characteristics of an \nEEPROM  as  the  weight function.  This  approach  has  the  advantage  that  a  single \ndevice  could  be  used  for  both  weight  storage  and  weight  function  computation, \nproviding a  very  compact  implementation.  It is  our  hope  that  this  approach  will \nlead to useful synapses of less  than  200um2  in  area, less  than a  tenth the area used \nby the Intel synapse. \n\nThough  an  EEPROM  is  a  JFET  device,  a  parasitic  capacitance  of the  structure \nresults  in  an  I-V  characteristic  which  is  unique.  Conventional  use  of  EEPROM \ndevices  in  digital  circuitry  does  not  make  use  of this  fact,  so  that  this  effect  has \nnot  before  been  characterized  or  modeled.  The  floating  gate  of an  EEPROM  is \ncontrolled via capacitive coupling by the top gate.  In addition, the thin-ox tunneling \nregion between the floating gate and the drain creates  a  parasitic capacitor between \nthese two nodes.  Though the area of this drain capacitor is  small relative to that of \nthe top-gate floating-gate overlap area, the tunneling oxide is much thinner than the \ninsulating oxide  between  the  two gates,  resulting  in  a  significant drain capacitance \n(figure  3). \n\nWe  have  developed  a  model  for  an  EEPROM  which  includes  this  parasitic  drain \ncapacitance  (figure  3).  The  basic  contribution  of this  capacitance  is  to  couple  the \nfloating-gate  voltage to the  drain  voltage.  This is  most  obvious when  the  device  is \nsaturated;  while  the current through  a standard JFET is  to first  order independent \nof drain voltage in this region,  in the case of an EEPROM, the current has a square \nlaw dependence  on the drain voltage (equation 3).  While this artifact of EEPROMs \nmakes  them  behave  poorly  as  current  sources,  it  may  make  them  more  useful  as \nsingle-device  weighting functions. \n\nfloating \nCox  :~gate \n\n-L \n--v- Cg~Cd \n\n1&(,: ~ \nCg \n=.J \n\nOlCf \n\n\\ \n\n~ \n\n\\.. \n\nSL \nEEPROM \n\nMODEL \n\nFigure  3:  EEPROM  model and capacitor  areas \n\nThere  are  several  ways  to  analyze  our  model  depending  on  the  level  of accuracy \ndesired  [Sin,  1991].  We  present here  the  results  of simplest of these  which captures \nthe  essential  behavior of an  EEPROM.  This  analysis  is  based  on  a  linear  channel \napproximation  and  the  equations  which  result  are  similar  in  form  to  those  for  a \nnormal JFET, with the addition of the dependence between the floating gate voltage \nand  the  drain  voltage  and  all capacitive coupling factors.  The  equations  for  drain \nsaturation  voltage (VdssaJ,  nonsaturated  drain current (Idsl,J  and saturated drain \ncurrent (Ids .at ) are: \n\n\fCompact EEPROM-based Weight Functions \n\n1005 \n\nOg Vg  - vt(Co.z;  + C'g  + Cd) \n\no .5C'ox + C'g \n\nA-p  [( \n\nC'g Vg \n\nCox + Cg + Cd) \n\nI(  [eg Vg  + CdVds  - vt(Cox  + Cg + Cd)]2 \n\n_  vt)  _  Vd~  ( \n\nC'g  - Cd \n\nCox  + Cg + Cd) \n\n)] \n\n2 \n\np \n\n0.5Co:!'\u00b7  + Cg + Cd \n\n(1) \n\n(2) \n\n(3) \n\nOn  EEPROM  devices  we  have  fabricated  in  house,  our  model  matches  measured \n1-V data well,  especially in capt uring the dependence  of saturated drain current on \ndrain  voltage (figure  4). \n\n160.00 \n\n14000 \n\n-\n\nMeasured \n\nVds \n\nVgs --1 \n\n120.00 \n\n10000 \n\nIds \n(uA)  8000 \n\n60.00 \n\n4000 \n\n--t--h7\"----t---=~I<\"=--_t_--+____::::.._t-'l-g=2V \n\n~~-~!!::-!!:-.::::-.:t--::::.--:::-.. : ... :. +-_~=~iYg=lV \n0.00  0.00 \n\n100 \n\ns.oo \n\n3.00 \n\n4.00 \n\n2.00 \nVds(V) \n\nFigure 4:  EEPROM I-V,  measured  and simulated. \n\n4  EEPROM-BASED WEIGHTING FUNCTIONS \n\nOne way to make a compact weight function using an EEPROM is  to use  the device \n1-V characteristics directly.  This could be accomplished by storing the weight as the \ndevice  threshold  voltage  (vt),  applying  the  input  value  as  the drain-source  voltage \n(Vds)  and  setting  the  top  gate  voltage  to  a  constant  reference  value  (figure  ld). \nIn  this case  the  synapse  would  look  exactly  like  the  I-V  measuring  circuit  and  the \nweighting function  would  be  exactly  the  EEPROM  I-V  shown  in  figure  4,  except \nthat  rather  than  leaving  the  threshold  voltage fixed  and  varying  the  gate  voltage, \nas  was  done  to  generate  the  curves  shown,  the  gate  voltage  would  be  fixed  to  a \nconstant value and different curves would be generated  by programming the device \nthreshold  to different  values. \n\nWhile  extremely  compact  (a  single  device),  this  function  is  only  a  one  quadrant \nfunction  (both  weight and  input  values  must be  positive or output is  zero)  and for \n\n\f1006 \n\nKramer, Sin, Chu, and Ko \n\nmany  applications  this  is  not  sufficient.  An  easy  way  to  provide  a  two-quadrant \nfunction  based  on  a  similar  approach  is  to  use  two  EEPROMs  configured  in  a \ncommon-input,  differential-output  (lout  = Ids+  -\nIds -)  scheme,  as  in  the  circuit \ndepicted  in  figure  5.  By  programming the  EEPRO Ms  so  that one  is  always active \nand one is always inactive, the output of the weight function can now be a  \"positive\" \nor a  \"negative\" current,  depending on which device  is  chosen.  Again, the weighting \nfunction  is  exactly the  EEPROM I-V  in  this case. \n\nIn  addition to providing a  two-quadrant function,  this  two-device circuit  offers  an(cid:173)\nother  interesting possibility.  The same  differential output  scheme  can  be  made  to \nprovide a  much more linear  two quadrant function  if both  \"positive\" and  ''negative \ndevices  are  programmed  to  be  active  (negative  thresholds).  The  \"weight\"  in  this \ncase is  the difference  in  threshold  values between the two devices  (W =  l{- - vt+). \nThis scheme  \"subtracts\"  one  device  curve  from  the  other.  The  model  we  have de(cid:173)\nveloped  indicates  that  this  has  the effect  of canceling out much of the nonlinearity \nand  results  in  a  function  which  has  three  distinct  regions,  two of which  are  linear \nin  the  input voltage and the  weight value. \n\n-\n\nMeasured \n\nW = (Vt+ - Vt-) \n\nlout = (lds+ - Ids -) \n\nVds= Vin \nvrer = const (2.SV) \n\nISO \n\n100 \n\noso \n\nVref \n\nlout \n(uA) \n\n000 \n\nYin \n\n-050 \n\n-100 \n\n-150  0.00 \n\n1.00 \n\n2.00 \n\n300 \n\n4.00 \n\nYin (V) \n\nFigure 5:  2-quadrant, 2-EEPROM  weighting function. \n\nW=3 \n\nW=2 \n\nW=I \n\nW~ \n\nW=-I \n\nW~2 \n\nW=-3 \n\ns.oo \n\nThe  first  of these  linear  regions  occurs  when  both  devices  are  active  and  neither \nis  saturated  (both  devices  modeled  by  equation  2).  In  this  case,  subtracting  I ds -\nfrom Ids+  cancels all nonlinearities and the differential is  exactly the  product of the \ninput value  (Vds)  and  the weight  (Vt- - Vt+),  with  a  scaling factor  of Kp: \n\nThe  other  linear  region  occurs  when  both  devices  are  saturated  (both  modeled \nby  equation  3).  All  nonlinearities  also  cancel  in  this  case,  but  there  is  an  offset \nremaining and  the scaling factor  is  modified: \n\n(4) \n\n\fCompact EEPROM-based Weight Functions \n\n1007 \n\nJ(p  (0.5Co\u00a3  ~~g + Cd)  Vds  (vt_  - vt+) + \nJ(p  (vt_  - vt+)  (0.5CoxC~ ~g + Cd  -\n\n(vt+  + vt_) ) \n\n(5) \n\nWe have fabricated structures of this  type and measured,  as  well as  simulated their \nfunction  characteristics.  Measured  data  again  agreed  with  our  model  (figure  5). \nNote  that  the  slope  in this  last  region  [scaling factor  of J(pCg/(0.5C'ox + C'g  + Cd)] \nwill be strictly less  that  in  the first  region  [scaling factor  J(p].  The  model indicates \nthat  one  way  to  minimize  this  difference  in  slopes  is  to  increase  the  size  of the \nparasitic  drain capacitance  (Cd)  relative to the  gate capacitance  (C'g). \n\n5  CONCLUSIONS \n\nWhile  EEPROM  devices  have  already  proven  useful  for  nonvolatile  analog  stor(cid:173)\nage,  we  have  discovered  and  characterized  novel  functional  characteristics  of the \nEEPROM device  which should make them useful  as  analog weighting functions.  A \nparasitic  drain-floating gate capacitance  has  been  included in  a  model  which accu(cid:173)\nrately  captures  this  behavior.  Several  compact  nonlinear  EEPROM-based  weight \nfunctions  have been  proposed,  including a single-device  one-quadrant function  and \na  more  linear  two-device two-quadrant function.  Problems such as  the  usability of \nnonlinear weighting functions, selection of optimal EEPROM device parameters and \npotential fanout  limitations of feeding the  input into a  low  impedance  node  (drain) \nmust all be resolved before  this technology can be  used for a full blown implementa(cid:173)\ntion.  Our model will be  helpful in  this work.  The approach of using inherent device \ncharacteristics  to build highly compact weighting functions  promises to greatly im(cid:173)\nprove  the  density  and  efficiency  of massively  parallel analog  computation such  as \nthat  performed  by neural  networks. \n\nAcknowledgements \n\nResearch  sponsored  by the  Air  Force  Office  of Scientific  Research  (AFSOR/JSEP) \nunder  Contract Number  F49620-90-C-0029. \n\nReferences \n\nM.  Holler,  et.  al.,  (1989)  \"An  Electrically  Trainable  Artificial  Neural  Network \n(ETANN)  with  10240  'Floating  Gate'  Synapses,\"  Proceedings  of  the  ICJNN-89, \nWashington  D.  C.,  1989. \n\nA.  Kramer,  et.  aI,  (1989)  \"EEPROM  Device  as  a  Reconfigurable  Analog  Element \nfor  Neural  Networks,\"  1989 IED}\\;[  Technical Digest,  Beaver Press,  Alexandria, VA, \nDec.  1989. \n\nC.  K.  Sin,  (1990)  EEPRO}\\;/  as  an  Analog Storage  Element,  Master's Thesis,  Dept. \nof EECS,  University of California at  Berkeley,  Berkeley,  CA,  Sept.  1990. \n\n\f", "award": [], "sourceid": 426, "authors": [{"given_name": "A.", "family_name": "Kramer", "institution": null}, {"given_name": "C.", "family_name": "Sin", "institution": null}, {"given_name": "R.", "family_name": "Chu", "institution": null}, {"given_name": "P.", "family_name": "Ko", "institution": null}]}