{"title": "Noisy Neural Networks and Generalizations", "book": "Advances in Neural Information Processing Systems", "page_first": 335, "page_last": 341, "abstract": null, "full_text": "Noisy Neural Networks  and \n\nGeneralizations \n\nHava T.  Siegelmann \n\nAlexander Roitershtein \n\nIndustrial Eng.  and Management, Mathematics \n\nTechnion - lIT \n\nHaifa 32000, Israel \n\niehava@ie.technion.ac.il \n\nMathematics \nTechnion - lIT \n\nHaifa 32000, Israel \n\nroiterst@math.technion.ac.il \n\nAsa Ben-Hur \n\nIndustrial Eng.  and Management \n\nTechnion - lIT \n\nHaifa 32000, Israel \nasa@tx.technion.ac. il \n\nAbstract \n\nIn this paper we  define  a  probabilistic computational model which \ngeneralizes many noisy neural network models, including the recent \nwork  of Maass and Sontag [5].  We identify weak  ergodicjty  as  the \nmechanism responsible for  restriction  of the computational power \nof  probabilistic  models  to  definite  languages,  independent  of the \ncharacteristics  of the noise:  whether  it is  discrete  or analog,  or if \nit  depends  on  the  input  or  not,  and  independent  of whether  the \nvariables  are  discrete  or continuous.  We  give examples of weakly \nergodic  models including  noisy  computational systems  with noise \ndepending on the current state and inputs, aggregate models,  and \ncomputational systems which update in continuous time. \n\n1 \n\nIntroduction \n\nNoisy neural networks were recently examined, e.g.  in.  [1,4, 5].  It was shown in [5] \nthat  Gaussian-like noise  reduces  the power  of analog recurrent  neural  networks  to \nthe  class of definite  languages, which  area strict subset  of regular  languages.  Let \nE be  an arbitrary alphabet.  LeE\u00b7 is called a  definite  language if for some integer \nr  any two words coinciding on the last r  symbols are either both in L  or neither in \nL.  The ability of a computational system to recognize  only  definite  languages can \nbe interpreted  as saying that the system forgets  all its input signals, except for  the \nmost recent  ones.  This property is reminiscent of human short term memory. \n\"Definite probabilistic computational models\"  have  their roots in  Rabin's pioneer(cid:173)\ning  work  on probabilistic  automata [9].  He  identified  a  condition  on probabilistic \nautomata with a finite  state space  which  restricts  them to definite  languages.  Paz \n[8]  generalized  Rabin's condition,  applying it to  automata with  a  countable state \nspace,  and  calling  it  weak  ergodicity  [7,  8]. \nIn  their  ground-breaking  paper  [5], \n\n\f336 \n\nH.  T.  Siegelmann, A.  Roitershtein and A.  Ben-Hur \n\nMaass  and  Sontag extended  the  principle  leading  to definite  languages  to a  finite \ninterconnection  of continuous-valued  neurons.  They  proved  that  in  the  presence \nof  \"analog noise\"  (e.g.  Gaussian),  recurrent  neural  networks  are  limited  in  their \ncomputational power  to definite  languages.  Under  a  different  noise  model,  Maass \nand  Orponen  [4]  and  Casey  [1]  showed  that  such  neural  networks  are  reduced  in \ntheir power  to regular languages. \nIn  this  paper  we  generalize  the  condition  of weak  ergodicity,  making  it  applica(cid:173)\nble to numerous probabilistic computational machines.  In our general probabilistic \nmodel,  the  state  space  can  be  arbitrary:  it  is  not  constrained  to  be  a  finite  or \ninfinite  set,  to  be  a  discrete  or  non-discrete  subset  of some  Euclidean  space,  or \neven  to  be  a  metric  or  topological space.  The input  alphabet  is  arbitrary  as  well \n(e.g.,  bits, rationals,  reals,  etc.).  The stochasticity  is  not  necessarily  defined  via a \ntransition probability function  (TPF)  as  in all the aforementioned probabilistic and \nnoisy models,  but through the more general  Markov  operators  acting  on  measures. \nOur  Markov  Computational  Systems  (MCS's)  include  as  special  cases  Rabin's ac(cid:173)\ntual probabilistic automata with cut-point  [9],  the  quasi-definite  automata by  Paz \n[8],  and  the  noisy  analog  neural  network  by  Maass  and  Sontag  [5].  Interestingly, \nour model also includes:  analog dynamical systems and neural models, which  have \nno underlying deterministic rule but rather update  probabilistic ally by  using finite \nmemory; neural  networks  with an  unbounded  number of components;  networks  of \nvariable dimension  (e.g.,  \"recruiting networks\");  hybrid systems  that combine dis(cid:173)\ncrete and continuous variables; stochastic cellular automata; and stochastic coupled \nmap lattices. \n\nWe  prove  that  all  weakly  ergodic  Markov  systems  are  stable,  i.e.  are  robust  with \nrespect  to architectural imprecisions and environmental noise.  This property  is de(cid:173)\nsirable for both biological and artificial neural networks.  This robustness was known \nup  to now  only  for  the  classical  discrete  probabilistic  automata [8,  9] .  To enable \npracticality and ease  in deciding weak  ergodicity for  given systems,  we  provide two \nconditions on  the transition probability functions  under which  the  associated com(cid:173)\nputational  system  becomes  weakly  ergodic.  One  condition  is  based  on  a  version \nof Doeblin's condition  [5]  while  the  second  is  motivated  by  the  theory  of scram(cid:173)\nbling matrices  [7,  8].  In  addition we  construct  various examples of weakly  ergodic \nsystems  which  include  synchronous  or  asynchronous  computational systems,  and \nhybrid continuous and discrete  time systems. \n\n2  Markov Computational System (MCS) \n\nInstead  of describing  various  types  of noisy  neural  network  models  or  stochastic \ndynamical systems we  define  a general abstract  probabilistic model.  When dealing \nwith systems containing inherent  elements  of uncertainty  (e.g.,  noise)  we  abandon \nthe study  of individual  trajectories  in favor  of an examination of the flow  of state \ndistributions.  The noise models we  consider are homogeneous in time, in that they \nmay depend  on the input,  but do  not depend  on time.  The  dynamics we  consider \nis  defined  by  operators  acting  in  the  space  of measures,  and  are  called  Markov \noperators  [6].  In  the  following  we  define  the concepts  which  are  required  for  such \nan approach. \n\nLet  E  be an arbitrary  alphabet and \u00b0 be  an abstract  state space.  We  assume that \na  O'-algebra  B  (not  necessarily  Borel  sets)  of subsets  of \u00b0 is  given, thus  (0, B)  is  a \n\nmeasurable  space.  Let  us  denote  by  P  the  set  of probability measures  on  (0, B). \nThis set  is called  a  distribution  space. \nLet E be  a space of finite  measures on  (0, B)  with the total variation norm defined \n\n\fNoisy Neural Networks and Generalizations \n\nby \n\nIlpllt = 11-'1(0) = sup I-'(A)  -\n\nAEB \n\ninf I-'(A). \nAEB \n\n337 \n\n(1) \n\nDenote  by  C the  set  of all  bounded  linear operators  acting from  \u00a3  to  itself.  The \n1I'lh- norm on \u00a3  induces  a  norm IIPlh = sUPJjE'P IIPI-'III  in C.  An  operator P  E  C \nis  said to be  a Markov  operator if for any probability measure I-'  E P, the image PI-' \nis  again a  probability measure.  For a  Markov  operator, IIPIII =  1. \nDefinition 2.1  A  Markov  system is a  set  of Markov operators T  = {Pu  :  u E E}. \nWith any  Markov system T,  one  can associate  a  probabilistic computational sys(cid:173)\ntem.  If the probability distribution  on  the initial states is  given  by  the  probability \nmeasure  Po,  then  the  distribution  of states  after  n  computational steps  on  inputs \nW  = Wo, WI, ... , W n ,  is defined  as  in  [5,  8] \n\nPwl-'o(A)  =  PWn  \u2022\u2022 \u2022 \u2022\u2022 Pw1Pwol-'0. \n\nLet  A  and R  be two subset  of P  with the  property of having a p-gap \n\ndist(A, R) = \n\ninf \n\nJjEA,IIE'R \n\nIII-' - viii =  P > 0 \n\n(2) \n\n(3) \n\nThe first  set  is called a  set  of accepting  distributions and  the second  is  called  a set \nof rejecting  distributions.  A  language  L  E  E*  is  said  to  be  recognized  by  Markov \ncomputational system M  =  (\u00a3, A, R, E, 1-'0, T)  if \n\nW  E  L {:::}  Pwl-'o  E A \nrt.  L, {:::}  PwPo  E R. \nW \n\nThis  model  of language  recognition  with  a  gap  between  accepting  and  rejecting \nspaces  agrees with Rabin's model of probabilistic automata with isolated cut-point \n[9]  and the model of analog probabilistic computation [4,  5]. \nAn example of a Markov system is  a system of operators defined  by  TPF on (0, B). \nLet Pu (x, A) be the probability of moving from a state x to the set of states A upon \nreceiving the input signal u E E.  The function  Pu(x,') is  a probability measure for \nall  x  E  0  and  PuC A)  is  a  measurable function  of x  for  any  A  E  B.  In  this case, \nPup(A)  are defined  by \n\n(4) \n\n3  Weakly Ergodic MCS \nLet  P  E  \u00a3,  be  a  Markov operator.  The  real  number J'(P)  = 1 - ! sUPJj,I/E'P IIPp -\nPvlll  is  called  the  ergodicity  coefficient  of  the  Markov  operator.  We  denote \nJ(P)  =  1 - J'(P) .  It can  be  proven  that  for  any  two  Markov  operators  P1 ,P2, \nJ(PI P2)  :S  J(Pt}J(P2)'  The  ergodicity coefficient  was  introduced  by  Dobrushin  [2] \nfor the particular case of Markov operators induced by TPF P (x, A).  In this special \ncase  J'(P)  = 1- SUPx,ySUPA  IP(x , A) - P(y,A)I \u00b7 \nWeakly ergodic systems were  introduced  and studied by  paz in the particular case \nof a  denumerable  state  space  0, where  Markov  operators  are  represented  by  infi(cid:173)\nnite  dimensional  matrices.  The  following  definition  makes  no  assumption  on  the \nassociated  measurable space. \n\nDefinition 3.1  A  Markov  system  {Pu ,  U  E E}  is called  weakly  ergodic  if for  any \na  > 0,  there  is  an  integer r =  r( a)  such  that for  any  W  E E~r and  any 1-', v  E P, \n\n1 \n\nJ(Pw)  =  \"2llPwl-'  - Pwvlh :S  a. \n\n(5) \n\n\f338 \n\nH.  T.  Siege/mann,  A.  ROitershtein and A.  Ben-Hur \n\nAn  MeS M  is  called  weakly  ergodic  if its  associated  Markov system {Pu ,  u  E  E} \nis weakly ergodic. \n\u2022 \n\nAn  MeS M  is  weakly  ergodic if and only if there  is  an integer  r  and  real  number \na  < 1,  such  that IlPwJ.l - Pwvlh  ::;  a  for  any  word  w  of length r.  Our most general \ncharacterization of weak  ergodicity is  as follows:  [11]: \n\nTheorem 1  An  abstract  MCS  M \na  multiplicative  operator's  norm  II  . 11**  on  C  equivalent  to  the  norm  II  . liB \nsUPP,:Ml=O}  1I1~11!1 , and such  that SUPUE~ I lPu lIu ::;  \u20ac \n\nfor  some  number \u20ac  < 1. \n\nis  weakly  ergodic  if  and  only  if there  exists \n\n\u2022 \n\nThe next theorem connects the computational power of weakly ergodic MeS's with \nthe class of definite languages, generalizing the results by  Rabin [9],  Paz [8,  p.  175], \nand  Maass  and Sontag [5]. \n\nTheorem 2  Let M  be  a  weakly ergodic  MCS.  If a  language  L  can  be  recognized  by \nM,  then  it is  definite. \n\u2022 \n\n4  The Stability Theorem of Weakly Ergodic MCS \n\nAn important issue for  any computational system is whether  the machine is robust \nwith  respect  to small perturbations  of the system's parameters or under some  ex(cid:173)\nternal noise.  The stability of language recognition  by weakly ergodic MeS's under \nperturbations of their Markov operators was previously considered by Rabin [9]  and \nPaz [7,8].  We next state a general version ofthe stability theorem that is applicable \nto our wide  notion of weakly ergodic systems. \n\nWe  first  define  two  MeS's M  and M  to be  similar if they share the same measur(cid:173)\nable  space  (0,8),  alphabet  E,  and sets  A  and 'fl,  and if they differ  only  by  their \nassociated  Markov operators. \n\nTheorem 3  Let  M  and  M  be  two  similar  MCS's  such  that  the  first  is  weakly \nergodic.  Then  there  is  a  >  0,  such  that  if IlPu  - 1\\lh  ::;  a  for  all  u  E  E,  then \nthe  second is  also  weakly ergodic.  Moreover,  these  two  MCS's recognize  exactly  the \nsame  class  of languages. \n\u2022 \n\nCorollary 3.1  Let  M  and  M  be  two  similar  MCS's.  Suppose  that  the  first  is \nweakly ergodic.  Then  there exists f3  > 0,  such  that ifsuPAEB IPu(x, A) -.Pu(x, A)I  ::; \nf3  for all u  E E, x  E 0,  the second is also  weakly ergodic.  Moreover,  these two MCS's \nrecognize  exactly  the  same  class  of languages. \n\u2022 \n\nA  mathematically deeper  result  which  implies Theorem 3 was  proven in [11]: \n\nTheorem 4  Let  M  and  M  be  two  similar  MCS's,  such  that  the  first  is  weakly \nergodic  and  the  second  is  arbitrary.  Then,  for  any a  > 0  there  exists  \u20ac  > 0  such \nthat IlPu -1\\lh ::;  \u20ac \nfor  all u  E  E  implies IIPw - .Pw11 1  ::;  a  for all words wE E* .\u2022 \n\nTheorem 3 follows from Theorem 4.  To see this, one can chose any a  < p in Theorem \n4  and  obser~ that IlPw - .Pwlh  ::;  a  < p  implies that the  word  w  is  accepted  or \nrejected  by M  in  accordance  to whether it is  accepted  or rejected  by  M. \n\n\fNoisy Neural Networks and Generalizations \n\n339 \n\n5  Conditions on the Transition Probabilities \n\nThis section  discusses  practical  conditions for  weakly  ergodic  MCS's  in  which  the \nMarkov  operators  Pu  are  induced  by  transition  probability  functions  as  in  (4). \nClearly,  a  simple  sufficient  condition  for  an  MCS  to  be  weakly  ergodic  is  given \nby  sUPUEE d(Pu )  ~ 1 - c,  for some c> o. \nMaass  and  Sontag  used  Doeblin's  condition  to  prove  the  computational power  of \nnoisy neural networks  [5].  Although the networks in [5]  constitute a very  particular \ncase  of weakly ergodic  MCS's,  Doeblin's condition is  applicable  also  to our general \nmodel.  The following  version  of Doeblin's condition was given  by  Doob  [3]: \n\nDefinition 5.1  [3]  Let P(x, A) be a TPF on (0,8).  We say that it satisfies Doeblin \ncondition,  D~, if there  exists  a  constant  c  and  a  probability measure  p  on  (0,8) \nsuch  that  pn(x,A) ~ cp(A)  for  any set  A  E 8. \n\u2022 \n\nIf an MCS  M  is  weakly ergodic,  then all  its associated TPF  Pw (x, A), wEE must \nsatisfy Do  for  some n  =  n(w).  Doop has proved [3,  p.  197]  that if P(x,A) satisfies \nDoeblin's  condition  D~ with  constant  c,  then  for  any  p, II  E  P,  IIPp  - Plliit  ~ \n(1  - c)llp - 11111,  i.e.,  d(P)  ~ 1- c.  This leads us  to the following definition. \n\nDefinition 5.2  Let  M  be an MCS.  We  say that the space  0  is  small with respect \nto  M  if there  exists  an  m  > 0  such  that  all  associated  TPF  P w (x, A),  w  E  Em \nsatisfy Doeblin's condition D~ uniformly with the same constant  c,  i.e., Pw (x, A) ~ \n\u2022 \ncpw (A),  wE Em. \n\nThe following theorem strengthens  the result  by  Maass  and Sontag [5]. \n\nTheorem 5  Let  M  be  an  MCS.  If the  space  0  is  small  with  respect  to  M,  then \nM \n\u2022 \n\nis  weakly  ergodic,  and  it can  recognize  only  definite  languages. \n\nThis theorem provides a convenient method for  checking weak ergodicity in  a given \nTPF. The theorem implies that it is sufficient to execute the following simple check: \nchoose  any  integer  n,  and  then  verify  that  for  every  state  x  and  all  input  strings \nwEEn,  the  \"absolutely  continuous\"  part  of  all  TPF  Pw, wEEn  is  uniformly \nbounded from below: \n\n(6) \nwhere  Pw(x, y)  is  the  density  of the  absolutely  continuous  component  of Pw(x,\u00b7) \nwith respect  to 'l/Jw,  and  C1,  C2  are  positive numbers. \nMost practical systems can be defined by null preserving TPF (including for example \nthe systems in [5]).  For these systems we provide (Theorem 6)  a sufficient and neces(cid:173)\nsary condition in terms of density kernels.  A TPF Pu(x, A),  u  E E is called null pre(cid:173)\nserving with respect  to a probability measure pEP if it has a density with respect \nto p  i.e.,  P(x,A) =  IAPu(x,z)p(dz). It is not hard to see,  that the property of null \npreserving per letter u  E  E  implies that all TPF Pw(x, A) of words w  E E*  are null \npreserving  as  well.  In this case  d(Pu)  =  1 - infx,y In min{pu(x, z),pu(y, z)}Pu(dz) \nand we  have: \n\nTheorem 6  Let  M  be  an  MCS  defined  by  null  preserving  transition  probability \nfunctions  Pu , u  E  E.  Then,  M \nis  weakly  ergodic  if and  only  if there  exists  n  such \nthat infwEE\" infx,y In min{pu(x, z),pu(y, z)}Pu(dz) > o. \n\u2022 \nA similar result was previously established by paz [7, 8] for the case of a denumerable \nstate  space  O.  This  theorem  allows  to  treat  examples  which  are  not  covered  by \n\n\f340 \n\nH  T.  Siegelmann,  A.  ROitershtein and A.  Ben-Hur \n\nTheorem 5.  For example, suppose that the space  0  is not small with respect  to an \nMCS M, but for some n  and any  wEEn  there exists  a measure 1/Jw  on  (0, B)  with \nthe property that for  any couple of states  x, yEO \n\n1/Jw  ({z  : min{pw(x, z),Pw(y, z)}  ~ cd) ~ C2 , \n\n(7) \nwhere Pw(x , y)  is  the density of Pw(x,\u00b7)  with respect  to 1/Jw,  and  Cl,C2  are  positive \nnumbers.  This condition may occur even ifthere is no  y such that Pu(x, y)  S;  Cl  for \nall  x  E  O. \n\n6  Examples of Weakly Ergodic Systems \n\n1.  The Synchronous Parallel Model \nLet  (Oi , Bi ), i  =  1,2, .. . , N  be  a collection  of measurable sets.  Define  ni = TIj #  nj \nand Hi  = TIj #  Bj.  Then  (ni , Bi)  are  measurable spaces.  Define  also  Ei  = E  x n i , \nand  11  = {Pxl,u (Xi , Ai)  :  (xi, u)  E  Ed  be  given  stochastic  kernels.  Each  set  11 \nTIi Oi, \ndefines  an  MCS  Mi.  We  can  define  an  aggregate  MCS  by  setting  n \nB = TIi Bi , S = TIi Si , R = TIi Ri,  and \n\n(8) \n\nThis  describes  a  model  of  N  noisy  computational  systems  that  update  in  syn(cid:173)\nchronous  parallelism.  The state of the whole  aggregate  is  a  vector  of states of the \nindividual components, and each receives  the states of all other components as part \nof its input. \n\nTheorem 7  [12] Let M  be  an  MCS defined  by  equation  (8).  It is  weakly ergodic  if \nat least one  set of operators T  is such  that <5(P~,xl) S;  1- C  for any u  E E,  xi E ni \nand some positive  number c. \n\u2022 \n\n2.  The Asynchronous Parallel Model \n\nIn this model, at every step only one component is  activated.  Suppose that a collec-\ntion of N  similar MCS 's  M i, i =  1, ... , N  is  given.  Consider a  probability measure \ne  = {fl,\" ., eN}  on  the  set  K  = {I, ... , N} .  Assume  that  in  each  computational \nstep  only  one  MCS  is  activated.  The  current  state  of the  whole  aggregate  is  rep(cid:173)\nresented  by  the state of its  active  component.  Assume also  that  the  probability of \na  computational system  Mi  to  be  activated,  is  time-independent  and  is  given  by \nProb(Md = ei.  The aggregate system is  then described  by  stochastic kernels \n\nN \n\nPu(x, A) =  LeiP~(x ,  A) . \n\ni=l \n\n(9) \n\nTheorem 8  [12] Let M  be  an  MCS  defined  by formula  (9).  It is  weakly ergodic  if \nat least  one  set of operators  {PJ} , ... , {Pt'}  is  weakly  ergodic. \n\u2022 \n\n3.  Hybrid Weakly Ergodic Systems \n\nWe  now  present  a  hybrid  weakly  ergodic computational system consisting  of both \ncontinuous  and  discrete  elements.  The  evolution  of the  system  is  governed  by  a \ndifferential  equation,  while  its  input  arrives  at  discrete  times.  Let  n =  ffin ,  and \nconsider  a collection  of differential equations \n\nXu(s)  =  1/Ju(xu(s)),  u E  E,  s E  [0,00). \n\n(10) \n\n\fNoisy Neural Networks and Generalizations \n\n341 \n\nSuppose that 1/Ju (x)  is sufficiently smooth to ensure the existence  and uniqueness of \nsolutions of Equation (10)  for s  E [0,1]  and for  any initial condition. \n\nConsider  a  computational system  which  receives  an  input  u(t)  at  discrete  times \nto, t l , t 2 ....  In  the  interval t  E  [ti, ti+d  the  behavior  of the  system  is  described  by \nEquation (10),  where s =  t-tj. A random initial condition for the time tn is defined \nby \n\n(11) \nwhere X u (t,,_d(l) is the state of the system after previously completed computations, \nand Pu (x, A) , u E E is a family of stochastic kernels on 0 x 8. This describes a system \nwhich  receives  inputs  in  discrete  instants  of time;  the  input  letters  u  E  E  cause \nrandom perturbations of the state  Xu (t-l)(I) governed by  the transition probability \nfunctions  pu(t)(xu(t-l), A).  In all  other times the system is  a  noise-free  continuous \ncomputational system which  evolves  according  to equation (10). \nLet 0  = IRn \n,  Xo  E 0  be a distinguished initial state, and let Sand R be two subsets \nof 0  with the property of having a  p-gap:  dist(S, R) = infxEs,YER Ilx - yll = p > O. \nThe  first  set  is  called  a  set  of  accepting  final  states  and  the  second  is  called  a \nset  of  reJ'ecting  final  states.  We  say  that  the  hybrid  computational  system  M  = \n(0, E, xo, 1/Ju, S, R)  recognizes  L  ~ E*  if for  all  w  =  WO  ... Wn  E  E*  and  the  end \nletter  $  tj.  E  the  following  holds:  W  E  L  \u00a2}  Prob(xw\"s(l)  E  S)  >  ~ + c,  and \nW  tj.  L \u00a2} Prob(xw\"s(l) E R)  > ~ + c. \n\nTheorem 9  [12}  Let M  be  a hybrid computational  system.  It  is  weakly  ergodic  if \nits set of evolution  operators T  = {Pu  :  u  E E}  is  weakly  ergodic. \n\u2022 \n\nReferences \n\n[1]  Casey,  M.,  The  Dynamics  of Discrete-Time  Computation,  With  Application  to  Re(cid:173)\n\ncurrent  Neural  Networks and Finite State Machine  Extraction,  Neural  Computation \n8,  1135-1178,  1996. \n\n[2]  Dobrushin,  R.  L.,  Central limit  theorem  for  nonstationary  Markov  chains \n\nTheor.  Probability Appl. vol.  1,  1956,  pp 65-80,  298-383. \n\nI, \n\nII. \n\n[3]  Doob J.  L.,  Stochastic Processes.  John Wiley  and Sons,  Inc.,  1953. \n[4]  W. Maass and Orponen, P., On the effect of analog noise in discrete time computation, \n\nNeural  Computation,  10(5),  1998,  pp.  1071-1095. \n\n[5]  W.  Maass and  Sontag,  E.,  Analog neural nets with  Gaussian  or other common noise \ndistribution  cannot  recognize  arbitrary  regular  languages,  Neural  Computation,  11, \n1999,  pp.  771-782. \n\n[6]  Neveu J.,  Mathematical Foundations of the  Calculus  of Probability.  Holden  Day,  San \n\nFrancisco,  1964. \n\n[7]  Paz A.,  Ergodic  theorems  for  infinite  probabilistic  tables.  Ann.  Math.  Statist.  vol. \n\n41,  1970,  pp.  539-550. \n\n[8]  Paz A.,  Introduction to  Probabilistic Automata.  Academic Press,  Inc.,  London,  1971. \n[9]  Rabin, M., Probabilistic automata,  Information and Control, vol 6, 1963,  pp. 230-245. \n[10]  Siegelmann  H.  T.,  Neural  Networks  and  Analog  Computation:  Beyond the  Turing \n\nLimit.  Birkhauser,  Boston,  1999. \n\n[11]  Siegelmann  H.  T .  and  Roitershtein  A.,  On weakly  ergodic  computational  systems, \n\n1999,  submitted. \n\n[12]  Siegelmann  H.  T.,  Roitershtein  A.,  and Ben-Hur,  A.,  On noisy  computational  sys(cid:173)\n\ntems,  1999,  Discrete Applied Mathematics, accepted. \n\n\f", "award": [], "sourceid": 1764, "authors": [{"given_name": "Hava", "family_name": "Siegelmann", "institution": null}, {"given_name": "Alexander", "family_name": "Roitershtein", "institution": null}, {"given_name": "Asa", "family_name": "Ben-Hur", "institution": null}]}