{"title": "A Trellis-Structured Neural Network", "book": "Neural Information Processing Systems", "page_first": 592, "page_last": 601, "abstract": null, "full_text": "592 \n\nA Trellis-Structured Neural Network* \n\nThomas Petschet and Bradley W. Dickinson \n\nPrinceton University, Department of Electrical Engineering \n\nPrinceton, N J 08544 \n\nAbstract \n\nWe have developed a neural network which consists of cooperatively inter(cid:173)\n\nconnected Grossberg on-center off-surround subnets and which can be used to \noptimize a function related to the log likelihood function for decoding convolu(cid:173)\ntional codes or more general FIR signal deconvolution problems. Connections in \nthe network are confined to neighboring subnets, and it is representative of the \ntypes of networks which lend themselves to VLSI implementation. Analytical and \nexperimental results for convergence and stability of the network have been found. \nThe structure of the network can be used for distributed representation of data \nitems while allowing for fault tolerance and replacement of faulty units. \n\n1 \n\nIntrod uction \n\nIn order to study the behavior of locally interconnected networks, we have focused \non a class of \"trellis-structured\" networks which are similar in structure to multilayer \nnetworks [5] but use symmetric connections and allow every neuron to be an output. \nWe are studying such\u00b7 locally interconnected neural networks because they have the \npotential to be of great practical interest. Globally interconnected networks, e.g., \nHopfield networks [3], are difficult to implement in VLSI because they require many \nlong wires. Locally connected networks, however, can be designed to use fewer and \nshorter wires. \n\nIn this paper, we will describe a subclass of trellis-structured networks which op(cid:173)\n\ntimize a function that, near the global minimum, has the form of the log likelihood \nfunction for decoding convolutional codes or more general finite impulse response sig(cid:173)\nnals. Convolutional codes, defined in section 2, provide an alternative representation \nscheme which can avoid the need for global connections. Our network, described in \nsection 3, can perform maximum likelihood sequence estimation of convolutional coded \nsequences in the presence of noise. The performance of the system is optimal for low \nerror rates. \n\nThe specific application for this network was inspired by a signal decomposition \nnetwork described by Hopfield and Tank [6]. However, in our network, there is an \nemphasis on local interconnections and a more complex neural model, the Grossberg \non-center off-surround network [2], is used. A modified form of the Gorssberg model \nis defined in section 4. Section 5 presents the main theoretical results of this paper. \nAlthough the deconvolution network is simply a set of cooperatively interconnected \n\n\u00b7Supported by the Office of N ava.l Research through grant N00014-83-K-0577 and by the National \n\nScience Foundation through grant ECS84-05460. \n\ntpermanent address: Siemens Corporate Research and Support, Inc., 105 College Road East, \n\nPrinceton, N J 08540. \n\n@ American Institute of Physics 1988 \n\n\f593 \n\non-center off-surround subnetworks, and absolute stability for the individual subnet(cid:173)\nworks has been proven [1], the cooperative interconnections between these subnets \nmake a similar proof difficult and unlikely. We have been able, however, to prove \nequiasymptotic stability in the Lyapunov sense for this network given that the gain \nof the nonlinearity in each neuron is large. Section 6 will describe simulations of the \nnetwork that were done to confirm the stability results. \n\n2 Convolutional Codes and MLSE \n\nIn an error correcting code, an input sequence is transformed from a b-dimensional \ninput space to an M -dimensional output space, where M ~ b for error correction \nand/ or detection. In general, for the b-bit input vector U = (U1, . \u2022\u2022 ,Ub) and the M(cid:173)\nbit output vector V = (VI, ... , VM), we can write V = F( U1, . . . ,Ub). A convolutional \ncode, however, is designed so that relatively short subsequences of the input vector \nare used to determine subsequences of the output vector. For example, for a rate 1/3 \nconvolutional code (where M ~ 3b), with input subsequences oflength 3, we can write \nthe output, V = (VI, ... , Vb) for Vi = (Vi,I, Vi,2, Vi,3), of the encoder as a convolution \nof the input vector U = (UI, ... , Ub, 0, 0) and three generator sequences \n\ngo = (11 1) \n\ngi = (1 1 0) \n\ng2 = (0 1 1). \n\nThis convolution can be written, using modulo-2 addition, as \n\nVi= \n\nk=max{I,i-2) \n\n(1) \n\nIn this example, each 3-bit output subsequence, Vi, of V depends only on three \nbits of the input vector, i.e., Vi = I( Ui-2, Ui-I, Ui). In general, for a rate l/n code, the \nconstraint length, K, is the number of bits of the input vector that uniquely determine \nIn the absence of noise, any subsequences in the \neach n-bit output subsequence. \ninput vector separated by more than K bits (i.e., that do not overlap) will produce \nsubsequences in the output vector that are independent of each other. \nIf we view a convolutional code as a special case of block coding, this rate 1/3, \nK = 3 code converts a b-bit input word into a codeword of length 3(b + 2) where \nthe 2 is added by introducing two zeros at the end of every input to \"zero-out\" the \ncode. Equivalently, the coder can be viewed as embedding 2b memories into a 23{b+2L \ndimensional space. The minimum distance between valid memories or codewords in \nthis space is the free distance of the code, which in this example is 7. This implies \nthat the code is able to correct a minimum of three errors in the received signal. \n\nFor a convolutional code with constraint length K, the encoder can be viewed as \na finite state machine whose state at time i is determined by the K - 1 input bits, \nUi-k, ... , Ui-I. The encoder can also be represented as a trellis graph such as the one \nshown in figure 1 for a K = 3, rate 1/3 code. In this example, since the constraint \nlength is three, the two bits Ui-2 and Ui-I determine which of four possible states the \nencoder is in at time i . In the trellis graph, there is a set of four nodes arranged in a \nvertical column, which we call a stage, for each time step i. Each node is labeled with \nthe associated values of Ui-2 and Ui-1. In general, for a rate l/n code, each stage of \nthe trellis graph contains 2K -1 nodes, representing an equal number of possible states. \nA trellis graph which contains S stages therefore fully describes the operation of the \nencoder for time steps 1 through S. The graph is read from left to right and the upper \nedge leaving the right side of a node in stage i is followed if Ui is a zero; the lower edge \n\n\f594 \n\nstage i-2 \n\nstage i-1 \n\nstage i \n\nstage i+1 \n\nstage i+2 \n\n&000 \n\n111 \n\n101 \n\n&-010 \n\nstate 1 \n\nstate 2 \n\nstate 3 \n\nstate 4 \n\nFigure 1: Part of the trellis-code representation for a rate 1/3, K = 3 convolutional \ncode. \n\nif Ui is a one. The label on the edge determined by Ui is Vi, the output of the encoder \ngiven by equation 1 for the subsequence Ui-2, Ui-I, Ui. \n\nDecoding a noisy sequence that is the output of a convolutional coder plus noise \nis typically done using a maximum likelihood sequence estimation (MLSE) decoder \nwhich is designed to accept as input a possibly noisy convolutional coded sequence, R, \nand produce as output the maximum likelihood estimate, V, of the original sequence, \nV. If the set of possible n(b+2)-bit encoder output vectors is {Xm : m = 1, ... , 2n(b+2)} \nand Xm,i is the ith n-bit subsequence of Xm and ri is the ith n-bit subsequence of R \nthen \n\n(2) \n\nV = argmax II P(ri I Xm,i) \n\nb \n\nXm i=l \n\nThat is, the decoder chooses the Xm that maximizes the conditional probability, given . \nX m , of the received sequence. \n\nA binary symmetric channel (BSC) is an often used transmission channel model in \nwhich the decoder produces output sequences formed from an alphabet containing two \nsymbols and it is assumed that the probability of either of the symbols being affected \nby noise so that the other symbol is received is the same for both symbols. In the \ncase of a BSC, the log of the conditional probability, P( ri I Xm,i), is a linear function \nof the Hamming distance between ri and Xm,i so that maximizing the right side of \nequation 2 is equivalent to choosing the Xm that has the most bits in common with \nR. Therefore, equation 2 can be rewritten as \n\n(3) \n\nwhere Xm,i,l is the lth bit of the ith subsequence of Xm and fa (b) is the indicator \nfunction: fa(b) = 1 if and only if a equals b. \n\nFor the general case, maximum likelihood sequence estimation is very expensive \n\nsince the number of possible input sequences is exponential in b. The Viterbi algo(cid:173)\nrithm [7], fortunately, is able to take advantage of the structure of convolutional codes \nand their trellis graph representations to reduce the complexity of the decoder so that \n\n\f595 \n\nit is only exponential in I( (in general K ~ b). An optimum version of the Viterbi al(cid:173)\ngorithm examines all b stages in the trellis graph, but a more practical and very nearly \noptimum version typically examines approximately 5K stages, beginning at stage i, \nbefore making a decision about Ui. \n\n3 A Network for MLSE Decoding \n\nThe structure of the network that we have defined strongly reflects the structure of a \ntrellis graph. The network usually consists of 5]( subnetworks, each containing 2K - 1 \nneurons. Each subnetwork corresponds to a stage in the trellis graph and each neuron \nto a state. Each stage is implemented as an \"on-center off-surround\" competitive \nnetwork [2], described in more detail in the next section, which produces as output a \ncontrast enhanced version of the input. This contrast enhancement creates a \"winner \ntake all\" situation in which, under normal circumstances, only one neuron in each \nstage -the neuron receiving the input with greatest magnitude - will be on. The \nactivation pattern of the network after it reaches equilibrium indicates the decoded \nsequence as a sequence of \"on\" neurons in the network. If the j-th neuron in subnet i, \nNi,i is on, then the node representing state j in stage i lies on the network's estimate \nof the most likely path. \n\nFor a rate lin code, there is a symmetric cooperative connection between neurons \nNi,j and Ni+1 ,k if there is an edge between the corresponding nodes in the trellis \ngraph. If (xi,i,k,l, . .. , Xi,j,k,n) are the encoder output bits for the transition between \nthese two nodes and (ri,!, ... , ri,n) are the received bits, then the connection weight \nfor the symmetric cooperative connection between Ni,i and Ni+1,k is \n\n1 n \n\nm \"\" k--\"I. (X \"\" k/) \n\n',J, -\n\nL.J ri I \n' \n\nn 1=1 \n\n',J\" \n\n(4) \n\nIf there is no edge between the nodes, then mi,i,k = o. \n\nIntuitively, it is easiest to understand the action of the entire network by exam(cid:173)\nining one stage. Consider the nodes in stage i of the trellis graph and assume that \nthe conditional probabilities of the nodes in stages i - 1 and i + 1 are known. (All \nprobabilities are conditional on the received sequence.) Then the conditional proba(cid:173)\nbility of each node in stage i is simply the sum of the probabilities of each node in \nstages i - 1 and i + 1 weighted by the conditional transition probabilities. If we look \nat stage i in the network, and let the outputs of the neighboring stages i - 1 and \ni + 1 be fixed with the output of each neuron corresponding to the \"likelihood\" of \nthe corresponding state at that stage, then the final outputs of the neurons M,i will \ncorrespond to the \"likelihood\" of each of the corresponding states. At equilibrium, the \nneuron corresponding to the most likely state will have the largest output. \n\n4 The Neural Model \n\nThe \"on-center off-surround\" network[2] is used to model each stage in our network. \nThis model allows the output of each neuron to take on a range of values, in this \ncase between zero and one, and is designed to support contrast enhancement and \ncompetition between neurons. The model also guarantees that the final output of \neach neuron is a function of the relative intensity of its input as a fraction of the total \ninput provided to the network. \n\n\f596 \n\nUsing the \"on-center off-surround\" model for each stage and the interconnection \n\nweights, mi,j,k, defined in equation 4, the differential equation that governs the in(cid:173)\nstantaneous activity of the neurons in our deconvolution network with S stages and \nN states in each stage can be written as \n\nUi,j = -Aui,j + (B - Ui,j) (f(Ui,j) + I)mi-I,k,jf(Ui-I,k) + mi,j,kf(Ui+1,k)]) \n\n- (C + Ui,j) L (f( Ui,k) + L[mi-I,k,t!( Ui-I,k) + mi,l,kf( Ui+1,k)]) \n\n(5) \n\nN \n\nklJ \n\n1=1 \n\nN \n\nk\"lj \n\nwhere f(x) = (1 + e-'\\X)-I, oX is the gain of the nonlinearity, and A, B, and Care \nconstants \n\nFor the analysis to be presented in section 5, we note that equation 5 can be \nrewritten more compactly in a notation that is similar to the equation for additive \nanalog neurons given in [4]: \n\nU' . -\n\n&,} -\n\n-\n\nwhere, for 1 ~ I ~ N, \n\nAu\" \n\nS N \n'\"' '\"'Cu ' \u00b7S\u00b7 . k 1!(Uk I) \n&,} - L..J L..J \nk=I/=I \n\n&,}, , \n\n&,} \n\n, - &,3\" \n\n1'. .. k 1!(Uk I)) \n\n, \n\n&,},&, \n\nS\u00b7 \" 1- 1 \nS \u00b7 \" 11 = ~m ' \n&,},&-, \n\n-\n\nl.J &-I\"q \nq \n\nI \n\nS .. \u00b7 1/- ~m' I \n&,q, \n\n&,},&+ , \n\n-\n\nl.J \nq \n\nSi,j,k,1 = 0 V k \u00a2 {i - 1, i, i + 1} \n\n&,},&, \n\n&,},&,} -\n\n1'. . . .. - B \n1'. .. . I = -C V I ~ J' \nTi j i-I I = Bmi_l I j - C E mi-l I q \n1'. ... II - Bm\"1 C ~ m' I \n&,q, \n\n&,3,&+ , \n\nq\u00a2j \n\n'\" \n\nI,), \n\n, , \n\nr \n\n\" \n\n-\n\n-\n\nl.J \nq\"lj \n\n(6) \n\n(7) \n\nTo eliminate the need for global interconnections within a stage, we can add two \n\nsumming elements to calculate \n\nN \n\nXi = L \n\nj=1 \n\nf(Xi,j) \n\nand \n\nN N \nJi = L L \n\nj=I k=I \n\n[mi-l,k,j!( Ui-l,k) + mi,j,kf( Ui+1,k)] \n\n(8) \n\nUsing these two sums allows us to rewrite equation 5 as \n\nU\u00b7 . = -Au' . + (B + C)(f(u\u00b7 .) + L .) - U\u00b7 \u00b7(X \u00b7 + J.) \n\n&,} \n\n&,} \n\n&,} \n\n&,) \n\n&,) \n\n& \n\n& \n\n(9) \n\nThis form provides a more compact design for the network that is particularly suited \nto implementation as a digital filter or for use in simulations since it greatly reduces \nthe calculations required, \n\n5 Stability of the Network \n\nThe end of section 3 described the desired operation of a single stage, given that the \noutputs of the neighboring stages are fixed. It is possible to show that in this situation \na single stage is stable. To do this, fix f( Uk,/) for k E {i - 1, i + 1} so that equation 6 \ncan be written in the form originally proposed by Grossberg [2]: \n\nUi,j = -Auj,j + (B - Uj,j) (Ii,j + f(Ui,j)) - (Ui,j + C)(L Ii,k + L !(Ui,k)) \n\n(10) \n\nN \n\nN \n\nk=1 \n\nk=1 \n\n\fwhere Ii,; = 2:1'=1 [mi-l,k,jf( Ui-l,k) + mi,j,kf( Ui+I,k)]. \n\nEquation 10 is a special case of the more general nonlinear system \n\nXi = ai(xi) (bi(Xi) - t Ci,kdk(Xk)) \n\nk=1 \n\n597 \n\n(11) \n\nwhere: (1) ai(xi) is continuous and ai(xd > 0 for Xi 2: OJ (2) bi(Xi) is continuous \nfor Xi ~ OJ (3) Ci,k = Ck,ij and (4) di(Xi) ~ 0 for all Xi E (-00,00). Cohen and \nGrossberg [1] showed that such a system has a global Lyapunov function: \n\n(12) \n\nand that, therefore, such a system is equiasymptotically stable for all constants and \nfunctions satisfying the four constraints above. In our case, this means that a single \nstage has the desired behavior when the neighboring stages are fixed. IT we take the \noutput of each neuron to correspond to the likelihood of the corresponding state then, \nif the two neighboring stages are fixed, stage i will converge to an equilibrium point \nwhere the neuron receiving the largest input will be on and the others will be off, just \nas it should according to section 2. \n\nIt does not seem possible to use the Cohen-Grossberg stability proof for the entire \nsystem in equation 5. In fact, Cohen and Grossberg note that networks which allow \ncooperative interactions define systems for which no stability proof exists [1]. \n\nSince an exact stability proof seems unlikely, we have instead shown that in the \nlimit as the gain, A, of the nonlinearity gets large the system is asymptotically stable. \nUsing the notation in [4], define Vi = f(Ui) and a normalized nonlinearity J(.) such \nthat J-l(Vi) = AUi. Then we can define an energy function for the deconvolution \nnetwork to be \n\n2 L 1,.1,k,l 1,.1 k,l L A \n1 \n1 ( \nE - --\n-\ni,j,k,l \n\nT:. V.\u00b7 \u00b7Vi \n\ni,j \n\n-\n\n-\n\n-A -\n\nL 1,.1,k,l k,l \nS \u00b7\u00b7 Vi \nk,l \n\n) ~VIc'1 -\n-\nf \n\n1 \n2 \n\n1 \nd \n(() ( \n\n(13) \n\nThe time derivative of E is \n. L dVii ( \n-\nE -\n\n-\n\ni,i \n\n- - ' -Au\u00b7\u00b7 - U\u00b7 . \n1,.1 \n\ndt \n\nI,) \n\nL \nk,l \n\n1,.1,k,l k,l + \nS\u00b7 \u00b7 Vi \n1 ~ (VIc ,1 -\n\nL \nk,l \n- \"X L.i Si,j,k,l } l. \n\nT\u00b7 \u00b7 Vi \n\n1,.1,k,l k,l \n\n1 \n\n) \nf- (()d( \n\nk,l \n\n2 \n\n(14) \n\nIt is difficult to prove that E is nonpositive because of the last term in the parentheses. \nHowever, for large gain, this term can be shown to have a negligible effect on the \nderivative. \nIt can be shown that for f(u) = (1 + C>'U)-I, Ii' l-I(()d( is bounded above \nby log(2). In this deconvolution network, there are no connections between neurons \nunless they are in the same or neighboring stages, i.e., Si,i,k,l = 0 for Ii - kl > 1 and \n1 is restricted so that 0 ~ 1 ~ S, so there are no more than 3S non-zero terms in the \nproblematical summation. Therefore, we can write that \n\n2 \n\n1 \n\nlim - , L Si,j,k,l \n>'-00 A k,l \n\n~VIc'1 _ \nt \n\nf-l(() d( = 0 \n\n\f598 \n\n. \n\nas \n\nThen, in the limit as ). -- 00, the terms in parentheses in equation 14 converge to Ui \n. \n\u00b7t thO \nli \nIII equatIOn 6, so t at m \nIS \n.\\-+00 \n\ne c am ru e, we can reWrl e \n\nth h\u00b7 \n\nE\u00b7 ~ dVi j. U\u00b7 \n\n= L..J - - ' Ui. \n\nsmg \n\nh \n\nI \n\n. . dt \nt,) \n\nE.n~J = - t= (d~J )'( d~/-l (V;J\u00bb) \n\nIt can also be shown that that, if 1(\u00b7) is a monotonically increasing function then \n~ I-I (Vi) > 0 for all Vi. This implies that for all u = (Ui,b . .\u2022 ,UN,S), lim>.-+oo E S; 0, \nand, therefore, for large gains, E as defined in equation 13 is a Lyapunov function for \nthe system described by equation 5 and the network is equiasymtotically stable. \n\nIf we apply a similar asymptotic argument to the energy function, equation 13 \n\nreduces to \n\n1 \n\nE -\n\n~ T: . k IV.\u00b7 . Vik I \n- - '2 L..J \n, \ni,j,k,1 \n\n',3\" \n\n1,3 \n\n(15) \n\nwhich is the Lyapunov function for a network of discontinuous on-off neurons with \ninterconnection matrix T. For the binary neuron case, it is fairly straight forward to \nshow that the energy function has minima at the desired decoder outputs if we assume \nthat only one neuron in each stage may be on and that Band C are appropriately \nchosen to favor this. However, since there are 0(52 N) terms in the disturbance \nsummation in equation 15, convergence in this case is not as fast as for the derivative \nof the energy function in equation 13, which has only 0(5) terms in the summation. \n\n6 Simulation Results \n\nThe simulations presented in this section are for the rate 1/3, K = 3 convolutional code \nillustrated in figure 1. Since this code has a constraint length of 3, there are 4 possible \nstates in each stage and an MLSE decoder would normally examine a minimum of \n5K subsequences before making a decision, we will use a total of 16 stages. In these \nsimulations, the first and last stage are fixed since we assume that we have prior \nknowledge or a decision about the first stage and zero knowledge about the last stage. \nThe transmitted codeword is assumed to be all zeros. \n\nThe simulation program reads the received sequence from standard input and uses \nit to define the interconnection matrix W according to equation 4. A relaxation \nsubroutine is then called to simulate the performance of the network according to an \nEuler discretization of equation 5. Unit time is then defined as one RC time constant of \nthe unforced system. All variables were defined to be single precision (32 bit) floating \npoint numbers. \nFigure 2a shows the evolution of the network over two unit time intervals with the \nsampling time T = 0.02 when the received codeword contains no noise. To interpret \nthe figure, recall that there are 16 stages of 4 neurons each. The output of each stage \nis a vertical set of 4 curves. The upper-left set is the output of the first stage; the \nupper-most curve is the output of the first neuron in the stage. For the first stage, \nthe first neuron has a fixed output of 1 and the other neurons have a fixed output of \no. The outputs of the neurons in the last stages are fixed at an intermediate value to \nrepresent zero a priori knowledge about these states. Notice that the network reaches \nan equilibrium point in which only the top neurons in each state (representing the \"00\" \nnode in figure 1) are on and all others are off. This case illustrates that the network \ncan correctly decode an unerrored input and that it does so rapidly, i.e., in about one \ntime constant. In this case, with no errors in the input, the network performs the \n\n\f599 \n\no \n\n2 0 \n\n2 0 \n( a) \n\n2 0 \n\n2 \n\no \n\n10 0 \n\n10 0 \n\n10 \n\n10 0 \n(b) \n\nFigure 2: Evolution of the trellis network for (a) unerrored input, (b) input with burst \nerrors: R is 000 000 000 000 000 000 000 000 111 000 000 000 000 000 000. A = 10., \nA = 1.0, B = 1.0, C = 0.75, T = 0.02. The initial conditions are XI,1 = 1., xI,.i = 0.0, \nX16,j = 0.2, all other Xi,j = 0.0. \n\nsame function as Hopfield and Tank's network and does so quite well. Although we \nhave not been able to prove it analytically, all our simulations support the conjecture \nthat if xi,AO) = ~ for all i and j then the network will always converge to the global \nminimum. \n\nOne of the more difficult decoding problems for this network is the correction of \na burst of errors in a transition subsequence. Figure 2b shows the evolution of the \nnetwork when three errors occur in the transition between stages 9 and 10. Note that \n10 unit time intervals are shown since complete convergence takes much longer than \nin the first example. However, the network has correctly decoded many of the stages \nfar from the burst error in a much shorter time. \n\nIf the received codeword contains scattered errors, the convolutional decoder should \nbe able to correct more than 3 errors. Such a case is shown in figure 3a in which the \nreceived codeword contains 7 errors. The system takes longest to converge around two \ntransitions, 5-6 and 11-12. The first is in the midst of consecutive subsequences which \neach have one bit errors and the second transition contains two errors. \n\nTo illustrate that the energy function shown in equation 13 is a good candidate \nfor a Lyapunov function for this network, it is plotted in figure 3b for the three cases \ndescribed above. The nonlinearity used in these simulations has a gain of ten, and, as \npredicted by the large gain limit, the energy decreases monotonically. \n\nTo more thoroughly explore the behavior of the network, the simulation program \nwas modified to test many possible error patterns. For one and two errors, the program \nexhaustively tested each possible error pattern. For three or more errors, the errors \nwere generated randomly. For four or more errors, only those errored sequences for \nwhich the MLS estimate was the sequence of all zeros were tested. The results of \nthis simulation are summarized in the column labeled \"two-nearest\" in figure 4. The \nperformance of the network is optimum if no more than 3 errors are present in the \nreceived sequence, however for four or more errors, the network fails to correctly decode \nsome sequences that the MLSE decoder can correctly decode. \n\n\f600 \n\n60 \n\n~~~~ 80 \n~~~~ 40 \n~~~~ 20 \n~~~E 0 \n\n-20 \n\nE \n\n0 . 0 \n\n0 \n\n2 0 \n\n2 0 \n\n2 \n\n2 0 \n(a) \n\nerrors \n\n0.5 \n\n1.0 \ntime \n\n(b) \n\n1.5 \n\n2.0 \n\nFigure 3: (a) Evolution of the trellis network for input with distributed errors. The \ninput, R, is 000 010 010 010 100 001 000 000 000 000 110 000 000 000 000. The \nconstants and initial conditions are the same as in figure 2. (b) The energy function \ndefined in equation 13 evaulated for the three simulations discussed. \n\nerrored \n\nnumber of \n\nnumber of errors \n\nbits \n\ntest vectors \n\ntvo-nearest \n\nfour-nearest \n\n0 \n1 \n2 \n3 \n4 \n5 \n6 \n7 \n\nTotal \n\n1 \n39 \n500 \n500 \n500 \n500 \n500 \n500 \n2500 \n\n0 \n0 \n0 \n0 \n7 \n33 \n72 \n132 \n244 \n\n0 \n0 \n0 \n0 \n0 \n20 \n68 \n103 \n191 \n\nFigure 4: Simulation results for a deconvolution network for a K = 3, rate 1/3 code. \nThe network parameters were: .x = 15, A = 6, B = 1, C = 0.45, and T = 0.025. \n\nFor locally interconnected networks, the major concern is the flow of information \nthrough the network. In the simulations presented until now, the neurons in each stage \nare connected only to neurons in neighboring stages. A modified form of the network \nwas also simulated in which the neurons in each stage are connected to the neurons \nin the four nearest neighboring stages. To implement this network, the subroutine to \ninitialize the connection weights was modified to assign a non-zero value to Wi,j,i+2,k. \nThis is straight-forward since, for a code with a constraint length of three, there is a \nsingle path connecting two nodes a distance two apart. \n\nThe results of this simulation are shown in the column labeled \"four-nearest\" in \nfigure 4. It is easy to see that the network with the extra connections performs better \n\n\f601 \n\nthan the previous network. Most of the errors made by the nearest neighbor network \noccur for inputs in which the received subsequences ri and ri+1 or ri+2 contain a total \nof four or more errors. It appears that the network with the additional connections \nis, in effect, able to communicate around subsequences containing errors that block \ncommunications for the two-nearest neighbor network. \n\n7 Summary and Conclusions \n\nWe have presented a locally interconnected network which minimizes a function that \nis analogous to the log likelihood function near the global minimum. The results of \nsimulations demonstrate that the network can successfully decode input sequences \ncontaining no noise at least as well as the globally connected Hopfield-Tank [6] de(cid:173)\ncomposition network. Simulations also strongly support the conjecture that in the \nnoiseless case, the network can be guaranteed to converge to the global minimum. In \naddition, for low error rates, the network can also decode noisy received sequences. \nWe have been able to apply the Cohen-Grossberg proof of the stability of \"on(cid:173)\n\ncenter off-surround\" networks to show that each stage will maximize the desired local \n\"likelihood\" for noisy received sequences. We have also shown that, in the large gain \nlimit, the network as a whole is stable and that the equilibrium points correspond to \nthe MLSE decoder output. Simulations have verified this proof of stability even for rel(cid:173)\natively small gains. Unfortunately, a proof of strict Lyapunov stability is very difficult, \nand may not be possible, because of the cooperative connections in the network. \n\nThis network demonstrates that it is possible to perform interesting functions even \n\nif only localized connections are allowed, although there may be some loss of perfor(cid:173)\nmance. If we view the network as an associative memory, a trellis structured network \nthat contains N S neurons can correctly recall 28 memories. Simulations of trellis net(cid:173)\nworks strongly suggest that it is possible to guarantee a non-zero minimum radius of \nattraction for all memories. We are currently investigating the use of trellis structured \nlayers in multilayer networks to explicitly provide the networks with the ability to \ntolerate errors and replace faulty neurons. \n\nReferences \n[1] M. Cohen and S. Grossberg, \"Absolute stability of global pattern formation and parallel \nmemory storage by competitive neural networks,\" IEEE Trans. Sys., Man, and Cyber., \nvol. 13, pp. 815-826, Sep.-Oct. 1983. \n\n[2] S. Grossberg, \"How does a brain build a cognitive code,\" in Studies of Mind and Brain, \n\npp. 1-52, D. Reidel Pub. Co., 1982. \n\n[3] J. Hopfield, \"Neural networks and physical systems with emergent collective computational \nabilities,\" Proceedings of the National Academy of Sciences USA, vol. 79, pp. 2554-2558, \n1982. \n\n[4] J. Hopfield, \"Neurons with graded response have collective computational properties like \nthose of two-state neurons,\" Proceeedings of the National Academy of Science, USA, vol. 81, \npp. 3088-3092, May 1984. \n\n[5] J. McClelland and D. Rumelhart, Parallel Distributed Processing, Vol. 1. The MIT Press, \n\n1986. \n\n[6] D. Tank and J. Hopfield, \"Simple 'neural' optimization networks: an AID converter, signal \ndecision circuit and a linear progra.mming circuit,\" IEEE Trans. on Circuits and Systems, \nvol. 33, pp. 533-541, May 1986. \n\n[7] A. Viterbi and J. Omura, Principles of Digital Communications and Coding. McGra.w-Hill, \n\n1979. \n\n\f", "award": [], "sourceid": 64, "authors": [{"given_name": "Thomas", "family_name": "Petsche", "institution": null}, {"given_name": "Bradley", "family_name": "Dickinson", "institution": null}]}