{"title": "Stagewise Processing in Error-correcting Codes and Image Restoration", "book": "Advances in Neural Information Processing Systems", "page_first": 343, "page_last": 349, "abstract": null, "full_text": "Stagewise processing in error-correcting \n\ncodes and image restoration \n\nDepartment of Physics, Hong Kong University of Science and Technology, \n\nClear Water Bay, Kowloon, Hong Kong \n\nK. Y. Michael Wong \n\nphkywong@ust.hk \n\nHidetoshi Nishimori \n\nDepartment of Physics, Tokyo Institute of Technology, \n\nOh-Okayama, Meguro-ku, Tokyo 152-8551, Japan \n\nnishi@stat.phys.titech.ac.jp \n\nAbstract \n\nWe introduce stagewise processing in error-correcting codes and \nimage restoration, by extracting information from the former stage \nand using it selectively to improve the performance of the latter \none. Both mean-field analysis using the cavity method and sim(cid:173)\nulations show that it has the advantage of being robust against \nuncertainties in hyperparameter estimation. \n\n1 \n\nIntroduction \n\nIn error-correcting codes [1] and image restoration [2], the choice of the so-called \nhyperparameters is an important factor in determining their performances. Hyper(cid:173)\nparameters refer to the coefficients weighing the biases and variances of the tasks. \nIn error correction, they determine the statistical significance given to the parity(cid:173)\nchecking terms and the received bits. Similarly in image restoration, they determine \nthe statistical weights given to the prior knowledge and the received data. It was \nshown, by the use of inequalities, that the choice of the hyperparameters is opti(cid:173)\nmal when there is a match between the source and model priors [3]. Furthermore, \nfrom the analytic solution of the infinite-range model and the Monte Carlo simula(cid:173)\ntion of finite-dimensional models, it was shown that an inappropriate choice of the \nhyperparameters can lead to a rapid degradation of the tasks. \nHyperparameter estimation is the subject of many studies such as the \"evidence \nframework\" [4]. However, if the prior models the source poorly, no hyperparameters \ncan be reliable [5]. Even if they can be estimated accurately through steady-state \nstatistical measurements, they may fluctuate when interfered by bursty noise sources \nin communication channels. Hence it is equally important to devise decoding or \nrestoration procedures which are robust against the uncertainties in hyperparameter \nestimation. \n\nHere we introduce selective freezing to increase the tolerance to uncertainties in hy-\n\n\fperparameter estimation. The technique has been studied for pattern reconstruc(cid:173)\ntion in neural networks, where it led to an improvement in the retrieval precision, \na widening of the basin of attraction, and a boost in the storage capacity [6]. The \nidea is best illustrated for bits or pixels with binary states \u00b11, though it can be eas(cid:173)\nily generalized to other cases. In a finite temperature thermodynamic process, the \nbinary variables keep moving under thermal agitation. Some of them have smaller \nthermal fluctuations than the others, implying that they are more certain to stay \nin one state than the other. This stability implies that they have a higher probabil(cid:173)\nity to stay in the correct state for error-correction or image restoration tasks, even \nwhen the hyperparameters are not optimally tuned. It may thus be interesting to \nseparate the thermodynamic process into two stages. In the first stage we select \nthose relatively stable bits or pixels whose time-averaged states have a magnitude \nexceeding a certain threshold. In the second stage we subsequently fix (or freeze) \nthem in the most probable thermodynamic states. Thus these selectively frozen \nbits or pixels are able to provide a more robust assistance to the less stable bits or \npixels in their search for the most probable states. \n\nThe two-stage thermodynamic process can be studied analytically in the mean-field \nmodel using the cavity method. For the more realistic cases of finite dimensions \nin image restoration, simulation results illustrate the relevance of the infinite-range \nmodel in providing qualitative guidance. Detailed theory of selective freezing is \npresented in [7]. \n\n2 Formulation \n\nConsider an information source which generates data represented by a set of Ising \nspins {~i}' where ~i = \u00b11 and i = 1\" .. ,N. The data is generated according to the \nsource prior Ps ( {~d ). For error-correcting codes transmitting unbiased messages, \nall sequences are equally probable and Ps({O) = 2-N. For images with smooth \nstructures, the prior consists of ferromagnetic Boltzmann factors, which increase \nthe tendencies of the neighboring spins to stay at the same spin states, that is, \n\nPs ( {~}) ex exp (~ 2: ~i~j) . \n\n(ij) \n\n(1) \n\nHere (ij) represents pairs of neighboring spins, z is the valency of each site. The \ndata is coded by constructing the codewords, which are the products of p spins \nJR ... ip = ~il ... ~ip for appropriately chosen sets of of indices {il' ... , ip}. Each spin \nmay appear in a number of p-spin codewords; the number of times of appearance \nis called the valency zp. For conventional image restoration, codewords with only \np = 1 are transmitted, corresponding to the pixels in the image. \nWhen the signal is transmitted through a noisy channel, the output consists of \nthe sets {Jh ... ip} and {Ti}, which are the corrupted versions of {JR\"'ip} and {~i} \nrespectively, and described by the output probability \n\nPout ({ J}, {T} I{ 0) ex exp (/h 2: Jil ... ip~il ... ~ip + (31' 2: Ti~i) . \n\n(2) \n\nAccording to Bayesian statistics, the posterior probability that the source sequence \nis {(T}, given the outputs {J} and {T}, takes the form \n\nP( {(T }I{J}, {T}) ex exp ((3J 2: Jil .. \u00b7ip(Til ... (Tip + (31' 2: TiO'i + ~ 2: (TiO'j) . \n\n(ij) \n\n(3) \n\n\fIf the receiver at the end of the noisy channel does not have precise information on \n(3J, (3r or (3s, and estimates them as (3, hand (3m respectively, then the ith bit of \nthe decoded/restored information is given by sgn(ai}, where \n\nThaie-H{u} \n(ai} = The-H{u} , \n\nand the Hamiltonian is given by \n\nH{a} = -(3\" J -\n\nL...J \n\n11 ---1p \n\n- a-\n11 \n\n- - - a -\n1p \n\n- h \" T.-a- -\n1 \n\nL...J 1 \n\n(3m \" a-a -\nL...J \n1 J-\nZ (ij) \n\n(4) \n\n(5) \n\nFor the two-stage process of selective freezing, the spins evolve thermodynamically \nas prescribed in Eq_ (4) during the first stage, and the thermal averages (ai} of \nthe spins are monitored_ Then we select those spins with l(ai}1 exceeding a given \nthreshold (), and freeze them in the second stage of the thermodynamics_ The \naverage of the spin ai in the second stage is then given by \n\n_ \n(ai} = \n\nThai IIj [6 ((aj}2 -\nTh IIj [6 ((aj}2 -\n\n()2) 8Uj ,sgn(uj) + 6 (()2 - (aj }2)] e-ii{u} \n, \n()2) 8uj ,sgn(uj) + 6 (()2 -\n(aj}2)] e-H{u} \n\n-\n\n(6) \n\nwhere 6 is the step function, iI {a} is the Hamiltonian for the second stage, and \nhas the same form as Eq. (5) in the first stage. One then regards sgn(ai} as the ith \nspin of the decoding/restoration process. \n\nThe most important quantity in selective freezing is the overlap of the decod(cid:173)\ned/restored bit sgn(ai} and the original bit ei averaged over the output probability \nand the spin distribution. This is given by \n\nMsf = LITj dJITj dTPS ({O)Pout ({J}, {T}I{O)eisgn(ai}. \n\n(7) \n\n~ \n\nFollowing [3], we can prove that selective freezing cannot outperform the single-stage \nprocess if the hyperparameters can be estimated precisely. However, the purpose \nof selective freezing is rather to provide a relatively stable performance when the \nhyperparameters cannot be estimated precisely. \n\n3 Modeling error-correcting codes \n\nLet us now suppose that the output of the transmission channel consists of only the \nset of p-spin interactions {Jh ---ip }. Then h = 0 in the Hamiltonian (5), and we set \n(3m = 0 for the case that all messages are equally probable. Analytical solutions are \navailable for the infinite-range model in which the exchange interactions are present \nfor all possible pairs of sites. Consider the noise model in which Jh ---ip is Gaussian \nwith mean p!joeh .. -eip/NP-1 and variance p!J2 /2NP-l. We can apply a gauge \ntransformation a i -+ a iei and Jh ---ip -+ Ji1 ---ip ei1 ... eip, and arrive at an equivalent \np-spin model with a ferromagnetic bias, where \n\nThe infinite-range model is exactly solvable using the cavity method [8]. The \nmethod uses a self-consistency argument to consider what happens when a spin \nis added or removed from the system. The central quantity in this method is the \n\n(8) \n\n\fcavity field, which is the local field of a spin when it is added to the system, assuming \nthat the exchange couplings act only one-way from the system to the new spin (but \nnot from the spin back to the system). Since the exchange couplings feeding the \nnew spin have no correlations with the system, the cavity field becomes a Gaussian \nvariable in the limit of large valency. \n\nThe thermal average of a spin, say spin 1, is given by \n\n(9) \nwhere h1 is the cavity field obeying a Gaussian distribution, whose mean and vari(cid:173)\nance are pjomp - 1 and pJ2qP-1 /2 respectively, where m and q are the magnetization \nand Edwards-Anderson order parameter respectively, given by \nm = N ~(Ui) and q = N ~(Ui) . \n\n_1\", \n\n_1\", \n\n(10) \n\n2 \n\ni \n\ni \n\nApplying self-consistently the cavity argument to all terms in Eq. (10), we can \nobtain self-consistent equations for m and q. \nNow we consider selective freezing. If we introduce a freezing threshold () so that \nall spins with (Ui)2 > (}2 are frozen, then the freezing fraction f is given by \n\n(11) \n\nThe thermal average of a dynamic spin in the second stage is related to the cavity \nfields in both stages, say, for spin 1, \n\n(0\"1) = tanh,B {h1 + ~(p - 1)J2rP-2Xtr tanh ,Bh1 } , \n\n(12) \n\nwhere 11,1 is the cavity field in the second stage, r is the order parameter describing \nthe spin correlations of the two thermodynamic stages: \n\nr == ~ L(Ui) {(ui)8 [(}2 -\n\n(Ui)2] + sgn(Ui)8 [(Ui)2 - (}2]} , \n\n(13) \n\ni \n\nXtr is the trans-susceptibility which describes the response of a spin in the second \nstage to variations of the cavity field in the first stage, namely \n\n_ 1 ' \" {)(Ui) \n\nXtr= N~~\u00b7 \n\ni \n\nt \n\n(14) \n\nThe cavity field 11,1 is a Gaussian variable. Its mean and variance are pjoinp - 1 \nand pJ2 ijP-1/2 respectively, where in and ij are the magnetization and Edwards(cid:173)\nAnderson order parameter respectively, given by \n\nin = ~ L \n\ni \n\n[8((}2 -\n\n(Ui)2)(Ui) + 8((Ui)2 - (}2)sgn(ui)] , \n\nij _ ~ L \n\n[8((}2 -\n\n(Ui)2)(Ui)2 + 8((Ui)2 - (}2)] . \n\ni \n\n(15) \n\n(16) \n\nFurthermore, the covariance between h1 and 11,1 is pJ2rP- 1/2, where r is given in \nEq. (13). Applying self-consistently the same cavity argument to all terms in Eqs. \n(15), (16), (13) and (14), we arrive at the self-consistent equations for in, ij, rand \nXtr. The performance of selective freezing is measured by \n\nMsf == ~ L \n\ni \n\n[8((}2 -\n\n(Ui)2)sgn(ui) + 8((Ui)2 - (}2)sgn(ui)] . \n\n(17) \n\n\f0.93 \n\n0.91 \n\n0.89 \n\n0.87 \n\n0.85 \n\n'2\" \n\n0.95 \n\n0.94 \n\n\u2022 '2 \n\n0.93 \n\nG-------- \" f=0.5 \n- -- \" f=0.7 \n.---... f=0.9 \no f=0.95 \n\n0.92 \n\n0.90 \n\n0.88 \n\n0.86 \n\n0.84 \n\n; \n~ \n\n0.82 \n\n0 \n\nil \nt ~ , \n\\ t \nl' \n\n3 \n\n4 \n\nTm \n\n7 \n\nFigure 2: (a) The performance of selective freezing with 2 components of Gaussian \nnoise at fJs = 1.05, II = 4/2 = 0.8, a1 = 5a2 = 1 and T1 = T2 = 0.3, The restoration \nagent operates at the optimal ratio fJm/h which assumes a single noise component \nwith the overall mean 0.84 and variance 0.4024. (b) Results of Monte Carlo sim(cid:173)\nulations for the overlaps of selective freezing compared with that of the one-stage \ndynamics for two-dimensional images generated at the source prior temperature \nTs = 2.15. \n\nSuppose the restoration agent operates at the optimal ratio of fJm/h which assumes \na single noise component. Then there will be a degradation of the quality of the \nrestored images. In the example in Fig. 2(a), the reduction of the overlap Msf \nfor selective freezing is much more modest than the one-stage process (j = 0). \nOther cases of interest, in which the restoration agent operates on other imprecise \nestimations, are discussed in [7]. All confirm the robustness of selective freezing. \nIt is interesting to study the more realistic case of two-dimensional images, since we \nhave so far presented analytical results for the mean-field model only. As confirmed \nby the results for Monte carlo simulations in Fig. 2(b), the overlaps of selective \nfreezing are much more steadier than that of the one-stage dynamics when the \ndecoding temperature changes. This steadiness is most remarkable for a freezing \nfraction of f = 0.9. \n\n5 Discussions \n\nWe have introduced a multistage technique for error-correcting codes and image \nrestoration, in which the information extracted from the former stage can be used \nselectively to improve the performance of the latter one. While the overlap Msf \nof the selective freezing is bounded by the optimal performance of the one-stage \ndynamics derived in [3], it has the advantage of being tolerant to uncertainties in \nhyperparameter estimation. This is confirmed by both analytical and simulational \nresults for mean-field and finite-dimensional models. Improvement is observed both \nabove and below the optimal decoding temperature, superseding the observations \nin [7]. As an example, we have illustrated its advantage of robustness when the \nnoise distribution is composed of more than one Gaussian components, such as in \nthe case of modern communication channels supporting multimedia applications. \n\nSelective freezing can be generalized to more than two stages, in which spins that \nremain relatively stable in one stage are progressively frozen in the following one. \n\n\fIt is expected that the performance can be even more robust. \n\nOn the other hand, we have a remark about the basic assumption of the cavity \nmethod, namely that the addition or removal of a spin causes a small change in \nthe system describable by a perturbative approach. In fact, adding or removing a \nspin may cause the thermal averages of other spins to change from below to above \nthe thresholds \u00b1& (or vice versa). This change, though often small, induces a non(cid:173)\nnegligible change of the thermal averages from fractional values to the frozen values \nof \u00b1I (or vice versa) in the second stage. The perturbative analysis of these changes \nis only approximate. The situation is reminiscent of similar instabilities in other \ndisordered systems such as the perceptron, and are equivalent to Almeida-Thouless \ninstabilities in the replica method [9]. A full treatment ofthe problem would require \nthe introduction of a rough energy landscape [9], or the replica symmetry breaking \nansatz in the replica method [8]. Nevertheless, previous experiences on disordered \nsystems showed that the corrections made by a more complete treatment may not \nbe too large in the ordered phase. For example, simulational results in Figs. I(b) \nare close to the corresponding analytical results in [7]. \n\nIn practical implementations of error-correcting codes, algorithms based on belief(cid:173)\npropagation methods are often employed [10]. \nsuch decoded messages converge to the solutions of the TAP equations in the corre(cid:173)\nsponding thermodynamic system [11]. Again, the performance of these algorithms \nare sensitive to the estimation of hyperparameters. We propose that the selective \nfreezing procedure has the potential to make these algorithms more robust. \n\nIt has recently been shown that \n\nAcknowledgments \n\nThis work was partially supported by the Research Grant Council of Hong Kong \n(HKUST6157/99P). \n\nReferences \n\n[1] R. J. McEliece, The Theory of Information and Coding, Encyclopedia of Mathematics \n\nand its Applications (Addison-Wesley, Reading, MA 1977). \n\n[2] S. Geman and D. Geman, IEEE Trans. PAMI 6, 721 (1984). \n[3] H. Nishimori and K. Y. M. Wong, Phys. Rev. E 60, 132 (1999). \n[4] D. J. C. Mackay, Neural Computation 4, 415 (1992). \n[5] J. M. Pryce and A. D. Bruce, J. Phys. A 28, 511 (1995). \n[6] K. Y. M. Wong, Europhys. Lett. 36, 631 (1996). \n[7] K. Y. M. Wong and H. Nishimori, submitted to Phys. Rev. E (2000). \n[8] M. Mezard, G. Parisi, and V.A. Virasoro, Spin Glass Theory and Beyond (World \n\nScientific, Singapore 1987). \n\n[9] K. Y. M. Wong, Advances in Neural Information Processing Systems 9, 302 (1997). \n[10] B. J. Frey, Graphical Models for Machine Learning and Digital Communication (MIT \n\nPress, 1998). \n\n[11] Y. Kabashima and D. Saad, Europhys. Lett. 44, 668 (1998). \n\n\f", "award": [], "sourceid": 1858, "authors": [{"given_name": "K. Y. Michael", "family_name": "Wong", "institution": null}, {"given_name": "Hidetoshi", "family_name": "Nishimori", "institution": null}]}