{"title": "Neural Network Implementation of Admission Control", "book": "Advances in Neural Information Processing Systems", "page_first": 493, "page_last": 499, "abstract": null, "full_text": "Neural Network Implementation of Admission Control \n\nRodolfo A. Milito, Isabelle Guyon, and Sara A. SoDa \n\nAT&T Bell Laboratories, Crawfords Corner Rd., Holmdel, NJ 07733 \n\nAbstract \n\nA feedforward layered network implements a mapping required to control an \nunknown stochastic nonlinear dynamical system. Training is based on a \nnovel approach that combines stochastic approximation ideas with back(cid:173)\npropagation. The method is applied to control admission into a queueing sys(cid:173)\ntem operating in a time-varying environment. \n\n1 INTRODUCTION \n\nA controller for a discrete-time dynamical system must provide, at time tn, a value un for \nthe control variable. Information about the state of the system when such decision is \nmade is available through the observable Yn' The value un is determined on the basis of \nthe current observation Yn and the preceding control action Un-I' Given the information \nIn = (Yn' Un-I), the controllerimplements a mapping In -+ Un. \nOpen-loop controllers suffice in static situations which require a single-valued control \npolicy U : a constant mapping In -+ u\u00b7, regardless of In. Closed-loop controllers pro(cid:173)\nvide a dynamic control action un, determined by the available information In. This work \naddresses the question of training a neural network to implement a general mapping \nIn -+ Un' \nThe problem that arises is the lack of training patterns: the appropriate value un for a \ngiven input In is not known. The quality of a given control policy can only be assessed by \nusing it to control the system, and monitoring system perfonnance. The sensitivity of the \nperformance to variations in the control policy cannot be investigated analytically, since \nthe system is unknown. We show that such sensitivity can be estimated within the stan(cid:173)\ndard framework of stochastic approximation. The usual back-propagation algorithm is \nused to determine the sensitivity of the output un to variations in the parameters W of the \nnetwork, which can thus be adjusted so as to improve system performance. \nThe advantage of a neural network as a closed-loop controller resides in its ability to \naccept inputs (In, In-I, ... , In-p )' The additional p time steps into the past provide infor(cid:173)\nmation about the history of the controlled system. As demonstrated here, neural network \ncontrollers can capture regularities in the structure of time-varying environments, and are \nparticularly powerful for tracking time variations driven by stationary stochastic \nprocesses. \n\n493 \n\n\f494 Milito, Guyon, and Solla \n\n2 CONTROL OF STOCHASTIC DYNAMICAL SYSTEMS \n\nConsider a dynamical system for which the state x,. is updated at discrete times \ntIl = n B. The control input u,. in effect at time tIl affects the dynamical evolution, and \n\n(2.1) \nHere (f;,,J is a stochastic process which models the intrinsic randomness of the system as \nwell as external, unmeasurable disturbances. The variable XII is not accessible to direct \nmeasurement, and knowledge about the state of the system is limited to the observable \n\nXII +1 = l(xII \u2022 ull \u2022 ~,.). \n\nYII = h(xII )\u00b7 \n\n(2.2) \n\nOur goal is to design a neural network controller which produces a specific value UII for \nthe control variable to be applied at time til' given the available information \nIII == (YII' UII-I)\u00b7 \nIn order to design a controller which implements the appropriate control policy III -+ UII , \na specification of the purpose of controlling the dynamical system is needed. There is \ntypically a function of the observable. \n\nwhich measures system perfonnance. It follows from Eqs. (2.1)-(2.3) \ncomposition G = H 0 hoi determines \n\n'II = H(YII)' \n\n(2.3) \n\nthat \n\nthe \n\n(2.4) \na function of the state X of the system, the control variable u. and the stochastic variable \n~. The quantity of interest is the expectation value of the system performance, \n\n<.III> = <H(y,.\u00bb~. \n\n(2.5) \n\naveraged with respect to ~. This expectation value can be estimated by the long-run \naverage \n\nsince for an ergodic system 'N -+ <.III> as N -+ 00. The goal of the controller is to \n\ngenerate a sequence {UII} , 1 ~ n ~ N of control values, such that the average \nperformance <.III> stabilizes to a desired value , \u2022. \n\nThe parameters W of the neural network are thus to be adapted so as to minimize a cost \nfunction \n\n(2.6) \n\n\fNeural Network Implementation of Admission Control \n\n495 \n\n\u2022 2 \nE(W) = 2\" (<.I,,> -J ) . \n\n1 \n\n(2.7) \n\nThe dependence of E(W) on W is implicit: the value of <.I,,> depends on the controlling \nsequence {U,,} , which depends on the parameters W of the neural network. \n\nOn-line training proceeds through a gradient descent update \n\ntowards the minimization of the instantaneous deviation \n\nW,,+1 = WIt -\" VwE,,(W), \n\n\u2022 2 \nE,,(W) = \"2 (J,,+1 - J ) . \n\n1 \n\n(2.8) \n\n(2.9) \n\nThere is no specified target for the output U\" that the controller is expected to provide in \nresponse to the input I\" = (y\", U,,-I). The output u\" can thus be considered as a variable \nu, which controls the subsequent performance: J,,+1 = G(x\", u, ~,,), as follows from \nEq. (2.4). Then \n\n(2.10) \n\nThe factor V w U measures the sensitivity of the output of the neural network controller to \nchanges in the internal parameters W: at fixed input I\", the output u\" is a function only \nof the network parameters W. The gradient of this scalar function is easily computed \nusing the standard back-propagation algorithm (Rumelhart et al, 1986). \nThe factor dG IdU measures the sensitivity of the system performance J,,+1 to changes in \nthe control variable. The information about the system needed to evaluate this derivative \nis not available: unknown are the functions fwhich describes how X,,+1 is affected by u\" \nat fixed x\"' and the function h which describes how this dependence propagates to the \nobservable Y,,+I. The algorithm is rendered operational through the use of stochastic \napproximation (Kushner, 1971): assuming that the average system performance <.I,,> is a \nmonotonically increasing function of u, the sign of the partial derivative d<.l,,>ldU is \npositive. Stochastic approximation amounts to neglecting the unknown fluctuations of \nthis derivative with u, and approximating it by a constant positive value, which is then \nabsorbed in a redefinition of the step size\" > O. \nThe on-line update rule then becomes: \n\nAs with stochastic approximation, the on-line gradient update uses the instantaneous \ngradient based on the current measurement J,,+I, rather than the gradient of the expected \n\n(2.11) \n\n\f496 Milito, Guyon, and Solla \n\nvalue <.I,,>, whose deviations with respect to the target J* are to be minimized. The \ncombined use of back-propagation and stochastic approximation to evaluate VwE,,(W), \nleading to the update rule of Eq. (2.11), provides a general and powerful learning rule for \nneural network controllers. The only requirement is that the average performance <.I\" > \nbe indeed a monotonic function of the control variable u. \n\nIn the following section we illustrate the application of the algorithm to an admission \ncontroller for a traffic queueing problem. The advantage of the neural network over a \nstandard stochastic approximation approach becomes apparent when the mapping which \nproduces u\" is used to track a time-varying environment generated by a stationary \nstochastic process. A straightforward extension of the approach discussed above is used \nto train a network to implement a mapping (I\", 1,,-1, ... , I,,-p) -+ u\", The additional p \ntime steps into the past provide information on the history of the controlled system, and \nallow the network to capture regularities in the time variations of the environment. \n\n3 A TWO-TRAFFIC QUEUEING PROBLEM \nConsider an admission controller for a queueing system. As depicted in Fig. 1, the system \nincludes a server, a queue, a call admission mechanism, and a controller. \n\n\"'!,ocal arri_v_a_ls _____ --. \n\nSERVER \n\na \n\nQUEUE \n\nadmissions \n\n------------~~ \n\n~----------re~j~ \n\n\" \n\nabandonments \n\ncontroller \n\nFigure 1: Admission controller for a two-traffic queuing problem. \n\n\fNeural Network Implementation of Admission Control \n\n497 \n\nThe need to serve two independent traffic streams with a single server arises often in \ntelecommunication networks. In a typical situation. in addition to remote arrivals which \ncan be monitored at the control node, there are local arrivals whose admission to the \nqueue can be neither monitored nor regulated. Within this limited information scenario. \nthe controller must execute a policy that meets specified performance objectives. Such is \nthe situation we now model. \nTwo streams are offered to the queueing system: remote traffic and local traffic. Both \nstreams are Poisson, Le., the interarrival times are independently and exponentially \ndistributed, with mean l/A. Calls originated by the remote stream can be controlled, by \ndenying admission to the queue. Local calls are neither controlled nor monitored. While \nthe arrival rate AR of remote calls is fixed. the rate AL(t) of local calls is time-varying. It \ndepends on th~ state of a stationary Markov chain to be described later (Kleinrock. 1975). \nThe service time required by a call of any type is an exponentially distributed random \nvariable. with mean I Ill. \nCalls that find an empty queue on arrival get immediately into service. Otherwise. they \nwait in queue. The service discipline is first in first out, non-idling. Every arrival is \nassigned a \"patience threshold\" 'to independently drawn from a fixed but unknown \ndistribution that characterizes customer behavior. If the waiting time in queue exceeds its \n\"patience threshold\". the call abandons. \nIdeally, every incoming call should be admitted. The server. however. cannot process. on \nload \nthe average. more \nP = [AR + AL(t)]/1l approaches or exceeds I, the queue starts to build up. Long \nqueues result in long delays. which in turn induce heavy abandonments. To keep the \nabandonments within tolerable limits. it becomes necessary to reject some remote \narrivals. \nThe call admission mechanism is implemented via a token-bank (not shown in the figure) \nrate control throttle (Berger. 1991). Tokens arrive at the token-bank at a deterministic \nrate AT. The token-bank is finite. and tokens that find a full bank are lost A token is \nneeded by a remote call to be admitted to the queue. and tokens are not reusable. Calls \nthat find an empty token bank are rejected. Remote admissions are thus controlled \nthrough U=ATIAR' \nLocal calls are always admitted. The local arrival rate AL(t) is controlled by an \nunderlying q-state Markov chain. a birth-death process (Kleinrock, 1975) with transition \nrate y only between neighboring states. When the Markov chain is in state i. I ~ i ~ q. \nthe local arrival rate is AL(i). \nComplete specification of the state xn of the system at time tn would require information \nabout number of arrivals. abandonments. and services for both remote and local traffic \nduring the preceding time interval of duration B = 1. as well as rejections for the \ncontrollable remote traffic, and waiting time for every queued call. But the local traffic is \nnot monitored. and information on arrivals and waiting times is not accessible. Thus Y n \nonly contains information about the remote traffic: the number nr of rejected calls. the \nnumber no of abandonments. and the number ns of serviced calls since tn-I' The \ninformation In available at time tn also inCludes the preceding control action Un-I' The \ncontroller uses (/n' In-I, ... , In-p ) to determine un. \n\nIl calls per unit \n\nthan \n\ntime. Whenever \n\nthe offered \n\n\f498 Milito, Guyon, and Solla \n\nThe goal of the control policy is to admit as many calls as possible. compatible with a \ntolerable rate of abandonment na I AR s;.1. The ratio na I AR thus plays the role of the \nperformance measure J\". and its target value is J* =.1. Values in excess of.1 imply an \nexcessive number of abandonments and require stricter admission control. Values smaller \nthan .1 are penalized if obtained at the expense of avoidable rejections. \n\n4 RESULTS \nAll simulations reported here correspond to a server capable of handling calls at a rate of \nJ.L = 200 per unit time. The remote traffic arrival rate is AR = 100. The local traffic \narrival rate is controlled by a q = 10 Markov chain with AL(i) = 20i for 1 s; i s; 10. \nThe offered load thus spans the range 0.6 s; p s; 1.5. in steps of 0.1. Transition rates \ny = 0.1. 1. and lOin the Markov chain have been used to simulate slow. moderate. and \nrapid variations in the offered load. \nThe neural network controller receives inputs (I,.. 1,.-1' ... , 1,.-4) at time t,. through 20 \ninput units. A hidden layer with 6 units transmits information to the single output unit. \nwhich provides u,.. The bound for tolerable abandonment rate is set at.1 = 0.1. \nTo check whether the neural network controller is capable of correct generalization. a \nnetwork trained under a time-varying scenario was subject to a static one for testing. \nTraining takes place under an offered load p varying at a rate of y = 1. The network is \ntested at y = 0: the underlying Markov chain is frozen and p is kept fixed for a long \nenough period to stabilize the control variable around a fixed value u *. and obtain \nstatistically meaningful values for nat n,. and ns' A careful numerical investigation of \nthese quantities as a function of p reveals that the neural network has developed an \nadequate control policy: light loads p s; 0.8 spontaneously result in low values of na and \nrequire no control (u = 1.25 guarantees ample token supply. and n, ::: 0). but as p \nexceeds 1. the system is controlled by decreasing the value of u below 1. thus increasing \nn, to satisfy the requirement na IAR s; .1. Detailed results of the static performance in \ncomparison with a standard stochastic approximation approach will be reported \nelsewhere. \nIt is in the tracking of a time-varying environment that the power of the neural network \ncontroller is revealed. A network trained under a varying offered load is tested \ndynamically by monitoring the distribution of abandonments and rejections as the \nnetwork controls an environment varying at the same rate y as used during training. The \nabandonment distribution Fa(x) = Prob {nalAR s; xj. shown in Fig. 2 (a) for y = 1. \nthat the neural network (NN) controller outperforms both stochastic \nindicates \napproximation} (SA) and the uncontrolled system (UN): the probability of keeping the \nabandonment rate n, I AR bounded is larger for the NN controller for all values of the \nbound x. As for the goal of not exceeding x =.1. it is achieved with probability \nFa(A) = 0.88 by the NN. in comparison to only Fa(.1) = 0.74 with SA or \nFa(A) = 0.51 if UN. The rejection distribution F,(x) = Prob {n,/AR s; xj. shown in \nFig. 2 (b) for y = 1. illustrates the stricter control policy provided by NN. Results for \ny = 0.1 and y = 10, not shown here. confirm the superiority of the control policy \n\nI Stochastic approximation with a fixed gain, to enable the controller to track time-varying environments. \nThe gain was optimized nwnerically. \n\n\fNeural Network Implementation of Admission Control \n\n499 \n\ndeveloped by the neural network. \n\nABAHDOHMEHTS DISTRIBUTIOH \n\nREJECTIOHS DISTRIBUTIOH \n\n.9 \n\n.8 \n\n. 7 \n\n.S \n\n.5 \n\n. 4 \n\n. 3 \n\n.2 \n\n.1 \n\n\u2022 \n\na \no \n\nnec.r1S I contro II er \n\nstochastic approxi~tion \n\nlI'lControll ad \n\n. 9 \n\n.8 \n\n.7 \n\n.S \n\n.5 \n\n.4 \n\n. 3 \n\n. 2 \n\n.1 \n\n\u2022 \n\na \no \n\nnelrlSl controller \n\nstochastic approxi~tion \n\nlI'lContro II ad \n\nO~~-+ __ ~~~~~+-~ __ - -4 \no \n\n.5.S \n\n.2 \n\n.3 \n\n.1 \n\n.S \n\n.8\n\n. 4 \n\nO~~-+ __ ~~-+~~~_~I~~ \no \n\n.3 .4 \n\n.S . 7 \n\n.9 \n\n.8 \n\n.1 \n\n.2 \n\nFigure 2: (a) Abandonment distribution Fa (X), and (b) rejection distribution Fr(x). \n\n5 CONCLUSIONS \nThe control of an unknown stochastic system requires a mapping that is implemented \nhere via a feedforward layered neural network. A novel learning rule, a blend of \nstochastic approximation and back-propagation, is proposed to overcome the lack of \ntraining patterns through the use of on-line performance information provided by the \nsystem under control. Satisfactorily tested for an admission control problem, the \napproach shows promise for a variety of applications to congestion control in \ntelecommunication networks. \n\nReferences \n\nA.W. Berger, \"Overload control using a rate control throttle: selecting token capacity for \nrobustness to arrival rates\", IEEE Transactions on Automatic Control 36, 216-219 \n(1991). \n\nH. Kushner, Stochastic Approximation Methods for Constrained and Unconstrained \nSystems, Springer Verlag (1971). \n\nL. Kleinrock, QUEUEING SYSTEMS Volwne I: Theory, John Wiley & Sons (1975). \n\nD.E. Rumelhart, G.E. Hinton, and RJ. Williams, \"Learning representations by back(cid:173)\npropagating errors\", Nature 323, 533-536 (1986). \n\n\f", "award": [], "sourceid": 327, "authors": [{"given_name": "Rodolfo", "family_name": "Milito", "institution": null}, {"given_name": "Isabelle", "family_name": "Guyon", "institution": null}, {"given_name": "Sara", "family_name": "Solla", "institution": null}]}