{"title": "Adaptive Caching by Refetching", "book": "Advances in Neural Information Processing Systems", "page_first": 1489, "page_last": 1496, "abstract": null, "full_text": "Adaptive Caching by Refetching\n\nRobert B. Gramacy , Manfred K. Warmuth, Scott A. Brandt, Ismail Ari\u0001\n\nDepartment of Computer Science, UCSC\n\nSanta Cruz, CA 95064\n\n\u0002 rbgramacy, manfred, scott, ari\n\n@cs.ucsc.edu\n\nAbstract\n\nWe are constructing caching policies that have 13-20% lower miss rates\nthan the best of twelve baseline policies over a large variety of request\nstreams. This represents an improvement of 49\u201363% over Least Recently\nUsed, the most commonly implemented policy. We achieve this not by\ndesigning a speci\ufb01c new policy but by using on-line Machine Learning\nalgorithms to dynamically shift between the standard policies based on\ntheir observed miss rates. A thorough experimental evaluation of our\ntechniques is given, as well as a discussion of what makes caching an\ninteresting on-line learning problem.\n\n1 Introduction\n\nCaching is ubiquitous in operating systems. It is useful whenever we have a small, fast main\nmemory and a larger, slower secondary memory. In \ufb01le system caching, the secondary\nmemory is a hard drive or a networked storage server while in web caching the secondary\nmemory is the Internet. The goal of caching is to keep within the smaller memory data\nobjects (\ufb01les, web pages, etc.) from the larger memory which are likely to be accessed\nagain in the near future. Since the future request stream is not generally known, heuristics,\ncalled caching policies, are used to decide which objects should be discarded as new objects\nare retained. More precisely, if a requested object already resides in the cache then we\ncall it a hit, corresponding to a low-latency data access. Otherwise, we call it a miss,\ncorresponding to a high-latency data access as the data must be fetched from the slower\nsecondary memory into the faster cache memory. In the case of a miss, room must be made\nin the cache memory for the new object. To accomplish this a caching policy discards from\nthe cache objects which it thinks will cause the fewest or least expensive future misses.\n\nIn this work we consider twelve baseline policies including seven common policies\n(RAND, FIFO, LIFO, LRU, MRU, LFU, and MFU), and \ufb01ve more recently devel-\noped and very successful policies (SIZE and GDS [CI97], GD* [JB00], GDSF and\n99]). These algorithms employ a variety of directly observable criteria\nLFUDA [ACD\nincluding recency of access, frequency of access, size of the objects, cost of fetching the\nobjects from secondary memory, and various combinations of these.\n\nThe primary dif\ufb01culty in selecting the best policy lies in the fact that each of these policies\nmay work well in different situations or at different times due to variations in workload,\n\n\u0005 Partial support from NSF grant CCR 9821087\n\u0006 Supported by Hewlett Packard Labs, Storage Technologies Department\n\n\u0003\n\u0004\n\fsystem architecture, request size, type of processing, CPU speed, relative speeds of the\ndifferent memories, load on the communication network, etc. Thus the dif\ufb01cult question\nis: In a given situation, which policy should govern the cache? For example, the request\nstream from disk accesses on a PC is quite different from the request stream produced by\nweb-proxy accesses via a browser, or that of a \ufb01le server on a local network. The relative\nperformance of the twelve policies vary greatly depending on the application. Furthermore,\nthe characteristics of a single request stream can vary temporally for a \ufb01xed application.\nFor example, a \ufb01le server can behave quite differently during the middle of the night while\nmaking tape archives in order to backup data, whereas during the day its purpose is to\nserve \ufb01le requests to and from other machines and/or users. Because of their differing\ndecision criteria, different policies perform better given different workload characteristics.\nThe request streams become even more dif\ufb01cult to characterize when there is a hierarchy\nor a network of caches handling a variety of \ufb01le-type requests. In these cases, choosing a\n\ufb01xed policy for each cache in advance is doomed to be sub-optimal.\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n \n \n \n \n \n\nlru\nfifo\nmru\nlifo\nsize\nlfu\nmfu\nrand\ngds\ngdsf\nlfuda\ngd\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n205000\n\n210000\n\n215000\n\n220000\n(a)\n\n225000\n\n230000\n\n235000\n\n205000 210000 215000 220000 225000 230000\n\n(b)\n\nLowest miss rate policy switches between SIZE, GDS, GDSF, and GD*\n\nLowest miss rate policy ... SIZE, GDS, GDSF, and GD*\n\nsize\ngds\ngdsf\ngd\n\n205000\n\n210000\n\n215000\n\n220000\n(c)\n\n225000\n\n230000\n\n235000\n\n \n \n \n \n \n \n\n205000\n\n210000\n\n215000\n(d)\n\n220000\n\n225000\n\n230000\n\nFigure 1: Miss rates (\nover 30,000 requests (\nthe policies with the lowest miss rates in the \ufb01gures above.\n\naxis)of a) the twelve \ufb01xed policies (calculated w.r.t. a window of 300 requests)\naxis), b) the same policies on a random permutation of the data set, c) and d)\n\nThe usual answer to the question of which policy to employ is either to select one that works\nwell on average, or to select one that provides the best performance on some past workload\nthat is believed to be representative. However, these strategies have two inherent costs.\nFirst, the selection (and perhaps tuning) of the single policy to be used in any given situation\nis done by hand and may be both dif\ufb01cult and error-prone, especially in complex system\narchitectures with unknown and/or time-varying workloads. And second, the performance\nof the chosen policy with the best expected average case performance may in fact be worse\nthan that achievable by another policy at any particular moment. Figure 1 (a) shows the hit\nrate of the twelve policies described above on a representative portion of one of our data\nsets (described below in Section 3) and Figure 1 (b) shows the hit rate of the same policies\non a random permutation of the request stream. As can be clearly be seen, the miss rates\non the permuted data set are quite different from those of the original data set, and it is this\ndifference that our algorithms aim to exploit. Figures 1 (c) and (d) show which policy is\nbest at each instant of time for the data segment and the permuted data segment. It is clear\nfrom these (representative) \ufb01gures that the best policy changes over time.\n\n\n\u0001\n\fBestShifting(\n\nstream into at most \nBestShifting(\n\nFixed is\nselected\npolicy with the lowest miss rate on the\nentire request stream for our twelve\npolicies.\n) considers\nall possible partitions of the request\nsegments along\nwith the best policy for each segment.\n) chooses the partition\nwith the lowest\ntotal miss rate over\nthe entire dataset and can be computed\nusing dynamic\nin time\nis the total\na bound on\nthe\nnumber of base-line policies. Figure 2\n\n\u0001\u0003\u0002\u0005\u0004\u0006\b\u0007\b\t\nnumber of requests,\n\nBF=SIZE\n\n%\n \ns\ne\n\nt\n\na\nr\ns\ns\nM\n\ni\n\n5\n\n.\n\n5\n\n0\n\n.\n\n5\n\n5\n\n.\n\n4\n\n0\n\n.\n\n4\n\nBest Fixed = SIZE\nBestShift(K)\nAll Virtual Caches\n\nAll VC\n\n0\n\n400\n\n600\n\nTo avoid the perils associated with trying to hand-pick a single policy, one would like to be\nable to automatically and dynamically select the best policy for any given situation. In other\nwords, one wants a cache replacement policy which is \u201cadaptive\u201d. In our Storage Systems\nResearch Group, we have identi\ufb01ed the need for such a solution in the context of complex\nnetwork architectures and time-varying workloads and suggested a preliminary framework\nin which a solution could operate [AAG\nar], but without giving speci\ufb01c algorithmic so-\nlutions to the adaptation problem. This paper presents speci\ufb01c algorithmic solutions that\naddress the need identi\ufb01ed in that work.\n\nIt is dif\ufb01cult to give a precise de\ufb01nition of \u201cadaptive\u201d when the data stream is continually\nchanging. We use the term \u201cadaptive\u201d only informally and when we want to be precise\nwe use off-line comparators to judge the performance of our on-line algorithms, as is\ncommonly done in on-line learning [LW94, CBFH\n97, KW97]. An on-line algorithm\nis called adaptive if it performs well when measured up against off-line comparators.\n). Best-\n\nIn this paper we use two off-line comparators: BestFixed and BestShifting(\n\nthe a posteriori\n\nWWk, BestShifting(K)\n\n200\n\n).\n\nK = Number of Shifts\n\nprogramming. Here \u0004\nFigure 2: Optimal of\ufb02ine comparators. AllVC \n\nthe number of segments, and \u0007\n\u000b\r\f\u000f\u000e\u0011\u0010\u0013\u0012\u0015\u0014 BestShifting(\u0016\nshows graphically each of the comparators mentioned above. Notice that BestFixed \u0017\nBestShifting(\u0018 ), and that most of the advantage of shifting policies occurs with relatively\nfew shifts (\u001a\u0019\u001c\u001b\u001e\u001d shifts in roughly 300,000 requests).\n\nRather than developing a new caching policy (well-plowed ground, to say the least), this\npaper uses a master policy to dynamically determine the success rate of all the other poli-\ncies and switch among them based on their relative performance on the current request\nstream. We show that with no additional fetches, the master policy works about as well as\nBestFixed. We de\ufb01ne a refetch as a fetch of a previously seen object that was favored by the\ncurrent policy but discarded from the real cache by a previously active policy. With refetch-\ning, it can outperform BestFixed. In particular, when all required objects are refetched\ninstantly, this policy has a 13-20% lower miss rate than BestFixed, and almost the same\n. For reference, when compared with LRU,\nthis policy has a 49-63% lower miss rate. Disregarding misses on objects never seen before\n(compulsory misses), the performance improvements are even greater.\n\nperformance as BestShifting(\n\n) for modest \n\nBecause refetches themselves potentially costly, it is important to note that they can be\ndone in the background. Our preliminary experiments show this to be both feasible and\neffective, capturing most of the advantage of instant refetching. A more detailed discussion\nof our results is given in Section 3\n\n\u0004\n\u0004\n\n\f2 The Master Policy\n\nWe seek to develop an on-line master policy that determines which of a set of base-\nline policies should govern the real cache at any time. Appropriate switch points need\nto be found and switches must be facilitated. Our key idea is \u201cvirtual caches\u201d. A vir-\ntual cache simulates the operation of\na single baseline policy. Each virtual\ncache records a few bytes of meta-\ndata about each object in its cache:\nID, size, and calculated priority. Ob-\nject data is only kept in the real\ncache, making the cost of maintain- Figure 3: Virtual caches embedded in the cache memory.\ning the virtual caches negligible1. Via the virtual caches, the master policy can observe the\nmiss rates of each policy on the actual request stream in order to determine their perfor-\nmance on the current workload.\n\nTo be fair, virtual caches reside in the memory space which could have been used to cache\nreal objects, as is illustrated in Figure 3. Thus, the space used by the real cache is reduced by\nthe space occupied by the virtual caches. We set the virtual size of each virtual cache equal\nto the size of the full cache. The caches used for computing the comparators BestFixed and\n\nBestShifting(\n\n) are based on caches of the full size.\n\nA simple heuristic the master policy can use to choose which caching policy should control\nat any given time is to continuously monitor the number of misses incurred by each policy\nin a past window of, for example, 300 requests (depicted in Figure 1 (a)). The master pol-\nicy then gives control of the real cache to the policy with the least misses in this window\n(shown in Figure 1 (c)). While this works well in practice, maintaining such a window for\nmany \ufb01xed policies is expensive, further reducing the space for the real cache. It is also\nfor each\nhard to tune the window size. A better master policy keeps just one weight\npolicy (non-negative and summing to one) which represents an estimate of its current rela-\ntive performance. The master policy is always governed by the policy with the maximum\nweight2.\nWeights are updated by using the combined loss and share updates of Herbster and War-\n97]\nmuth [HW98] and Bousquet and Warmuth [BW02] from the expert framework [CBFH\nfor on-line learning. Here the experts are the caching policies. This technique is preferred\nto the window-based master policy because it uses much less memory, and because the\nparameters of the weight updates are easier to tune than the window size. This also makes\nthe resulting master policy more robust (not shown).\n\n\u0002\u0001\n\n2.1 The Weight Updates\n\nUpdating the weight vector \u0002\u0003\u0005\u0004\u0007\u0006\t\b\n\b\t\b\u000b\u0006\f\r\u0004\u000f\u000e\n\nweights of all policies that missed the new request are multiplied by a factor \u0010\u0012\u0011\nthen renormalized. We call this the loss update. Since the weights are renormalized, they\nremain unchanged if all policies miss the new request. As noticed by Herbster and War-\nmuth [HW98], multiplicative updates drive the weights of poor experts to zero so quickly\nthat it becomes dif\ufb01cult for them to recover if their experts subsequently start doing well.\n\n\t after each trial is a two-part process. First, the\n\t and\n\n\u001d\u0013\u0006\n\n1As an additional optimization, we record the id and size of each object only once, regardless of\n\nthe number of virtual caches it appears in.\n\n2This can be sub-optimal in the worst case since it is always possible to construct a data stream\nwhere two policies switch back and forth after each request. However, real request streams appear\nto be divided into segments that favor one of the twelve policies for a substantial number of requests\n(see Figure 1).\n\n\u0004\n\u0002\n\u0018\n\fTherefore, the second share update prevents the weights of experts that did well in the\npast from becoming too small, allowing them to recover quickly, as shown in Figure 4.\nFigure 1(a) shows the current absolute performance of the policies in a rolling window\n\n\u0001\u0003\u0002\n\n\u001d\u001e\u001d ), whereas Figure 4 depicts relative performance and shows how the policies\n\ncompete over time. (Recall that the policy with the highest weight always controls the real\ncache).\n\n(\n\nWeight History for Individual Policies\n\nThere are a number of share up-\ndates [HW98, BW02] with various\nrecovery properties. We chose the\nFIXED SHARE TO UNIFORM PAST\n(FSUP) update because of its simplic-\nity and ef\ufb01ciency. Note that the loss\nbounds proven in the expert frame-\nwork for the combined loss and share\nupdate do not apply in this context.\nThis is because we use the mixture\nweights only to select the best policy.\nHowever, our experimental results\nsuggest\nthat we are exploiting the\nrecovery properties of the combined update that are discussed extensively by Bousquet\nand Warmuth [BW02].\n\nFigure 4: Weights of baseline policies.\n\nlru\nfifo\nmru\nlifo\nsize\nlfu\nmfu\nrand\ngds\ngdsf\nlfuda\ngd\n\nRequests Over Time\n\nW\nP\nU\nS\nF\n\n210000\n\n225000\n\n235000\n\n220000\n\nt\nh\ng\ni\ne\n\n \n\n215000\n\n230000\n\n205000\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nwhere \u0010\nand 0 otherwise. The initial distribution is uniform, i.e.\n. The Fixed-Share to\nUniform Past update mixes the current weight vector with the past average weight vector\n\n\u0010 miss\n\f\u000b\n\n\u0007\t\b\nis 1 if the \u0004 -th object is missed by policy \u0014\n\n\u0006\t\b\t\b\n\b\n\nfor\u0014\u0015\u0001\n\u0018\u0017\u0016\n\n\u0007\t\b\n\nFormally, for each trial \u0004 , the loss update is\n\u0004\u000f\u0001\n\u0001\u0013\u0012\n\t and miss\u0007\t\b\n\u0016\u001e\u0004 , which is easy to maintain:\n\u0018\u000f\u001f! \u0013\t\n\n\u0010 miss\n\f\u000b\n\u0007\t\b\n\u0006\u0005\nis a parameter in \u0002\n\u0004\u001d\u001c\nis a parameter in \u0002\n\u0002&% and !\u0001\n\n\u0001\u001a\u0019\nwhere \nThe higher the \n\n\u001d\u0013\u0006\n\u001b .\n\n2.2 Demand vs. Instantaneous Rollover\n\nused \u0010\n\nits corresponding policy starts incurring more misses than other policies with high weights.\nthe more quickly past good policies will recover. In our experiments we\n\n\t . A small \u0010 parameter causes high weight to decay quickly if\n\n\u0007#\"\n\n\u0007\t$\n\nWhen space is needed to cache a new request, the master policy discards objects not present\nin the governing policy\u2019s virtual cache 3. This causes the content of the real cache to \u201croll\nover\u201d to the content of the current governing virtual cache. We call this demand rollover\nbecause objects in the governing virtual cache are refetched into the real cache on demand.\nWhile this master policy works almost as well as BestFixed, we were not satis\ufb01ed and\non the number\nof segments). We noticed that the content of the real cache lagged behind the content of\nthe governing virtual cache and had more misses, and conjectured that \u201dquicker\u201d rollover\nstrategies would improve overall performance.\n\nwanted to do as well as BestShifting(\n\n) (for a reasonably large bound \n\nOur search for a better master policy began by considering an extreme and unrealistic\nrollover strategy that assures no lag time: After each switch instantaneously refetch all\n3We update the virtual caches before the real cache, so there are always objects in the real cache\nthat are not in the governing virtual cache when the master policy goes to \ufb01nd space for a new request.\n\n\u0001\n\u0001\n\n\u0001\n\n\u000e\n\u0007\n\u0004\n\u0004\n\u0006\n\u000e\n\u0007\n\u0004\n\u0010\n\u0011\n\u0004\n\n\u0001\n\n\u0006\n\u0018\n\u0006\n\u0007\n\u0006\n\u001d\n\u0006\n\u0018\n\u0001\n\n\u0004\n\b\n\u0001\n\u0001\n\u0007\n\u0018\n\u0007\n\u0007\n\u001b\n\u0012\n\u0005\n\u001b\n\u001c\n\u0007\n\u0004\n\u0004\n\u0001\n\u0002\n\u001c\n\u0005\n \n\u0018\n\u0004\n\u0006\n\u0018\n\u0001\n\b\n\b\n\u001d\n\u001d\n\fthe objects in the new governing virtual cache that were not retained in the real cache.\nWe call this refetching policy instantaneous rollover. By appropriate tuning of the update\nthe number of instantaneous rollovers can be kept reasonably small and\n\nparameters \u0010 and \nthe miss rates of our master policy are almost as good as BestShifting(\n) for  much larger\nthan the actual number of shifts used on-line. Note that the comparator BestShifting(\n\n)\nis also not penalized for its instantaneous rollovers. While this makes sense for de\ufb01ning a\ncomparator, we now give more realistic rollover strategies that reduce the lag time.\n\n2.3 Background Rollover\n\nBecause instantaneous rollover immediately refetches everything in the governing virtual\ncache that is not already in the real cache, it may cause a large number of refetches even\nwhen the number of policy switches is kept small. If all refetches are counted as misses,\nthen the miss rate of such a master policy is comparable to that of BestFixed. The same\nholds for BestShifting. However, from a user perspective, refetching is advantageous be-\ncause of the latency advantage gained by having required objects in memory before they\nare needed. And from a system perspective, refetches can be \u201cfree\u201d if they are done when\nthe system is idle. To take advantage of these \u201cfree\u201d refetches, we introduce the concept\nof background rollover. The exact criteria for when to refetch each missing object will\ndepend heavily on the system, workload, and expected cost and bene\ufb01t of each object. To\ncharacterize the performance of background rollover without addressing these architectural\ndetails, the following background refetching strategies were examined: 1 refetch for every\ncache miss; 1 for every hit; 1 for every request; 2 for every request; 1 for every hit and 5 for\nevery miss, etc. Each background technique gave fewer misses than BestFixed, approach-\ning and nearly matching the performance obtained by the master policy using instantaneous\nrollover. Of course, techniques which reduce the number of policy switches (by tuning \u0010\n\nand ) also reduce the number of refetches. Figure 5 compares the performance of each\n\nmaster policy with that of BestFixed and shows that the three master policies almost always\noutperform BestFixed.\n\ne\nt\na\nR\n \ns\ns\ni\n\nM\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n-0.1\n\nMiss Rate Differences\n\nbestF - demd\nbestF - back\nbestF - inst\n\n205000\n\n210000\n\n215000\n\n220000\n\n225000\n\n230000\n\nRequests Over Time\n\nis BestFixed. Deviations from the baseline\n\nFigure 5: BestFixed - P, where P\nbaseline\nour on-line shifting policies differ in miss rate. Above (Below)\nmisses than BestFixed.\n3 Data and Results\n\n\u0006\u0005\b\u0007\n\n\u0006\u0005\n\n\t\u0005\n\n\u0002\u0001\n\nInstantaneous, Demand, and Background Rollover 2\n\n. The\nshow how the performance of\ncorresponds to fewer (more)\n\nFigure 6 shows how the master policy with instantaneous rollover (labeled \u2019roll\u2019) \u201ctracks\u201d\nthe baseline policy with the lowest miss rate over the representative data segment used in\nprevious \ufb01gures. Figure 7 shows the performance of our master policies with respect to\n), and LRU. It shows that demand rollover does slightly worse\nthan BestFixed, while background 1 (1 refetch every request) and background 2 (1 refetch\n\nBestFixed, BestShifting(\n\n\u0003\n\u0004\n\u0001\n\u0001\n\u0001\n\fevery hit and 5 every miss) do better than BestFixed and almost as well as instantaneous,\nwhich itself does almost as well as BestShifting. All of the policies do signi\ufb01cantly better\nthan LRU. Discounting the compulsory misses, our best policies have\n1/3 fewer \u201creal\u201d\nmisses than BestFixed and\n\n1/2 the \u201creal\u201d misses of LRU.\n\nFigure 8 summarizes the performance of our algorithms over three large datasets. These\nwere gathered using Carnegie Mellon University\u2019s DFSTrace system [MS96] and had du-\nrations ranging from a single day to over a year. The traces we used represent a variety of\nworkloads including a personal workstation (Work-Week), a single user (User-Month), and\na remote storage system with a large number of clients, \ufb01ltered by LRU on the clients\u2019 local\ncaches (Server-Month-LRU). For each data set, the table shows the number of requests, %\nof requests skipped (size \u0001\ncache size), number of compulsory misses of objects not previ-\nously seen, and the number of rollovers. For each policy (including BestShifting(\n)), the\ntable shows miss rate, and % improvement over BestFixed (labeled \u2019\nBF\u2019) and LRU. In\neach case all 12 virtual caches consumed on average less than 2% of the real cache space.\nWe \ufb01xed \u0010\n)\nis never penalized for rollovers.\n\nfor all experiments. As already mentioned, BestShifting(\n\n\u0002\u0004\u0003\n\n% , #\u0001\n\n\u001d\u001e\u001d\n\nMiss Rates under FSUP with Master\n\nlru\nfifo\nmru\nlifo\nsize\nlfu\nmfu\nrand\ngds\ngdsf\nlfuda\ngd\nroll\n\ns\ne\nt\na\nR\n \ns\ns\ni\n\nM\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n205000\n\n210000\n\n215000\n\n220000\n\n225000\n\n230000\n\n235000\n\nRequests Over Time\n\nFigure 6: \u201cTracking\u201d the best policy.\n\nWWk Master and Comparator Missrates\n\n9\n\n8\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n%\n \ns\ne\nt\na\nr\ns\ns\nM\n\ni\n\nLRU\n\nDemand\n\nBF=SIZE\n\nBackground 1\nBackground 2\nInstantaneous\n\nK = 76\n\nLRU\nBest Fixed = SIZE\nBestShift(K)\nAll Virtual Caches\nCompulsory Missrate\n\nAll VC\n\nWorks\nWeek\n\n138k\n900KB\n6.5%\n0.020\n\n88\n\nDataset\nUser\nMonth\n\n382k\n2MB\n12.8%\n0.015\n485\n\nServer\nMonth\nLRU\n48k\n4MB\n15.7%\n0.152\n\n93\n\n0.088\n\n0.076\n\n0.450\n\nSIZE\n0.055\n36.8%\n\n0.061\n-9.6%\n30.9%\n\n0.053\n5.1%\n40.1%\n\n0.047\n15.4%\n46.6%\n\n0.044\n19.7%\n49.2%\n\n0.042\n23.6%\n52.2%\n\nGDS\n0.075\n54.7%\n\n0.076\n-0.5%\n54.4%\n\n0.068\n9.8%\n59.4%\n\n0.067\n11.9%\n60.1%\n\n0.065\n13.4%\n60.8%\n\n0.039\n48.0%\n48.7%\n\nGDSF\n0.399\n54.2%\n\n0.450\n-12.8%\n48.5%\n\n0.401\n-0.7%\n55.5%\n\n0.349\n12.4%\n60.3%\n\n0.322\n19.3%\n63%\n\n0.312\n21.8%\n30.1%\n\n#Requests\nCache size\n%Skipped\n# Compuls\n\n# Shifts\nLRU\n\nMiss Rate\nBestFixed\n\nPolicy\n\nMiss Rate\n%\nLRU\nDemand\nMiss Rate\nBestF\n%\n%\nLRU\n\nBackgrnd 1\nMiss Rate\nBestF\n%\n%\nLRU\n\nBackgrnd 2\nMiss Rate\n%\nBestF\n%\nLRU\nInstant\nMiss Rate\nBestF\n%\n%\nLRU\n\nBestShifting\n\nMiss Rate\n%\nBestF\nLRU\n%\n\n0\n\n200\n\n400\n\n600\n\n800\n\nK = Number of Shifts\n\nFigure 8: Performance Summary.\n\nFigure 7: Online shifting policies against of\ufb02ine com-\nparators and LRU for Work-Week dataset.\n\n4 Conclusion\n\nOperating systems have many hidden parameter tweaking problems which are ideal appli-\ncations for on-line Machine Learning algorithms. These parameters are often set to values\n\n\n\n\u0001\n\b\n\u0002\n\b\n\u001b\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\u0005\n\fwhich provide good average case performance on a test workload. For example, we have\nidenti\ufb01ed candidate parameters in device management, \ufb01le systems, and network proto-\ncols. Previously the on-line algorithms for predicting as well as the best shifting expert\nwere used to tune the time-out for spinning down the disk of a PC [HLSS00]. In this pa-\nper we use the weight updates of these algorithms for dynamically determining the best\ncaching policy. This application is more elaborate because we needed to actively gather\nperformance information about the caching policies via virtual caches. In future work we\nplan to do a more thorough study of feasibility of background rollover by building actual\nsystems.\n\nAcknowledgements: Thanks to David P. Helmbold for an ef\ufb01cient dynamic programming\n), Ahmed Amer for data, and Ethan Miller many helpful in-\n\nsights.\n\napproach to BestShifting(\nReferences\n\n[AAG\n\n[ACD\n\nar] Ismail Ari, Ahmed Amer, Robert Gramacy, Ethan Miller, Scott Brandt, and\nDarrell D. E. Long. ACME: Adaptive caching using multiple experts. In Pro-\nceedings of the 2002 Workshop on Distributed Data and Structures (WDAS\n2002). Carleton Scienti\ufb01c, (to appear).\n\n99] Martin Arlitt, Ludmilla Cherkasova, John Dilley, Rich Friedrich, and Tai Jin.\nEvaluating content management techniques for Web proxy caches. In Pro-\nceedings of the Workshop on Internet Server Performance (WISP99), May\n1999.\n\n[BW02] O. Bousquet and M. K. Warmuth. Tracking a small set of experts by mixing\npast posteriors. J. of Machine Learning Research, 3(Nov):363\u2013396, 2002.\nSpecial issue for COLT01.\n\n[CBFH\n\n97] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and\nM. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427\u2013\n485, 1997.\n\n[CI97] Pei Cao and Sandy Irani. Cost-aware WWW proxy caching algorithms. In\nProceedings of the 1997 Usenix Symposium on Internet Technologies and\nSystems (USITS-97), 1997.\n\n[HLSS00] David P. Helmbold, Darrell D. E. Long, Tracey L. Sconyers, and Bruce Sher-\nrod. Adaptive disk spin-down for mobile computers. ACM/Baltzer Mobile\nNetworks and Applications (MONET), pages 285\u2013297, 2000.\n\n[HW98] M. Herbster and M. K. Warmuth. Tracking the best expert. Journal of Ma-\nchine Learning, 32(2):151\u2013178, August 1998. Special issue on concept drift.\n[JB00] Shudong Jin and Azer Bestavros. Greedydual* web caching algorithm: Ex-\nploiting the two sources of temporal locality in web request streams. Techni-\ncal Report 2000-011, 4, 2000.\n\n[KW97] J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient up-\ndates for linear prediction. Information and Computation, 132(1):1\u201364, Jan-\nuary 1997.\n\n[LW94] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Infor-\n\nmation and Computation, 108(2):212\u2013261, 1994.\n\n[MS96] Lily Mummert and Mahadev Satyanarayanan. Long term distributed \ufb01le ref-\nerence tracing: Implementation and experience. Software - Practice and Ex-\nperience (SPE), 26(6):705\u2013736, June 1996.\n\n\u0004\n\u0004\n\u0004\n\f", "award": [], "sourceid": 2296, "authors": [{"given_name": "Robert", "family_name": "Gramacy", "institution": null}, {"given_name": "Manfred K.", "family_name": "Warmuth", "institution": null}, {"given_name": "Scott", "family_name": "Brandt", "institution": null}, {"given_name": "Ismail", "family_name": "Ari", "institution": null}]}