{"title": "Safe Active Learning for Time-Series Modeling with Gaussian Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 2730, "page_last": 2739, "abstract": "Learning time-series models is useful for many applications, such as simulation\nand forecasting. In this study, we consider the problem of actively learning time-series models while taking given safety constraints into account. For time-series modeling we employ a Gaussian process with a nonlinear exogenous input structure. The proposed approach generates data appropriate for time series model learning, i.e. input and output trajectories, by dynamically exploring the input space. The approach parametrizes the input trajectory as consecutive trajectory sections, which are determined stepwise given safety requirements and past observations. We analyze the proposed algorithm and evaluate it empirically on a technical application. The results show the effectiveness of our approach in a realistic technical use case.", "full_text": "Safe Active Learning for Time-Series Modeling\n\nwith Gaussian Processes\n\nChristoph Zimmer Mona Meister Duy Nguyen-Tuong\nBosch Center for Arti\ufb01cial Intelligence, Renningen, Germany\n\n{christoph.zimmer,mona.meister,duy.nguyen-tuong}@de.bosch.com\n\nAbstract\n\nLearning time-series models is useful for many applications, such as simulation and\nforecasting. In this study, we consider the problem of actively learning time-series\nmodels while taking given safety constraints into account. For time-series mod-\neling we employ a Gaussian process with a nonlinear exogenous input structure.\nThe proposed approach generates data appropriate for time series model learning,\ni.e. input and output trajectories, by dynamically exploring the input space. The\napproach parametrizes the input trajectory as consecutive trajectory sections, which\nare determined stepwise given safety requirements and past observations. We ana-\nlyze the proposed algorithm and evaluate it empirically on a technical application.\nThe results show the effectiveness of our approach in a realistic technical use case.\n\n1\n\nIntroduction\n\nActive model learning deals with the problem of sequential data labeling for learning an unknown\nfunction. Data points are sequentially selected for labeling such that the information required for\napproximating the unknown function is maximized, according to some measures. The overall goal is\nto create an accurate model without having to supply more data than necessary and, thereby reducing\nthe annotation effort and measurement costs. Active learning has been well studied for classi\ufb01cation\ntasks, e.g. for image labeling [12], but in the \ufb01eld of regression, the active learning approach, related\nto the optimal experimental design problem [8], is not yet widespread.\nFor actively learning time-series models representing physical systems, the data has to be generated\nsuch that the relevant dynamics can be captured. In practice, the physical system needs to be excited\nby dynamically moving around in the input space using input trajectories, such that the collected\ndata, i.e. input and output trajectories, contain as much information about the dynamics as possible.\nCommonly used input trajectories include sinusoidal functions, ramps and step functions, white noise,\netc. [13, 17]. When employing input excitation on physical systems, however, additional aspects of\nsafety need to be considered. The excitation must not damage the physical system while dynamically\nexploring the input space, making it crucial to identify safe regions where dynamic excitation can be\nperformed.\nIn this paper we consider the problem of safe exploration for active learning of time-series models.\nThe goal is to generate input trajectories and output measurements which are informative for learning\ntime-series models. To do so, our input trajectories are parametrized in consecutive sections, e.g.\nas consecutive piecewise ramps or splines. These consecutive sections of the input trajectory are\ndetermined stepwise in an explorative approach. Given observations, the next trajectory sections are\ndetermined by maximizing an information gain criterion with respect to the model. In this paper,\nwe employ a Gaussian process with a nonlinear exogenous structure as the time-series model for\nwhich an appropriate exploration criterion is desired. An additional Gaussian process model is\nsimultaneously used for predicting safe input regions, given safety requirements. The sections of\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fthe input trajectory are determined by solving a constraint optimization problem, taking the safety\nprediction into account. The main contributions of the paper can be summarized as:\n\nexploration, in the context of the Gaussian process framework.\n\n\u2022 We formulate an active learning setting for learning time-series models with dynamic\n\u2022 We incorporate the safety aspect into the exploration mechanism and derive a criterion\n\u2022 We provide a theoretical analysis of the algorithm, and empirically evaluate the proposed\n\nappropriate for the dynamic exploration of the input space with trajectories.\n\napproach on a realistic technical use case.\n\nThe remainder of the paper is organized as follows. In Section 2, we provide an overview on related\nwork. In Section 3, we introduce the algorithm for safe active learning of time-series models. Section\n4 provides a theoretical analysis, and in Section 5, we highlight our empirical evaluations in learning\ntime-series model in several settings. The Appendix contains the proofs of the theoretical analysis\nsection and some more experimental investigations.\n\n2 Related Work\n\nMost existing work for safe exploration in unknown environments is in the reinforcement learning\nsetting [16, 10, 9]. For example, the safe exploration in \ufb01nite MDP relies on the restriction of suitable\npolicies, ensuring ergodicity at a user-de\ufb01ned safety level [16]. In [10], the ergodic assumption for\nthe MDPs is dropped by introducing fatal absorbing states. In [9], the authors consider the use of a\nmulti-armed, risk-aware bandit setting to prevent hazards when exploring different tasks. Strategies\nfor exploring unknown environments have also been re\ufb02ected in the framework of global optimization\nwith Gaussian processes [1, 23, 11]. For example, [11] propose an ef\ufb01cient submodular exploration\ncriterion for near-optimal sensor placements, i.e. for discrete input spaces. In [1], a framework\nis presented which yields a compromise between exploration and exploitation through con\ufb01dence\nbounds. In [23], the authors show that under reasonable assumptions, strong exploration guarantees\ncan be given for Bayesian optimization with Gaussian processes.\nSafe exploration using Gaussian processes (GP) has also been considered in the past, such as for safe\nactive learning [20] and safe Bayesian optimization [2, 24]. In [24], for example, a two-steps process\nis proposed for a safe exploration and ef\ufb01cient exploitation of identi\ufb01ed safe areas. In safe active\nlearning, [20] proposes a method for safe exploration based on the GP variance for stationary, i.e.\npointwise, measurements. In contrast to the work by [20], we consider the setting of safe dynamical\nexploration, i.e. using trajectory-wise measurements. This setting is especially useful when actively\nlearning time-series models.\nThe problem of active learning for time-series models has not yet been considered extensively in\nthe machine learning literature. Work on related topics is mostly in the \ufb01eld of online design of\nexperiments, e.g. [6]. In [6], the authors employ a parametric model for learning dynamical processes,\nin which the data for model learning is obtained by exploring using the Fisher information matrix.\nIn contrast to their work, we explore unknown environments by employing a criterion de\ufb01ned for\nthe non-parametric GP model, while also taking into account safety requirements. Furthermore, our\nproposed exploration scheme is rigorously analyzed providing further algorithmic insights.\n\n3 Safe Active Learning for Time-Series Modeling\nOur goal is to approximate an unknown function f : X \u2282 Rd \u2192 Y \u2282 R. In the case of time-series\nthe well-established nonlinear exogenous (NX) model, the input space consists of\nmodels, e.g.\ndiscretized values of the so-called manipulated variables [3]. Thus, xk at time k can be given as\n\nxk = (uk, uk\u22121, . . . , uk\u2212d2+1) ,\n\nwhere (uk)k, uk \u2208 Rd2, represents the discretized manipulated trajectory. Here, d1 is the dimension\nof the system\u2019s input space, d2 the dimension of the NX structure, and d = d2 \u00b7 d1. In practice, the\nelements uk are measured from physical systems and need not be equidistant, however, for notational\nconvenience we assume equidistance in this setting. In general, the manipulated trajectories are\ncontinuous signals and can be explicitly controlled. In the model learning setting, we observe data in\n\n2\n\n\fn ={\u03c4 i, \u03c1i}n\n\nthe form of n consecutive piecewise trajectories Df\nmatrix and consists of m input points of dimension d, i.e. \u03c4 i = (xi\ntrajectory \u03c1i contains m corresponding output measurements, i.e. \u03c1i = (yi\nThe considered problem is to determine the next piecewise trajectory \u03c4 n+1 as input excitation to the\nphysical system such that the information gain of Df\nn+1 \u2013 with respect to modeling f \u2013 is increased.\nAt the same time, \u03c4 n+1 should be determined subject to given safety constraints. In this section we\nelaborate on the setting and describe the algorithm. The de\ufb01nition of the considered information gain\nand corresponding analysis are provided in Section 4.\n\ni=1, where the input trajectory \u03c4 i is a\nm)\u2208Rd\u00d7m. The output\n1, . . . , yi\n\nm)\u2208Rm.\n\n1, . . . , xi\n\n3.1 Modeling Trajectories with Gaussian Processes\n\nWe employ a Gaussian Process (GP) model to approximate the function f (see [19] for more\ndetails). A GP is speci\ufb01ed by its mean function \u00b5(x) and covariance function k(xi, xj), i.e. f (xi)\u223c\nGP(\u00b5(xi), k(xi, xj)). Given noisy observations of input and output trajectories, the joint distribution\naccording to the GP prior is given as\n\np (P n|T n) = N(cid:0)P n|0, Kn +\u03c32I(cid:1) ,\n\nwhere P n\u2208Rn\u00b7m is a vector concatenating output trajectories and T n\u2208Rn\u00b7m\u00d7d a matrix containing\ninput trajectories. The covariance matrix is represented by Kn\u2208Rn\u00b7m\u00d7n\u00b7m. In this paper, we employ\nf (xi\u2212xj)),\nthe Gaussian kernel as the covariance function, i.e. k(xi, xj) = \u03c32\nf ). Furthermore, we have a zero vector 0 \u2208 Rn\u00b7m as mean,\nwhich is parametrized by \u03b8f = (\u03c32\nan n \u00b7 m-dimensional identity matrix I, and \u03c32 as output noise variance (see [19]). Given the\njoint distribution, the predictive distribution p(\u03c1\u2217|\u03c4 \u2217,Df\nn) for a new piecewise trajectory \u03c4 \u2217 can be\nexpressed as\n\n2 (xi\u2212xj)T \u039b2\n\nf exp(\u2212 1\n\nf , \u039b2\n\np(\u03c1\u2217|\u03c4 \u2217,Df\n\nn) = N (\u03c1\u2217|\u00b5(\u03c4 \u2217), \u03a3(\u03c4 \u2217)) ,\n\n(1)\n\nwith\n\n\u2217\u2217\n\n\u00b5(\u03c4 \u2217) = k(T n, \u03c4 \u2217)T (Kn +\u03c32I)\u22121P n ,\n\u03a3(\u03c4 \u2217) = k\n\n\u2217\u2217 \u2208 Rm\u00d7m is a matrix with k\u2217\u2217\n\n(\u03c4 \u2217, \u03c4 \u2217) \u2212 k(T n, \u03c4 \u2217)T (Kn +\u03c32I)\u22121k(T n, \u03c4 \u2217) ,\n\n(2)\nij = k(xi, xj). The matrix k \u2208 Rn\u00b7m\u00d7m contains kernel\nwhere k\nevaluations relating \u03c4 \u2217 to the previous n input trajectories. As the covariance matrix Kn is fully\noccupied, the input points x are fully correlated within a piecewise trajectory, as well as across\ndifferent trajectories; this enables the exploitation of high capacity correlations. However, due to the\npotentially large dimension n\u00b7m, inverting the matrix Kn+\u03c32I can be infeasible. GP approximation\ntechniques can be employed, e.g. using sparse inducing inputs or variational approaches [18, 22, 26].\n\n3.2 Modeling the Safety Condition\nThe safety status of the system is described by an unknown function g : X \u2282Rd \u2192 Z \u2282R, mapping\nan input point x to a safety value z, which acts as a safety indicator. The values z are computed\nusing information from the system, and are designed such that all values equal or greater than zero\nare considered safe for the corresponding input x. Example 1 shows a construction for computing z\nvalues. More examples can be found in the evaluation in Section 5.\nExample 1 (A safety indicator for a high-pressure \ufb02uid system). In a high-pressure \ufb02uid system, we\ncan measure the pressure \u03c8 for a given input state x. Additionally, we know the value of the maximal\npressure \u03c8max which can act on the physical system. Given the current pressure \u03c8, the safety values\nz can be computed as\n\nz(\u03c8) = 1 \u2212 exp((\u03c8 \u2212 \u03c8max)/\u03bbp) ,\n\n(3)\n\nwhere \u03bbp describes the decline, when \u03c8 increases towards \u03c8max.\n\nNote that z is continuous and, intuitively, indicates the distance of a given point x from the unknown\nsafety boundary in the input space. Thus, given the function g \u2013 or an estimate of it \u2013 we can evaluate\nthe level of safety for a trajectory \u03c4 . We consider a trajectory as safe, if the probability that its safety\nvalues z are greater than zero is suf\ufb01ciently large, i.e.\n\n(cid:90)\n\nz1,...,zm\u22650\n\np(z1, . . . , zm|\u03c4 ) dz1, . . . , zm > 1 \u2212 \u03b1 ,\n\n3\n\n\fwith \u03b1 \u2208 (0, 1] representing the threshold for considering \u03c4 unsafe. Given data Dg\nwith \u03b6i = (zi\ntion p(\u03b6\n\ni=1,\nm)\u2208Rm, we employ a GP to approximate the function g. The predictive distribu-\n\nn) given a piecewise trajectory \u03c4 \u2217 is then computed as\n\nn ={\u03c4 i, \u03b6i}n\n\n\u2217|\u03c4 \u2217,Dg\n\n1, . . . , zi\n\n\u2217|\u03c4 \u2217,Dg\n\np(\u03b6\n\n(4)\nwith \u00b5g(\u03c4 \u2217) and \u03a3g(\u03c4 \u2217) being the corresponding mean and covariance. The quantities \u00b5g and \u03a3g\nare computed as shown in Eq. (2), then with Zn\u2208Rn\u00b7m as the target vector concatenating all \u03b6i. By\nemploying a GP for approximating g, the safety condition \u03be(\u03c4 ) for a trajectory \u03c4 can be computed as\n\n\u2217|\u00b5g(\u03c4 \u2217), \u03a3g(\u03c4 \u2217)(cid:1) ,\n\nn) = N(cid:0)\u03b6\nN(cid:0)\u03b6|\u00b5g(\u03c4 ), \u03a3g(\u03c4 )(cid:1) dz1, . . . , zm > 1 \u2212 \u03b1 .\n\n(cid:90)\n\n(5)\n\n\u03be(\u03c4 ) =\n\nz1,...,zm\u22650\n\nIn general, the computation of \u03be(\u03c4 ) is analytically intractable, and thus needs to rely on some\napproximation, such as Monte-Carlo sampling or expectation propagation [15].\n\n3.3 The Algorithm\n\nIn the previous sections, we elaborated on the modeling of the predictive distribution and the safety\ncondition for a given piecewise trajectory \u03c4 in the input space. For ef\ufb01ciently choosing an optimal\n\u03c4 , the trajectory needs to be appropriately parametrized. The most straightforward possibility is to\nparametrize in the input space. We illustrate the trajectory parametrization in the following Example\n2, using ramp parameterization.\nExample 2 (Consecutive ramps as piecewise trajectory). A ramp can be parametrized with its start\nand end point. As the start point is the last point of the previous trajectory, the end point \u03b7 is the only\nfree quantity, and therefore a ramp can be parametrized as\n\n\u03c4 (\u03b7) = (x1(\u03b7), . . . , xm(\u03b7))\n\nwith for 1 \u2264 k \u2264 m : xk(\u03b7) =\n\nu0 +\n\n(\u03b7 \u2212 u0), . . . , u0 +\n\nk\nm\n\nk \u2212 d2 + 1\n\nm\n\n(\u03b7 \u2212 u0)\n\nwhere u0 is the start point of the ramp. For k\u2212i \u2265 0, the manipulated input variable is on the\ncurrently planned trajectory, and for k\u2212 i < 0 it can be read from the list of already executed\ntrajectories.\n\n(cid:19) (6)\n\n(cid:18)\n\nGiven a trajectory parametrization with its predictive distribution in Eq. (1) and safety condition in\nEq. (5), the next piecewise trajectory \u03c4 n+1(\u03b7\u2217) can be obtained by solving the following constrained\noptimization problem\n\n\u03b7\u2217 = argmax\u03b7\u2208\u03a0 I (\u03a3(\u03b7))\ns.t. \u03be (\u03b7) > 1 \u2212 \u03b1 ,\n\n(7)\n(8)\nwhere \u03b7 represents our trajectory parametrization, \u03a0 is domain of the manipulated variable, and I an\noptimality criterion we will discuss later. As shown in Eq. (7), we employ the predictive variance\n\u03a3 from Eq. (1) for the exploration, which is common in the active learning setting, especially\nin combination with a GP model [20, 14]. In contrast to previous work, due to the nature of the\nconsidered trajectory, we have a covariance matrix \u03a3 instead of the variance value usually employed\nin the active learning and Bayesian optimization setting [23]. The covariance matrix is mapped\nby an optimality criterion I to a real number, as indicated by Eq. (7). Various optimality criteria\ncan be used for I, as discussed in the system identi\ufb01cation literature [8]. For example, I can be\nthe determinant, equivalent to maximizing the volume of the predictive con\ufb01dence ellipsoid of the\nmulti-normal distribution, the trace, equivalent to maximizing the average predictive variance, or the\nmaximal eigenvalue, equivalent to maximizing the largest axis of the predictive con\ufb01dence ellipsoid\n[8].\nThe constraint in Eq. (8) represents a probabilistic safety criterion, motivated by our probabilistic\nmodeling approach for the safety. The probabilistic approach \ufb02exibly allows us to control the trade-off\nbetween exploration speed and safety consideration. For example, a 100% safe exploration would\nkeep the algorithm from leaving the initial safe area and, hence, would not lead to an exploration of\nnew areas. On the other hand, a 0% safe exploration will explore without any safe considerations\nwhich will result in many safety violations. This trade-off provides the users an additional degree of\nfreedom, depending on how much they \u201ctrust\u201d the behavior of their physical systems.\n\n4\n\n\fAlgorithm 1 Safe Active Learning for Time-Series Modeling\n1: Input: Safety threshold 0\u2264 \u03b1\u2264 1\n2: Initialization: Collect n0 safe trajectories, i.e. Df,g\n3: for k = 1 to N do\n4:\n5:\n6:\n7:\n8:\n9: end for\n10: Update and return regression model and safety model\n\nUpdate regression model approximating f using Df\nUpdate safety model approximating g using Dg\ni=1, according to Eq. (4)\nDetermine new piecewise trajectory \u03c4 n+1, by optimizing \u03b7 according to Eq. (7 and 8)\nExecute \u03c4 n+1 on the physical system, while measuring \u03c1n+1 and \u03b6n+1\nInclude new trajectories into Df\n\n0 ={\u03c4 i, \u03c1i, \u03b6i}n\n\nk\u22121 ={\u03c4 i, \u03c1i}n\n\nk\u22121 ={\u03c4 i, \u03b6i}n\n\ni=1 with n = n0.\n\ni=1, according to Eq. (1)\n\nk\u22121 and Dg\n\nk\u22121 with n = n+1.\n\nAlgorithm 1 summarizes the basic steps of the proposed algorithm, which needs to be initialized by n0\nsafe trajectories. In practice, the initial trajectories are located in a small, safe region chosen before-\nhand using prior knowledge. The incremental updates of the GP models for new data, i.e. steps 4 and\n5 in Algorithm 1, can be ef\ufb01ciently performed, e.g. through rank-one updates [21]. The optimization\nproblem in Eq. (7) can be solved using gradient-based optimization approaches, e.g. [4, 5]. In this\npaper, we employ the NX-structure in combination with the GP model for time-series modeling.\nHowever, this approach can also be extended to the general nonlinear auto-regressive exogenous\ncase [3], i.e. a GP with NARX input structure xk = (yk, yk\u22121, . . . , yk\u2212q, uk, uk\u22121, . . . , uk\u2212d). In\nthis case, for optimization and planning of the next piecewise trajectories, one can use the predictive\nmean of p(\u03c1|\u03c4 ,Df\nn) as surrogate for yk. Note that the input excitation is still performed through the\nmanipulated variable uk in the case of NARX.\n\n4 Theoretical Results\n\nIn this section, we provide some results on the theoretical analysis of the proposed approach. First, we\ninvestigate the safety aspect of the algorithm. In Section 4.2, we provide a bound on the decay rate of\nthe predictive variances for the case when the criterion I is a determinant, i.e. I (\u03a3(\u03b7)) = det (\u03a3(\u03b7)).\nThe proofs can be found in the Appendix.\n\n4.1 Safe Exploration\n\nTo satisfy the safety requirements, it is necessary to bound the probability of failures during explo-\nration. Theorem 1 provides an upper bound on the probabilities for unsafe trajectories.\nTheorem 1. Let us assume that we have recorded n0 initial safe trajectories, and that their observa-\n\u221a\ntions are enough to model g well, in the sense that our GP quanti\ufb01es the uncertainty of predictions\nfor g correctly, i.e. P (\u00b5g \u2212 \u03bd\u03c3g \u2264 z \u2264 \u00b5g + \u03bd\u03c3g) = Erf(\u03bd/\n2) for all \u03bd \u2265 0. Let \u03b4 \u2208 [0, 1] be\nthe desired failure probability when determining the next N consecutive piecewise trajectories. Set\n\u03b1 = \u03b4/N and let this \u03b1 be the probability bound for a trajectory being unsafe (as in Eq. (5)). Then,\nthe iterative exploration for the next N trajectories is unsafe with probability at most \u03b4, i.e.\n\nj) < 0 for some 1 \u2264 j \u2264 m|\u03be(\u03c4 i) > 1\u2212\u03b1(cid:9)(cid:17) \u2264 \u03b4.\n(cid:8)g(xi\n\n(cid:16)\u222an0+N\n\nP\n\ni=n0+1\n\nTheorem 1 supplies us with a useful rule of thumb to select \u03b1 for sequentially determining the next\nN trajectories.\n\n4.2 Decay of Predictive Variance\n\nThe remainder of the analysis is to show that the proposed exploration scheme makes the predictive\nuncertainty \u03a3 decrease as n increases. In this paper we use the determinant of \u03a3 as an exploration\ncriterion, which has been shown to have a close relationship to the information gain [14, 23], de\ufb01ned\nas the mutual information I.\nFirst, we point out that this relationship still holds true in case of trajectories as observations.\nSubsequently, we introduce the maximum information gain as an upper bound, which can further be\n\n5\n\n\fused to show the decrease of the predictive uncertainty. Lemma 1 clari\ufb01es the relationship between\ndeterminant and mutual information. Let us denote the predictive variance after recording i\u2212 1\ntrajectories as \u03a3i\u22121(\u03c4 i), \u03a30(\u03c4 1) = k\nLemma 1. The mutual information I({\u03c1i}n\n\u03a3i\u22121(\u03c4i) as follows\n\ni=1) can be related to the predictive co-variances\n\n(\u03c4 1, \u03c4 1), and set \u02dc\u03c1i =(cid:0)f (xi\n\nm)(cid:1).\n\n1), . . . , f (xi\n\n\u2217\u2217\n\nI ({\u03c1i}n\n\ni=1;{\u02dc\u03c1i}n\n\ni=1) = 1/2\n\nlog |I m + \u03c3\u22122\u03a3i\u22121(\u03c4 i)|\n\ni=1;{\u02dc\u03c1i}n\nn(cid:88)\n\ni=1\n\ni=1;{\u02dc\u03c1i}n\n\ni=1\u2282XmI({\u03c1i}n\n\nNext, we introduce the maximum information gain after observing n trajectories as \u03b3n :=\ni=1), (see Srinivas et al. [23] for more details). The maximum\nmax{\u03c4 i}n\ninformation gain is the information which could be gathered when exploring the system in a non-\niterative way, by optimally designing all trajectories simultaneously (which is in practice hard as\nit would require a solution of a high dimensional optimization problem, and would not allow us to\nincorporate safety information from observations during the experiment). According to Srinivas et al.\n\n([23], Theorem 5) the maximum information gain satis\ufb01es \u03b3n = O(cid:0)log(n)d+1(cid:1), i.e. the maximum\n\ninformation grows slower than the number of additional trajectories. This will be crucial later on, but\n\ufb01rst we investigate the relation between the determinant of the covariance and \u03b3n. Using Lemma 1\nthe determinant of the covariance can be bounded, as given in Lemma 2.\nLemma 2. After observing n trajectories {\u03c4 i}n\ncovariance is upper bounded by\n\ni=1 (according to Eq. (7)), the determinant of the\n\nn(cid:88)\n\ni=1\n\n1\nn\n\n|\u03a3i\u22121(\u03c4 i)| \u2264 C\n\n\u03b3n\nn\n\n,\n\nwhere \u03a3i\u22121 is the predictive variance computed using the previous i\u2212 1 trajectories, \u03b3n is the\nmaximum information gain, and C = 2\u03c32m\n\nf /log(1 + \u03c3\u22122m\u03c32m\n\n) is a constant.\n\nf\n\nThe \ufb01rst step in proving Lemma 2 is to upper bound the predictive variance using the mutual\ninformation via Lemma 1. Subsequently, the mutual information is upper bounded by \u03b3n. Using\nLemma 2 and Theorem 5 in [23], we can provide a decay rate on the average determinant of predictive\nvariances.\nTheorem 2. Let {\u02dc\u03c4 i}n\ni=1 be n arbitrary trajectories within a compact and convex domain X, and k\nbe a kernel function such that k(\u00b7,\u00b7) \u2264 1. If \u03a3i\u22121 come from our exploration scheme Eq. (7) (i.e.\nwithout safety considerations), then we have\n\nn(cid:88)\n\ni=1\n\n1\nn\n\n|\u03a3i\u22121(\u02dc\u03c4 i)| = O\n\n(cid:18) log(n)d+1\n\n(cid:19)\n\n.\n\nn\n\nn\n\nWe sketch the proof here: as our algorithm (without safety considerations) always chooses the\n(cid:80)n\n(cid:80)n\ntrajectory with the highest determinant (D criterion), the average determinant of an actively learned\nscheme is always higher than or equal to the average determinant of an arbitrary scheme. Therefore,\ni=1 |\u03a3i\u22121(\u02dc\u03c4 i)| \u2264 1\ni=1 |\u03a3i\u22121(\u03c4 i)|, which is O(log(n)d+1/n) when employing Lemma 2\n1\nn\nand the Theorem 5 of [23].\nBy Theorem 2, for any sequence of trajectories, the average of the determinants of their predictive\ncovariances tends to zero. As the determinant corresponds to the volume of the con\ufb01dence ellipsoid,\nwe can conclude that the average volume of con\ufb01dence ellipsoids tends to zero as well, indicating\nthat on average, our predictions become precise. However, as the safety constraint in Eq. (8) changes\nat every iteration, we extend the statement of Theorem 2:\nTheorem 3. Let us assume that there exists a compact and convex domain X, and a kernel function k\nsuch that k(\u00b7,\u00b7) \u2264 1, that covers the whole area which is explorable (independent of whether it is safe\nor not). Then, the statement of Theorem 2 still holds for our Algorithm 1 with iteration-dependent\nsafe areas Si.\n\nTheorem 3 guarantees the decay of averaged determinants of covariances during safe exploration.\n\n6\n\n\fFigure 1: The columns show the progress of the approximation of f (inlay) and the identi\ufb01ed safety region\n(main \ufb01gure) at different iterations. Each iteration corresponds to a consecutive planning of a new piecewise\ntrajectory (here: 2D ramp). As shown by the results, the current estimation of the safe region (green area)\ngradually covers the actual safe area (red line), and the approximation error gradually decreases (as shown in the\nsub\ufb01gures). An illustrative video showing all iterations can be found in the Appendix.\n\n5 Evaluations\n\nIn section 5.1 we illustrate the proposed approach using synthetic models, comparing our safe active\nlearning approach (SAL-NX) with random selection using safety constraints. Subsequently, we\nemploy the approach to learn a dynamics model of a physical, high-pressure \ufb02uid system in Section\n5.2. For simplicity we employ ramps for the piecewise trajectory parametrization, but other curve\nparameterizations could also be used instead, e.g. spline parameterization. The form of the input\ntrajectory has an impact on the excitation of the system, as comprehensively studied in the \ufb01eld of\nsystem identi\ufb01cation [17].\n\n5.1 Simulated Experiments\n\nExperiment 1\nIn this experiment, a toy example is employed to illustrate the concept of input\nspace exploration with piecewise trajectories and safe region detection. A function f : R2 \u2192 R,\nf (x) = (x(1)\u22122)2+(x(1)\u22122)(x(2)\u22122)+(x(2)\u22122)2 with x = (x(1), x(2)) is used as the ground-truth.\nAn observation is given by y = f (x) + \u0001 with \u0001 \u223c N (0, 1). The safe region is characterized by\ng :R2\u2192R with g(x) = (x(1)\u22125)2 +(x(1)\u22125)(x(2)\u22125)+(x(2)\u22125)2. The safety indicator z is given\nas z =\u22120.005 \u00b7 g(x)+1+\u03c2 with \u03c2 \u223c N (0, 1). It is considered to be safe for z > 0, otherwise unsafe.\nWe proceed as shown in Algorithm 1, where the piecewise trajectories are parametrized as 2D-ramps\nwith 5 discretization points (i.e. m = 5, see Example 2). We start with 10 initial safe trajectories\nand consecutively determine new piecewise trajectories in the input space X, while also collecting\noutputs y and computing safety indicator values z. As the exploration progresses, the approximation\nof f and g becomes more and more accurate, as shown in Fig. 1. The current estimation of the\nsafe region (green area) gradually covers the actual safe area, and the approximation error gradually\ndecreases (as shown in the sub\ufb01gures). An illustrative video showing all iterations can be found in\nthe Appendix.\n\nk , u(2)\n\nk , u(1)\n\nand u(2)\nk\n\nk\u22121, u(2)\n\nExperiment 2\nIn this experiment, we learn a time-series model given as a GP with NX-structure.\nWe have two manipulated variables u(1)\nat time k. The NX-structure is determined to be\nk\nxk = (u(1)\nk\u22121), an input space with d = 4. The ground-truth models of f and g are\nprovided in the Appendix. The piecewise trajectory is again parametrized as 4D-ramps with m = 5.\nWe initialize the models using 10 collected piecewise ramps in a safe area, and start exploring in the\ninput space. For a fair comparison, we benchmark the proposed algorithm against a random selection\nwith safe constraints of next piecewise trajectories. Instead of optimizing the ramp parameter \u03b7\nas shown in Eq. (7) and (8), we randomly select \u03b7 and pick the \ufb01rst one which ful\ufb01lls the safety\nconstraint \u03be(\u03b7) > 1\u2212 \u03b1. Fig. 2 shows the results of the comparison of the proposed approach\n(SAL-NX) with random selection.\nThe results in Fig. 2 show that SAL-NX continuously improves the model approximation (shown\nas RMSE) and provides fast coverage of safe regions. The models for f and g are updated after\nevery iteration by including new sample points. The hyperparameters can be estimated beforehand\n\n7\n\nIteration 3-5010203040x1-5010203040x20102030050100150RMSEIteration of our algorithmIteration 13-5010203040x1-5010203040x20102030050100150RMSEIteration of our algorithmIteration 24-5010203040x1-5010203040x20102030050100150RMSEIteration of our algorithm\fFigure 2: The \ufb01rst two pictures from the left show the comparison of the SAL-NX (red line) with random\nselection (blue line). SAL-NX yields faster convergence in model approximation (left picture) and coverage of\nsafe regions (right picture), while having less variance and outliers (indicated as small circles). The last two\npictures show the impact of the safety threshold \u03b1. The left picture shows the RMSE of SAL-NX for 4 different\nvalues of \u03b1. The right picture shows the model approximation error as RMSE (red line) and percentage of unsafe\ntrajectories (blue line) as a function of \u03b1. All pictures show boxplots over 5 repetitions.\n\nor updated after every iteration. For the required number of initial trajectories, we refer to the\nlower bound as given in [20]. For computing the safety condition \u03be(\u03c4 ) from Eq. (5), we employ\nMonte-Carlo sampling. Our experiments are performed on a desktop computer. The algorithm is\nsuf\ufb01ciently fast for real-time applications.\nIn this experiment, we also compare our exploration approach with the one proposed in [6], however,\nwithout safety requirements in order to cope with the setting from [6]. We adapt their criterion based\non the Fisher information for our GP model by employing the GP mean function. Additionally,\nwe also compare the decrease in RMSE of the Fisher information based criterion to the decrease\nin RMSE of our algorithm. The results can be found in the Appendix 7.4 and show a competitive\nperformance of our approach.\n\n5.2 Learning a Surrogate Model of the High-Pressure Fluid System\n\nRail Pressure \u03c8k\n\nThe Use Case As a realistic technical use case, we employ the approach to actively learn a surrogate\nmodel of a high-pressure \ufb02uid injection system, as shown in Figure 3. Such systems are widely\nused in industry, e.g. in the automotive domain for injection of fuel into the combustion engine [25].\nThe physical injection system is controlled by an actuation\nActuation vk\nsignal vk and the speed of an external engine nk, for every\ntime step k. The goal is to obtain a surrogate model pre-\ndicting the rail pressure \u03c8k, which determines the amount\nof \ufb02uid coming out of the outlets. Due to the nature of the\n\ufb02uid and the mechanical components, the dynamics of the\nwhole system are nonlinear, and thus model learning is\nan appropriate alternative compared to analytical models.\nHowever, generating the data for learning a time series\nsurrogate model by varying the actuation signal and en-\ngine speed is not simple, as an inappropriate combination\nof them would result in hazardously high rail pressures,\ndamaging the physical system.\n\nFigure 3: High-pressure \ufb02uid injection sys-\ntem with controllable inputs vk , nk and mea-\nsured output \u03c8k (picture taken from [25]).\n\nFluid\n\nEngine Speed nk\n\nLearning Time-Series Surrogate Models Due to the safety requirements and the fact that\nthe safety boundary is not known beforehand, our safe active learning approach is very ap-\npropriate for approximating the dynamics model. The employed NX-structure is chosen to be\nxk = (nk, nk\u22121, nk\u22122, nk\u22123, vk, vk\u22121, vk\u22123). We again parameterize with piecewise ramps in this\n7D input space. The safety indicator value z is computed as shown in Example 1, with \u03c8max = 18 MPa\nbeing the maximally allowed rail pressure. It should be noted that here z is computed as a function of\nthe target output \u03c8, in constrast to the experiments in Section 5.1, where z is a function of the input.\nWe initialize the model with 25 trajectories sampled around a safe point chosen by a domain expert.\nSubsequently, we start exploring the input space dynamically, considering both the safety constraint\n\n8\n\n\fFigure 4: The \ufb01rst two pictures from the left show the comparison of the SAL-NX (red line) with random\nselection with safe constraints (blue line), with respect to model approximation and coverage of safe regions.\nHere, \u03b1 = 0.5 and 250 trajectories are planned. The last two pictures show the impact of the safety threshold\n\u03b1 on the approximation error, and failures during exploration. The results are displayed as a boxplot over 5\nrepetitions.\n\nand the model information gain, while measuring the actuation and speed signals as input and the rail\npressure as output.\nFigure 4 shows the results after exploring the input space with 250 consecutive ramp trajectories, each\nconsisting of m = 5 discretization points. We update the hyperparameters after every iteration. We\ncompare our SAL-NX approach with the random selection, as described in the previous experiment.\nThe \ufb01gure also shows the impact of varying the threshold value \u03b1 on both the model approximation\nerror and the percenteage of selected unsafe trajectories. In practice, the execution of the trajectories\non the physical system is interrupted, when the system notices a violation of the maximal pressure\n\u03c8max. The selected piecewise trajectory is then indicated as unsafe. For the evaluation of the\ncoverage (second picture from the left), the \u201cground-truth\u201d safe region is estimated beforehand with\nan extensive procedure.\n\n6 Conclusions\n\nIn this paper we present an approach for active learning of a time-series model, given as a GP model\nwith NX-structure. In this setting, the exploration is performed while taking safety requirements into\naccount. For the successful application of the algorithm, it is crucial that the system can be actively\ncontrolled by a set of inputs and a safety signal can be observed during the system\u2019s operation. The\nproposed approach is evaluated on toy examples, as well as on a realistic technical use case. The\nresults show that this approach is appropriate for real-world applications, especially, in the industrial\nsetting, where safety is a key requirement during operation.\n\nReferences\n[1] P. Auer. Using Con\ufb01dence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning\n\nResearch, 2002.\n\n[2] F. Berkenkamp, A. Krause, and A. P. Schoellig. Bayesian Optimization with Safety Constraints: Safe and\n\nAutomatic Parameter Tuning in Robotics. Technical report, arXiv, February 2016.\n\n[3] S. Billings. Nonlinear System Identi\ufb01cation: Narmax Methods in the Time, Frequency, and Spatio-Temporal\n\nDomains. John Wiley & Sons, 2013.\n\n[4] R. H. Byrd, J. C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for\n\nnonlinear programming. Mathematical Programming, 89(1):149\u2013185, 2000.\n\n[5] T. F. Coleman and Y. Li. On the convergence of re\ufb02ective newton methods for large-scale nonlinear\n\nminimization subject to bounds. Technical report, Ithaca, NY, USA, 1992.\n\n[6] M. De\ufb02orian, F. Kloepper, and J. Rueckert. Online dynamic black box modelling and adaptive experiment\ndesign in combustion engine calibration. In 6th IFAC Symposium Advances in Automotive Control. Elsevier,\n2010.\n\n9\n\n\f[7] M. De\ufb02orian, F. Kloepper, and J. Rueckert. Design of experiments for nonlinear dynamic system identi\ufb01-\ncation. In Proceedings of the 18th World Congress, The International Federation of Automatic Control.\n2011 IFAC, 2011.\n\n[8] V. Fedorov and P. Hackl. Model-Oriented Design of Experiments. Lecture Notes in Statistics. Springer\n\nNew York, 2012.\n\n[9] N. Galichet, M. Sebag, and O. Teytaud. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed\n\nBandits. In Proceedings of the 5th Asian Conference on Machine Learning, 2013.\n\n[10] P. Geibel. Reinforcement Learning with Bounded Risk. In C. E. Brodley and A. P. Danyluk, editors,\n\nProceedings of the 18th International Conference on Machine Learning, pages 162 \u2013 169, 2001.\n\n[11] C. Guestrin, A. Krause, and A. Singh. Near-Optimal Sensor Placements in Gaussian Processes.\n\nProceedings of the 22nd International Conference on Machine Learning, 2005.\n\nIn\n\n[12] A. J. Joshi, F. Porikli, and N. Papanikolopoulos. Multiclass active learning for image classi\ufb01cation. In\n\nIEEE Conf. on Computer Vision and Pattern Recognition, pages 2372\u20132379, 2009.\n\n[13] L. Ljung and T. S\u00f6derstr\u00f6m. Theory and Practice of Recursive Identi\ufb01cation. MIT Press series in signal\n\nprocessing, optimization, and control. MIT Press, 1985.\n\n[14] D. J. C. MacKay. Information-Based Objective Functions for Active Data Selection. Neural Computation,\n\n4(4):590\u2013604, 1992.\n\n[15] T. P. Minka. Expectation Propagation for Approximate Bayesian Inference. In Uncertainty in Arti\ufb01cial\n\nIntelligence. Morgan Kaufmann, 2001.\n\n[16] T. M. Moldovan and P. Abbeel. Safe Exploration in Markov Decision Processes. In Proceedings of the\n\n29th International Conference on Machine Learning, 2012.\n\n[17] R. Pintelon and J. Schoukens. System Identi\ufb01cation: A Frequency Domain Approach. Wiley, 2012.\n\n[18] J. Qui\u00f1onero-Candela and C. E. Rasmussen. A Unifying View of Sparse Approximate Gaussian Process\n\nRegression. In Journal of Machine Learning Research, 2005.\n\n[19] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.\n\n[20] J. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischoff, H. Markert, and M. Toussaint. Safe Exploration for\n\nActive Learning with Gaussian Processes. In ECML/PKDD, volume 9286, 2015.\n\n[21] M. Seeger. Low rank updates for the Cholesky decomposition, 2007.\n\n[22] E. L. Snelson and Z. Ghahramani. Sparse Gaussian Processes using Pseudo-inputs. In Advances in Neural\n\nInformation Processing Systems, 2006.\n\n[23] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger. Information-Theoretic Regret Bounds for\n\nGaussian Process Optimization in the Bandit Setting. Transactions on Information Theory, 2012.\n\n[24] Y. Sui, V. Zhuang, J. Burdick, and Y. Yue. Stagewise safe bayesian optimization with gaussian processes.\n\nIn 35th International Conference on Machine Learning, 2018.\n\n[25] N. Tietze, U. Konigorski, C. Fleck, and D. Nguyen-Tuong. Model-based calibration of engine controller\nusing automated transient design of experiment. In 14. Internationales Stuttgarter Symposium. Springer\nFachmedien Wiesbaden, 2014.\n\n[26] M. K. Titsias. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In Proceedings of\n\nthe Twelfth International Conference on Arti\ufb01cial Intelligence and Statistics, 2009.\n\n10\n\n\f", "award": [], "sourceid": 1441, "authors": [{"given_name": "Christoph", "family_name": "Zimmer", "institution": "Bosch Center for Artificial Intelligence"}, {"given_name": "Mona", "family_name": "Meister", "institution": "Robert Bosch GmbH"}, {"given_name": "Duy", "family_name": "Nguyen-Tuong", "institution": "Bosch Center for AI"}]}