{"title": "Deep Multi-State Dynamic Recurrent Neural Networks Operating on Wavelet Based Neural Features for Robust Brain Machine Interfaces", "book": "Advances in Neural Information Processing Systems", "page_first": 14514, "page_last": 14525, "abstract": "We present a new deep multi-state Dynamic Recurrent Neural Network (DRNN) architecture for Brain Machine Interface (BMI) applications. Our DRNN is used to predict Cartesian representation of a computer cursor movement kinematics from open-loop neural data recorded from the posterior parietal cortex (PPC) of a human subject in a BMI system. We design the algorithm to achieve a reasonable trade-off between performance and robustness, and we constrain memory usage in favor of future hardware implementation. We feed the predictions of the network back to the input to improve prediction performance and robustness. We apply a scheduled sampling approach to the model in order to solve a statistical distribution mismatch between the ground truth and predictions. Additionally, we configure a small DRNN to operate with a short history of input, reducing the required buffering of input data and number of memory accesses. This configuration lowers the expected power consumption in a neural network accelerator. Operating on wavelet-based neural features, we show that the average performance of DRNN surpasses other state-of-the-art methods in the literature on both single- and multi-day data recorded over 43 days. Results show that multi-state DRNN has the potential to model the nonlinear relationships between the neural data and kinematics for robust BMIs.", "full_text": "Deep Multi-State Dynamic Recurrent Neural\nNetworks Operating on Wavelet Based Neural\nFeatures for Robust Brain Machine Interfaces\n\nBenyamin Haghi1,*, Spencer Kellis2, Sahil Shah1, Maitreyi Ashok1, Luke Bashford2, Daniel\n\nKramer3, Brian Lee3, Charles Liu3, Richard A. Andersen2, Azita Emami1\n\n1 Electrical Engineering Department, Caltech, Pasadena, CA, USA\n\n2 Biology and Biological Engineering Department, Caltech, Pasadena, CA, USA\n\n3 Neurorestoration Center and Neurosurgery, USC Keck School of Medicine, L.A., CA, USA\n\n*benyamin.a.haghi@caltech.edu\n\nAbstract\n\nWe present a new deep multi-state Dynamic Recurrent Neural Network (DRNN)\narchitecture for Brain Machine Interface (BMI) applications. Our DRNN is used to\npredict Cartesian representation of a computer cursor movement kinematics from\nopen-loop neural data recorded from the posterior parietal cortex (PPC) of a human\nsubject in a BMI system. We design the algorithm to achieve a reasonable trade-off\nbetween performance and robustness, and we constrain memory usage in favor of\nfuture hardware implementation. We feed the predictions of the network back to\nthe input to improve prediction performance and robustness. We apply a scheduled\nsampling approach to the model in order to solve a statistical distribution mismatch\nbetween the ground truth and predictions. Additionally, we con\ufb01gure a small\nDRNN to operate with a short history of input, reducing the required buffering of\ninput data and number of memory accesses. This con\ufb01guration lowers the expected\npower consumption in a neural network accelerator. Operating on wavelet-based\nneural features, we show that the average performance of DRNN surpasses other\nstate-of-the-art methods in the literature on both single- and multi-day data recorded\nover 43 days. Results show that multi-state DRNN has the potential to model the\nnonlinear relationships between the neural data and kinematics for robust BMIs.\n\n1\n\nIntroduction\n\nBrain-machine interfaces (BMIs) can help spinal cord injury (SCI) patients by decoding neural\nactivity into useful control signals for guiding robotic limbs, computer cursors, or other assistive\ndevices [1]. BMI in its most basic form maps neural signals into movement control signals and then\ncloses the loop to enable direct neural control of movements. Such systems have shown promise in\nhelping SCI patients. However, improving performance and robustness of these systems remains\nchallenging. Even for simple movements, such as moving a computer cursor to a target on a computer\nscreen, decoding performance can be highly variable over time. Furthermore, most BMI systems\ncurrently run on high-power computer systems. Clinical translation of these systems will require\ndecoders that can adapt to changing neural conditions and which operate ef\ufb01ciently enough to run on\nmobile, even implantable, platforms.\nConventionally, linear decoders have been used to \ufb01nd the relationship between kinematics and neural\nsignals of the motor cortex. For instance, Wu et al. [2] use a linear model to decode the neural activity\nof two macaque monkeys. Orsborn et al. [3] apply a Kalman \ufb01lter, updating the model on batches of\nneural data of an adult monkey, to predict kinematics in a center-out task. Gilja et al. [4] propose\na Kalman Filter to predict hand movement velocities of a monkey in a center-out task. However,\nall of these algorithms can only predict piecewise linear relationships between the neural data and\nkinematics. Moreover, because of nonstationarity and low signal-to-noise ratio (SNR) in the neural\ndata, linear decoders need to be regularly re-calibrated [2].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fRecently, nonlinear machine learning algorithms have shown promise in attaining high performance\nand robustness in BMIs. For instance, Wessberg et al. [5] apply a fully-connected neural network to\nneural data recorded from a monkey. Shpigelman et al. [6] show that a Gaussian kernel outperforms\na linear kernel in a Kernel Auto-Regressive Moving Average (KARMA) algorithm when decoding\n3D kinematics from macaque neural activity. Sussillo et al. [7] apply a large FORCE Dynamic\nRecurrent Neural Network (F-DRNN) on neural data recorded from the primary motor cortex in\ntwo monkeys, and then they test the stability of the model over multiple days [8]. Zhang et al. [9]\nand Schwemmer et al. [10] extract wavelet based features of motor cortex neural data of a human\nsubject to classify intended hand movements by using a nonlinear support vector machine (SVM)\nand a large deep neural network, respectively. Hosman et al. [11] pass motor cortex neural \ufb01ring\nrates to an LSTM and a Kalman \ufb01lter to compare their performances for decoding intended cursor\nvelocity of a human subject. These nonlinear learning-based decoders have shown more stability over\nmultiple days and have improved performance compared to prior linear methods. However, they all\nhave been applied to motor cortex data by mostly using neural \ufb01ring rates as input features, which\nshow more variability over long periods [2]. Recent work has demonstrated that neural activity in the\nposterior parietal cortex (PPC) can be used to support BMIs [12, 13, 14, 15, 16, 17, 18], although\nthe encoding of movement kinematics appears to be complex. PPC processes a rich set of high-level\naspects of movement including sensory integration, planning, and execution [13] and may encode this\ninformation differently [15]. These characteristics of PPC differentiate it from other brain areas and,\nwhile providing a large amount of information to the decoder, also require new paradigms, such as\nthose discussed here, to extract useful information. Therefore, extracting appropriate neural features\nand designing a robust decoder that can model this relationship in an actual BMI setting is required.\nWe propose a new Deep Multi-State Dynamic Recurrent Neural Network (DRNN) decoder to address\nthe challenges of performance, robustness, and potential hardware implementation. We refer to two\ntheorems to show the stability, convergence, and potential of DRNNs for approximation of state-space\ntrajectories (see supplementary material). We train the DRNN by passing a history of input data to it\nand feeding the predictions of the system back to the input to improve performance and robustness for\nsequential data prediction. Moreover, we apply scheduled sampling to solve the statistical distribution\ndiscrepancy between the ground truth and predictions. By extracting different neural features, we\ncompare the performance and robustness of the DRNN with the existing methods in the literature to\npredict hand movement kinematics from open-loop neural data. Our BMI data are recorded from\nthe PPC of a human subject over 43 days. Finally, we discuss the potential for implementing our\nDRNN ef\ufb01ciently in hardware for implantable platforms. To the best of our knowledge, this is the \ufb01rst\ndemonstration of applying learning-based decoders to a human PPC activity. Our results indicate that\nthe Deep Multi-State DRNN operating on mid-band wavelet-based neural features has the potential\nto model the nonlinear relationships between the neural data and kinematics for robust BMIs.\n\n2 Deep multi-state dynamic recurrent neural network\n\nA DRNN is a nonlinear dynamic system described by a set of differential or difference equations.\nIt contains both feed-forward and feedback synaptic connections. In addition to the recurrent ar-\nchitecture, a nonlinear and dynamic structure enables it to capture time-varying spatiotemporal\nrelationships in the sequential data. Moreover, because of state feedback, a small recurrent network\ncan be equivalent to a large feed-forward network. Therefore, a recurrent network will be compu-\ntationally ef\ufb01cient, especially for the applications that require hardware implementation [19]. We\nde\ufb01ne our deep multi-state DRNN at each time step k as below:\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nsk = Wsssk\u22121 + Wsrrk\u22121 + Wsiuk + Wsf zk\u22121 + bs\nrk = tanh(sk)\nh(1)\nk = tanh(Wh(1)h(1)h(1)\nk = tanh(Wh(i)h(i) h(i)\nh(i)\n\u02c6yk = Wyh(l) h(l)\n\u02c6yk = tanh(\u02c6yk), |\u02c6yk| > 1\nzk \u2190 \u02c6yk or yk (Scheduled Sampling during Training)\n\nk\u22121 + Wh(1)rrk + bh(1))\nk\u22121 + Wh(i)h(i\u22121)h(i\u22121)\n\nk + by\n\nk\n\n+ bh(i) )\n\n(1)\n\ns \u2208 RN is the activation variable, and r \u2208 RN is the vector of corresponding \ufb01ring rates. These two\ninternal states track the \ufb01rst- and zero-order differential features of the system, respectively. Unlike\n\n2\n\n\fconventional DRNNs, Wss \u2208 RN\u00d7N generalizes the dynamic structure of our DRNN by letting the\nnetwork learn the matrix relationship between present and past values of s. Wsr \u2208 RN\u00d7N describes\nthe relationship between s and r. Wsu \u2208 RN\u00d7I relates s to the input vector u. z \u2208 RM models the\nadded prediction feedback in our DRNN. Wsf \u2208 RN\u00d7M tracks the effect of z on s. i \u2208 {2, ..., l}\nand l is the number of layers, Ni is the number of hidden units in ith layer, h(i) \u2208 RNi is the hidden\nstate of the ith hidden layer, Wh(1)r \u2208 RN1\u00d7N , Wh(i)h(i) \u2208 RNi\u00d7Ni, Wh(i)h(i\u22121) \u2208 RNi\u00d7Ni\u22121,\nWyh(l) \u2208 RM\u00d7Nl, bs \u2208 RN , bh(i) \u2208 RNi are the weights and biases of the network. All the\nparameters are learnable in our DRNN. Although feed-forward neural networks usually require a\ndeep structure, DRNNs generally need fewer than three layers. Algorithm 1 shows the training\nprocedure1. Inference is performed by using equation 1. Figure 1 shows the schematic of a two layer\nDRNN operating on a sample sequence of input data with length \u2206k.\nDuring inference, since the ground truth values are unavailable, the feedback, zk, has to be replaced\nby the previous network predictions. However, the same approach cannot be applied during training\nsince the DRNN has not been trained yet and it may cause poor performance of the DRNN. On the\nother hand, statistical discrepancies between ground truth and predictions mean that prior ground\ntruth cannot be passed to the input. Because of this disparity between training and testing, the DRNN\nmay enter unseen regions of the state-space, leading to mistakes at the beginning of the sequence\nprediction process. Therefore, we should \ufb01nd a strategy to start from the ground truth distribution\nand move toward the predictions\u2019 distribution slowly as the DRNN learns.\nThere exist several approaches to address this issue. Beam search generates several target sequences\nfrom the ground truth distribution [20]. However, for continuous state-space models like recurrent\nnetworks, the effective number of generated sequences remains small. SEARN is a batch approach\nthat trains a new model according to the current policy at each iteration. Then, it applies the new\nmodel on the test set to generate a new policy which is a combination of the previous policy and\nthe actual system behavior [21]. In our implementation, we apply scheduled sampling which can be\nimplemented easily in the online case and has shown better performance than others [22].\nIn scheduled sampling, at the ith epoch of training, the model pseudorandomly decides whether to\nfeed ground truth (probability pi) or a sample from the predictions\u2019 distribution (probability (1 \u2212 pi))\nback to the network, with probability distribution modeled by P (yk\u22121|rk\u22121). When pi = 1, the\nalgorithm selects the ground truth, and when pi = 0, it works in Always-Sampling mode. Since the\nmodel is not well trained at the beginning of the training process, we adjust these probabilities during\ntraining to allow the model to learn the predictions\u2019 distribution. Among the various scheduling\noptions for pi [22], we select linear decay, in which pi is ramped down linearly from ps to pf at each\nepoch e for the total number of epochs, E:\n\npf \u2212 ps\n\nE\n\npi =\n\ne + ps\n\n(2)\n\n3 Pre-processing and feature engineering\n\nWe evaluate the performance of our DRNN on 12 neural features: High-frequency, Mid-frequency,\nand Low-frequency Wavelet features (HWT, MWT, LWT); High-frequency, Mid-frequency, and\nLow-frequency Fourier powers (HFT, MFT, LFT); Latent Factor Analysis via Dynamical Systems\n(LFADS) features [23]; High-Pass and Low-Pass Filtered (HPF, LPF) data; Threshold Crossings\n(TCs); Multi-Unit Activity (MUA); and combined MWT and TCs (MWT + TCs) (Table 1).\nTo extract wavelet features, we use \u2019db4\u2019 mother wavelet on 50ms moving windows of the voltage\ntime series recorded from each channel. Then, the mean of absolute-valued coef\ufb01cients for each scale\nis calculated to generate 11 time series for each channel. HWT is formed from the wavelet scales 1\nand 2 (effective frequency range \u2265 3.75KHz). MWT is made from the wavelet scales 3 to 6 (234Hz -\n3.75KHz). Finally, LWT shows the activity of scales 7 to 11 as the low frequency scales (\u2264 234Hz).\nFourier-based features are extracted by computing the Fourier transform with the sampling frequency\nof 30KHz on one-second moving windows for each channel. Then, the band-powers at the same 11\nscales of the wavelet features are divided by the total power at the frequency band of 0Hz - 15KHz.\nTo generate TCs, we threshold bandpass-\ufb01ltered (250Hz - 5KHz) neural data at -4 times the root-mean\n\n1Our code is available at: https://github.com/BenyaminHaghi/DRNN-NeurIPS2019\n\n3\n\n\fRequire: u, y: Input and ground truth\nif i = 1 then\n\nz = y\n\nh \u2190 0\n\npi = pf\u2212ps\nfor i = 1 to number of batches do\n\nE e + ps\n\nend if\ns \u2190 N (0, \u03c3s), r \u2190 tanh(s)\nif number of layers = 2 then\n\nend if\nfor k = 2 to batch length do\nsk = Wsssk\u22121 + Wsrrk\u22121\n\nAlgorithm 1 Training \u2013 DRNN with Feedback\n1: Require: E, pf , ps\n2: for e = 1 to E do\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20:\n21:\n22:\n23:\n24:\n25:\n26:\n27:\n28:\n29:\n30: end for\n31: Until validation loss increases\n\nend for\nUpdate weights and biases: BPTT\n\nrk = tanh(sk)\nif layers = 1 then\n\n\u02c6yk = Wyrrk + by\n\nelse if layers = 2 then\n\n+Wsiuk + Wsf zk\u22121 + bs\n\nend for\n\nhk = tanh(Whhhk\u22121+Whrrk +bh)\n\u02c6yk = Wyhhk + by\n\nend if\nif |\u02c6yk| > 1 then\nend if\nzk \u2190 \u02c6yk or yk (Scheduled Sampling)\n\n\u02c6yk = tanh(\u02c6yk)\n\nFigure 1: Training DRNN on a sample se-\nquence of input data with length \u2206k.\n\nTable 1: Frequency range of features\nFrequency Range\n\nFeatures\n\nHWT, HFT, HPF\n\nTCs, LFADS\n\nMWT, MFT, BPF\nLWT, LFT, LPF\n\n> 3.75KHz\n\n250Hz - 5KHz\n\n234Hz - 3.75KHz\n\n< 234Hz\n\nFigure 2: Average performance of decoders\noperating on MWT over single-day data\n\n-square (RMS) of the noise in each channel. We do not sort the action potential waveforms [24].\nThreshold crossing events were then binned at 50ms intervals.\nLFADS is a generalization of variational auto-encoders that can be used to model time-varying aspect\nof neural signals. Pandarinath et al. [23] shows that decoding performance improves when using\nLFADS to infer smoothed and denoised \ufb01ring rates. We use LFADS to generate LFADS features\nbased on the trial-by-trial threshold crossings from each center-out task.\nTo extract HPF, MUA, and LPF features, we apply high-pass, band-pass, and low-pass \ufb01lters to the\nbroadband data, respectively, by using second-order Chebyshev \ufb01lters with cut-off frequencies of\n234Hz and 3.75KHz. To infer MUA features, we calculate RMS of band-pass \ufb01lter output. Then,\nwe average the output signals to generate one feature per 50ms for each channel. Table 1 shows the\nfrequency range of features.\nWe smooth all features with a 1s minjerk smoothing kernel. Afterwards, the kinematics and the\nfeatures are centered and normalized by the mean and standard deviation of the training data. Then,\nto select the most informative features for regression, we use XGBoost, which provides a score that\nindicates how useful each feature is in the construction of its boosted decision trees [25, 26]. In our\nsingle-day analysis, we perform Principal Component Analysis (PCA) [27]. Figure 3 shows the block\ndiagram of our BMI system.\n\n4 Experimental Results\n\nWe conduct our FDA- and IRB-approved study of a BMI with a 32 year-old tetraplegic (C5-C6)\n\n4\n\nDRNNHL-DRNNNNLSTMGRURNNKARMAF-DRNNSVRXGBRFKFDTLM0.00.10.20.30.40.50.60.70.80.9Average R2\fFigure 3: Architecture of our BMI system. Recorded neural activities of Anterior Intraparietal Sulcus\n(AIP), and Broadman\u2019s Area 5 (BA5) are passed to a feature extractor. After pre-processing and\nfeature selection, the data is passed to the decoder to predict the kinematics in a center-out task.\n\nhuman research participant. This participant has Utah electrode arrays (NeuroPort, Blackrock\nMicrosystems, Salt Lake City, UT, USA) implanted in the medial bank of Anterior Intraparietal\nSulcus (AIP), and Broadman\u2019s Area 5 (BA5). In a center-out task, a cursor moves, in two dimensions\non a computer screen, from the center of a computer screen outward to one of eight target points\nlocated around a unit circle. A trial is one trajectory of the cursor from the center of the screen to one\nof the eight targets on a unit circle (Figure 3). During open-loop training, the participant observes\nthe cursor move under computer control for 3 minutes. We collected open-loop training data from\n66 blocks over 43 days for of\ufb02ine analysis of the DRNN. Broadband data were sampled at 30,000\nsamples/sec from the two implanted electrode arrays (96 channels each). Of the 43 total days, 42\ncontain 1 to 2 blocks of training data and 1 day contains 6 blocks, with about 50 trials per block.\nMoreover, these 43 days include 32, 5, 1, and 5 days of 2015, 2016, 2017, and 2018, respectively.\nSince the predictions and the ground truth should be close in both micro and macro scales, we report\nroot mean square error (RMSE) and R2 as measures of average point-wise error and the strength\nof the linear association between the predicted and the ground truth signals, respectively. Results\nreported in the body of this manuscript are R2 values for Y-axis position. R2 values for X-axis\nposition and velocities in X and Y directions and RMSE values for all the kinematics are all presented\nin supplementary material. All the curves and bar plots are shown by using 95% con\ufb01dence intervals\nand standard deviations, respectively.\nThe available data is split into train and validation sets for parameter tuning. Parameters are computed\non the training data and applied to the validation data. We perform 10 fold cross-validation by\nsplitting the training data to 10 sets. Every time, the decoder is trained on 9 sets for different set of\nparameters and validated on the last set. We \ufb01nd the set of optimum parameters by using random\nsearch, as it has shown better performance than grid search [28]. Finally, we test the decoder with\noptimized parameters on the test set. The performance on all the test sets is averaged to report the\noverall performance of the models in both single- and multi-day analysis.\nWe compare our DRNN with other decoders, ranging from linear and historical decoders to nonlinear\nand modern techniques. The linear and historical decoders with which we compare ours are the\nLinear Model (LM) [2] and Kalman Filter (KF) [3]. The nonlinear and modern techniques with which\nwe also compare ours include Support Vector Regression (SVR) [29], Gaussian KARMA [6], tree\nbased algorithms (e.g., XGBoost (XGB) [25, 26, 30], Random Forest (RF) [31], and Decision Tree\n(DT) [32]), and neural network based algorithms (e.g., Deep Neural Networks (NN) [5], Recurrent\nNeural networks with simple recurrent units (RNN) [33], Long-Short Term Memory units (LSTM)\n[34], Gated Recurrent Units (GRU) [35], and F-DRNN [7]). (See supplementary material).\nWe \ufb01rst present single-day performance of DRNN, which is a common practice in the \ufb01eld [7, 3, 36]\nand is applicable when the training data is limited to a single day. Moreover, there are aspects that\ndiffer between single- and multi-day decoding, which have not yet been well characterized (e.g.,\nvarying sources of signal instability) and remain challenging in neuroscience. Furthermore, single-day\ndecoding is important before considering multi-day decoding since our implantable hardware will be\ndeveloped such that the decoder parameters can be updated at any time.\n\n4.1 Single-day performance\n\nWe select the MWT as the input neural feature. The models are trained on the \ufb01rst 90% of a day and\ntested on the remaining 10%. Figure 2 shows the average performance of the decoders. History-Less\nDRNN (HL-DRNN) uses the neural data at time k and kinematics at time k-1 to make predictions at\ntime k. As we see, DRNN and HL-DRNN are more stable and have higher average performance.\n\n5\n\nFeatureExtractionSmoothingNormalizationCenteringFeature SelectionXGBoostPCA(Single-day Analysis)Center-Out TaskCursor Position\f(a) DRNN\n\n(b) DRNN-10% Data\n\n(c) FDRNN\n\n(d) LSTM\n\n(e) GRU\n\n(f) RNN\n\n(g) NN\n\n(h) SVR\n\n(i) XGB\n\n(j) RF\n\n(k) KF\n\n(l) KARMA\n\n(m) DT\n\n(n) Linear Model\n\nFigure 4: Regression of different algorithms on test data from the same day 2018-04-23: true target\nmotion (black) and reconstruction (red).\n\n(a)\n(b)\nFigure 5: Cross-day analysis of the DRNN.\n\nFigure 4 shows the regression of all the decoders on a sample day. We use only 10% of the single-day\ntraining data in \ufb01gure 4 (b) to show the stability of the DRNN to the limited amount of single-day\ntraining data. Other single-day analyses, including evaluation of the DRNN by changing the amount\nof single-day training data, the history of neural data, and the number of nodes are presented in\nsupplementary material.\nFor cross-day analysis, we train the DRNN on a single day and test it on all the other days, and repeat\nthis scenario for all the days. Figure 5 shows the performance of the DRNN over all the days. This\n\ufb01gure shows that MWT is a more robust feature across single days.\n\n4.2 Multi-day performance\n\nTo evaluate the effect of the selected feature on the stability and performance of the DRNN, we\ntrain the DRNN on the data from the \ufb01rst 20 days of 2015 and test it on the consecutive days by\nusing different features. Figure 6 shows that the DRNN operating on the MWT results in superior\nperformance compared to the other features. Black vertical lines show the year change. We show that\nthe MWT are also the best for a range of decoders in supplementary material.\nThen, we evaluate the stability and performance of all the decoders over time. Figure 7 shows that the\n\n6\n\n02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y(cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.0Y (cm)0510152025303540Days0.000.050.100.150.200.250.30R2HWTMWTLWTHFTMFTLFTTCsLFADSLPFMUAHPFMWTLPFLFADSLWTTCsMUAMFTLFTHPFHFTHWT0.00.10.20.3Average R2\f(a)\n\n(b)\n\nFigure 6: The DRNN operating on different features.\n\n(a)\n\n(b)\n\nFigure 7: Multi-day performance of the decoders.\n\noverall and the average performance of the DRNN exceeds other decoders. Moreover, the DRNN\nshows almost stable performance across 3 years. The drop in the performance of almost all the\ndecoders is because of the future neural signal variations [13].\nTo assess the sensitivity of the decoders to the number of training days, we change the number of\ntraining days from 1 to 20 by starting from day 20. Figure 8 shows that the Deep-DRNN with 2\nlayers and the DRNN have higher performance compared to the other decoders, even by using a small\nnumber of training days. Moreover, \ufb01gure 8 shows that the performance of the DRNN with 1 layer,\n10 nodes, and history of 10 is comparable to the Deep-DRNN with 2 layers, 50 and 25 nodes in the\n\ufb01rst and second layers, and history of 20. Therefore, a small DRNN with a short history has superior\nperformance compared to the other decoders.\nTo evaluate the effect of re-training the DRNN, we consider four scenarios. First, we train DRNN on\nthe \ufb01rst 20 days of 2015 and test it on the subsequent days. Second, we re-train a DRNN, which has\nbeen trained on 20 days, with the \ufb01rst 5%, 10%, 50%, and 90% of the subsequent test days. Third,\nwe re-train the trained DRNN annually with 5%, 10%, 50%, and 90% of the \ufb01rst days of 2016, 2017,\nand 2018. Finally, we train DRNN only on the \ufb01rst 5% and 90% of the single test day. Figure 9 shows\na general increase in the performance of the DRNN after the network is re-trained. The differences\nbetween the performances of the \ufb01rst three scenarios are small, which means that the DRNN does not\nnecessarily need to be re-trained to perform well over multiple days. However, because of inherent\n\n7\n\n2122232425262728293031323334353637383940414243Consequent Days0.00.10.20.30.40.50.60.70.80.9R2HWTMWTLWTHFTMFTLFTTCsLFADSMWT + TCsLPFMUAHPFMWTMWT + TCsLWTLPFMUAMFTLFTTCsHPFLFADSHFTHWT0.00.10.20.30.40.50.60.70.80.9Average R22122232425262728293031323334353637383940414243Consequent Days0.00.10.20.30.40.50.60.70.80.9R2Deep-DRNNDRNNHL-DRNNF-DRNNLSTMGRURNNXGBRFDTSVRNNLMKFKARMADRNNDeep-DRNNKARMASVRXGBRFHL-DRNNNNLMRNNGRULSTMDTF-DRNNKF0.00.10.20.30.40.50.60.7Average R2\fFigure 8: Effect of number of training days on\nthe performance of the decoders.\n\nFigure 9: The DRNN operating in different train-\ning scenarios.\n\n(a)\n\n(b)\n\nFigure 10: (a) DRNN predictions for sample targets in all four quadrants, (b) DRNN predictions -\nno/short neural data. True target motion (black) and reconstructions (colored)\n\nnonstationarity of the recorded neural data over multiple days [13], training the DRNN on the \ufb01rst\n90% of the same test day in the last scenario results in the highest average test performance.\nThe DRNN relies on neural data inputs\u2013not just the kinematic feedback or target information\u2013based\non the following evidence. First, target information is not explicitly provided to the DRNN. Any\ntarget information available to the DRNN is learned from the neural data and/or feedback components.\nSecond, DRNN outputs change substantially based on different feature engineering approaches\n(Figures 5, 6) and over different trials (with the same features) (Figures 4, 10a). Finally, predictions\nfail when the DRNN uses only feedback (Feedback-Only), feedback with noise substituted for neural\ndata (Feedback-Noise), or feedback with the neural data provided only at the beginning of the trials\n(Short-Neural) (Figure 10b).\n\n5 Hardware implementation potential\n\nBMIs are intended to operate as wireless, implantable systems that require low-power circuits, small\nphysical size, wireless power delivery, and low temperature deltas (\u2264 1\u25e6C) [37, 26, 38]. By choosing\nef\ufb01cient algorithms that map well to CMOS technologies, Application Speci\ufb01c Integrated Circuit\n(ASIC) implementations could offer substantial power and mobility bene\ufb01ts. We are proposing\n\n8\n\n1234567891011121314151617181920Number of Training Days0.00.10.20.30.40.50.60.7R2Deep-DRNNDRNNHL-DRNNF-DRNNLSTMGRURNNXGBRFDTSVRNNLMKFKARMA2122232425262728293031323334353637383940414243Consequent Days0.00.10.20.30.40.50.60.70.80.91.0R220 days20 days + 5% of test day20 days + 10% of test day20 days + 50% of test day20 days + 90% of test day20 days + Retrain Annually with 5% of test day20 days + Retrain Annually with 10% of test day20 days + Retrain Annually with 50% of test day20 days + Retrain Annually with 90% of test day5% of test day90% of test day0255075100125150175Time Step (x 50 ms)0.750.500.250.000.250.500.75Y (cm)02505007501000Time Step (x 50 ms)1.00.50.00.51.01.5Y (cm)Feedback-Noise PredictionsFeedback-Only PredictionsShort-Neural PredictionsGroundtruth\fa method that will not only have good performance on single- and multi-day data, but will also\nbe optimal for hardware implementation. Since it is impractical to require powerful CPUs and\nGPUs for everyday usage of a BMI device, we need a device that is easily portable and does not\nrequire communication of the complete signals recorded by electrodes to an external computer for\ncomputation. Doing the computation in an ASIC would drastically reduce the latency of kinematics\ninference and eliminate a large power draw for the gigabytes of neural data that must be transferred\notherwise. Thus, we plan to create an ASIC that can be implanted in the brain to perform inference\nof kinematics from neural signals. The main bottleneck in most neural network accelerators is\nthe resources spent on fetching input history and weights from memory to the Multiplication and\nAccumulation (MAC) unit [39]. The DRNN will help mitigate this issue since it requires fewer nodes\nand input history compared to the standard recurrent neural networks. This eliminates the need for\nlarge input history storage and retrieval, reducing latency and control logic. Furthermore, by using\n16-bit \ufb01xed point values for the weights and inputs rather than \ufb02oating point values, we can reduce\nthe power used by the off-chip memory [39, 40].\n\n6 Discussion\n\nWe propose a Deep Multi-State DRNN with feedback and scheduled sampling to better model the\nnonlinearity between the neural data and kinematics in BMI applications. We show that feeding\nback the DRNN output recurrently result in better performance/more robust decodes. Feeding the\noutput back to the input recurrently in addition to the input neural data provides more information to\nthe DRNN to make predictions, which results in a smaller network with less history. Analogous to\nthe gain term of the Kalman \ufb01lter, the DRNN learns the relative importance of the neural data and\nfeedback. Integrating both state and neural information in this way leads to smoother predictions\n(Figure 4a). In addition, we show that the added internal derivative state enables our DRNN to track\n\ufb01rst order and more complicated patterns in the data. Our DRNN is unique since it learns a matrix\nthat establishes a relationship between the past and present derivative states unlike the conventional\nDRNN. Also our DRNN, which learns all the model parameters by using back propagation through\ntime (BPTT), is distinct from F-DRNN as the most similar previous model in BMI, which only learns\nthe output weight by using RLS algorithm. Moreover, its application differs from most of the existing\ndecoders that have been applied to motor cortex data of a non-human primate. To the best of our\nknowledge, we present the \ufb01rst demonstration of applying feedback and scheduled sampling to a\nDRNN and comparing different learning based decoders operating on different features to predict\nkinematics by using open-loop neural data recorded from the PPC area of a human subject in a real\nBMI setting. Our DRNN has the potential to be applied to the recorded data from other brain areas as\na recurrent network.\nTo evaluate our DRNN, we analyze single-day, cross-day, and multi-day behavior of the DRNN by\nextracting 12 different features. Moreover, we compare the performance and robustness of the DRNN\nwith other linear and nonlinear decoders over 43 days. Results indicate that our proposed DRNN, as\na nonlinear dynamical model operating on the MWT, is a powerful candidate for a robust BMI.\nThe focus of this work is to \ufb01rst evaluate different decoders by using open-loop data since the data\npresented was recorded from a subject who has completed participation in the clinical trial and has\nhad the electrodes explanted. However, the principles learned from this analysis will be relevant to\nthe future subjects with electrodes in the same cortical area.\nFuture studies will evaluate the DRNN performance in a closed-loop BMI, in which all the decoders\nuse the brain\u2019s feedback. Next, since we believe that our small DRNN achieves higher ef\ufb01ciency and\nuses less memory by reducing the history of the input, number of weights, and therefore memory\naccesses, we are planning to implement the DRNN in a \ufb01eld-programmable gate array (FPGA)\nsystem where we can optimize for speed, area, and power usage. Then, we will build an ASIC of the\nDRNN for BMI applications. The system implemented must be optimized for real-time processing.\nThe hardware will involve designing multiply-accumulates with localized memory to reduce the\npower consumption associated with memory fetch and memory store.\n\nAcknowledgment: We thank Tianqiao and Chrissy (T&C) Chen Institute for Neuroscience at\nCalifornia Institute of Technology (Caltech) for supporting this IRB approved research. We also\nthank Dr. Erin Burkett for reviewing this manuscript.\n\n9\n\n\fReferences\n[1] Sam Musallam, BD Corneil, Bradley Greger, Hans Scherberger, and Richard A. Andersen.\n\nCognitive control signals for neural prosthetics. Science, 305(5681):258\u2013262, 2004.\n\n[2] Wei Wu and Nicholas G. Hatsopoulos. Real-time decoding of nonstationary neural activity in\nmotor cortex. IEEE Transaction on Neural Systems and Rehabilitation Engineering, 16(3):213\u2013\n222, 2008.\n\n[3] Amy L. Orsborn, Siddharth Dangi, Helene G. Moorman, and Jose M. Carmena. Closed-loop\ndecoder adaptation on intermediate time-scales facilitates rapid bmi performance improvements\nindependent of decoder initialization conditions. IEEE Transactions on Neural Systems and\nRehabilitation Engineering, 20(4):468\u2013477, 2012.\n\n[4] Vikash Gilja, Paul Nuyujukian, Cindy A. Chestek, John P. Cunningham, Byron M. Yu, Joline M.\nFan, Mark M. Churchland, Matthew T. Kaufman, Jonathan C. Kao, Stephen I. Ryu, and\nKrishna V. Shenoy. A high-performance neural prosthesis enabled by control algorithm design.\nNature Neuroscience, 15(12):1752\u20131757, 2012.\n\n[5] Johan Wessberg, Christopher R. Stambaugh, Jerald D. Kralik, Pamela D. Beck, Mark Laubach,\nJohn K. Chapin, Jung Kim, S. James Biggs, Mandayam A. Srinivasan, and Miguel A. L.\nNicolelis. Real-time prediction of hand trajectory by ensembles of cortical neurons in primates.\nNature, 408:361\u2013365, 2000.\n\n[6] Lavi Shpigelman, Hagai Lalazar, and Eilon Vaadia. Kernel-arma for hand tracking and brain-\nmachine interfacing during 3d motor control. Advances in Neural Information Processing\nSystems, 21, 2009.\n\n[7] David Sussillo, Paul Nuyujukian, Joline M. Fan, Jonathan C. Kao, Sergey D. Stavisky,\nStephen Ryu, and Krishna Shenoy. A recurrent neural network for closed-loop intracorti-\ncal brain\u2013machine interface decoders. Journal of Neural Engineering, 9(2), 2012.\n\n[8] David Sussillo, Sergey D. Stavisky, Jonathan C. Kao, Stephen I. Ryu, and Krishna V. Shenoy.\nMaking brain\u2013machine interfaces robust to future neural variability. Nature communications,\n7(13749), 2016.\n\n[9] Mingming Zhang, Michael A. Schwemmer, Jordyn E. Ting, Connor E. Majstorovic, David A.\nFriedenberg, Marcia A. Bockbrader, W. Jerry Mysiw, Ali R. Rezai, Nicholas V. Annetta, Chad E.\nBouton, Herbert S. Bresler, and Guarav Sharma. Extracting wavelet based neural features from\nhuman intracortical recordings for neuroprosthetics applications. Bioelectronic Medicine, 4(11),\n2018.\n\n[10] Michael A. Schwemmer, Nicholas D. Skomrock, Per B. Sederberg, Jordyn E. Ting, Guarav\nSharma, Marcia A. Bockbrader, and David A. Friedenberg. Meeting brain\u2013computer interface\nuser performance expectations using a deep neural network decoding framework. Nature\nMedicine, 24(11):1669\u2014-1676, 2018.\n\n[11] Tommy Hosman, Marco Vilela, Daniel Milstein, Jessica N. Kelemen, David M. Brandman,\nLeigh R. Hochberg, and John D. Simeral. Bci decoder performance comparison of an lstm\nrecurrent neural network and a kalman \ufb01lter in retrospective simulation. 9th International\nIEEE/EMBS Conference on Neural Engineering (NER), pages 1066\u20131071, 2019.\n\n[12] Richard A. Anderson, Spencer Kellis, Christian Klaes, and Tyson A\ufb02alo. Toward more versatile\n\nand intuitive cortical brain machine interfaces. Current Biology, 24(18):R885\u2013R897, 2014.\n\n[13] Tyson A\ufb02alo, Spencer Kellis, Christian Klaes, Brian Lee, Ying Shi, Kelsie Pejsa, Kathleen\nShan\ufb01eld, Stephanie Hayes-Jackson, Mindy Aisen, Christi Heck, Charles Liu, and Richard A.\nAnderson. Decoding motor imagery from the posterior parietal cortex of a tetraplegic human.\nScience, 348(6237):906\u2013910, 2015.\n\n[14] Christian Klaes, Spencer Kellis, Tyson A\ufb02alo, Brian Lee, Kelsie Pejsa, Kathleen Shan\ufb01eld,\nStephanie Hayes-Jackson, Mindy Aisen, Christi Heck, Charles Liu, and Richard A. Andersen.\nHand shape representations in the human posterior parietal cortex. Journal of Neuroscience,\n35(46):15466\u201315476, 2015.\n\n10\n\n\f[15] Carey Y. Zhang, Tyson A\ufb02alo, Boris Revechkis, Emily R. Rosario, Debra Ouellette, Nader\nPouratian, and Richard A. Andersen. Partially mixed selectivity in human posterior parietal\nassociation cortex. Neuron, 95(3):697\u2013708, 2017.\n\n[16] Sahil Shah, Benyamin Haghi, Spencer Kellis, Luke Bashford, Daniel Kramer, Brian Lee,\nCharles Liu, Richard Andersen, and Azita Emami. Decoding kinematics from human parietal\ncortex using neural networks. International IEEE/EMBS Conference on Neural Engineering\n(NER), 9, 2019.\n\n[17] Benyamin Haghi, Spencer Kellis, Maitreyi Ashok, Sahil Shah, Luke Bashford, Daniel Kramer,\nBrian Lee, Charles Liu, Richard A. Andersen, and Azita Emami. Deep multi-state dynamic\nrecurrent neural networks for robust brain machine interfaces. Society for Neuroscience Annual\nMeeting, 49, 2019.\n\n[18] Benyamin Haghi, Spencer Kellis, Luke Bashford, Sahil Shah, Daniel Kramer, Brian Lee,\nCharles Liu, Richard A. Andersen, and Azita Emami. Decoding kinematics from human parietal\ncortex using neural networks. IEEE Brain Initiative Workshop, 2018.\n\n[19] Liang Jin, P.N. Nikiforuk, and M.M. Gupta. Approximation of discrete-time state-space\ntrajectories using dynamic recurrent neural networks. IEEE transaction on automatic control,\n40(7):1266\u2013\u20181270, 1995.\n\n[20] Peng S. Ow and Thomas E. Morton. Filtered beam search in scheduling. International Journal\n\nfor Production Research, 26(1):35\u201362, 1988.\n\n[21] Hal Daume, John Langford, and Daniel Marcu. Search-based structured prediction. Machine\n\nLearning Journal, 2009.\n\n[22] Samy Bengio, Oriol Vinayls, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for\nsequence prediction with recurrent neural networks. Neural Information Processing Systems\n(NIPS), 2015.\n\n[23] Chethan Pandarinath, Daniel J. O\u2019Shea, Jasmine Collins, Rafal Jozefowicz, Sergey D. Stavisky,\nJonathan C. Kao, Eric M. Trautmann, Matthew T. Kaufman, Stephen I. Ryu, Leigh R. Hochberg,\nJaimie M. Henderson, Krishna V. Shenoy, L. F. Abbott, and David Sussillo. Inferring single-trial\nneural population dynamics using sequential auto-encoders. Nature Methods, 15:805\u2014-815,\n2018.\n\n[24] Christie BP, Tat DM, Irwin ZT, Gilja V, Nuyujukian P, Foster JD, Ryu SI, Shenoy KV, Thompson\nDE, and Chestek CA. Comparison of spike sorting and thresholding of voltage waveforms for\nintracortical brain-machine interface performance. Journal of Neural Engineering, 12(1):1741\u2014\n-2560, 2015.\n\n[25] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. arXiv, 2016.\n\n[26] Mahsa Shoaran, Benyamin A. Haghi, Milad Taghavi, Masoud Farivar, and Azita Emami.\nEnergy-ef\ufb01cient classi\ufb01cation for resource-constrained biomedical applications. IEEE Journal\non Emerging and Selected Topics in Circuits and Systems, 8(4):693\u2014-707, 2018.\n\n[27] Hao Nan, Benyamin Allahgholizadeh Haghi, and Amin Arbabian. Interferogram-based breast\ntumor classi\ufb01cation using microwave-induced thermoacoustic imaging. Annual International\nConference of the IEEE Engineering in Medicine and Biology Society (EMBC), 37:2717\u2014-2720,\n2015.\n\n[28] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal\n\nof Machine Learning Research, 13:281\u2013305, 2012.\n\n[29] Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. Support vector regression.\n\nNeural Information Processing, 11(10):203\u2013224, 2007.\n\n[30] Mahsa Shoaran, Benyamin A. Haghi, Masoud Farivar, and Azita Emami. Ef\ufb01cient feature\nextraction and classi\ufb01catin methods in neural interfaces. Frontiers of Engineering: Reports on\nLeading-Edge Engineering from the 2017 Symposium, 47(4):31\u2014-35, 2017.\n\n11\n\n\f[31] Leo Breiman. Random forests. Journal of Machine Learning, 45(1):5\u2014-32, 2001.\n\n[32] J Ross Quinlan. Induction of decision trees. Journal of Machine Learning, 1(1):81\u2014-106,\n\n1986.\n\n[33] Danilo P. Mandic and Jonathan A. Chambers. Recurrent Neural Networks for Prediction:\nLearning Algorithms, Architectures and Stability. John Wiley & Sons, Inc., New York, NY,\nUSA, 2001.\n\n[34] Felix A. Gers, Jurgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction\n\nwith lstm. Neural Computation, 12(10):2451\u20132471, 2000.\n\n[35] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares,\nHolger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-\ndecoder for statistical machine translation. arXiv, 2014.\n\n[36] Suraj Gowda, Amy L. Orsborn, Simon A. Overduin, Helene G. Moorman, and Jose M. Carmena.\nDesigning dynamical properties of brain-machine interfaces to optimize task-speci\ufb01c perfor-\nmance. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(5):911\u2014-920,\n2014.\n\n[37] M. W. Dewhirst, B. L. Viglianti, M. Lora-Michiels, M. Hanson, and P. J. Hoopes. Basic\nprinciples of thermal dosimetry and thermal thresholds for tissue damage from hyperthermia.\nInternational Journal of Hyperthermia, 19(3):267\u2014-294, 2003.\n\n[38] Milad Taghavi, Benyamin A. Haghi, Masoud Farivar, Mahsa Shoaran, and Azita Emami. A\n41.2 nj/class, 32-channel on-chip classi\ufb01er for epileptic seizure detection. Annual International\nConference of the IEEE Engineering in Medicine and Biology Society (EMBC), 40:3693\u2014-3696,\n2018.\n\n[39] Paul N. Whatmough, Sae Kyu Lee, David Brooks, and Gu-Yeon Wei. Dnn engine: A 28-nm\ntiming-error tolerant sparse deep neural network processor for iot applications. IEEE Journal of\nSolid-State Circuits, 53(9):2722\u2014-920, 2018.\n\n[40] Mohit Shah, Sairam Arunachalam, Jingcheng Wang, David Blaauw, Dennis Sylvester, Hun Seok\nKim, Jae sun Seo, and Chaitali Chakrabarti. A \ufb01xed-point neural network architecture for\nspeech applications on resource constrained hardware. Journal of Signal Processing Systems,\n90(9):725\u2014-741, 2018.\n\n12\n\n\f", "award": [], "sourceid": 8215, "authors": [{"given_name": "Benyamin", "family_name": "Allahgholizadeh Haghi", "institution": "California Institute of Technology"}, {"given_name": "Spencer", "family_name": "Kellis", "institution": "California Institute of Technology"}, {"given_name": "Sahil", "family_name": "Shah", "institution": "California Institute of Technology"}, {"given_name": "Maitreyi", "family_name": "Ashok", "institution": "California Institute of Technology"}, {"given_name": "Luke", "family_name": "Bashford", "institution": "California Institute of Technology"}, {"given_name": "Daniel", "family_name": "Kramer", "institution": "University of Southern California"}, {"given_name": "Brian", "family_name": "Lee", "institution": "University of Southern California"}, {"given_name": "Charles", "family_name": "Liu", "institution": "University of Southern California"}, {"given_name": "Richard", "family_name": "Andersen", "institution": "California Institute of Technology"}, {"given_name": "Azita", "family_name": "Emami", "institution": "California Institute of Technology"}]}