%PDF-1.3 1 0 obj << /Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R ] /Type /Pages /Count 11 >> endobj 2 0 obj << /Subject (Neural Information Processing Systems http\072\057\057nips\056cc\057) /Publisher (Curran Associates\054 Inc\056) /Language (en\055US) /Created (2018) /EventType (Oral) /Description-Abstract (There is growing interest in combining model\055free and model\055based approaches in reinforcement learning with the goal of achieving the high performance of model\055free algorithms with low sample complexity\056 This is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm\054 and in sufficiently complex environments\054 the dynamics model will always be imperfect\056 As a result\054 a key challenge is to combine model\055based approaches with model\055free learning in such a way that errors in the model do not degrade performance\056 We propose stochastic ensemble value expansion \050STEVE\051\054 a novel model\055based technique that addresses this issue\056 By dynamically interpolating between model rollouts of various horizon lengths\054 STEVE ensures that the model is only utilized when doing so does not introduce significant errors\056 Our approach outperforms model\055free baselines on challenging continuous control benchmarks with an order\055of\055magnitude increase in sample efficiency\056) /Producer (PyPDF2) /Title (Sample\055Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion) /Date (2018) /ModDate (D\07220190219005230\05508\04700\047) /Published (2018) /Type (Conference Proceedings) /firstpage (8224) /Book (Advances in Neural Information Processing Systems 31) /Description (Paper accepted and presented at the Neural Information Processing Systems Conference \050http\072\057\057nips\056cc\057\051) /Editors (S\056 Bengio and H\056 Wallach and H\056 Larochelle and K\056 Grauman and N\056 Cesa\055Bianchi and R\056 Garnett) /Author (Jacob Buckman\054 Danijar Hafner\054 George Tucker\054 Eugene Brevdo\054 Honglak Lee) /lastpage (8234) >> endobj 3 0 obj << /Type /Catalog /Pages 1 0 R >> endobj 4 0 obj << /Contents 15 0 R /Parent 1 0 R /Resources << /Font << /F69 16 0 R /F30 21 0 R /F39 26 0 R /F67 30 0 R /F13 34 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 38 0 R 39 0 R 40 0 R 41 0 R 42 0 R 43 0 R 44 0 R 45 0 R ] /Type /Page >> endobj 5 0 obj << /Contents 46 0 R /Parent 1 0 R /Resources << /XObject << /Im2 47 0 R /Im1 82 0 R >> /Font << /F14 129 0 R /F10 133 0 R /F11 137 0 R /F13 34 0 R /F26 141 0 R /F1 145 0 R /F6 149 0 R /F99 153 0 R /F67 30 0 R /F69 16 0 R /F8 157 0 R /F7 161 0 R >> >> /Rotate 360 /Group 127 0 R /MediaBox [ 0 0 612 792 ] /Annots [ 165 0 R 166 0 R 167 0 R 168 0 R 169 0 R 170 0 R ] /Type /Page >> endobj 6 0 obj << /Contents 171 0 R /Parent 1 0 R /Resources << /Font << /F14 129 0 R /F10 133 0 R /F11 137 0 R /F12 172 0 R /F13 34 0 R /F26 141 0 R /F1 145 0 R /F7 161 0 R /F67 30 0 R /F69 16 0 R /F36 176 0 R /F8 157 0 R /F37 180 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 184 0 R 185 0 R 186 0 R 187 0 R 188 0 R 189 0 R 190 0 R 191 0 R 192 0 R ] /Type /Page >> endobj 7 0 obj << /Contents 193 0 R /Parent 1 0 R /Resources << /XObject << /Im3 194 0 R >> /Font << /F14 129 0 R /F137 196 0 R /F10 133 0 R /F11 137 0 R /F13 34 0 R /F1 145 0 R /F6 149 0 R /F99 153 0 R /F67 30 0 R /F69 16 0 R /F8 157 0 R /F9 200 0 R /F7 161 0 R >> >> /Rotate 360 /Group 204 0 R /MediaBox [ 0 0 612 792 ] /Annots [ 205 0 R 206 0 R 207 0 R 208 0 R 209 0 R 210 0 R 211 0 R 212 0 R 213 0 R ] /Type /Page >> endobj 8 0 obj << /Contents 214 0 R /Parent 1 0 R /Resources << /Font << /F14 129 0 R /F10 133 0 R /F11 137 0 R /F13 34 0 R /F26 141 0 R /F1 145 0 R /F7 161 0 R /F67 30 0 R /F69 16 0 R /F8 157 0 R /F9 200 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 215 0 R 216 0 R 217 0 R 218 0 R 219 0 R 220 0 R 221 0 R 222 0 R 223 0 R 224 0 R 225 0 R 226 0 R 227 0 R ] /Type /Page >> endobj 9 0 obj << /Contents 228 0 R /Parent 1 0 R /Resources << /XObject << /Im12 229 0 R /Im11 231 0 R /Im10 233 0 R /Im7 235 0 R /Im6 237 0 R /Im5 239 0 R /Im4 241 0 R /Im9 243 0 R /Im8 245 0 R >> /Font << /F69 16 0 R /F163 247 0 R /F11 137 0 R /F67 30 0 R >> >> /Rotate 360 /Group 204 0 R /MediaBox [ 0 0 612 792 ] /Annots [ 251 0 R 252 0 R 253 0 R 254 0 R 255 0 R 256 0 R 257 0 R 258 0 R 259 0 R 260 0 R 261 0 R 262 0 R 263 0 R 264 0 R 265 0 R 266 0 R ] /Type /Page >> endobj 10 0 obj << /Contents 267 0 R /Parent 1 0 R /Resources << /XObject << /Im13 268 0 R /Fm1 270 0 R /Im16 273 0 R /Im15 307 0 R >> /Font << /F69 16 0 R /F8 157 0 R /F11 137 0 R /F67 30 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 334 0 R 335 0 R 336 0 R ] /Type /Page >> endobj 11 0 obj << /Contents 337 0 R /Parent 1 0 R /Resources << /XObject << /Im24 338 0 R /Im22 375 0 R /Im23 411 0 R /Im20 447 0 R /Im21 480 0 R /Im19 512 0 R /Im18 544 0 R /Im17 577 0 R >> /Font << /F30 21 0 R /F10 133 0 R /F11 137 0 R /F13 34 0 R /F99 153 0 R /F67 30 0 R /F69 16 0 R /F8 157 0 R /F9 200 0 R /F7 161 0 R >> >> /Rotate 360 /Group 611 0 R /MediaBox [ 0 0 612 792 ] /Annots [ 613 0 R 614 0 R 615 0 R 616 0 R 617 0 R 618 0 R 619 0 R 620 0 R 621 0 R ] /Type /Page >> endobj 12 0 obj << /Contents 622 0 R /Parent 1 0 R /Resources << /Font << /F69 16 0 R /F8 157 0 R /F99 153 0 R /F67 30 0 R /F11 137 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 623 0 R 624 0 R 625 0 R 626 0 R 627 0 R 628 0 R 629 0 R 630 0 R 631 0 R 632 0 R 633 0 R ] /Type /Page >> endobj 13 0 obj << /Contents 634 0 R /Parent 1 0 R /Resources << /Font << /F69 16 0 R /F30 21 0 R /F99 153 0 R /F67 30 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 635 0 R 636 0 R ] /Type /Page >> endobj 14 0 obj << /Contents 637 0 R /Parent 1 0 R /Resources << /Font << /F69 16 0 R /F30 21 0 R /F8 157 0 R /F99 153 0 R >> >> /Rotate 360 /MediaBox [ 0 0 612 792 ] /Annots [ 638 0 R ] /Type /Page >> endobj 15 0 obj << /Length 3168 /Filter /FlateDecode >> stream xYYs6~ׯGΖ!xDU*e;VNM,?`Hh61F78$GٔKn0ͷx%6d4L$HM,t귫WwW_|#ӍHP$J$DFhsW\nu}>¼4Ͱ݅Ina&^mE妞~4)]GUpCu?9oP =]ܽ|M_ |s[Fʗ+__Muu;cȑF%{B~5j6Wi~cn2Ⅻd;^|"|mo J7ix0 /xeaOEkC?8ǀTs;Rmd]6NA2JNj~ɍ߾t7F< (= šeVxDbT|=p|J5x䋂$G?nڑ3-pm_I_,7 p$$87X"鰗~t>ЎhFk#`@$~@k8M3X8tz@ϴʩTΰuS`C^өku~4=IP<W4-ăUL cpX+&0{Ónӱ<id:(&ա KjɤIkJҔ-LPycl}i+S G'ﶱE4D ڞ*J@Xe~^>V MǞ/ 5l(]9d>3"֜!1+HX6aW̵?569c'qɀF()|l#}*Ac8s=4S{UѠ\OiGMG yM?Acgt'0ΏtMUOԇXUDZÀaų/-cavᱰvry8|h6 jˏHM,'G=Pt]Ԧe9 piyPܦܞ8p5