NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
PyTorch is a useful and impactful package, and has made some interesting and influential design decisions. This paper explains the philosophy behind some of these decisions. Although the framework is not new this year, we still expect broad interest in this exposition, and that having an archival paper will be useful for the community to cite. Given the subject of this paper, the style is different from a typical NeurIPS submission and it needed to be evaluated differently. In particular, many of the claims in the paper are not convincingly tested or demonstrated in the paper as we would normally require. Instead, the utility of the ideas has been demonstrated through wide-spread adoption and impact in the NeurIPS community and beyond. Like the JMLR open source software track guidelines, existing (rather than potential) community engagement was important for an acceptance decision here. Newer software projects may need to use more of the body of their papers to test their claims. In addition to the reviews, the following critical comments were gathered from an area chair during the review process, which may help improve the paper, or be useful to consider going forwards: * Claims about ease of implementation could be tested with a user study. Alternatively a paper could provide examples of algorithms that are difficult to implement in competing packages. (These are also ways that a less-established framework could convince reviewers.) * The goal of "being Pythonic", is broad and subjective, and so not everyone will agree it has been met. Implementing each function as a "module" class, and having two different ways to do this, can be viewed as un-Pythonic by some users. * The figures are low-resolution bitmaps and would appear more professional if included in a vector format, or at least as crisper bitmaps. * Given the central role of derivative computations, there could be a deeper treatment: - There's no mention of the more general vector-Jacobian products that reverse-mode autodiff allows, or the computational complexity of different alternatives. - There's no discussion of nested autodiff, which is a difficult-to-get-right feature. * There are naming choices in PyTorch that are questionable: - Calling arrays "tensors" (not unique to PyTorch). - "Autograd" isn't (or wasn't) an accepted generic term for autodiff, but the name of a competing package.