This paper presents an interesting new perspective on the design of learning methods: the idea is to choose a classifier that minimizes the risk function uniformly over a family of distributions, constructed based on an iid data set, with the guarantee that (with high probability) the true data-generating distribution is contained in the family. This inherently supplies an upper bound on the risk of the chosen classifier. The family of distributions is generated by constraints on the expectation of a function Phi of (x,y), using data-dependent confidence bounds on its true expectation to set the constraints. Thus, the method is highly dependent on the choice of the function Phi. One significant concern noted by the reviewers is that the paper doesn't seem to explore this dependence in much depth, such as providing an array of illustrative examples and design principles for Phi, discussion of how choices of Phi for a given sample size may relate to notions of expressiveness and overfitting, or checking whether the technique can provide guarantees competitive with known results obtained by more traditional approaches (e.g., kernel methods, or ERM guarantees from uniform convergence). In other words, they do not flesh out this new theory by investigating its potential connections to, or advantages over, previous approaches in any concrete scenarios. As such, it seems hard to guess what advantages this approach might provide, other than simply providing a new perspective. In summary, the new approach is innovative and seems like a potentially promising new direction. However, the concrete implications and advantages of the new theory aren't argued very thoroughly or convincingly.