Paper ID: 799 Title: Predictive PAC Learning and Process Decompositions
Reviews

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper focuses on learning with dependent observations. Under various assumptions, it is possible to upper-bound the deviations of the empirical mean with respect to the expectation, thus allowing to study empirical risk minimization. These assumptions ranges from various mixing assumptions to so-called weak-dependence condition.

The authors remark that when P and Q are two probability distributions for the stochastic process X, when X is mixing under P and under Q, it is generally not mixing under aP + (1-a)Q. However, if E is a deviation undersirable event, and if P(E)\leq e and Q(E)\leq e, then (aP+(1-a)Q)(E)\leq e. So, it is possible to extend deviation inequalities to a wider class of probability distribution (that is, mixture of mixing disitributions). This remark might seem trivial, however, it allows some interesting generalization. For example, as exchangeable probability distributions can be written as mixtures of iid distributions, this remark extends results from iid setting to exhangeable distribution at no cost.

However:
1) to my opinion, this is a small increment to learning with dependent observations.
2) the paper is not well written. Unnecessary discussions hide the main message of the paper. On the other hand, Definition 1 is non standard and should be explained in more details. The bibliography is incomplete. A lot has been done on learning with dependent observations in Modha and Masry (IEEE Trans. Info. Theory, 1998), Meir (JMLR, 2000), a series of papers by Steinwart, e.g. Steinwart, Hush and Scovel (Journal of Multivariate Analysis, 2009), Alquier and Wintenberger (Bernoulli, 2012) among others.
An interesting remark, however:
1) this is a small increment to learning with dependent observations.
2) the paper is not well written.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
In this paper, a PAC bound is given for mixtures of processes satisfying mixing conditions, and
a similar result is shown to exist for exchangeable sequences.

I really like this paper.
It's interesting, well-written and theoretically sound.
I personally find the introduction a bit too short;
perhaps the authors could elaborate on the main results in the intro.

How would the results of this paper generalize to the case of ergodic processes that do not necessarily satisfy any mixing conditions?

I didn't understand what was meant by "of independent interest" in the last sentence of the abstract.
pg 4, second to last paragraph: "by contrast" -> By contrast
The paper is interesting, nicely written and theoretically sound.

Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper discusses an extension of generalization error results (more precisely, the GC type property) beyond i.i.d. data. The authors argue that one of the difficulties faced by previous attempts is in the goal itself: one should not consider the average of conditional expectations but rather the "ergodic counterpart" to the expectation.

This paper was a pleasure to read. I enjoyed the motivation and the argument in favor of the "ergodic" definition. There are no breakthroughs in this paper, but I would encourage its publication: it sets the stage for further work on non-i.i.d. extensions. The simple observation that one does not pay for the mixture of learnable processes and simply inherits their sample complexity -- is nice.

Minor:

* One missing reference that comes to mind is " Extension of the PAC Framework to Finite and Countable Markov Chains" by Gamarnik.

* line 153: beta(k) seems to be undefined at this point in the paper.
* line 180: "impose" -> "imposed"
* line 206: "by" -> "By"