
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
"Tractable Learning for Complex Probability Queries" recommends the use of Sentential Decision Diagrams, instead of arithmetic circuits, for learning and querying Markov networks.
For learning they propose a greedy algorithm that adds edges when a score which is a penalized loglikelihood is increased.
In the main theorem, they demonstrate that when the resulting learned SDDs are small, it is possible to tractably query symmetric (permutation invariant) functions.
The experimental results show that SDDs are more accurate than ACs of the same size and that the queries are significantly faster.
This paper is well written, and the point of the paper (which is the recommendation of using SDDs over ACs) is convincingly argued.
Because I am not an expert in this field, I cannot speak to the novelty or importance of this result.
But the main result seems correct and the experiments are very convincing.
To me it seems like an important result that is novel.
The only typo that I found was line 256 "with with".
Q2: Please summarize your review in 12 sentences
I am not an expert in this field so I cannot speak to the novelty or importance of this work, but the paper is very well written, the main result is correct, and the experiments are convincing.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a tractable learning method, called LearnSDD, for dealing with complex probability queries. Built on Sentential Decision Diagrams (SDDs), this method learns a Markov network and its circuit representation. Also, the authors proved that, the proposed method takes polynomial time for some class of complex probability queries.
This paper may present some interesting ideas, but it is really hard to follow, maybe due to organization issues and lacks of some details. For example, some backgrounds, such as "what are complex queries", connections of graphical models to "queries" and related studies, are not given. In Section 2.1, the authors presented Markov networks, but what is the connection to the presented method?
Moreover, the motivations of the main algorithm is not clear. The proposed LearnSDD method, namely Algorithm 1, is built upon SDDs. But from current paper, I am not clear about the motivations. The validity of this algorithm is not clear.
Moreover, the score function in (1) is also confusing. Without necessary details, I do not understand why it is in current form. Also, how to compute Pr(TM)?
For the experiments, I think the comparison so far is not sufficient. For example, there is only one comparison method, which is not enough.
The first argument in the paper, i.e. "Tractable learning is a new machine learning paradigm", is not correct. We may say "Tractable structural learning". In fact, "Tractable" just implies that certain operations are efficient, but it is not a machine learning paradigm.
Q2: Please summarize your review in 12 sentences
This paper may present some interesting ideas, but it is hard to follow. Moreover, motivations of the main algorithm is not clear, and some essential details are not given.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Abstract: the paper introduces LearnSDD an algorithm that learns loglinear models for discrete random variables but adds a penalty term for models that are expensive at query time. Compared to earlier work in this direction the paper studies a new way of describing models (SDDs instead of ACs) and is interested in "complex queries", e.g. counts of groups of binary instead of MAP, or marginals. The computational complexity of complex queries are not directly addressed in the algorithm, but as it turns out the choice of SDD as model space also has good runtime performance for certain complex queries (Theorem 1).
Quality: there are no obvious errors, but some definitions in the proof are missing. Some key elements in the algorithm are not motivated/discussed (see comments below)
Clarity: The presentation is good enough, but can be improved. Computational complexity is key in the paper but discussed late and only mentioned not derived.
Originality: the paper provides an addition following papers in a workshop at icml 2014.
Significance: the argument and examples that "complex queries" are important are mildly convincing. Not many people will be impacted by this. Also the proposed algorithm has this only as a byproduct.
Comments:
 Other than the empirical results, is there a motivation for the SDD being a good/suitable model class? Together with the order of features and score (i.e. the implied prior over models)?
 The order in which features are proposed is crucial in the greedy algorithm. Why is this order suitable?
 In the discussion of the score (Sec 3.1) it is nice to discuss the computational complexity of the queries as a function of SDD.
 The score currently is a weighted combination of the relative increase in computational cost but absolute cost in log likelihood. Is this deliberate?
 Given that Theorem 1 is one of the main claimed contributions of the paper, and the proof is only a few lines, it would be better to add it to the paper itself (you even have space for it). What is WMC in the proof?
Q2: Please summarize your review in 12 sentences
The paper adds to a recent series of papers on learning models while penalizing query time computational complexity. Not all key decisions in the algorithm are perfectly explained and the argument for looking at "complex queries" (e.g. counts of binary variables instead of MAP, marginals) is somewhat contrived.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper presents an algorithm for learning a Markov network together with its underlying representation of Sentential Decision Diagrams. By using SDD, it is argued in the paper that more complex queries than usual MPE, such as counting, threshold, comparison, etc become tractable. The algorithm is built from a previous learning algorithm, namely the ACMN learner, which uses arithmetic circuits instead of SDDs to allow tractable queries in Markov networks. Experiments are conducted in four datasets, where the new LearnSDD is compared to ACMN.
The paper address an important research topic, it is generally clear and wellwritten, and besides empirical results it also presents theoretical results concerning the complexity of performing queries in Markov networks represented as SDDs. The questions that arose about the paper are the following.
 How the representation and answering mechanism addressed in the paper are compared to probabilistic SDD, by D. Kisa et al., 2014? Also, there are a number of previous representations that also deals with complex queries, such as the ones that are able to deal with aggregation functions in SRL. Would it be possible to compare (theoretically and/or empirically) the answering machinery of MNs as SDDs with those type of representations? I believe the paper lacks
a related work section, where those comparisons could be discussed.
 Is it possible to learn the weights efficiently even in the presence of missing data?
 The compilation of the SDD for a query together with the underlying SDD for the MN it is a key step for answering complex queries without having to create multiple conjunctive queries (as it is done in ACs). Therefore, a thorough explanation should also be given about this step in the paper, maybe with an example. Also, the relationship of this compilation step with mutex features should be discussed in the paper.
Minors: remove the second "it" in "after it was learned it" remove second "with" in 4.1, second paragraph the alpha parameter should be given as input to the score function in the algorithm as the algorithm 1 also terminates when it runs out of time (according to the explanation in the text), this stopping requirement should also appear in the algorithm. The highest scoring feature is only added to the current model if the score
of the model plus the new feature is higher than before (according to the algorithm). This should be also said in the text (and not only saying that the feature with the highest score is permanently added to the model).
Q2: Please summarize your review in 12 sentences
Although this is a good paper, addressing the important topic of tractability of complex and
(because of their complexity) not usual queries, it could have presented and compared the devised approach with other related work, instead of only ACMN, such as probabilistic SDD. Also,
the representation of complex queries within the learned model could be clearer in the paper, as this is the main goal of the approach.
Submitted by Assigned_Reviewer_5
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary of paper: The paper deals with learning SDDs (from previous work) that can be used to infer complex queries. Inference for these type of queries could not be done efficiently in previous models like the ACMNs. The paper provides details about the learning algorithm (which is similar to the ACMN learning algorithm based on greedy selection), different types of complex queries (symmetric and asymmetric that can be converted to symmetric), and experimental evaluation on real data to answer how well SDDs do and how fast they are. Complexity bounds on the inference complexity are also provided.
Quality: The paper is of high quality and is well written. Did not find any errors in text or any equations.
Clarity: Along with the supplemental text (description of SDDs and WMC), the paper is quite clear to follow. Suggestion but not required would be to provide similar summary text in the supplemental material for ACMN especially since you have the figure in 1(c).
Originality: To the best of my knowledge, this is a novel method to learn SDDs from queries and trying to infer more difficult queries than was previously done using tractable models.
Significance: Given that the paper provides another step in the direction of use of tractable models to solve even more difficult queries, there could be other ideas that could build on what is provided here.
Questions, clarifications and suggestions to improve paper:
1) Are sumproduct networks any better than ACMNs to solve these complex queries? If so, how would they compare against use of SDDs ?
2) You use the work conjoin at multiple places, it might be help define the operator.
3) In the supplementary text you mention additional unary parameters w_A and w_B ? Is there any reason they are not included in the tree 1b and 1c ? Are they not needed to be included in the SDD ?
4) In Algorithm 1, you pass \alpha as a parameter. You should pass \alpha as a parameter to score(M, T) since score depends on it.
5) What step of the algorithm are you doing dynamic SDD minimization ? Is it done in M.add(f) step ?
6) Is it possible that the minimization or compressing can reduce the size of the SDD more in step t+2 by selecting features f1 and f2 combined compared to when you would have selected f3 and f4 sequentially in a greedy way ? i.e. does selecting features in a greedy fashion cause generating a SDD that might not be optimally minimized ?
7) It would be good to have example queries where ACMN does similar to SDDs as well as asymmetric queries that are not tractable by SDDs for completeness.
8) It might be better to use func() instead of # in Table 1.
9) Fix opening double quotes on page 7. ... vote "yes" on a bill ...
10) In the positive, negative movie review question, is there any reason you don't use all characters instead of restricting to first 750 characters ? does it affect performance and how ?
11) While answering the second question posed and using complex queries on the voting and movie dataset, is there any reason you do not look at the accuracy as well as the time especially for queries that do not time out. It seemed like the Mutex queries and Complex queries you had are quite different and you didn't test for the accuracy in terms of loglikelihood on the complex queries.
12) Complete the references with details (i.e. for example put in page numbers for references 3, 6, 8, 12 etc)
Q2: Please summarize your review in 12 sentences
The paper provides an interesting new direction to learn a MN structure (SDD) and infer complex probability queries which could not be inferred efficiently using previous MN models.
Submitted by Assigned_Reviewer_6
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes an new algorithm for structure learning of binary MRF networks; the core idea is using Sequential Decision Diagrams (SDD) as compact representation of MRF; it allows more flexibility to control structure complexity and more efficient computation for hard probability queries, than traditional arithmetic circuits (AC) representations; the work is interesting.
It is slight disappointing that the idea is still embedded in heuristic search procedure; an improvement can be to encode the SDD creation or operations in a stochastic search procedure, namely a MCMC framework, to obtain better learning results.
Q2: Please summarize your review in 12 sentences
The paper proposes an new algorithm for MRF structure learning by introducing a compact representation from propositional logic, i.e., Sequential Decision Diagrams (SDD); it is more flexible and efficient than arithmetic circuits representation; it is an interesting work.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their valuable feedback.
We will address the suggestions for a final version. Now we respond to
each reviewer.
REVIEWER 2
We will add a discussion on PSDDs
and SRL:
There does not currently exist a general structure
learning algorithm for PSDDs. PSDD learning (Choi IJCAI15) requires a
structured probability space (logical constraints) that is compiled into
SDD and parametrized during learning. Such logical constraints are absent
from the general learning setting in the present work. PSDDs do inherit
from SDDs the ability to efficiently do complex queries.
Many SRL
approaches learn aggregates or combining functions (eg Getoor IJCAI01,
Vens ILP04, Natajaran AMAI08). Our work differs in that we 1) learn
propositionally where aggregates are over a single row in the data (SRL
aggregates are over multiple rows), and 2) guarantee tractable
inference (SRL representations usually do not).
We will add an
example of SDD compilation to the supplement.
REVIEWER 3
1.
Since SPNs and ACs are interchangeable (Rooshenas and Lowd 2014,
Propositions 1&2), SPNs are also not efficient for complex
queries.
3. This is a typo. We refer to an old version of Fig
1.
5. Minimization is done after M' < M.add(f) and after M
< M_best.
6. Yes. Both structure learning and SDD minimization
are NPhard, so both are done with a besteffort greedysearch
approach.
7. SDDs and ACs perform similarly on conjunctive queries.
Our models are intractable for sentences without a compact SDD, such has
hard satisfiability problems. We will add a weighted kthreshold
example.
10. We consider an active sensing setting to predict
features of the review without reading it entirely. If the entire review
is observed, there is no need to predict these features as they can be
computed directly.
11. We will add conditional loglikelihoods
(CLL) of complex queries. A preliminary test for voting gives results in
favor of SDDs: for 14% unknown votes (where AC are tractable), the AC
CLLs are 0.47,0.71,0.83, 0.73, while SDD gets 0.47,0.58,0.79,
0.59
REVIEWER 4
1. What are complex queries
This is
explained in Section 4, lines 226238.
2. Connection of graphical
models to queries
Answering queries (inference) is the main tasks
in graphical models (line 94)
3. Connection to Markov
networks
The paper is about learning MNs that allow efficient
inference. SDDs and ACs are tractable representations of MNs (line
110).
4. Motivations of the main algorithm are unclear
Using
SDDs is motivated in lines 6467, 158161 and 191193.
5.
"Tractable learning is a new ... paradigm": not correct
We use
tractable learning to refer to the recent trend of learning
generalpurpose tractable probabilistic models with complex dependency
structures. While there has been work in this direction since the 60s, the
paradigm has resurged in recent years, with lots of activity and the first
ICML workshop on the topic in 2014.
REVIEWER 6
1. MCMC vs
heuristic search
This is an interesting future direction. Your
suggestion highlights a benefit of SDDs: they allow a MCMCstyle search
because it is possible to efficiently add and remove features. In
contrast, removing a feature from ACs would require recompiling the AC
from scratch, which would be very computationally expensive.
We
used a greedy topdown search to make the search as close as possible to
the ACMN baseline. Thus we can see if the benefit comes from using SDDs
and not from a different search algorithm.
REVIEWER 7
1. Why
complex queries
Complex properties such as counts and threshold
appear naturally in related domains such as neural networks and relational
learning (cf. Reviewer 2). We believe they have not been used to their
full potential in the graphical model world because of their
intractability, which we aim to resolve.
2. Motivate SDD as a good
model class
SDDs are useful for tractable learning because 1) they
give a direct measure for the complexity of inference, 2) they are more
flexible than other representations (ACs and SPNs) because we can add any
kind of feature and perform more queries efficiently and 3) with
minimization one can reduce the size of an SDD during learning.
3.
Importance of feature order
The order is not crucial because in
each iteration, the algorithm considers all the features and then only
adds the best. If no minimization is performed along the way, the
algorithm is even oblivious to the order in which the features are
presented.
4. Structure of score function
We take the
relative increase in size because the worst case size increase is linear
in the current model size. The size increase will thus always get worse
when the model grows bigger. When using absolute size increase, from a
certain point it only adds features based on size increase but they no
longer contribute to likelihood.
5. Theorem 1 proof
We will
add a proof outline to the paper. 
