Review for NeurIPS paper: BoxE: A Box Embedding Model for Knowledge Base Completion

NeurIPS 2020

BoxE: A Box Embedding Model for Knowledge Base Completion

Review 1

Summary and Contributions: The paper uses previously developed box embeddings on the knowledge base / graph completion task. The paper explores which types of rules and relationships are expressible, capturable, and injectable in the box embedding model, finding this subset attractive. Experiments confirm that box embeddings outperform comparable approaches on a variety of tasks and provide a useful inductive bias when injecting new rules into a knowledge base. A box embedding is a representation relations as two boxes (or more for n-ary relations) in an embedding space, and points as two vectors (a basis point and a bump vector). For a relation r to apply to two entities the first entity's base point (translated by second entity's bump) has to be contained in the relation's first box and the second entity's base point (translated by the first entity's bump) has to be contained in the relation's second box. This representation, though a little hard to reason about, allows expressing a large set of relevant constraints on possible relations.

Strengths: The model is clearly very expressive (see table 5.1 for the types of patterns expressible in it) and seems relatively straightforward to train. The paper also very clearly specifies the distinctions between expressing a relation, capturing an inference pattern, and injecting a rule into a model (where some amount of nondeterminism and inductive bias is brought in). The experiments exploring the effects of these actions are illuminating.

Weaknesses: It's not easy to see how to compress these models. The proof for theorem 5.1 relies on an induction step adding dimensions for every rule you want to encode in the model. The definition of the box embedding can be a little hard to grok, and the base explanation is quite short. In writing this review I had to correct my paraphrase multiple times, and refer to the proofs for theorems 5.1 and 5.2 in the appendix to fully understand the implications. The main text would be improved with some examples of how the general constraints are encodable in terms of the boxes (like how symmetric relations have two identical boxes, etc).

Correctness: I read through the proofs in the appendix superficially and found no issues. The experimental results also show no problems.

Clarity: Yes, the paper is reasonably easy to read, though it took a few tries to really grok the definition of box relations as used here.

Relation to Prior Work: Yes all relevant prior work I am aware of has been cited.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper proposes to provide a novel link prediction model using box embedding. The proposed model seems to be immune to exiting shortcomings of KGE models such as theoretical inexpressivity, inability to properly capturing prominent inference patterns, and incapability in extending to higher-arity relations. The authors validate their proposed approach through several experiments. The contributions are as follows: 1) Providing a novel KGE model using box embedding. 2) Analyzing the performance of the proposed method in detailed experiments achieving several SOTA performances in link prediction task. 3) Studying rule injection and higher-arity relations in KGs, as extra features of the BoxE model.

Strengths: This paper reads well and the results appear sound. I personally find the capability of BoxE in learning higher-arity relations and rule injection very interesting. Furthermore, the provided approach outperforms previous methods in several benchmarks. Finally, the paper did a great job in thoroughly analyzing the proposed model theoretically and experimentally,

Weaknesses: The only drawbacks of this paper are novelty, considering the similarity of BoxE with previously introduced methods which use box embedding, and inadequate explanation on learning procedure of their model, such is loss function, and number of negative samples.

Correctness: The claims and methodology appear sound.

Clarity: I found the paper very well-written.

Relation to Prior Work: The authors discussed the connection to previous works clearly.

Reproducibility: Yes

Additional Feedback: My suggestions and questions are as follows: 1) It will be interesting to see the result of injecting rules on other KGs, such as WN18RR and FB15K-237. 2) I am curious about the authors' opinion on the interpretability of BoxE representations in comparison to other existing KGE models. 3) To see the capability of BoxE in capturing prominent inference patterns, it will be interesting to study the per-relation breakdown performance on different KGs.

Review 3

Summary and Contributions: Significant contribution to unifying relations of any arity for KBC, using per-entity displacements and modeling relations as box regions.

Strengths: + Rather creative box model for any-ary relations, unifying symmtry, asymmetry, anti-symmetry, and transitivity into single framework. + Thorough study of theoretical properties. + Comprehensive experiments and results.

Weaknesses: - Some minor notation malfunction, easily fixable. - Some material from appendix should be pushed into main paper.

Correctness: I did not check Thm 5.3 and 5.4 closely. The rest seem sound. Experimental methodology is standard.

Clarity: Very well written for the most part, minor fixes suggested.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: Please number ALL equations for easy reference, at least in the preliminary submission. L139 Translational bumps are certainly very expressive, but a likely first reaction is that they are too expressive. Perhaps you need a couple sentences right here on how you control their power. L153 "for the sample KG, there are $4^2$ potential configurations" There are four entities and two binary relations. For each relation, each slot can be occupied by any one of four entities (assuming selectively reflexive and symmetric relations allowed). That is 16 possible worlds for each relation, so $16^2$ potential configurations --- is that wrong? L163 The notation can be improved and simplified. For relation $r$, you can use $\ell_r$ for the lower corner, $u_r$ for the upper corner, and $c_r$ for the center. In your notation, the center and width lose track of which relation it pertains to. Add notes on elementwise operations (particularly elementwise division and reciprocal) around equations between L165 and L166 to make them clearer. There is also no explanation for the form of the expression in the second line. The extreme case consistency mentioned in L166 is not enough. (The charts in the appendix makes the intention quite clear, but I could not `read' the equations as the charts on a rapid pass through the main paper. After seeing the charts, it seems like a 2-sided adaptation of ReLU variants that retain some guiding gradient.) The bump given to an entity by comrades in a tuple is itself independent of any parameterization of the relation. The relation parameters (center and `radius') encourage all bumped entities in a tuple to stay within its boundaries. In the proof of Theorem 5.1 as well as experiments, it seems you are depending on negative examples to control the extent of the boxes, rather than regularizing their widths or volumes. Are you convinced uniform negative sampling is best possible or most efficient?