NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 1018 Embedding Logical Queries on Knowledge Graphs

Reviewer 1

This paper proposes a system to query a knowledge graph combined with a knowledge base completion (KBC) system. Nodes (entities) and edge (relation) types are associated with suitable embeddings. The system described here can handle specific query structures. (The query is structured, not uncontrolled text.) Depending critically on the Deep Sets paper of Smola et al., this paper proposes compositional expressions to build up an embedding for a query in this class. Once the query is thus compiled' into a suitable embedding vector, entities embedded to nearby vectors are presented as responses. Composing a query embedding involves definiting two operators in the query language' and their denotations in the embedding domain. Given a set of nodes, the first operator (curiously called projection') expands along edges of specified types to collect another set of nodes. The second operator (called intersection') computes a query embedding from a set of nodes, with the underlying semantics of estimating the embedding of a node that is close to all the nodes. Although the query class is adequately motivated and the design decisions may be reasonable for building a practical system, I am not sure what the theoretical analysis means. Here are two specific issues. In equation (4), we get the impression that irrespective of how many nodes may be implicit in \mathbf{q}, it is represented by a vector of a fixed width/capacity d. This implies some form of symmetric aggregation to fixed capacity, which is troubling. For example, the centroid of a set of (node) vectors may be quite far from any node embedding. The problem persists when, in (4), we simply multiply this aggregate with a matrix and expect the resulting vector to continue to represent a different set of nodes (i.e. their embeddings). These operators can be cascaded, with potentially increasingly deleterious effects on the aggregate representation. In case of intersection, suppose we are looking for a vector v that is close to each of u_1, \ldots, u_m. Then, given a candidate v, we might create a network N(v, u_i) \in [0,1] that measures how close v and u_i are, and then we need to AND them, which would generally be expressed as a product of sigmoids, but perhaps min is OK. Again, for unclear reasons (except denotational efficiency), rather than judge the pairwise closeness first and then attempt a smooth AND, the paper retains (\mathbf{B}_\gamma transformed) node embeddings, does a symmetric aggregation, and only then applies a matrix (unclear why). Given the nonlinearities, these are not algebraically equivalent. It seems in both cases the authors are falling back heavily on the deep sets paper, but in practical applications, d = O(|V|) is rather unsatisfactory. If in reality you need d \ll O(|V|), you need to present intution about why this is possible, at least for the data sets you consider.