{"title": "Bridging Machine Learning and Logical Reasoning by Abductive Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 2815, "page_last": 2826, "abstract": "Perception and reasoning are two representative abilities of intelligence that are integrated seamlessly during human problem-solving processes. In the area of artificial intelligence (AI), the two abilities are usually realised by machine learning and logic programming, respectively. However, the two categories of techniques were developed separately throughout most of the history of AI. In this paper, we present the abductive learning targeted at unifying the two AI paradigms in a mutually beneficial way, where the machine learning model learns to perceive primitive logic facts from data, while logical reasoning can exploit symbolic domain knowledge and correct the wrongly perceived facts for improving the machine learning models. Furthermore, we propose a novel approach to optimise the machine learning model and the logical reasoning model jointly. We demonstrate that by using abductive learning, machines can learn to recognise numbers and resolve unknown mathematical operations simultaneously from images of simple hand-written equations. Moreover, the learned models can be generalised to longer equations and adapted to different tasks, which is beyond the capability of state-of-the-art deep learning models.", "full_text": "Bridging Machine Learning and Logical Reasoning\n\nby Abductive Learning\u2217\n\nWang-Zhou Dai\u2020\n\nQiuling Xu\u2020\n\nYang Yu\u2020\n\nZhi-Hua Zhou\n\nNational Key Laboratory for Novel Software Technology\n\nNanjing University, Nanjing 210023, China\n\n{daiwz, xuql, yuy, zhouzh}@lamda.nju.edu.cn\n\nAbstract\n\nPerception and reasoning are two representative abilities of intelligence that are\nintegrated seamlessly during human problem-solving processes. In the area of\narti\ufb01cial intelligence (AI), the two abilities are usually realised by machine learning\nand logic programming, respectively. However, the two categories of techniques\nwere developed separately throughout most of the history of AI. In this paper,\nwe present the abductive learning targeted at unifying the two AI paradigms in\na mutually bene\ufb01cial way, where the machine learning model learns to perceive\nprimitive logic facts from data, while logical reasoning can exploit symbolic domain\nknowledge and correct the wrongly perceived facts for improving the machine\nlearning models. Furthermore, we propose a novel approach to optimise the\nmachine learning model and the logical reasoning model jointly. We demonstrate\nthat by using abductive learning, machines can learn to recognise numbers and\nresolve unknown mathematical operations simultaneously from images of simple\nhand-written equations. Moreover, the learned models can be generalised to longer\nequations and adapted to different tasks, which is beyond the capability of state-of-\nthe-art deep learning models.\n\nIntroduction\n\n1\nHuman cognition [34] consists of two remarkable capabilities: perception and reasoning, where the\nformer one processes sensory information, and the latter one majorly works symbolically. These two\nabilities function at the same time and affect each other, and they are often joined subconsciously by\nhumans, which is essential in many real-life learning and problem-solving procedures [34].\nModern arti\ufb01cial intelligence (AI) systems exhibit both these abilities. Machine learning tech-\nniques such as deep neural networks have achieved extraordinary performance in solving perception\ntasks [19]; meanwhile, logic-based AI systems have succeeded in human-level reasoning abilities in\nproving mathematical theorems [27] and in performing inductive reasoning concerning relations [25].\nHowever, popular machine learning techniques can hardly exploit sophisticated domain knowledge\nin symbolic forms, and perceived information is hard to include in reasoning systems. Even in recent\nneural networks with the ability to focus on relations [31], enhanced memories and differentiable\nknowledge representations [13], full logical reasoning ability is still missing\u2014as an example, consider\nthe dif\ufb01culties of understanding natural language [17]. On the other hand, Probabilistic Logic Program\n(PLP) [5] and Statistical Relational Learning (SRL) [12] are aiming at integrating learning and logical\nreasoning by preserving the symbolic representation. However, they usually require semantic-level\ninput, which involves pre-processing sub-symbolic data into logic facts [30].\n\n\u2020These authors contributed equally to this work.\n\u2217W.-Z. Dai (w.dai@imperial.ac.uk) and Q. Xu (simpleword2014@gmail.com) are now at Imperial\n\nCollege London and Purdue University, respectively.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fTo leverage learning and reasoning more naturally, it is crucial to understand how perception and\nreasoning affect each other in a single system. A possible answer is abduction [28], or termed as\nretro-production [33]. It refers to the process of selectively inferring speci\ufb01c facts and hypotheses\nthat give the best explaination to observations based on background knowledge [23, 14], where the\n\u201cobservations\u201d are mostly sensory information, and \u201cknowledge\u201d is usually symbolic and structural.\nAn example of human abductive problem-solving is the decipher-\nment of Mayan hieroglyphs [15], which re\ufb02ects two remarkable\nhuman intelligence capabilities: 1) visually perceiving individual\nnumbers from hieroglyphs and 2) reasoning symbolically based on\nthe background knowledge about mathematics and calendars. Fig. 1\nshows a Mayan calendar discovered from the Palenque Temple of the\nCross Complex, it starts with the mythical creation date, followed\nby a time period written in long count, and \ufb01nished with a speci\ufb01c\ndate encoded by Tzolk\u2019in and Haab\u2019 calendars. Fig. 2 depicts the\nrecords of breaking Fig. 1 by Charles P. Bowditch [2]. He \ufb01rst iden-\nti\ufb01ed some known numbers, and con\ufb01rmed that the \ufb01rst and sixth\nhieroglyphs are the same. Then, Bowditch tried substituting those\nunknown hieroglyphs with visually similar numbers, as shown in\n\u201cColumn 1\u201d in Fig. 2. Meanwhile, he calculated the Tzolk\u2019in and\nHaab\u2019 values according to his conjectures and background knowl-\nedge in Mayan calendars, as shown in \u201cColumn 2\u201d in Fig. 2. Finally,\nhe got the correct answer \u201c1.18.5.4.0, 1 Ahau 13 Mak\u201d by observing\nthe consistency between his conjecture and calculation [2].\nInspired by abductive problem-solving, we present the Abductive\nLearning (ABL), a new approach towards bridging machine learning\nand logical reasoning. In abductive learning, a machine learning\nmodel is responsible for interpreting sub-symbolic data into primitive\nlogical facts, and a logical model can reason about the interpreted\nfacts based on some \ufb01rst-order logical background knowledge to obtain the \ufb01nal output. The primary\ndif\ufb01culty lies in the fact that the sub-symbolic and symbolic models can hardly be trained together.\nMore concretely: 1) it does not have any ground-truth of the primitive logic facts \u2014 e.g., the correct\nnumbers in Fig. 1 \u2014 for training the machine learning model; 2) without accurate primitive logic\nfacts, the reasoning model can hardly deduce the correct output or learn the right logical theory.\n\nFigure 1: A Mayan calendar.\nThe coloured boxes and \u201c?\u201d cor-\nrespond to unknown numbers.\n\nOur presented Abductive Learning (ABL) tries to address these\nchallenges with logical abduction [18, 7] and consistency opti-\nmisation. Given a training sample associated with a \ufb01nal output,\nlogical abduction can conjecture about the missing information\n\u2014 e.g., candidate primitive facts in the example, or logic clauses\nthat can complete the background knowledge \u2014 to establish\na consistent proof from the sample to its \ufb01nal output. The\nabduced primitive facts and logic clauses are then used for\ntraining the machine learning model and stored as symbolic\nknowledge, respectively. Consistency optimisation is used for\nmaximising the consistency between the conjectures and the\nbackground knowledge. To solve this highly complex problem,\nwe transform it into a task that searches for a function guessing\nabout possibly mistaken primitive facts.\nBecause of the dif\ufb01culty of collecting Mayan hieroglyph data,\nwe designed a similar task \u2014 the handwritten equation deci-\npherment puzzles \u2014 for experiments. The task is to learn image\nrecognition (perception) and mathematical operations for calcu-\nlating the equations (reasoning) simultaneously. Experimental\nresults show that ABL generalise better than state-of-the-art\ndeep learning models and can leverage learning and reasoning\nin a mutually bene\ufb01cial way. Further experiments on a visual n-queens task shows that the ABL\nframework is \ufb02exible and can improve the performance of machine learning by taking advantage of\nclassical symbolic AI systems such as Constraint Logic Programming [16].\n\nFigure 2: Bowditch\u2019s decipherment of\nFig. 1 (he wrote \u201cMak\u201d as \u201cMac\u201d) [2].\nNumbers in the vertical boxes are his\nguesses (Column 1) to the unknown hi-\neroglyphs in Fig. 1. The dashed yellow\nbox marks the consistent result accord-\ning to his calculation (Column 2).\n\n2\n\n\f2 Related Work\nAs one of the holy grail problems in AI, combining machine learning and logical reasoning has\ndrawn much attention. Most existing methods try to combine the two different systems by making\none side to subsume the other. For example, Fuzzy logic [41], Probabilistic Logic Programming [5]\nand Statistical Relational Learning [12] have been presented to empower traditional logic-based\nmethods to handle uncertainty; however, most of them still require human-de\ufb01ned symbols as\ninput [30]. Probabilistic programming [35, 21, 20] is presented as an analogy to human cognition\nto enable probabilistic reasoning with sub-symbolic primitives, yet the correspondence between\nthe sub-symbolic primitives and their symbolic representations used in programming is assumed to\nalready exist rather than assuming that it should be learned.\nAnother typical approach is to use deep neural networks or other differentiable functional calculations\nto approximate symbolic calculi. Some of them try to translate logical programs into neural networks,\ne.g. KBANN [38] and Artur Garcez\u2019s works on neural-symbolic learning [10, 9]; others directly\nreplace symbolic computing with differentiable functions, e.g., differential programming methods\nsuch as DNC and so on attempt to emulate symbolic computing using differentiable functional\ncalculations [13, 11, 1, 6]. However, few of them can make full-featured logical inferences, and they\nusually require large amounts of training data.\nDifferent from the previous works, ABL tries to bridge machine learning and logical reasoning in a\nmutually bene\ufb01cial way [42]. The two components perceive sub-symbolic information and make\nsymbolic reasoning separately but interactively. The logical abduction with consistency optimisation\nenables ABL to improve the machine learning model and learn logical theory in a single framework.\n\n3 Abductive Learning\n\nIn this section, we present the ABL approach. Notations and the problem formulation are \ufb01rstly\nintroduced, followed by the detailed description and the presented optimisation approach.\n\n3.1 Problem Setting\n\nThe task of abductive learning can be formalised as follows. The input of abductive learning consists\nof a set of labelled training data D = {(cid:104)x1, y1(cid:105), . . . ,(cid:104)xn, yn(cid:105)} about a target concept C and a\ndomain knowledge base B, where xi \u2208 X is the input data, yi \u2208 {0, 1} is the label for xi of target\nconcept C, and B is a set of \ufb01rst-order logical clauses. The target concept C is de\ufb01ned with unknown\nrelationships amongst a set of primitive concepts symbols P = {p1, . . . , pr} in the domain, where\neach pk is a de\ufb01ned symbol in B. The target of abductive learning is to output a hypothesis model\nH = p \u222a \u2206C, in which:\n\n\u2022 p : X (cid:55)\u2192 P is a mapping from the feature space to primitive symbols, i.e., it is a perception\n\nmodel formulated as a conventional machine learning model;\n\n\u2022 \u2206C is a set of \ufb01rst-order logical clauses that de\ufb01ne the target concept C with B, which is\n\ncalled knowledge model.\n\nThe hypothesis model should satisfy:\n\n\u2200(cid:104)x, y(cid:105) \u2208 D (B \u222a \u2206c \u222a p(x) |= y).\n\n(1)\n\nWhere \u201c|=\u201d stands for logical entailment.\nAs we can observe from Eq. 1, the major challenge for abductive learning is that the perception model\np and the knowledge model \u2206C are mutually dependent: 1) To learn \u2206C, the perception results\np(x) \u2014 the set of groundings of the primitive concepts in x \u2014 is required; 2) To obtain p, we need\nto get the ground truth labels p(x) for training, which can only be logically derived from B \u222a \u2206C\nand y. When the machine learning model is under-trained, the perceived primitive symbols p(x) is\nhighly possible to be incorrect; therefore, we name them pseudo-groundings or pseudo-labels. As a\nconsequence, the inference of \u2206C based on Eq. 1 would be inconsistent; when the knowledge model\n\u2206C is inaccurate, the logically derived pseudo-labels p(x) might also be wrong, which harms the\ntraining of p. In either way, they will interrupt the learning process.\n\n3\n\n\fFigure 3: The structure of ABL framework.\n\n3.2 Framework\n\nThe ABL framework [42] tries to address these challenges by connecting machine learning with an\nabductive logical reasoning module and bridging them with consistency optimisation. Fig. 3 shows\nthe outline of the framework.\nMachine learning\nis used for learning the perception model p. Given an input instance x, p\ncan predict the pseudo-labels p(x) as groundings of possible primitive concepts in x. When the\npseudo-labels contain mistakes, the perception model needs to be re-trained, where the labels are the\nrevised pseudo-labels r(x) returned from logical abduction.\nLogical abduction is the logical formalisation of abductive reasoning. Given observed facts\nand background knowledge expressed as \ufb01rst-order logical clauses, logical abduction can abduce\nground hypotheses as possible explanations to the observed facts. A declarative framework in Logic\nProgramming that formalises this process is Abductive Logic Programming (ALP) [18]. Formally, an\nabductive logic program can be de\ufb01ned as follows:\nDe\ufb01nition 1 [18] An abductive logic program is a triplet (B, A, IC), where B is background\nknowledge, A is a set of abducible predicates, and IC is the integrity constraints. Given some\nobserved facts O, the program outputs a set \u2206, of ground abducibles of A, such that:\n\n\u2022 B \u222a \u2206 |= O,\n\u2022 B \u222a \u2206 |= IC,\n\u2022 B \u222a \u2206 is consistent.\n\nto abduce the knowledge model \u2206C according to B \u222a p(X)\u222a Y . Here we use p(X) =(cid:83)n\ni=1{xi}, and Y = (cid:83)n\nto represent the pseudo-labels of all the instances X = (cid:83)n\n\nIntuitively, the abductive explanation \u2206 serves as a hypothesis that explains how an observation O\ncould hold according to the background knowledge B and the constraint IC.\nConsidering the formulation in Eq. 1, ABL takes the instance labels about the \ufb01nal concept as observed\nfacts, and takes the hypothesis model H = \u2206C \u222a p as abducibles. Given a \ufb01xed \u2206C, ABL can\nabduce p(X) according to B and Y ; when the perception model p has been determined, ALP is able\ni=1{p(xi)}\ni=1{yi} are the\n\ufb01nal concept labels corresponding to X. Therefore, we can denote the abduced knowledge model\nconditioned by B \u222a p(X) and Y as \u2206C(B \u222a p(X), Y ).\n3.3 Optimisation\n\nThe objective of ABL is to learn a hypothesis consistent with background knowledge and training\nexamples. More concretely, ABL tries to maximise the consistency between the abduced hypotheses\nH with training data D = {(cid:104)xi, yi(cid:105)}n\n\n(2)\nwhere Con(H \u222a D; B) stands for the size of subset \u02c6DC \u2286 D which is consistent with H = p \u222a \u2206C\ngiven B. It can be de\ufb01ned as follows:\n\ni=1 given background knowledge B:\nmax\nH=p\u222a\u2206C\n\nCon(H \u222a D; B),\n\nCon(H \u222a D; B) = max\nDc\u2286D\ns.t. \u2200(cid:104)xi, yi(cid:105) \u2208 Dc (B \u222a \u2206C \u222a p(xi) |= yi) .\n\n| Dc |\n\n(3)\n\nTo solve Eq. 2, ABL tries to optimise \u2206C and p alternatively.\nDuring the t-th epoch, when the perception model pt is under-trained, the pseudo-labels pt(X) could\nbe incorrect and make logical abduction fail to abduce any consistent \u2206C satisfying Eq. 1, resulting\nin Con(H \u222a D; B) = 0.\n\n4\n\n\fC by considering:\n\nB \u222a pt(X) \u2212 \u03b4[pt(X)] \u222a \u2206\u03b4[pt(X)] \u222a \u2206t\n\nTherefore, ABL needs to correct the wrongly perceived pseudo-labels to achieve consistent abductions,\nsuch that \u2206t\nC can be consistent with as many as possible examples in D. Here we denote the\npseudo-labels to be revised as \u03b4[pt(X)] \u2286 pt(X), where \u03b4 is a heuristic function to estimate which\npseudo-labels are perceived incorrectly by current machine learning model pt \u2014 in analogy to\nBowditch\u2019s power of identifying the misinterpreted hieroglyphs (see Fig. 2).\nAfter removing the incorrect pseudo-labels marked by the \u03b4 function, ABL can apply logical abduction\nto abduce the candidate pseudo-labels to revise \u03b4[pt(X)] together with \u2206t\nC |= Y\n\n(4)\nWhere pt(X) \u2212 \u03b4[pt(X)] are the remaining \u201ccorrect\u201d pseudo-labels determined by \u03b4, and \u2206\u03b4[pt(X)]\nare the abduced pseudo-labels for revising \u03b4[pt(X)].\nTheoretically, \u03b4 can simply mark all pseudo-labels as \u201cwrong\u201d, i.e., letting \u03b4[pt(X)] = pt(X) and\nask logical abduction to do all the learning jobs. In this case, ABL can always abduce a consistent\n\u2206\u03b4[pt(X)] \u222a \u2206t\nC satisfying Eq. 4. However, this means that the logical abduction have to learn the\nknowledge model \u2206C without any in\ufb02uence from the perception model p and the raw data X. It\nnot only results in an exponentially larger search space for the abduction, but also breaks the link\nbetween logical reasoning and actual data. Consequently, ABL chooses to restrict the revision to\nbe not too far away from the percieved results, by limiting | \u03b4[pt(X)] |\u2264 M, where M de\ufb01nes the\nstep-wise search space on the scale of the abduction and is suf\ufb01cient to be set a small number.\nTherefore, when pt is \ufb01xed, we can transform the optimisation problem of \u2206C into an optimisation\nproblem of function \u03b4, and reformulate Eq. 2 as follows:\n\nmax\n\n\u03b4\ns.t.\n\nCon(H\u03b4 \u222a D),\n| \u03b4[pt(X)] |\u2264 M\n\n(5)\n\nwhere H\u03b4 = pt(X) \u2212 \u03b4[pt(X)] \u222a \u2206\u03b4[pt(X)] \u222a \u2206t\nC is the abduced hypothesis de\ufb01ned by Eq. 4.\nAlthough this objective is still non-convex, optimising \u03b4 instead of \u2206C allows ABL to revise and\nimprove the hypothesis even when pt is not optimal.\nThe heuristic function \u03b4 could take any form as long as it can be easily learned. We present to solve\nit with derivative-free optimisation [40], which is a \ufb02exible framework for optimising non-convex\nobjectives. As to the subset selection problem in Eq. 5, we present to solve it with greedy algorithms.\nAfter obtained the \u03b4 and \u2206t\nC, ABL can directly apply logical abduction to obtain the revised pseudo-\nlabels r(X) = pt(X) \u2212 \u03b4[pt(X)] \u222a \u2206\u03b4[pt(X)], which is used for re-training the machine learning\nmodel. This procedure can be formulated as follows:\n\nm(cid:88)\n\np\n\ni=1\n\npt+1 = arg min\n\nLoss (p(xi), r(xi)) ,\n\n(6)\n\nwhere Loss stands for the loss function for machine learning, r(xi) \u2208 r(X) is the set of revised\npseudo-labels for instance xi \u2208 X.\nIn short, ABL works as follows: Given the training data, an initialised machine learning model is\nused for obtaining the pseudo-labels, which are then treated as groundings of the primitive concepts\nfor logical reasoning to abduce \u2206C. If the abduction terminated due to inconsistency, the consistency\noptimisation procedure in Eq. 5 is called to revise the pseudo-labels, which are then used for re-\ntraining the machine learning model.\n\n4\n\nImplementation\n\nTo verify the effectiveness of the presented approach, we designed the handwritten equation decipher-\nment tasks, as shown in Fig. 4 and applied ABL to solve them.\nThe equations for the decipherment tasks consist of sequential pictures of characters. The equations\nare constructed from images of symbols (\u201c0\u201d, \u201c1\u201d, \u201c+\u201d and \u201c=\u201d), and they are generated with unknown\noperation rules, each example is associated with a label that indicates whether the equation is correct.\nA machine is tasked with learning from a training set of labelled equations, and the trained model is\n\n5\n\n\fFigure 4: Handwritten equation decipherment puzzle: a computer should learn to recognise the symbols and\n\ufb01gure out the unknown operation rules (\u201cxnor\u201d in this example) simultaneously.\n\nFigure 5: The structure of our ABL implementation.\n\nexpected to predict unseen equations correctly. Thus, this task demands the same ability as a human\njointly utilising perceptual and reasoning abilities in Fig. 1.\nFig. 5 shows the architecture of our ABL implementation, which employs a convolutional neural\nnetwork (CNN) [22] as the perception machine learning model. The CNN takes image pixels as\ninput and is expected to output the symbols in the image. The symbol output forms the pseudo-labels.\nThe logical abduction is realised by an Abductive Logic Program implemented with Prolog. The\nconsistency optimisation problem in Eq. 5 is solved by a derivative-free optimisation tool RACOS[40].\nBefore training, the domain knowledge\u2014written as a logic program\u2014is provided to the ALP as\nbackground knowledge B. In our implementation, B involves only the structure of the equations and\na recursive de\ufb01nition of bit-wise operations. The background knowledge about equation structures\nis a set of de\ufb01nite clause grammar (DCG) rules recursively de\ufb01ne that a digit is a sequence of \u201c0\u201d\nand \u201c1\u201d, and each equation share the structure of X+Y=Z, although the length of X, Y and Z may be\nvaried. The knowledge about bit-wise operations is a recursive logic program that reversely calculate\nX+Y, i.e., it operates on X and Y digit-by-digit and from the last digit to the \ufb01rst. The logic programs\nde\ufb01ning this background knowledge are shown in the supplementary.\nRemark Please notice that, the speci\ufb01c rules for calculating the operations are unde\ufb01ned in B,\ni.e., results of \u201c0+0\u201d, \u201c0+1\u201d and \u201c1+1\u201d could be \u201c0\u201d, \u201c1\u201d, \u201c00\u201d, \u201c01\u201d or even \u201c10\u201d. The missing\ncalculation rules form the knowledge model \u2206C, which are required to be learned from the data.\nAfter training starts, the CNN will interpret the images to the symbolic equations constructed by\npseudo-labels \u201c0\u201d, \u201c1\u201d, \u201c+\u201d and \u201c=\u201d. Because the CNN is untrained, the perceived symbols are\ntypically wrong. In this case, ALP cannot abduce any \u2206C that is consistent with the training data\naccording to the domain knowledge, i.e., no calculation rules can satisfy the perceived pseudo-labels\nwith the associated labels. To abduce the most consistent \u2206C, ABL learns the heuristic function \u03b4 for\nmarking possible incorrect pseudo-labels.\nFor example, in the beginning, the under-trained CNN is highly likely to interpret the images as\na pseudo-grounding eq0=[1,1,1,1,1], which is inconsistent with any binary operations since it\nhas no operator symbol. Observing that ALP cannot abduce a consistent hypothesis, RACOS will\nlearn a \u03b4 that substituting the \u201cpossibly incorrect\u201d pseudo-labels in eq0 with blank Prolog variables,\ne.g., eq1=[1,_,1,_,1]. Then, ALP can abduce a consistent hypothesis involving the operation rule\nop(1,1,[1]) and a list of revised pseudo-labels eq1\u2019=[1,+,1,=,1], where the latter one is used\nfor re-train the CNN, helping it distinguish images of \u201c+\u201d and \u201c=\u201d from other symbols.\nThe complexity of the optimisation objective in Eq. 5 is very high, which usually makes it infeasible\nto evaluate the entire training set D during optimisation. Therefore, ABL performs abduction and\noptimisation for T times, each time using a subsample Dt \u2286 D for training. The locally consistent\nreasoning model \u2206t\n\nC abduced in each iteration are kept as a relational feature.\n\n6\n\nPositiveNegative?\f(a) Training examples.\n\n(b) Test examples.\n\nFigure 6: Data examples for the handwritten equations decipherment tasks.\n\nFigure 7: Experimental results of the DBA (left) and RBA (right) tasks.\n\nAfter the CNN converged or the algorithm meets the iteration limit, all (cid:104)xi, yi(cid:105) \u2208 D are proposition-\nalised to binary feature vectors by the relational features. For every input equation xi, its pseudo-labels\nwill be evaluated by all the relational features to produce a binary vector ui = [ui1, . . . , uiT ], where\n\n(cid:40)\n\n(7)\nTherefore, the original dataset D = {(cid:104)xi, yi(cid:105)} can be transforms into a new dataset D(cid:48) = {(cid:104)ui, yi(cid:105)},\nfrom which a decision model is learned to handle the noises introduced by subsampling.\n\nuij =\n\n1, B \u222a \u2206j\n0, B \u222a \u2206j\n\nC \u222a p(xi) |= yi,\nC \u222a p(xi) (cid:54)|= yi.\n\n5 Experiments\nDataset We constructed two image sets of symbols to build the equations shown in Fig. 6. The\nDigital Binary Additive (DBA) equations were created with images from benchmark handwritten\ncharacter datasets [22, 36], while the Random Symbol Binary Additive (RBA) equations were con-\nstructed from randomly selected characters sets of the Omniglot dataset [21] and shared isomorphic\nstructure with the equations in the DBA tasks. In order to evaluate the perceptual generalisation\nability of the compared methods, the images for generating the training and test equations are dis-\njoint. Each equation is input as a sequence of raw images of digits and operators. The training and\ntesting data contains equations with lengths from 5 to 26. For each length it contains 300 randomly\ngenerated equations, in a total of 6,600 training examples. This task has 4! = 24 possible mappings\nfrom the CNN outputs to the pseudo-label symbols, and 43 = 64 possible operation rule sets (with\ncommutative law), so the search space of logical abduction contains 1536 different possible \u2206C.\nFurthermore, the abduction for revising pseudo-labels introduces 2M more candidates. Considering\nthe small amount of training data (especially for the ABL-short setting with only 1200 training\nexamples), this task is not trivial.\nCompared methods\n\n\u2022 ABL: The machine learning model of ABL consists of a two-layer CNN and a two-layer\nmultiple-layer perceptron (MLP) followed by a softmax layer; the logical abduction will\nkeep 50 calculation rule sets of bit-wise operations set as relational features; The decision\nmodel is a two-layer MLP. Two different settings have been tried: the ABL-all that uses all\ntraining data and the ABL-short that only uses training equations of lengths 5-8.\n\u2022 Differentiable Neural Computer (DNC) [13]: This is a deep neural network associated\n\nwith memory, and has shown its potential on symbolic computing tasks [13].\n\n7\n\nPositiveNegativePositiveDBAequation instancelabelRBAequation instancelabelPositiveNegativePositiveDBAequation instancelabel???RBAequation instancelabel???\f(a) CNN training accuracy.\n\n(b) Results of logical abduction in RBA tasks.\n\nFigure 8: Training accuracy and results of logical abductions.\n\nFigure 9: Results of the cross-task transfer experiments.\n\n\u2022 Transformer networks [39]: This is a deep neural network enhanced with attention, and\n\nhas been veri\ufb01ed to be effective on many natural language processing tasks.\n\n\u2022 Bidirectional Long Short-Term Memory Network (BiLSTM) [32]: This is the most\n\nwidely used neural network for learning from sequential data.\n\nTo handle image inputs, the BiLSTM, DNC and Transformer networks also use the same structured\nCNN like the ABLs as their input layers. All the neural networks are tuned with a held-out validation\nset randomly sampled from the training data. All the experiment are repeated for 10 times and\nperformed on a workstation with a 16 core Intel Xeon CPU @ 2.10GHz, 32 GB memory and a Nvidia\nTitan Xp GPU.\nWe also carried out a human experiment. Forty volunteers were asked for classifying images of\nequations sampled from the same datasets. Before taking the quiz, the domain knowledge about the\nbit-wise operation was provided as hints, but speci\ufb01c calculation rules are not available \u2014 just like\nthe setting for ABL. Instead of using the precisely same setting as the machine learning experiments,\nwe gave the human volunteers a simpli\ufb01ed version, which only contains 5 positive and 5 negative\nequations with lengths ranging from 5-14.\nResults Fig. 7 shows that on both tasks, the ABL-based approaches signi\ufb01cantly outperform the\ncompared methods, and ABL correctly learned the symbolic rules de\ufb01ning the unknown operations.\nAll the methods performed better on the DBA tasks than RBA, because the symbol images in the\nDBA task are more easily distinguished. The performance of ABL-all and ABL-short have no\nsigni\ufb01cant difference, and the performance of the compared approaches degenerates quickly toward\nthe random-guess line as the length of the testing equations grows, while the ABL-based approaches\nextrapolates better to the unseen data. An interesting result is that the human performance on the\ntwo tasks are very close, and both of them are worse than that of ABL. According to the volunteers,\nthey do not suffer from distinguishing different symbols, but machines are better in checking the\nconsistency of logical theories \u2014 in which people are prone to make mistakes. Therefore, machine\nlearning systems should make use of their advantages in logical reasoning.\nInside the learning process of ABL, although no ground-truth labels exist for the images of digits and\noperators, the CNN training accuracy did increase during the learning process, as shown by Fig. 8a.\nOn the other hand, Fig. 8b shows the relationship between ABL\u2019s overall equation classi\ufb01cation\naccuracy, image perception accuracy and results of logical abductions on the RBA tasks, where red\ndots indicate successful abductions and the blue dots signify failures. This result shows that the\ntraining of CNN and the logic-based learning of unknown operation rules indeed mutually bene\ufb01ted\neach other during the training process.\n\n8\n\nCNNCNNABLABLABLABLPerception TransferKnowledge Transfer\fC together with the decision MLP) to different tasks.\n\nCross-task Transfer We also carried experiments on transferring the learned CNN and knowledge\nmodel (i.e., the relational features \u2206t\nThe \ufb01rst task transfers the CNN learned from the DBA task to logical exclusive-or equations\nconstructed by the same characters. As shown in Fig. 9, although the \ufb01nal performances of ABLs\nwith and without perception transfer are comparable, the convergence of the ABL with perception\ntransfer is much faster. The second task transfers the learned knowledge model from RBA to DBA\ndomains. As depicted in the right side of the same \ufb01gure, ABL with knowledge transfer converged\nsigni\ufb01cantly faster than the compared method. However, comparing the results between knowledge\ntransfer and perception transfer, we can see that machine learning from sub-symbolic data without\nexplicitly providing the labels is considerably more dif\ufb01cult.\n\n6 Discussion\n\nAs an important cognitive model in psychology, abduction has already attracted some attention in\narti\ufb01cial intelligence [14, 10], while most of existing works combining abduction and induction\nonly consider symbolic domains [37, 7, 29]. There are also some works use abduction to enhance\nmachine learning [4, 24], however, they need to adapt logical background knowledge into functional\nconstraints or use particularly designed operators to support gradient descent during learning and\nreasoning, which relax logical inference into a different continuous optimisation problem.\nOn the other hand, ABL utilises logical abduction and trial-and-error search to bridge machine\nlearning with original \ufb01rst-order logic, without using gradient. As the result, ABL inherits the full\npower of \ufb01rst-order logical reasoning, e.g., it has the potential of abducing new \ufb01rst-order logical\ntheories that are not in the background knowledge [26]. Consequently, many existing symbolic AI\ntechniques can be directly incorporated without any modi\ufb01cation.\n\nABL-CLP(FD)\n0.981 \u00b1 0.006\n\nCNN\n\n0.736 \u00b1 0.016\n\nMethod\nAcc.\nMethod\nAcc.\nMethod\nAcc.\n\nABL-ALP\n0.976 \u00b1 0.008\nABL-CHR\n0.979 \u00b1 0.007\nBi-LSTM\n0.513 \u00b1 0.013\n\nIn order to verify the \ufb02exibility of\nthe ABL framework, a further experi-\nment on the extended n-queens task is\nshown with Fig. 10, whose inputs are\nimages of randomly generated chess-\nboards consist of blanks, queens, cas-\ntles and bishops represented by ran-\ndomly sampled MNIST images, and\n(a) Input image.\nthe associated labels are validity of\nFigure 10: The extended n-queens experiments, n \u2208 {2..10}.\neach board. In this task, we imple-\nmented logical abduction with Prolog-\nbased ALP and two popular constraint logic programming [16] systems: Constraint Handling\nRules [8] and CLP(FD) [3]. Given recursive \ufb01rst-order logical background knowledge about chess\nmoves, the ABL-based approaches achieved better results comparing to CNN and Bi-LSTM.\n\n(b) Experimental results.\n\n7 Conclusion\n\nIn this paper, we present the abductive learning, where machine learning and logical reasoning can\nbe entangled and mutually bene\ufb01cial. Our initial implementation of the ABL framework shows that\nit is possible to simultaneously perform sub-symbolic machine learning and full-featured \ufb01rst-order\nlogical reasoning that allows recursion.\nThis framework is general and \ufb02exible. For example, the perception machine learning model could\nbe a pre-trained model rather than to be learned from scratch; The task for machine learning could\nbe semi-supervised rather than having no label at all; The logical abduction could involve second-\norder logic clauses to enable abducing recursive clauses and automatically inventing predicates [26].\nWe hope that the exploration of abductive learning will help pave the way to a uni\ufb01ed framework\naccommodating learning and reasoning.\n\nAcknowledgement This research was supported by the National Key R&D Program of China\n(2018YFB1004300), NSFC (61751306), and the Collaborative Innovation Centre of Novel Software\nTechnology and Industrialisation. The authors would like to thank Yu-Xuan Huang and Le-Wen Cai\nfor their help on experiments, and thank the reviewers for their insightful comments.\n\n9\n\ncon\ufb02ict8=bishop9=queen6=castleNegative Example\fReferences\n[1] M. Bo\u0161njak, T. Rockt\u00e4schel, J. Naradowsky, and S. Riedel. Programming with a differentiable\nforth interpreter. In Proceedings of the 34th International Conference on Machine Learning,\npages 547\u2013556, Sydney, Australia, 2017.\n\n[2] C. P. Bowditch. The numeration, calendar systems and astronomical knowledge of the Mayas.\n\nCambridge University Press, Cambridge, UK, 1910.\n\n[3] M. Carlsson, G. Ottosson, and B. Carlson. An open-ended \ufb01nite domain constraint solver. In\nProceedings of the International Symposium on Programming Language Implementation and\nLogic Programming, pages 191\u2013206, Southampton, UK, 1997. Springer.\n\n[4] W.-Z. Dai and Z.-H. Zhou. Combining logical abduction and statistical induction: Discovering\nwritten primitives with human knowledge. In Proceedings of the 31st AAAI Conference on\nArti\ufb01cial Intelligence, pages 4392\u20134398, San Francisco, CA, 2017. AAAI.\n\n[5] L. De Raedt and A. Kimmig. Probabilistic (logic) programming concepts. Machine Learning,\n\n100(1):5\u201347, 2015.\n\n[6] R. Evans and E. Grefenstette. Learning explanatory rules from noisy data. Journal of Arti\ufb01cial\n\nIntelligence Research, 61:1\u201364, 2018.\n\n[7] P. A. Flach and A. C. Kakas, editors. Abduction and Induction. Springer, Dordrecht, Netherlands,\n\n2000.\n\n[8] T. Fr\u00fchwirth. Theory and practice of constraint handling rules. The Journal of Logic Program-\n\nming, 37(1):95\u2013138, 1998.\n\n[9] A. S. d. Garcez, K. B. Broda, and D. M. Gabbay. Neural-symbolic learning systems: foundations\n\nand applications. Springer-Verlag, London, UK, 2012.\n\n[10] A. S. d. Garcez, D. M. Gabbay, O. Ray, and J. Woods. Abductive reasoning in neural-symbolic\n\nsystems. Topoi, 26(1):37\u201349, Mar 2007.\n\n[11] A. L. Gaunt, M. Brockschmidt, N. Kushman, and D. Tarlow. Differentiable programs with\nneural libraries. In Proceedings of the 34th Annual International Conference on Machine\nLearning, pages 1213\u20131222, Sydney, Australia, 2017. ACM.\n\n[12] L. Getoor and B. Taskar, editors. Introduction to statistical relational learning. MIT Press,\n\nCambridge, Massachusetts, 2007.\n\n[13] A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwi\u00b4nska, S. G.\nColmenarejo, E. Grefenstette, T. Ramalho, and J. Agapiou. Hybrid computing using a neural\nnetwork with dynamic external memory. Nature, 538(7626):471\u2013476, 2016.\n\n[14] S. H\u00f6lldobler, T. Philipp, and C. Wernhard. An abductive model for human reasoning. In Papers\nfrom the 2011 AAAI Spring Symposium, Technical Report SS-11, Stanford, CA, 2011. AAAI.\n\n[15] S. D. Houston, O. C. Mazariegos, and D. Stuart, editors. The decipherment of ancient Maya\n\nwriting. University of Oklahoma Press, Norman, OK, 2001.\n\n[16] J. Jaffar and M. J. Maher. Constraint logic programming: a survey. The Journal of Logic\n\nProgramming, 19-20:503 \u2013 581, 1994. Special Issue: Ten Years of Logic Programming.\n\n[17] R. Jia and P. Liang. Adversarial examples for evaluating reading comprehension systems. In\nProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,\npages 2011\u20132021, Copenhagen, Denmark, 2017. ACL.\n\n[18] A. C. Kakas, R. A. Kowalski, and F. Toni. Abductive logic programming. Journal of Logic\n\nComputation, 2(6):719\u2013770, 1992.\n\n[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Advances in Neural Information Processing Systems 25, pages 1097\u20131105.\nCurran Associates Inc., 2012.\n\n10\n\n\f[20] T. D. Kulkarni, P. Kohli, J. B. Tenenbaum, and V. Mansinghka. Picture: A probabilistic\nprogramming language for scene perception. In Proceedings of IEEE Conference on Computer\nVision and Pattern Recognition, pages 4390\u20134399, Boston, MA, 2015. IEEE.\n\n[21] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through\n\nprobabilistic program induction. Science, 350(6266):1332\u20131338, 2015.\n\n[22] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document\n\nrecognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[23] L. Magnani. Abductive Cognition: The Epistemological and Eco-Cognitive Dimensions of\n\nHypothetical Reasoning. Springer-Verlag, Berlin, German, 1st edition, 2009.\n\n[24] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt. Deepproblog: Neural\nprobabilistic logic programming. In Advances in Neural Information Processing Systems 31,\npages 3749\u20133759. Curran Associates, Inc., 2018.\n\n[25] S. H. Muggleton. Inductive logic programming. New Generation Computing, 8(4):295\u2013318,\n\n1991.\n\n[26] S. H. Muggleton, D. Lin, and A. Tamaddoni-Nezhad. Meta-interpretive learning of higher-order\ndyadic datalog: Predicate invention revisited. Machine Learning, 2015. Published online: DOI\n10.1007/s10994-014-5471-y.\n\n[27] A. Newell and H. A. Simon. The logic theory machine \u2013 A complex information processing\n\nsystem. IRE Transactions on Information Theory, 2(3):61\u201379, 1956.\n\n[28] S. C. Peirce. Abduction and induction. In J. Buchler, editor, Philosophical writings of peirce.\n\nDover Publications, New York, NY, 1955.\n\n[29] O. Ray. Nonmonotonic abductive inductive learning. Journal of Applied Logic, 7(3):329 \u2013 340,\n\n2009. Special Issue: Abduction and Induction in Arti\ufb01cial Intelligence.\n\n[30] S. J. Russell. Unifying logic and probability. Communications of the ACM, 58(7):88\u201397, 2015.\n\n[31] A. Santoro, D. Raposo, D. G. T. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. P.\nLillicrap. A simple neural network module for relational reasoning. CoRR, abs/1706.01427,\n2017.\n\n[32] M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on\n\nSignal Processing, 45(11):2673\u20132681, 1997.\n\n[33] H. A. Simon and A. Newell. Human problem solving: The state of the theory in 1970. American\n\nPsychologist, 26(2):145, 1971.\n\n[34] R. L. Solso, O. H. MacLin, and M. K. MacLin. Cognitive Psychology. Pearson/Allyn and\n\nBacon, New York, NY, 8th edition, 2008.\n\n[35] J. B. Tenenbaum, C. Kemp, T. L. Grif\ufb01ths, and N. D. Goodman. How to grow a mind: Statistics,\n\nstructure, and abstraction. Science, 331(6022):1279\u20131285, 2011.\n\n[36] M. Thoma. The HASYv2 dataset. CoRR, abs/1701.08380, 2017.\n\n[37] C. A. Thompson and R. J. Mooney. Inductive learning for abductive diagnosis. In Proceedings\nof the 12th National Conference on Arti\ufb01cial Intelligence, pages 664\u2013669, Seattle, WA, 1994.\nAAAI.\n\n[38] G. G. Towell and J. W. Shavlik. Knowledge-based arti\ufb01cial neural networks. Arti\ufb01cial Intelli-\n\ngence, 70(1):119 \u2013 165, 1994.\n\n[39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and\nI. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems\n30, pages 5998\u20136008. Curran Associates, Inc., 2017.\n\n11\n\n\f[40] Y. Yu, H. Qian, and Y.-Q. Hu. Derivative-free optimization via classi\ufb01cation. In Proceedings\nof the 30th AAAI Conference on Arti\ufb01cial Intelligence, pages 2286\u20132292, Phoenix, AZ, 2016.\nAAAI.\n\n[41] L. A. Zadeh. Fuzzy sets. Information and Control, 8(3):338\u2013353, 1965.\n\n[42] Z.-H. Zhou. Abductive learning: towards bridging machine learning and logical reasoning.\n\nScience China Information Sciences, 62(7), 2019.\n\n12\n\n\f", "award": [], "sourceid": 1598, "authors": [{"given_name": "Wang-Zhou", "family_name": "Dai", "institution": "Imperial College London"}, {"given_name": "Qiuling", "family_name": "Xu", "institution": "Purdue University"}, {"given_name": "Yang", "family_name": "Yu", "institution": "Nanjing University"}, {"given_name": "Zhi-Hua", "family_name": "Zhou", "institution": "Nanjing University"}]}