{"title": "End-to-End Goal-Driven Web Navigation", "book": "Advances in Neural Information Processing Systems", "page_first": 1903, "page_last": 1911, "abstract": "We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a website, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision-making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia. We extensively evaluate different variants of neural net based artificial agents on WikiNav and observe that the proposed goal-driven web navigation well reflects the advances in models, making it a suitable benchmark for evaluating future progress. Furthermore, we extend the WikiNav with question-answer pairs from Jeopardy! and test the proposed agent based on recurrent neural networks against strong inverted index based search engines. The artificial agents trained on WikiNav outperforms the engined based approaches, demonstrating the capability of the proposed goal-driven navigation as a good proxy for measuring the progress in real-world tasks such as focused crawling and question-answering.", "full_text": "End-to-End Goal-Driven Web Navigation\n\nTandon School of Engineering\n\nCourant Institute of Mathematical Sciences\n\nRodrigo Nogueira\n\nNew York University\n\nrodrigonogueira@nyu.edu\n\nKyunghyun Cho\n\nNew York University\n\nkyunghyun.cho@nyu.edu\n\nAbstract\n\nWe propose a goal-driven web navigation as a benchmark task for evaluating an\nagent with abilities to understand natural language and plan on partially observed\nenvironments. In this challenging task, an agent navigates through a website,\nwhich is represented as a graph consisting of web pages as nodes and hyperlinks as\ndirected edges, to \ufb01nd a web page in which a query appears. The agent is required\nto have sophisticated high-level reasoning based on natural languages and ef\ufb01cient\nsequential decision-making capability to succeed. We release a software tool,\ncalled WebNav, that automatically transforms a website into this goal-driven web\nnavigation task, and as an example, we make WikiNav, a dataset constructed from\nthe English Wikipedia. We extensively evaluate different variants of neural net\nbased arti\ufb01cial agents on WikiNav and observe that the proposed goal-driven web\nnavigation well re\ufb02ects the advances in models, making it a suitable benchmark\nfor evaluating future progress. Furthermore, we extend the WikiNav with question-\nanswer pairs from Jeopardy! and test the proposed agent based on recurrent neural\nnetworks against strong inverted index based search engines. The arti\ufb01cial agents\ntrained on WikiNav outperforms the engined based approaches, demonstrating the\ncapability of the proposed goal-driven navigation as a good proxy for measuring\nthe progress in real-world tasks such as focused crawling and question-answering.\n\n1\n\nIntroduction\n\nIn recent years, there have been many exciting advances in building an arti\ufb01cial agent, which can be\ntrained with one learning algorithm, to solve many relatively large-scale, complicated tasks (see, e.g.,\n[8, 10, 6].) In much of these works, target tasks were computer games such as Atari games [8] and\nracing car game [6].\nThese successes have stimulated researchers to apply a similar learning mechanism to language-based\ntasks, such as multi-user dungeon (MUD) games [9, 4]. Instead of visual perception, an agent\nperceives the state of the world by its written description. A set of actions allowed to the agent is\neither \ufb01xed or dependent on the current state. This type of task can ef\ufb01ciently evaluate the agent\u2019s\nability of not only in planning but also language understanding.\nWe, however, notice that these MUD games do not exhibit the complex nature of natural languages\nto the full extent. For instance, the largest game world tested by Narasimhan et al. [9] uses a\nvocabulary of only 1340 unique words, and the largest game tested by He et al. [4] uses only 2258\nwords. Furthermore, the description of a state at each time step is almost always limited to the visual\ndescription of the current scene, lacking any use of higher-level concepts present in natural languages.\nIn this paper, we propose a goal-driven web navigation as a large-scale alternative to the text-based\ngames for evaluating arti\ufb01cial agents with natural language understanding and planning capability.\nThe proposed goal-driven web navigation consists of the whole website as a graph, in which the web\npages are nodes and hyperlinks are directed edges. An agent is given a query, which consists of one\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\for more sentences taken from a randomly selected web page in the graph, and navigates the network,\nstarting from a prede\ufb01ned starting node, to \ufb01nd a target node in which the query appears. Unlike\nthe text-based games, this task utilizes the existing text as it is, resulting in a large vocabulary with\na truly natural language description of the state. Furthermore, the task is more challenging as the\naction space greatly changes with respect to the state in which the agent is.\nWe release a software tool, called WebNav, that converts a given website into a goal-driven web\nnavigation task. As an example of its use, we provide WikiNav, which was built from English\nWikipedia. We design arti\ufb01cial agents based on neural networks (called NeuAgents) trained with\nsupervised learning, and report their respective performances on the benchmark task as well as the\nperformance of human volunteers. We observe that the dif\ufb01culty of a task generated by WebNav is\nwell controlled by two control parameters; (1) the maximum number of hops from a starting to a\ntarget node Nh and (2) the length of query Nq.\nFurthermore, we extend the WikiNav with an additional set of queries that are constructed from\nJeopardy! questions, to which we refer by WikiNav-Jeopardy. We evaluate the proposed NeuAgents\nagainst the three search-based strategies; (1) SimpleSearch, (2) Apache Lucene and (3) Google\nSearch API. The result in terms of document recall indicates that the NeuAgents outperform those\nsearch-based strategies, implying a potential for the proposed task as a good proxy for practical\napplications such as question-answering and focused crawling.\n\n2 Goal-driven Web Navigation\nA task T of goal-driven web navigation is characterized by\n\nT = (A, sS,G, q, R, \u2126).\n\n(1)\nThe world in which an agent A navigates is represented as a graph G = (N ,E). The graph consists\nof a set of nodes N = {si}NN\ni=1 and a set of directed edges E = {ei,j} connecting those nodes. Each\nnode represents a page of the website, which, in turn, is represented by the natural language text\nD(si) in it. There exists an edge going from a page si to sj if and only if there is a hyperlink in D(si)\nthat points to sj. One of the nodes is designated as a starting node sS from which any navigation\nbegins. A target node is the one whose natural language description contains a query q, and there\nmay be more than one target node.\nAt each time step, the agent A reads the natural language description D(st) of the current node in\nwhich the agent has landed. At no point, the whole world, consisting of the nodes and edges, nor its\nstructure or map (graph structure without any natural language description) is visible to the agent,\nthus making this task partially observed.\nOnce the agent A reads the description D(si) of the current node si, it can take one of the actions\navailable. A set of possible actions is de\ufb01ned as a union of all the outgoing edges ei,\u00b7 and the stop\naction, thus making the agent have state-dependent action space.\nEach edge ei,k corresponds to the agent jumping to a next node sk, while the stop action corresponds\nto the agent declaring that the current node si is one of the target nodes. Each edge ei,k is represented\nby the description of the next node D(sk). In other words, deciding which action to take is equivalent\nto taking a peek at each neighboring node and seeing whether that node is likely to lead ultimately to\na target node.\nThe agent A receives a reward R(si, q) when it chooses the stop action. This task uses a simple\nbinary reward, where\n\n(cid:26) 1,\n\nR(si, q) =\n\nif q \u2286 D(si)\n\n0, otherwise\n\nConstraints\nIt is clear that there exists an ultimate policy for the agent to succeed at every trial,\nwhich is to traverse the graph breadth-\ufb01rst until the agent \ufb01nds a node in which the query appears. To\navoid this kind of degenerate policies, the task includes a set of four rules/constraints \u2126:\n\n1. An agent can follow at most Nn edges at each node.\n2. An agent has a \ufb01nite memory of size smaller than T .\n\n2\n\n\fTable 1: Dataset Statistics of WikiNav-4-*, WikiNav-8-*, WikiNav-16-* and WikiNav-Jeopardy.\n\nWikiNav-4-* WikiNav-8-* WikiNav-16-* WikiNav-Jeopardy\n\nTrain\nValid\nTest\n\n6.0k\n1k\n1k\n\n1M\n20k\n20k\n\n12M\n20k\n20k\n\n113k\n10k\n10k\n\n3. An agent moves up to Nh hops away from sS.\n4. A query of size Nq comes from at least two hops away from the starting node.\n\nThe \ufb01rst constraint alone prevents degenerate policies, such as breadth-\ufb01rst search, forcing the agent\nto make good decisions as possible at each node. The second one further constraints ensure that the\nagent does not cheat by using earlier trials to reconstruct the whole graph structure (during test time)\nor to store the entire world in its memory (during training.) The third constraint, which is optional, is\nthere for computational consideration. The fourth constraint is included because the agent is allowed\nto read the content of a next node.\n\n3 WebNav: Software\n\nAs a part of this work, we build and release a software tool which turns a website into a goal-driven\nweb navigation task.1 We call this tool WebNav. Given a starting URL, the WebNav reads the whole\nwebsite, constructs a graph with the web pages in the website as nodes. Each node is assigned a\nunique identi\ufb01er si. The text content of each node D(si) is a cleaned version of the actual HTML\ncontent of the corresponding web page. The WebNav turns intra-site hyperlinks into a set of edges\nei,j.\nIn addition to transforming a website into a graph G from Eq. (1), the WebNav automatically selects\nqueries from the nodes\u2019 texts and divides them into training, validation, and test sets. We ensure that\nthere is no overlap among three sets by making each target node, from which a query is selected,\nbelongs to only one of them.\nEach generated example is de\ufb01ned as a tuple\n\nX = (q, s\u2217, p\u2217)\n\n(2)\nwhere q is a query from a web page s\u2217, which was found following a randomly selected path\np\u2217 = (sS, . . . , s\u2217). In other words, the WebNav starts from a starting page sS, random-walks the\ngraph for a prede\ufb01ned number of steps (Nh/2, in our case), reaches a target node s\u2217 and selects a\nquery q from D(s\u2217). A query consists of Nq sentences and is selected among the top-5 candidates\nin the target node with the highest average TF-IDF, thus discouraging the WebNav from choosing a\ntrivial query.\nFor the evaluation purpose alone, it is enough to use only a query q itself as an example. However,\nwe include both one target node (among potentially many other target nodes) and one path from the\nstarting node to this target node (again, among many possible connecting paths) so that they can be\nexploited when training an agent. They are not to be used when evaluating a trained agent.\n\n4 WikiNav: A Benchmark Task\n\nWith the WebNav, we built a benchmark goal-driven navigation task using Wikipedia as a target\nwebsite. We used the dump \ufb01le of the English Wikipedia from September 2015, which consists of\nmore than \ufb01ve million web pages. We built a set of separate tasks with different levels of dif\ufb01culty by\nvarying the maximum number of allowed hops Nh \u2208 {4, 8, 16} and the size of query Nq \u2208 {1, 2, 4}.\nWe refer to each task by WikiNav-Nh-Nq.\nFor each task, we generate training, validation and test examples from the pages half as many hops\naway from a starting page as the maximum number of hops allowed.2 We use \u201cCategory:Main topic\nclassi\ufb01cations\u201d as a starting node sS.\n\n1 The source code and datasets are publicly available at github.com/nyu-dl/WebNav.\n2 This limit is an arti\ufb01cial limit we chose for computational reasons.\n\n3\n\n\fTable 3: Sample query-answer pairs from WikiNav-Jeopardy.\nAnswer\n\nQuery\nFor the last 8 years of his life, Galileo was under\nhouse arrest for espousing this man\u2019s theory.\nIn the winter of 1971-72, a record 1,122 inches of snow fell Washington\nat Rainier Paradise Ranger Station in this state.\nThis company\u2019s Accutron watch, introduced in 1960,\nhad a guarantee of accuracy to within one minute a month.\n\nCopernicus\n\nBulova\n\nAs a minimal cleanup procedure, we excluded meta articles whose titles start with \u201cWikipedia\u201d.\nAny hyperlink that leads to a web page outside Wikipedia is removed in advance together with the\nfollowing sections: \u201cReferences\u201d, \u201cExternal Links\u201d, \u201cBibliography\u201d and \u201cPartial Bibliography\u201d.\nIn Table 2, we present basic per-article statistics of the\nEnglish Wikipedia. It is evident from these statistics that\nthe world of WikiNav-Nh-Nq is large and complicated,\neven after the cleanup procedure.\nWe ended up with a fairly small dataset for WikiNav-4-*,\nbut large for WikiNav-8-* and WikiNav-16-*. See Table 1\nfor details.\n\nHyperlinks Words\n462.5\n990.2\n132881\n\n\u221a\nAvg.\nVar\nMax\nMin\n\n4.29\n13.85\n300\n0\n\n1\n\nTable 2: Per-page statistics of English\nWikipedia.\n\n4.1 Related Work: Wikispeedia\n\nThis work is indeed not the \ufb01rst to notice the possibility of a website, or possibly the whole web, as a\nworld in which intelligent agents explore to achieve a certain goal. One most relevant recent work to\nours is perhaps Wikispeedia from [14, 12, 13].\nWest et al. [14, 12, 13] proposed the following game, called Wikispeedia. The game\u2019s world is nearly\nidentical to the goal-driven navigation task proposed in this work. More speci\ufb01cally, they converted\n\u201cWikipedia for Schools\u201d, which contains approximately 4,000 articles as of 2008, into a graph whose\nnodes are articles and directed edges are hyperlinks. From this graph, a pair of nodes is randomly\nselected and provided to an agent.\nThe agent\u2019s goal is to start from the \ufb01rst node, navigate the graph and reach the second node. Similarly\nto the WikiNav, the agent has access to the text content of the current nodes and all the immediate\nneighboring nodes. One major difference is that the target is given as a whole article, meaning that\nthere is a single target node in the Wikispeedia while there may be multiple target nodes in the\nproposed WikiNav.\nFrom this description, we see that the goal-driven web navigation is a generalization and re-framing\nof the Wikispeedia. First, we constrain a query to contain less information, making it much more\ndif\ufb01cult for an agent to navigate to a target node. Furthermore, a major research question by West and\nLeskovec [13] was to \u201cunderstand how humans navigate and \ufb01nd the information they are looking\nfor ,\u201d whereas in this work we are fully focused on proposing an automatic tool to build a challenging\ngoal-driven tasks for designing and evaluating arti\ufb01cial intelligent agents.\n\n5 WikiNav-Jeopardy: Jeopardy! on WikiNav\n\nOne of the potential practical applications utilizing the goal-drive navigation is question-answering\nbased on world knowledge. In this Q&A task, a query is a question, and an agent navigates a given\ninformation network, e.g., website, to retrieve an answer. In this section, we propose and describe\nan extension of the WikiNav, in which query-target pairs are constructed from actual Jeopardy!\nquestion-answer pairs. We refer to this extension of WikiNav by WikiNav-Jeopardy.\nWe \ufb01rst extract all the question-answer pairs from J! Archive3, which has more than 300k such\npairs. We keep only those pairs whose answers are titles of Wikipedia articles, leaving us with 133k\npairs. We divide those pairs into 113k training, 10k validation, and 10k test examples while carefully\n\n3 www.j-archive.com\n\n4\n\n\fensuring that no article appears in more than one partition. Additionally, we do not shuf\ufb02e the original\npairs to ensure that the train and test examples are from different episodes.\nFor each training pair, we \ufb01nd one path from the starting node \u201cMain Topic Classi\ufb01cation\u201d to the\ntarget node and include it for supervised learning. For reference, the average number of hops to\nthe target node is 5.8, the standard deviation is 1.2, and the maximum and minimum are 2 and 10,\nrespectively. See Table 3 for sample query-answer pairs.\n\n6 NeuAgent: Neural Network based Agent\n\n6.1 Model Description\n\nCore Function The core of the NeuAgent is a parametric function fcore that takes as input the\ncontent of the current node \u03c6c(si) and a query \u03c6q(q), and that returns the hidden state of the agent.\nThis parametric function fcore can be implemented either as a feedforward neural network fff:\n\nht = fff(\u03c6c(si), \u03c6q(q))\n\nwhich does not take into account the previous hidden state of the agent or as a recurrent neural\nnetwork frec:\n\nht = frec(ht\u22121, \u03c6c(si), \u03c6q(q)).\n\nWe refer to these two types of agents by NeuAgent-FF and NeuAgent-Rec, respectively. For the\nNeuAgent-FF, we use a single tanh layer, while we use long short-term memory (LSTM) units [5],\nwhich have recently become de facto standard, for the NeuAgent-Rec.\n\nBased on the new hidden state ht, the NeuAgent computes\nthe probability distribution over all the outgoing edges ei.\nThe probability of each outgoing edge is proportional to\nthe similarity between the hidden state ht such that\n\np(ei,j|\u02dcp) \u221d exp(cid:0)\u03c6c(sj)(cid:62)ht\n\n(cid:1) .\n\n(3)\n\nNote that the NeuAgent peeks at the content of the next\nnode sj by considering its vector representation \u03c6c(sj).\nIn addition to all the outgoing edges, we also allow the\nagent to stop with the probability\n\np(\u2205|\u02dcp) \u221d exp(cid:0)v(cid:62)\n\n(cid:1) ,\n\n\u2205 ht\n\nFigure 1: Graphical illustration of a\nsingle step performed by the baseline\nmodel, NeuAgent.\n\n(4)\nwhere the stop action vector v\u2205 is a trainable parameter.\nIn the case of NeuAgent-Rec, all these (unnormalized)\nprobabilities are conditioned on the history \u02dcp which is\na sequence of actions (nodes) selected by the agent so\nfar. We apply a softmax normalization on the unnormalized probabilities to obtain the probability\ndistribution over all the possible actions at the current node si.\nThe NeuAgent then selects its next action based on this action probability distribution (Eqs. (3) and\n(4)). If the stop action is chosen, the NeuAgent returns the current node as an answer and receives a\nreward R(si, q), which is one if correct and zero otherwise. If the agent selects one of the outgoing\nedges, it moves to the selected node and repeats this process of reading and acting.\nSee Fig. 1 for a single step of the described NeuAgent.\nContent Representation The NeuAgent represents the content of a node si as a vector \u03c6c(si) \u2208 Rd.\nIn this work, we use a continuous bag-of-words vector for each document:\n\n\u03c6c(si) =\n\n1\n\n|D(si)|\n\n|D(si)|(cid:88)\n\nk=1\n\nek.\n\nEach word vector ek is from a pretrained continuous bag-of-words model [7]. These word vectors\nare \ufb01xed throughout training.\n\n5\n\n\fQuery Representation In the case of a query, we consider two types of representation. The \ufb01rst\none is a continuous bag-of-words (BoW) vector, just as used for representing the content of a node.\nThe other one is a dynamic representation based on the attention mechanism [2].\nIn the attention-based query representation, the query is \ufb01rst projected into a set of context vectors.\nThe context vector of the k-th query word is\n\nwhere Wk(cid:48) \u2208 Rd\u00d7d and ek(cid:48) are respectively a trainable weight matrix and a pretrained word vector.\nu is the window size. Each context vector is scored at each time step t by \u03b2t\nk = fatt(ht\u22121, ck) w.r.t.\nthe previous hidden state of the NeuAgent, and all the scores are normalized to be positive and sum\nto one, i.e., \u03b1t\n. These normalized scores are used as the coef\ufb01cients in computing\nthe weighted-sum of query words to result in a query representation at time t:\n\nk = exp(\u03b2t\nk)\n\n(cid:80)|q|\n\nl=1 exp(\u03b2t\nl )\n\nk+u/2(cid:88)\n\nk(cid:48)=k\u2212u/2\n\nck =\n\nWk(cid:48)ek(cid:48),\n\n|q|(cid:88)\n\nk=1\n\n\u03c6q(q) =\n\n1\n|q|\n\n\u03b1t\n\nkck.\n\nLater, we empirically compare these two query representations.\n\n|p\u2217|(cid:88)\n\n6.2\n\nInference: Beam Search\n\nOnce the NeuAgent is trained, there are a number of approaches to using it for solving the proposed\ntask. The most naive approach is simply to let the agent make a greedy decision at each time step, i.e.,\nfollowing the outgoing edge with the highest probability arg maxk log p(ei,k| . . .). A better approach\nis to exploit the fact that the agent is allowed to explore up to Nn outgoing edges per node. We use a\nsimple, forward-only beam search with the beam width capped at Nn. The beam search simply keeps\nthe Nn most likely traces, in terms of log p(ei,k| . . .), at each time step.\n\n6.3 Training: Supervised Learning\n\nIn this paper, we investigate supervised learning, where we train the agent to follow an example trace\np\u2217 = (sS, . . . , s\u2217) included in the training set at each step (see Eq. (2)). In this case, the cost per\ntraining example is\n\nCsup = \u2212 log p(\u2205|p\u2217, q) \u2212\n\nlog p(p\u2217\n\nk|p\u2217\n\n<k, q).\n\n(5)\n\nThis per-example training cost is fully differentiable with respect to all the parameters of the neural\nnetwork, and we use stochastic gradient descent (SGD) algorithm to minimize this cost over the\nwhole training set, where the gradients can be computed by backpropagation [11]. This allows the\nentire model to be trained in an end-to-end fashion, in which the query-to-target performance is\noptimized directly.\n\nk=1\n\n7 Human Evaluation\n\nOne unique aspect of the proposed task is that it is very dif\ufb01cult for an average person who was not\ntrained speci\ufb01cally for \ufb01nding information by navigating through an information network. There are\na number of reasons behind this dif\ufb01culty. First, the person must be familiar with, via training, the\ngraph structure of the network, and this often requires many months, if not years, of training. Second,\nthe person must have in-depth knowledge of a broad range of topics in order to make a connection\nvia different concepts between the themes and topics of a query to a target node. Third, each trial\nrequires the person carefully to read the whole content of the nodes as she navigates, which is a\ntime-consuming and exhausting job.\nWe asked \ufb01ve volunteers to try up to 20 four-sentence-long queries4 randomly selected from the test\nsets of WikiNav-{4, 8, 16}-4 datasets. They were given up to two hours, and they were allowed to\n4 In a preliminary study with other volunteers, we found that, when the queries were shorter than 4, they\n\nwere not able to solve enough trials for us to have meaningful statistics.\n\n6\n\n\fTable 4: The average reward by the NeuAgents and humans on the test sets of WikiNav-Nh-Nq.\n\nfcore\nfff\nfrec\nfrec\nfrec\n\n(a)\n(b)\n(c)\n(d)\n(e)\n\nLayers\u00d7Units\n1 \u00d7 512\n1 \u00d7 512\n8 \u00d7 2048\n8 \u00d7 2048\nHumans\n\n\u03c6q\nBoW\nBoW\nBoW\nAtt\n\nNq = 1\n8\n4.7\n5.1\n10.9\n15.8\n\n-\n\nNh = 4\n\n21.5\n22.0\n17.7\n22.9\n\n-\n\n16\n1.2\n1.7\n8.0\n12.5\n\n-\n\n4\n\n40.0\n41.1\n35.8\n41.7\n\n-\n\n2\n8\n9.2\n9.2\n19.9\n24.5\n\n-\n\n16\n1.9\n2.1\n13.9\n17.8\n\n-\n\n4\n\n45.1\n44.8\n39.5\n46.8\n14.5\n\n4\n8\n\n12.9\n13.3\n28.1\n34.2\n8.8\n\n16\n2.9\n3.6\n21.9\n28.2\n5.0\n\nchoose up to the same maximum number of explored edges per node Nn as the NeuAgents (that\nis, Nn = 4), and also were given the option to give up. The average reward was computed as the\nfraction of correct trials over all the queries presented.\n\n8 Results and Analysis\n\n8.1 WikiNav\n\nWe report in Table 4 the performance of the NeuAgent-FF and NeuAgent-Rec models on the test\nset of all nine WikiNav-{4, 8, 16}-{1, 2, 4} datasets. In addition to the proposed NeuAgents, we also\nreport the results of the human evaluation.\nWe clearly observe that the level of dif\ufb01culty is indeed negatively correlated with the query length\nNq but is positively correlated with the maximum number of allowed hops Nh. The latter may be\nconsidered trivial, as the size of the search space grows exponentially with respect to Nh, but the\nformer is not. The former negative correlation con\ufb01rms that it is indeed easier to solve the task with\nmore information in a query. We conjecture that the agent requires more in-depth understanding of\nnatural languages and planning to overcome the lack of information in the query to \ufb01nd a path toward\na target node.\nThe NeuAgent-FF and NeuAgent-Rec shares similar performance when the maximum number of\nallowed hops is small (Nh = 4), but NeuAgent-Rec ((a) vs. (b)) performs consistently better for\nhigher Nh, which indicates that having access to history helps in long-term planning tasks. We also\nobserve that the larger and deeper NeuAgent-Rec ((b) vs (c)) signi\ufb01cantly outperforms the smaller\none, when a target node is further away from the starting node sS.\nThe best performing model in (d) used the attention-based query representation, especially as the\ndif\ufb01culty of the task increased (Nq \u2193 and Nh \u2191), which supports our claim that the proposed task\nof goal-driven web navigation is a challenging benchmark for evaluating future progress. In Fig. 2,\nwe present an example of how the attention weights over the query words dynamically evolve as the\nmodel navigates toward a target node.\nThe human participants generally performed worse than the NeuAgents. We attribute this to a number\nof reasons. First, the NeuAgents are trained speci\ufb01cally on the target domain (Wikipedia), while the\nhuman participants have not been. Second, we observed that the volunteers were rapidly exhausted\nfrom reading multiple articles in sequence. In other words, we \ufb01nd the proposed benchmark, WebNav,\nas a good benchmark for machine intelligence but not for comparing it against human intelligence.\n\nFigure 2: Visualization of the attention weights over a test query. The horizontal axis corresponds to\nthe query words, and the vertical axis to the articles titles visited.\n\n7\n\n1918 Kentuchy DerbyCategory: Kentuchy Derby RacesCategory: Kentuchy DerbyCategory: Sports events in Louisville, KentuchyCategory: Sports Events by CityCategory: Sports EventsCategory: SportsCategory: Main Topic ClassificationsTheKentuchyDerbywastherunningofthe[[KentuchyDerbyTheracetookplaceMay1918.FullResultsPayoutThewinnerreceivedapurseof$15,000.\f8.2 WikiNav-Jeopardy\n\nSettings We test the best model from the previous experiment (NeuAgent-Rec with 8 layers of 2048\nLSTM units and the attention-based query representation) on the WikiNav-Jeopardy. We evaluate\ntwo training strategies. The \ufb01rst strategy is straightforward supervise learning, in which we train a\nNeuAgent-Rec on WikiNav-Jeopardy from scratch. In the other strategy, we pretrain a NeuAgent-Rec\n\ufb01rst on the WikiNav-16-4 and \ufb01netune it on WikiNav-Jeopardy.\nWe compare the proposed NeuAgent against three search strategies. The \ufb01rst one, SimpleSearch, is\na simple inverted index based strategy. SimpleSearch scores each Wikipedia article by the TF-IDF\nweighted sum of words that co-occur in the articles and a query and returns top-K articles. Second,\nwe use Lucene, a popular open source information retrieval library, in its default con\ufb01guration on\nthe whole Wikipedia dump. Lastly, we use Google Search API5, while restricting the domain to\nwikipedia.org.\nEach system is evaluated by document recall at K (Recall@K). We vary K to be 1, 4 or 40. In\nthe case of the NeuAgent, we run beam search with width set to K and returns all the K \ufb01nal\nnodes to compute the document recall. Since there is only one correct document/answer per query,\nPrecision@K = Recall@K / K and therefore we do not show this measure in the results.\n\nTable 5: Recall on WikiNav-Jeopardy. ((cid:63)) Pretrained on WikiNav-16-4.\n\nPre(cid:63)\n\n(cid:88)\n\nModel\nNeuAgent\nNeuAgent\nSimpleSearch\nLucene\nGoogle\n\nRecall@1\n\nRecall@4\n\nRecall@40\n\n13.9\n18.9\n5.4\n6.3\n14.0\n\n20.2\n23.6\n12.6\n14.7\n22.1\n\n33.2\n38.3\n28.4\n36.3\n25.9\n\nResult and Analysis\nIn Table 5, we report the results on WikiNav-Jeopardy. The proposed\nNeuAgent clearly outperforms all the three search-based strategies, when it was pretrained on the\nWikiNav-16-4. The superiority of the pretrained NeuAgent is more apparent when the number of\ncandidate documents is constrained to be small, implying that the NeuAgent is able to accurately\nrank a correct target article. Although the NeuAgent performs comparably to the other search-based\nstrategy even without pretraining, the bene\ufb01t of pretraining on the much larger WikiNav is clear.\nWe emphasize that these search-based strategies have access to all the nodes for each input query.\nThe NeuAgent, on the other hand, only observes the nodes as it visits during navigation. This success\nclearly demonstrates a potential in using the proposed NeuAgent pretrained with a dataset compiled\nby the proposed WebNav for the task of focused crawling [3, 1], which is an interesting problem on\nits own, as much of the content available on the Internet is either hidden or dynamically generated [1].\n\n9 Conclusion\n\nIn this work, we describe a large-scale goal-driven web navigation task and argue that it serves as a\nuseful test bed for evaluating the capabilities of arti\ufb01cial agents on natural language understanding and\nplanning. We release a software tool, called WebNav, that compiles a given website into a goal-driven\nweb navigation task. As an example, we construct WikiNav from Wikipedia using WebNav. We\nextend WikiNav with Jeopardy! questions, thus creating WikiNav-Jeopardy. We evaluate various\nneural net based agents on WikiNav and WikiNav-Jeopardy. Our results show that more sophisticated\nagents have better performance, thus supporting our claim that this task is well suited to evaluate\nfuture progress in natural language understanding and planning. Furthermore, we show that our\nagent pretrained on WikiNav outperforms two strong inverted-index based search engines on the\nWikiNav-Jeopardy. These empirical results support our claim on the usefulness of the proposed task\nand agents in challenging applications such as focused crawling and question-answering.\n\n5 https://cse.google.com/cse\n\n8\n\n\fReferences\n[1] Manuel \u00c1lvarez, Juan Raposo, Alberto Pan, Fidel Cacheda, Fernando Bellas, and V\u00edctor\nCarneiro. Deepbot: a focused crawler for accessing hidden web content. In Proceedings of\nthe 3rd international workshop on Data enginering issues in E-commerce and services: In\nconjunction with ACM Conference on Electronic Commerce (EC\u201907), pages 18\u201325. ACM, 2007.\n\n[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly\n\nlearning to align and translate. In ICLR 2015, 2014.\n\n[3] Soumen Chakrabarti, Martin Van den Berg, and Byron Dom. Focused crawling: a new approach\n\nto topic-speci\ufb01c web resource discovery. Computer Networks, 31(11):1623\u20131640, 1999.\n\n[4] Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf.\nDeep reinforcement learning with an unbounded action space. arXiv preprint arXiv:1511.04636,\n2015.\n\n[5] Sepp Hochreiter and J\u00fcrgen Schmidhuber. Long short-term memory. Neural computation, 9(8):\n\n1735\u20131780, 1997.\n\n[6] Jan Koutn\u00edk, J\u00fcrgen Schmidhuber, and Faustino Gomez. Evolving deep unsupervised convolu-\ntional networks for vision-based reinforcement learning. In Proceedings of the 2014 conference\non Genetic and evolutionary computation, pages 541\u2013548. ACM, 2014.\n\n[7] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Ef\ufb01cient estimation of word\n\nrepresentations in vector space. arXiv preprint arXiv:1301.3781, 2013.\n\n[8] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G\nBellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.\nHuman-level control through deep reinforcement learning. Nature, 518(7540):529\u2013533, 2015.\n\n[9] Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. Language understanding for text-\n\nbased games using deep reinforcement learning. arXiv preprint arXiv:1506.08941, 2015.\n\n[10] Sebastian Risi and Julian Togelius. Neuroevolution in games: State of the art and open\n\nchallenges. arXiv preprint arXiv:1410.7326, 2014.\n\n[11] David Rumelhart, Geoffrey Hinton, and Ronald Williams. Learning representations by back-\n\npropagating errors. Nature, pages 323\u2013533, 1986.\n\n[12] Robert West and Jure Leskovec. Automatic versus human navigation in information networks.\n\nIn ICWSM, 2012.\n\n[13] Robert West and Jure Leskovec. Human way\ufb01nding in information networks. In 21st Interna-\n\ntional World Wide Web Conference, pages 619\u2013628. ACM, 2012.\n\n[14] Robert West, Joelle Pineau, and Doina Precup. Wikispeedia: An online game for inferring\n\nsemantic distances between concepts. In IJCAI, pages 1598\u20131603, 2009.\n\n9\n\n\f", "award": [], "sourceid": 1046, "authors": [{"given_name": "Rodrigo", "family_name": "Nogueira", "institution": "New York University"}, {"given_name": "Kyunghyun", "family_name": "Cho", "institution": "University of Montreal"}]}