{"title": "Effects of Spatial and Temporal Contiguity on the Acquisition of Spatial Information", "book": "Advances in Neural Information Processing Systems", "page_first": 17, "page_last": 23, "abstract": null, "full_text": "Effects of Spatial and Temporal Contiguity on \n\nthe Acquisition of Spatial Information \n\nThea B. Ghiselli-Crippa and Paul W. Munro \n\nDepartment of Information Science and Telecommunications \n\nUniversity of Pittsburgh \nPittsburgh, PA 15260 \n\ntbgst@sis.pitt.edu, munro@sis.pitt.edu \n\nAbstract \n\nSpatial information comes in two forms: direct spatial information (for \nexample, retinal position) and indirect temporal contiguity information, \nsince objects encountered sequentially are in general spatially close. The \nacquisition of spatial information by a neural network is investigated \nhere. Given a spatial layout of several objects, networks are trained on a \nprediction task. Networks using temporal sequences with no direct spa(cid:173)\ntial information are found to develop internal representations that show \ndistances correlated with distances in the external layout. The influence \nof spatial information is analyzed by providing direct spatial information \nto the system during training that is either consistent with the layout or \ninconsistent with it. This approach allows examination of the relative \ncontributions of spatial and temporal contiguity. \n\n1 \n\nIntroduction \n\nSpatial information is acquired by a process of exploration that is fundamentally tempo(cid:173)\nral, whether it be on a small scale, such as scanning a picture, or on a larger one, such as \nphysically navigating through a building, a neighborhood, or a city. Continuous scanning \nof an environment causes locations that are spatially close to have a tendency to occur in \ntemporal proximity to one another. Thus, a temporal associative mechanism (such as a \nHebb rule) can be used in conjunction with continuous exploration to capture the spatial \nstructure of the environment [1]. However, the actual process of building a cognitive map \nneed not rely solely on temporal associations, since some spatial information is encoded in \nthe sensory array (position on the retina and proprioceptive feedback). Laboratory studies \nshow different types of interaction between the relative contributions of temporal and spa(cid:173)\ntial contiguities to the formation of an internal representation of space. While Clayton and \nHabibi's [2] series of recognition priming experiments indicates that priming is controlled \nonly by temporal associations, in the work of McNamara et al. [3] priming in recogni(cid:173)\ntion is observed only when space and time are both contiguous. In addition, Curiel and \nRadvansky's [4] work shows that the effects of spatial and temporal contiguity depend on \nwhether location or identity information is emphasized during learning. Moreover, other \nexperiments ([3]) also show how the effects clearly depend on the task and can be quite \ndifferent if an explicitly spatial task is used (e.g., additive effects in location judgments). \n\n\f18 \n\nT. B. Ghiselli-Crippa and P W. Munro \n\nlabels \n\nlabels \n\nlabels \n\n(A coeff.) \n\ncoordinates \n(B coeff.) \n\nlabels \n\nlabels \n\ncoordinates \n\nlabels \n\nFigure 1: Network architectures: temporal-only network (left); spatio-temporal network \nwith spatial units part of the input representation (center); spatio-temporal network with \nspatial units part of the output representation (right). \n\n2 Network architectures \n\nThe goal of the work presented in this paper is to study the structure of the internal rep(cid:173)\nresentations that emerge from the integration of temporal and spatial associations. An \nencoder-like network architecture is used (see Figure 1), with a set of N input units and a \nset of N output units representing N nodes on a 2-dimensional graph. A set of H units is \nused for the hidden layer. To include space in the learning process, additional spatial units \nare included in the network architecture. These units provide a representation of the spatial \ninformation directly available during the learning/scanning process. In the simulations de(cid:173)\nscribed in this paper, two units are used and are chosen to represent the (x, y) coordinates of \nthe nodes in the graph. The spatial units can be included as part of the input representation \nor as part of the output representation (see Figure 1, center and right panels): both choices \nare used in the experiments, to investigate whether the spatial information could better ben(cid:173)\nefit training as an input or as an output [5]. In the second case, the relative contribution of \nthe spatial information can be directly manipulated by introducing weighting factors in the \ncost function being minimized. A two-term cost function is used, with a cross-entropy term \nfor the N label units and a squared error term for the 2 coordinate units, \n\nri indicates the actual output of unit i and ti its desired output. The relative influence of \nthe spatial information is controlled by the coefficients A and B. \n\n3 Learning tasks \n\nThe left panel of Figure 2 shows an example of the type of layout used; the effective \nlayout used in the study consists of N = 28 nodes. For each node, a set of neighboring \nnodes is defined, chosen on the basis of how an observer might scan the layout to learn the \nnode labels and their (spatial) relationships; in Figure 2, the neighborhood relationships are \nrepresented by lines connecting neighboring nodes. From any node in the layout, the only \nallowed transitions are those to a neighbor, thus defining the set of node pairs used to train \nthe network (66 pairs out of C(28, 2) = 378 possible pairs). In addition, the probability \nof occurrence of a particular transition is computed as a function of the distance to the \ncorresponding neighbor. It is then possible to generate a sequence of visits to the network \nnodes, aimed at replicating the scanning process of a human observer studying the layout. \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\nknife \n\ncoin \n\n19 \n\ncup \n\neraser \n\neraser \n\nbutton \n\nFigure 2: Example of a layout (left) and its permuted version (right). Links represent \nallowed transitions. A larger layout of 28 units was used in the simulations. \n\nThe basic learning task is similar to the grammar learning task of Servan-Schreiber et al. \n[6] and to the neighborhood mapping task described in [1] and is used to associate each of \nthe N nodes on the graph and its (x, y) coordinates with the probability distribution of the \ntransitions to its neighboring nodes. The mapping can be learned directly, by associating \neach node with the probability distribution of the transitions to all its neighbors: in this \ncase, batch learning is used as the method of choice for learning the mapping. On the \nother hand, the mapping can be learned indirectly, by associating each node with itself \nand one of its neighbors, with online learning being the method of choice in this case; \nthe neighbor chosen at each iteration is defined by the sequence of visits generated on \nthe basis of the transition probabilities. Batch learning was chosen because it generally \nconverges more smoothly and more quickly than online learning and gives qualitatively \nsimilar results. While the task and network architecture described in [1] allowed only \nfor temporal association learning, in this study both temporal and spatial associations are \nlearned simultaneously, thanks to the presence of the spatial units. However, the temporal(cid:173)\nonly (T-only) case, which has no spatial units, is included in the simulations performed \nfor this study, to provide a benchmark for the evaluation of the results obtained with the \nspatio-temporal (S-T) networks. \n\nThe task described above allows the network to learn neighborhood relationships for which \nspatial and temporal associations provide consistent information, that is, nodes experienced \ncontiguously in time (as defined by the sequence) are also contiguous in space (being spa(cid:173)\ntial neighbors). To tease apart the relative contributions of space and time, the task is kept \nthe same, but the data employed for training the network is modified: the same layout is \nused to generate the temporal sequence, but the x , y coordinates of the nodes are randomly \npermuted (see right panel of Figure 2). If the permuted layout is then scanned following the \nsame sequence of node visits used in the original version, the net effect is that the temporal \nassociations remain the same, but the spatial associations change so that temporally neigh(cid:173)\nboring nodes can now be spatially close or distant: the spatial associations are no longer \nconsistent with the temporal associations. As Figure 4 illustrates, the training pairs (filled \ncircles) all correspond to short distances in the original layout, but can have a distance \nanywhere in the allowable range in the permuted layout. Since the temporal and spatial \ndistances were consistent in the original layout, the original spatial distance can be used \nas an indicator of temporal distance and Figure 4 can be interpreted as a plot of temporal \ndistance vs. spatial distance for the permuted layout. \n\nThe simulations described in the following include three experimental conditions: temporal \nonly (no direct spatial information available); space and time consistent (the spatial coor(cid:173)\ndinates and the temporal sequence are from the same layout); space and time inconsistent \n(the spatial coordinates and the temporal sequence are from different layouts). \n\n\f20 \n\nT. B. Ghise/li-Crippa and P. W. Munro \n\nHidden unit representations are compared using Euclidean distance (cosine and inner prod(cid:173)\nuct measures give consistent results); the internal representation distances are also used to \ncompute their correlation with Euclidean distances between nodes in the layout (original \nand permuted). The correlations increase with the number of hidden units for values of \nH between 5 and 10 and then gradually taper off for values greater than 10. The results \npresented in the remainder of the paper all pertain to networks trained with H = 20 and \nwith hidden units using a tanh transfer function; all the results pertaining to S-T networks \nrefer to networks with 2 spatial output units and cost function coefficients A = 0.625 and \nB = 6.25. \n\n4 Results \n\nFigure 3 provides a combined view of the results from all three experiments. The left panel \nillustrates the evolution of the correlation between internal representation distances and \nlayout (original and permuted) distances. The right panel shows the distributions of the \ncorrelations at the end of training (1000 epochs). The first general result is that, when spa(cid:173)\ntial information is available and consistent with the temporal information (original layout), \nthe correlation between hidden unit distances and layout distances is consistently better \nthan the correlation obtained in the case of temporal associations alone. The second gen(cid:173)\neral result is that, when spatial information is available but not consistent with the temporal \ninformation (permuted layout), the correlation between hidden unit distances and original \nlayout distances (which represent temporal distances) is similar to that obtained in the case \nof temporal associations alone, except for the initial transient. When the correlation is com(cid:173)\nputed with respect to the permuted layout distances, its value peaks early during training \nand then decreases rapidly, to reach an asymptotic value well below the other three cases. \nThis behavior is illustrated in the box plots in the right panel of Figure 3, which report the \ndistribution of correlation values at the end of training. \n\n4.1 Temporal-only vs. spatio-temporal \n\nAs a first step in this study, the effects of adding spatial information to the basic temporal \nassociations used to train the network can be examined. Since the learning task is the same \nfor both the T-only and the S-T networks except for the absence or presence of spatial \ninformation during training, the differences observed can be attributed to the additional \nspatial information available to the S-T networks. The higher correlation between internal \nrepresentation distances and original layout distances obtained when spatial information is \n\n0 \n\n-\n., \n\n0 \n\n.. \n... \u2022 8 \" \n\n8 0 \nii \n\n0 \n\n'\" \n\nci \n\n0 \n0 \n\nS and T CO\"Isistent \n\nT-o\" \n\nSand T InCOnsistent \n(corr with T distance) \n\nS and T Ir'ICOOSlStent \n(corr. Wflh S distance) \n\n~ \n\n., \n\n0 \n\n.. \n\n0 \n\n\" \n\n0 \n\nN \n0 \n\n0 \n0 \n\ni:i \n\n-==-\n~ ~ \n-\n\n=s: \n\n........... \nE:2 \n--'----' \n\n200 \n\n400 \n600 \nOllnber 01 epochs \n\n800 \n\n1000 \n\nSandT \ncon_atent \n\nT-only \n\nSandT \n\nInconsistent \n\n(corr \" th T ast ) (corr wth 5 dst ) \n\nSandT \n\nineon.stant \n\nFigure 3: Evolution of correlation during training (0 - 1000 epochs) (left). Distributions of \ncorrelations at the end of training (1000 epochs) (right). \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n21 \n\nN -\n0 -\n., \n\n0 \n\n\", \n'\" E 0 \n~ \n\n... \n\n0 \n\nN \n0 \n\n0 \n0 \n\ndHU = 0.6 + 3.4d T + 0.3ds - 2.1(dT)2 + 0.4(d S )2 - 0.4d T ds \n\n2 5 \n\n15 \n\n05 \n\n15 \n\n00 \n\n02 \n\n04 \n\n08 \n\n1 0 \n\n12 \n\n14 \n\n\" \n\nFigure 4: Distances in the original layout \n(x) vs_ distances in the permuted layout \n(y)_ The 66 training pairs are identified by \nfilled circles_ \n\nFigure 5: Similarities (Euclidean distances) \nbetween internal representations developed \nby a S-T network (after 300 epochs)_ Figure \n4 projects the data points onto the x, y plane_ \n\navailable (see Figure 3) is apparent also when the evolution of the internal representations \nis examined_ As Figure 6 illustrates, the presence of spatial information results in better \ngeneralization for the pattern pairs outside the training set While the distances between \ntraining pairs are mapped to similar distances in hidden unit space for both the T-only and \nthe S-T networks, the T-only network tends to cluster the non-training pairs into a narrow \nband of distances in hidden unit space. In the case of the S-T network instead, the hidden \nunit distances between non-training pairs are spread out over a wider range and tend to \nreflect the original layout distances. \n\n4.2 Permuted layout \n\nAs described above, with the permuted layout it is possible to decouple the spatial and \ntemporal contributions and therefore study the effects of each. A comprehensive view of \nthe results at a particular point during training (300 epochs) is presented in Figure 5, where \nthe x, y plane represents temporal distance vs. spatial distance (see also Figure 4) and the z \naxis represents the similarity between hidden unit representations. The figure also includes \na quadratic regression surface fitted to the data points. The coefficients in the equation of \nthe surface provide a quantitative measure of the relative contributions of spatial (ds) and \ntemporal distances (dT ) to the similarity between hidden unit representations (dHU ): \n\n(2) \n\nIn general, after the transient observed in early training (see Figure 3), the largest and most \nsignificant coefficients are found for dT and (dT?, indicating a stronger dependence of \ndHU on temporal distance than on spatial distance. \n\nThe results illustrated in Figure 5 represent the situation at a particular point during training \n(300 epochs). Similar plots can be generated for different points during training, to study \nthe evolution of the internal representations. A different view of the evolution process is \nprovided by Figure 7, in which the data points are projected onto the x,Z plane (top panel) \nand the y,z plane (bottom panel) at four different times during training. In the top panel, \n\n\f22 \n\nN ,.. \n\n~ \n\n0 \n\n_ \n\n\u2022 \n\n~, \n\n~ ~ \n~ ~ \n~ -... -\n\n00 02 \" 06 O. \" 12 \n\n\"_d \n\n.. \n\n, \n\n~ \n\n:; ~ ~' ;; ~ \n~, -\n~ \n~ \n: \ni \n~ ~ \n~ .~ \n~ \n\n~ \n\n~ \n\n::: \n~ \n\n00 \n\n' \n\n::: \n\n, \n\n0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n~ \n\n~ \n\n, \n\n::: \n~ \n\n, \n, \n\n. \n00 02 .. 06 .. \" 12 \n\nT. B. Ghiselli-Crippa and P W Munro \n\n~ ,.. ~ \n~ roo ~ ~ ~ \n\n~ ~. ~ .~. \n00 02 .. .. .. \" 12 \n00 02 .. 06 .. \" \" \n::: ~ \n\n.. . \n\nf/Po \n\n,.~,o 0 \n\n.' \n\n~ : \n~ ~ \n\n~ \n\n~ \n~ \n~ ~ \n\n, \n, \n.I' \n\n. \n\n~ \n~ \n\n.:. \n\" \n\n',' \n\n: s \n\ne , \n\n',~-, \n\n',' \n\n, \n\n