{"title": "From Algorithmic to Subjective Randomness", "book": "Advances in Neural Information Processing Systems", "page_first": 953, "page_last": 960, "abstract": "", "full_text": "From Algorithmic to Subjective Randomness\n\nThomas L. Griffiths & Joshua B. Tenenbaum\n\nfgruffydd,jbtg@mit.edu\n\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\n\nAbstract\n\nWe explore the phenomena of subjective randomness as a case study in\nunderstanding how people discover structure embedded in noise. We\npresent a rational account of randomness perception based on the statis-\ntical problem of model selection: given a stimulus, inferring whether the\nprocess that generated it was random or regular. Inspired by the mathe-\nmatical definition of randomness given by Kolmogorov complexity, we\ncharacterize regularity in terms of a hierarchy of automata that augment\na finite controller with different forms of memory. We find that the reg-\nularities detected in binary sequences depend upon presentation format,\nand that the kinds of automata that can identify these regularities are in-\nformative about the cognitive processes engaged by different formats.\n\n1\n\nIntroduction\n\nPeople are extremely good at finding structure embedded in noise. This sensitivity to pat-\nterns and regularities is at the heart of many of the inductive leaps characteristic of human\ncognition, such as identifying the words in a stream of sounds, or discovering the presence\nof a common cause underlying a set of events. These acts of everyday induction are quite\ndifferent from the kind of inferences normally considered in machine learning and statis-\ntics: human cognition usually involves reaching strong conclusions on the basis of limited\ndata, while many statistical analyses focus on the asymptotics of large samples.\nThe ability to detect structure embedded in noise has a paradoxical character: while it is\nan excellent example of the kind of inference at which people excel but machines fail, it\nalso seems to be the source of errors in tasks at which machines regularly succeed. For\nexample, a common demonstration conducted in introductory psychology classes involves\npresenting students with two binary sequences of the same length, such as HHTHTHTT and\nHHHHHHHH, and asking them to judge which one seems more random. When students\nselect the former, they are told that their judgments are irrational: the two sequences are\nequally random, since they have the same probability of being produced by a fair coin. In\nthe real world, the sense that some random sequences seem more structured than others\ncan lead people to a variety of erroneous inferences, whether in a casino or thinking about\npatterns of births and deaths in a hospital [1].\nHere we show how this paradox can be resolved through a proper understanding of what our\nsense of randomness is designed to compute. We will argue that our sense of randomness is\nactually extremely well-calibrated with a rational statistical computation \u2013 just not the one\nto which it is usually compared. While previous accounts criticize people\u2019s randomness\n\n\fjudgments as poor estimates of the probability of an outcome, we claim that subjective\nrandomness, together with other everyday inductive leaps, can be understood in terms of the\nstatistical problem of model selection: given a set of data, evaluating hypotheses about the\nprocess that generated it. Solving this model selection problem for small datasets requires\ntwo ingredients: a set of hypotheses about the processes by which the data could have been\ngenerated, and a rational statistical inference by which these hypotheses are evaluated.\nWe will model subjective randomness as an inference comparing the probability of a se-\nquence under a random process, P (Xjrandom), with the probability of that sequence\nunder a regular process, P (Xjregular).\nIn previous work we have shown that defining\nP (Xjregular) using a restricted form of Kolmogorov complexity, in which regularity is\ncharacterized in terms of a simple computing machine, can provide a good account of hu-\nman randomness judgments for binary sequences [2]. Here, we explore the consequences\nof manipulating the conditions under which these sequences are presented. We will show\nthat the kinds of regularity to which people are sensitive depend upon whether the full se-\nquence is presented simultaneously, or its elements are presented sequentially. By explor-\ning how these regularities can be captured by different kinds of automata, we extend our\nrational analysis of the inference involved in subjective randomness to a rational character-\nization of the processes underlying it: certain regularities can only be detected by automata\nwith a particular form of memory access, and identifying the conditions under which regu-\nlarities are detectable provides insight into how characteristics of human memory interact\nwith rational statistical inference.\n\n2 Kolmogorov complexity and randomness\n\nA natural starting point for a formal account of subjective randomness is Kolmogorov com-\nplexity, which provides a mathematical definition of the randomness of a sequence in terms\nof the length of the shortest computer program that would produce that sequence. The idea\nof using a code based upon the length of computer programs was independently proposed\nin [3], [4] and [5], although it has come to be associated with Kolmogorov. A sequence\nX has Kolmogorov complexity K(X) equal to the length of the shortest program p for a\n(prefix) universal Turing machine U that produces X and then halts,\n\nK(X) = min\n\np:U (p)=X\n\n\u2018(p);\n\n(1)\n\nwhere \u2018(p) is the length of p in bits. Kolmogorov complexity identifies a sequence X\nas random if \u2018(X) \u00a1 K(X) is small: random sequences are those that are irreducibly\ncomplex [4]. While not necessarily following the form of this definition, psychologists\nhave preserved its spirit in proposing that the perceived randomness of a sequence increases\nwith its complexity (eg. [6]). Kolmogorov complexity can also be used to define a variety\nof probability distributions, assigning probability to events based upon their complexity.\nOne such distribution is algorithmic probability, in which the probability of X is\n\nR(X) = 2\u00a1K(X) = max\n\np:U (p)=X\n\n2\u00a1\u2018(p):\n\n(2)\n\nThere is no requirement that R(X) sum to one over all sequences; many probability distri-\nbutions that correspond to codes are unnormalized, assigning the missing probability to an\nundefined sequence.\nThere are three problems with using Kolmogorov complexity as the basis for a computa-\ntional model of subjective randomness. Firstly, the Kolmogorov complexity of any partic-\nular sequence X is not computable [4], presenting a practical challenge for any modelling\neffort. Secondly, while the universality of an encoding scheme based on Turing machines\nis attractive, many of the interesting questions in cognition come from the details: issues of\nrepresentation and processing are lost in the asymptotic equivalence of coding schemes, but\n\n\fplay a key role in people\u2019s judgments. Finally, Kolmogorov complexity is too permissive in\nwhat it considers a regularity. The set of regularities identified by people are a strict subset\nof those that might be expressed in short computer programs. For example, people are very\nunlikely to be able to tell the difference between a binary sequence produced by a linear\ncongruential random number generator (a very short program) and a sequence produced by\n\ufb02ipping a coin, but these sequences should differ significantly in Kolmogorov complexity.\nRestricting the set of regularities does not imply that people are worse than machines at\nrecognizing patterns: reducing the size of the set of hypotheses increases inductive bias,\nmaking it possible to identify the presence of structure from smaller samples.\n\n3 A statistical account of subjective randomness\n\nWhile there are problems with using Kolmogorov complexity as the basis for a rational\ntheory of subjective randomness, it provides a clear definition of regularity. In this section\nwe will present a statistical account of subjective randomness in terms of a comparison be-\ntween random and regular sources, where regularity is defined by analogues of Kolmogorov\ncomplexity for simpler computing machines.\n\n3.1 Subjective randomness as model selection\n\nOne of the most basic problems that arises in statistical inference is identifying the source\nof a set of observations, based upon a set of hypotheses. This is the problem of model\nselection. Model selection provides a natural basis for a statistical theory of subjective\nrandomness, viewing these judgments as the consequence of an inference to the process\nthat produced a set of observations. On seeing a stimulus X, we consider two hypotheses:\nX was produced by a random process, or X was produced by a regular process. The\ndecision about the source of X can be formalized as a Bayesian inference,\n\nP (randomjX)\nP (regularjX)\n\n=\n\nP (Xjrandom)\nP (Xjregular)\n\nP (random)\nP (regular)\n\n;\n\n(3)\n\nin which the posterior odds in favor of a random generating process are obtained from the\nlikelihood ratio and the prior odds. The only part of the right hand side of the equation\naffected by X is the likelihood ratio, so we define the subjective randomness of X as\n\nrandom(X) = log\n\nP (Xjrandom)\nP (Xjregular)\n\n;\n\n(4)\n\nbeing the evidence that X provides towards the conclusion that it was produced by a ran-\ndom process.\n\n3.2 The nature of regularity\n\nIn order to define random(X), we need to specify P (Xjrandom) and P (Xjregular). When\nevaluating binary sequences, it is natural to set P (Xjrandom) = ( 1\n2 )\u2018(X). Taking the\nlogarithm in base 2, random(X) is \u00a1\u2018(X) \u00a1 log2 P (Xjregular), depending entirely on\nP (Xjregular). We obtain random(X) = K(X) \u00a1 \u2018(X), the difference between the com-\nplexity of a sequence and its length, if we choose P (Xjregular) = R(X), the algorith-\nmic probability defined in Equation 2. This is identical to the mathematical definition of\nrandomness given by Kolmogorov complexity. However, the key point of this statistical\napproach is that we are not restricted to using R(X): we have a measure of the randomness\nof X for any choice of P (Xjregular).\nThe choice of P (Xjregular) will re\ufb02ect the stimulus domain, and express the kinds of\nregularity which people can detect in that domain. For binary sequences, a good candi-\ndate for specifying P (Xjregular) is a hidden Markov model (HMM), a probabilistic finite\n\n\f1\n\n2\n\nH\n\nT\n\n5\n\nH\n\n4\n\nT\n\nT\n\n6\n\nH\n\n3\n\nFigure 1: Finite state automaton used to define P (Xjregular) to give random(X) / DP .\nSolid arrows are transitions consistent with repeating a motif, which are taken with proba-\nbility \u2013. Dashed arrows are motif changes, using the prior determined by \ufb01.\n\nstate automaton. In fact, specifying P (Xjregular)in terms of a particular HMM results in\nrandom(X) being equivalent to the \u201cDifficulty Predictor\u201d (DP) [6] a measure of sequence\ncomplexity that has been extremely successful in modelling subjective randomness judg-\nments. DP measures the complexity of a sequence in terms of the number of repeating (eg.\nHHHH) and alternating (eg. HTHT) subsequences it contains, adding one point for each\nrepeating subsequence and two points for each alternating subsequence. For example, the\nsequence TTTHHHTHTH is a run of tails, a run of heads, and an alternating sub-sequence,\nDP = 4. If there are several partitions into runs and alternations, DP is calculated on the\npartition that results in the lowest score.\nIn [2], we showed that random(X) / DP if P (Xjregular) is specified by a particu-\nlar HMM. This HMM produces sequences by motif repetition, using the transition graph\nshown in Figure 1. The model emits sequences by choosing a motif, a sequence of symbols\nof length k, with probability proportional to \ufb01k, and emitting symbols consistent with that\nmotif with probability \u2013, switching to a new motif with probability 1 \u00a1 \u2013. In Figure 1,\nstate 1 repeats the motif H, state 2 repeats T, and the remaining states repeat the alternat-\ning motifs HT and TH. The randomness of a sequence under this definition of regularity\ndepends on \u2013 and \ufb01, but is generally affected by the number of repeating and alternating\nsubsequences. The equivalence to DP, in which a sequence scores a single point for each\nrepeating subsequence and two points for each alternating subsequence, results from taking\n\u2013 = 0:5 and \ufb01 = p3\u00a11\n, and choosing the the state sequence for the HMM that maximizes\nthe probability of the sequence.\nJust as the algorithmic probability R(X) is a probability distribution defined by the length\nof programs for a universal Turing machine, this choice of P (Xjregular) can be seen as\nspecifying the length of \u201cprograms\u201d for a particular finite state automaton. The output of a\nfinite state automaton is determined by its state sequence, just as the output of a universal\nTuring machine is determined by its program. However, since the state sequence is the\nsame length as the sequence itself, this alone does not provide a meaningful measure of\ncomplexity.\nIn our model, probability imposes a metric on state sequences, dictating a\ngreater cost for moves between certain states, which translates into a code length through\nthe logarithm. Since we find the state sequence most likely to have produced X, and thus\nthe shortest code length, we have an analogue of Kolmogorov complexity defined on a\nfinite state automaton.\n\n2\n\n3.3 Regularities and automata\n\nUsing a hidden Markov model to specify P (Xjregular) provides a measure of complexity\ndefined in terms of a finite state automaton. However, the kinds of regularities people can\ndetect in binary sequences go beyond the capacity of a finite state automaton. Here, we\nconsider three additional regularities: symmetry (eg. THTHHTHT), symmetry in the com-\n\n\fFinite state automaton\n\n(motif repetition)\n\nQueue automaton\n\n(duplication)\n\nPushdown automaton\n\n(symmetry)\n\nStack automaton\n\nTuring machine\n(all computable)\n\nFigure 2: Hierarchy of automata used to define measures of complexity. Of the regularities\ndiscussed in this paper, each automaton can identify all regularities identified by those\nautomata to its left as well as those stated in parentheses beneath its name.\n\nplement (eg. TTTTHHHH), and the perfect duplication of subsequences (eg. HHHTHHHT\nvs. HHHTHHHTH). These regularities identify formal languages that cannot be recognized\nby a finite state automaton, suggesting that we might be able to develop better models of\nsubjective randomness by defining P (Xjregular) in terms of more sophisticated automata.\nThe automata we will consider in this paper form a hierarchy, shown in Figure 2. This\nhierarchy expresses the same content as Chomsky\u2019s [7] hierarchy of computing machines\n\u2013 the regularities identifiable by each machine are a strict superset of those identifiable\nto the machine to the left \u2013 although it features a different set of automata. The most\nrestricted set of regularities are those associated with the finite state automaton, and the\nleast restricted are those associated with the Turing machine. In between are the pushdown\nautomaton, which augments a finite controller with a stack memory, in which the last item\nadded is the first to be accessed; the queue automaton,1 in which the memory is a queue, in\nwhich the first item added is the first to be accessed; and the stack automaton, in which the\nmemory is a stack but any item in the stack can be read by the controller [9, 10]. The key\ndifference between these kinds of automata is the memory available to the finite controller,\nand exploring measures of complexity defined in terms of these automata thus involves\nassessing the kind of memory required to identify regularities.\nEach of the automata shown in Figure 2 can identify a different set of regularities. The\nfinite state automaton is only capable of identifying motif repetition, while the pushdown\nautomaton can identify both kinds of symmetry, and the queue automaton can identify\nduplication. The stack automaton can identify all of these regularities, and the Turing\nmachine can identify all computable regularities. For each of the sub-Turing automata,\nwe can use these constraints to specify a probabilistic model for P (Xjregular). For ex-\nample, the probabilistic model corresponding to the pushdown automaton generates regu-\nlar sequences by three methods: repetition, producing sequences with probabilities deter-\nmined by the HMM introduced above; symmetry, where half of the sequence is produced\nby the HMM and the second half is produced by re\ufb02ection; and complement symmetry,\nwhere the second half is produced by re\ufb02ection and exchanging H and T. We then take\nP (Xjregular) = maxZ;M P (X; ZjM )P (M ), where M is the method of production and\nZ is the state sequence for the HMM. Similar models can be defined for the queue and\nstack automata, with the queue automaton allowing generation by repetition or duplication,\nand the stack automaton allowing any of these four methods. Each regularity introduced\ninto the model requires a further parameter in specifying P (M ), so the hierarchy shown\nin Figure 2 also expresses the statistical structure of this set of models: each model is a\nspecial case of the model to its right, in which some regularities are eliminated by setting\nP (M ) to zero. We can use this structure to perform model selection with likelihood ratio\ntests, determining which model gives the best account of a particular dataset using just the\ndifference in the log-likelihoods. We apply this method in the next section.\n\n1An unrestricted queue automaton is equivalent to a Turing machine. We will use the phrase to\nrefer to an automaton in which the number of queue operations that can be performed for each input\nsymbol is limited, which is generally termed a quasi real time queue automaton [8].\n\n\f4 Testing the models\n\nThe models introduced in the previous section differ in the memory systems with which\nthey augment the finite controller. The appropriateness of any one measure of complexity\nto a particular task may thus depend upon the memory demands placed upon the partici-\npant. To explore this hypothesis, we conducted an experiment in which participants make\nrandomness judgments after either seeing a sequence in its entirety, or seeing each element\none after another. We then used model selection to determine which measure of com-\nplexity gave the best account of each condition, illustrating how the strategy of defining\nmore restricted forms of complexity can shed light into the cognitive processes underlying\nregularity detection.\n\n4.1 Experimental methods\n\nThere were two conditions in the experiment, corresponding to Simultaneous and Sequen-\ntial presentation of stimuli. The stimuli were sequences of heads (H) and tails (T) presented\nin 130 point fixed width sans-serif font on a 19\u201d monitor at 1280 \u00a3 1024 pixel resolution.\nIn the Simultaneous condition, all eight elements of the sequence appeared on the display\nsimultaneously. In the Sequential condition, the elements appeared one by one, being dis-\nplayed for 300ms with a 300ms inter-stimulus interval.\nThe participants were 40 MIT undergraduates, randomly assigned to the two conditions.\nParticipants were instructed that they were about to see sequences which had either been\nproduced by a random process (\ufb02ipping a fair coin) or by other processes in which the\nchoice of heads and tails was not random, and had to classify these sequences according\nto their source. After a practice session, each participant classified all 128 sequences of\nlength 8, in random order, with each sequence randomly starting with either a head or a\ntail. Participants took breaks at intervals of 32 sequences.\n\n4.2 Results and Discussion\n\nWe analyzed the results by fitting the models corresponding to the four automata de-\nscribed above, using all motifs up to length 4 to specify the basic model. We computed\nrandom(X) for each stimulus as in Eq. (4), with P (Xjregular) specified by the probabilis-\ntic model corresponding to each of the automata. We then converted this log-likelihood\nratio into the posterior probability of a random generating process, using\n\nP (randomjX) =\n\n1\n\n1 + expf\u00a1\u201a random(X) \u00a1 \u02c6g\n\nwhere \u201a and \u02c6 are parameters weighting the contribution of the likelihoods and the pri-\nors respectively. We then optimized \u201a; \u02c6; \u2013; \ufb01 and the parameters contributing to P (M )\nfor each model, maximizing the likelihood of the classifications of the sequences by the\n20 participants in each of the 2 conditions. The results of the model-fitting are shown in\nFigure 3(a) and (b), which indicate the relationship between the posterior probabilities pre-\ndicted by the model and the proportion of participants who classified a sequence as random.\nThe correlation coefficients shown in the figure provide a relatively good indicator of the\nfit of the models, and each sequence is labelled according to the regularity it expresses,\nshowing how accommodating particular regularities contributes to the fit.\nThe log-likelihood scores obtained from fitting the models can be used for model selec-\ntion, testing whether any of the parameters involved in the models are unnecessary. Since\nthe models form a nested hierarchy, we can use likelihood ratio tests to evaluate whether\nintroducing a particular regularity (and the parameters associated with it) results in a sta-\ntistically significant improvement in fit. Specifically, if model 1 has log-likelihood L1 and\ndf1 parameters, and model 2 has log-likelihood L2 and df2 > df1 parameters, 2(L2 \u00a1 L1)\n\n\fFinite state\n\nPushdown\n\nQueue\n\nr=0.79\n\nr=0.76\n\n(a) \n\n1\n\nr=0.69\n\n)\nx\n|\nm\no\nd\nn\na\nr\n(\n\nP\n\n0.5\n\n0\n\n0\n\n0.5\n\n1\n\nSimultaneous data\n\nFinite state\n\nPushdown\n\nQueue\n\nr=0.70\n\nr=0.76\n\n(b) \n\n1\n\nr=0.70\n\n)\nx\n|\nm\no\nd\nn\na\nr\n(\n\nP\n\n0.5\n\n0\n\n0\n\n0.5\n\n1\n\nSequential data\n\nStack\n\nr=0.83\n\nRepetition\nSymmetry\nComplement\nDuplication\n\nStack\n\nr=0.77\n\n(c)\n\n57.43 (1df, p < 0.0001)\n\nQueue\n\n75.41 (2df, p < 0.0001)\n\n(d)\n\n33.24 (1df, p < 0.0001)\n\nQueue\n\n  5.69 (2df, p = 0.0582)\n\nFinite state\n\nStack\n\nFinite state\n\nStack\n\n87.76 (2df, p < 0.0001)\n\nPushdown\n\n45.08 (1df, p < 0.0001)\n\n  1.82 (2df, p = 0.4025)\n\nPushdown\n\n31.42 (1df, p < 0.0001)\n\nFigure 3: Experimental results for (a) the Simultaneous and (b) the Sequential condition,\nshowing the proportion of participants classifying a sequence as \u201crandom\u201d (horizontal axis)\nand P (randomjX) (vertical axis) as assessed by the four models. Points are labelled ac-\ncording to their parse under the Stack model. (c) and (d) show the model selection results\nfor the Simultaneous and Sequential conditions respectively, showing the four automata\nwith edges between them labelled with \u00b42 score (df, p-value) for improvement in fit.\n\nshould have a \u00b42(df2 \u00a1 df1) distribution under the null hypothesis of no improvement in\nfit. We evaluated the pairwise likelihood ratio tests for the four models in each condition,\nwith the results shown in Figure 3(c) and (d). Additional regularities always improved the\nfit for the Simultaneous condition, while adding duplication, but not symmetry, resulted in\na statistically significant improvement in the Sequential condition.\nThe model selection results suggest that the best model for the Simultaneous condition\nis the stack automaton, while the best model for the Sequential condition is the queue\nautomaton. These results indicate the importance of presentation format in determining\nsubjective randomness, as well as the benefits of exploring measures of complexity defined\nin terms of a range of computing machines. The stack automaton can evaluate regularities\nthat require checking information in arbitrary positions in a sequence, something that is\nfacilitated by a display in which the entire sequence is available. In contrast, the queue\nautomaton can only access information in the order that it enters memory, and gives a\nbetter match to the task in which working memory is required. This illustrates an important\nfact about cognition \u2013 that human working memory operates like a queue rather than a stack\n\u2013 that is highlighted by this approach.\nThe final parameters of the best-fitting models provide some insight into the relative impor-\ntance of the different kinds of regularities under different presentation conditions. For the\nSimultaneous condition, \u2013 = 0:66; \ufb01 = 0:12; \u201a = 0:26; \u02c6 = \u00a11:98 and motif repetition,\nsymmetry, symmetry in the complement, and duplication were given probabilities of 0:748,\n0:208, 0:005, and 0:039 respectively. Symmetry is thus a far stronger characteristic of reg-\n\n\fularity than either symmetry in the complement or duplication, when entire sequences are\nviewed simultaneously. For the Sequential condition, \u2013 = 0:70; \ufb01 = 0:11; \u201a = 0:38; \u02c6 =\n\u00a11:24, and motif repetition was given a probability of 0:962 while duplication had a prob-\nability of 0:038, with both forms of symmetry being given zero probability since the queue\nmodel provided the best fit. Values of \u2013 > 0:5 for both models indicates that regular se-\nquences tend to repeat motifs, rather than rapidly switching between them, and the low \ufb01\nvalues re\ufb02ect a preference for short motifs.\n\n5 Conclusion\n\nWe have outlined a framework for understanding the rational basis of the human ability to\nfind structure embedded in noise, viewing this inference in terms of the statistical prob-\nlem of model selection. Solving this problem for small datasets requires two ingredients:\nstrong prior beliefs about the hypothetical mechanisms by which the data could have been\ngenerated, and a rational statistical inference by which these hypotheses are evaluated.\nWhen assessing the randomness of binary sequences, which involves comparing random\nand regular sources, people\u2019s beliefs about the nature of regularity can be expressed in\nterms of probabilistic versions of simple computing machines. Different machines capture\nregularity when sequences are presented simultaneously and when their elements are pre-\nsented sequentially, and the differences between these machines provide insight into the\ncognitive processes involved in the task. Analyses of the rational basis of human inference\ntypically either ignore questions about processing or introduce them as relatively arbitrary\nconstraints. Here, we are able to give a rational characterization of process as well as in-\nference, evaluating a set of alternatives that all correspond to restrictions of Kolmogorov\ncomplexity to simple general-purpose automata.\nAcknowledgments. This work was supported by a Stanford Graduate Fellowship to the first author.\nWe thank Charles Kemp and Michael Lee for useful comments.\n\nReferences\n[1] D. Kahneman and A. Tversky. Subjective probability: A judgment of representativeness. Cog-\n\nnitive Psychology, 3:430\u2013454, 1972.\n\n[2] T. L. Griffiths and J. B. Tenenbaum. Probability, algorithmic complexity and subjective random-\nness. In Proceedings of the 25th Annual Conference of the Cognitive Science Society, Hillsdale,\nNJ, 2003. Erlbaum.\n\n[3] R. J. Solomonoff. A formal theory of inductive inference. Part I. Information and Control,\n\n7:1\u201322, 1964.\n\n[4] A. N. Kolmogorov. Three approaches to the quantitative definition of information. Problems of\n\nInformation Transmission, 1:1\u20137, 1965.\n\n[5] G. J. Chaitin. On the length of programs for computing finite binary sequences: statistical\n\nconsiderations. Journal of the ACM, 16:145\u2013159, 1969.\n\n[6] R. Falk and C. Konold. Making sense of randomness: Implicit encoding as a bias for judgment.\n\nPsychological Review, 104:301\u2013318, 1997.\n\n[7] N. Chomsky. Threee models for the description of language. IRE Transactions on Information\n\nTheory, 2:113\u2013124, 1956.\n\n[8] A. Cherubini, C. Citrini, S. C. Reghizzi, and D. Mandrioli. QRT FIFO automata, breadth-first\n\ngrammars and their relations. Theoretical Comptuer Science, 85:171\u2013203, 1991.\n\n[9] S. Ginsburg, S. A. Greibach, and M. A. Harrison. Stack automata and compiling. Journal of\n\nthe ACM, 14:172\u2013201, 1967.\n\n[10] A. V. Aho. Indexed grammars \u2013 an extension of context-free grammars. Journal of the ACM,\n\n15:647\u2013671, 1968.\n\n\f", "award": [], "sourceid": 2480, "authors": [{"given_name": "Thomas", "family_name": "Griffiths", "institution": null}, {"given_name": "Joshua", "family_name": "Tenenbaum", "institution": null}]}