{"title": "Markov Networks for Detecting Overalpping Elements in Sequence Data", "book": "Advances in Neural Information Processing Systems", "page_first": 193, "page_last": 200, "abstract": null, "full_text": "            Markov Networks for Detecting\n     Overlapping Elements in Sequence Data\n\n\n\n         Joseph Bockhorst                           Mark Craven\n     Dept. of Computer Sciences     Dept. of Biostatistics and Medical Informatics\n       University of Wisconsin                  University of Wisconsin\n         Madison, WI 53706                        Madison, WI 53706\n         joebock@cs.wisc.edu                    craven@biostat.wisc.edu\n\n\n\n                                     Abstract\n\n\n          Many sequential prediction tasks involve locating instances of pat-\n          terns in sequences. Generative probabilistic language models, such\n          as hidden Markov models (HMMs), have been successfully applied\n          to many of these tasks. A limitation of these models however, is\n          that they cannot naturally handle cases in which pattern instances\n          overlap in arbitrary ways. We present an alternative approach,\n          based on conditional Markov networks, that can naturally repre-\n          sent arbitrarily overlapping elements. We show how to efficiently\n          train and perform inference with these models. Experimental re-\n          sults from a genomics domain show that our models are more accu-\n          rate at locating instances of overlapping patterns than are baseline\n          models based on HMMs.\n\n\n\n1      Introduction\n\nHidden Markov models (HMMs) and related probabilistic sequence models have\nbeen among the most accurate methods used for sequence-based prediction tasks\nin genomics, natural language processing and other problem domains. One key\nlimitation of these models, however, is that they cannot represent general overlaps\namong sequence elements in a concise and natural manner. We present a novel\napproach to modeling and predicting overlapping sequence elements that is based on\nundirected Markov networks. Our work is motivated by the task of predicting DNA\nsequence elements involved in the regulation of gene expression in bacteria. Like\nHMM-based methods, our approach is able to represent and exploit relationships\namong different sequence elements of interest. In contrast to HMMs, however, our\napproach can naturally represent sequence elements that overlap in arbitrary ways.\n\nWe describe and evaluate our approach in the context of predicting a bacterial\ngenome's genes and regulatory \"signals\" (together its regulatory elements). Part\nof the process of understanding a given genome is to assemble a \"parts list\", often\nusing computational methods, of its regulatory elements. Predictions, in this case,\nentail specifying the start and end coordinates of subsequences of interest. It is\ncommon in bacterial genomes for these important sequence elements to overlap.\n\n\f\n                                                                             START            END\n(a)                                                                   (b)\n       prom                  prom prom                   term\n               1                2         3                      1\n                    gene1                      gene 2\n                                                                             prom     gene           term\n\n\n\n\n\nFigure 1: (a) Example arrangement of two genes, three promoters and one terminator in\na DNA sequence. (b) Topology of an HMM for predicting these elements. Large circles\nrepresent element-specific sub-models and small gray circles represent inter-element sub-\nmodels, one for each allowed pair of adjacent elements. Due to the overlapping elements,\nthere is no path through the HMM consistent with the configuration in (a).\n\n\n\n\nOur approach to predicting overlapping sequence elements, which is based on dis-\ncriminatively trained undirected graphical models called conditional Markov net-\nworks [5, 10] (also called conditional random fields), uses two key steps to make a\nset of predictions. In the first step, candidate elements are generated by having a set\nof models independently make predictions. In the second step, a Markov network\nis constructed to decide which candidate predictions to accept.\n\nConsider the task of predicting gene, promoter, and terminator elements encoded in\nbacterial DNA. Figure 1(a) shows an example arrangement of these elements in a\nDNA sequence. Genes are DNA sequences that encode information for constructing\nproteins. Promoters and terminators are DNA sequences that regulate transcrip-\ntion, the first step in the synthesis of a protein from a gene. Transcription begins\nat a promoter, proceeds downstream (left-to-right in Figure 1(a)), and ends at a\nterminator. Regulatory elements often overlap each other, for example prom2 and\nprom3 or gene1 and prom2 in Figure 1.\n\nOne technique for predicting these elements is first to train a probabilistic sequence\nmodel for each element type (e.g. [2, 9]) and then to \"scan\" an input sequence\nwith each model in turn. Although this approach can predict overlapping elements,\nit is limited since it ignores inter-element dependencies. Other methods, based on\nHMMs (e.g. [11, 1]), explicitly consider these dependencies. Figure 1(b) shows an\nexample topology of such an HMM. Given an input sequence, this HMM defines a\nprobability distribution over parses, partitionings of the sequence into subsequences\ncorresponding to elements and the regions between them. These models are not nat-\nurally suited to representing overlapping elements. For the case shown in Figure 1(a)\nfor example, even if the subsequences for gene1 and prom2 match their respective\nsub-models very well, since both elements cannot be in the same parse there is a\ncompetition between predictions of gene1 and prom2. One could expand the state\nset to include states for specific overlap situations however, the number of states in-\ncreases exponentially with the number of overlap configurations. Alternatively, one\ncould use the factorized state representation of factorial HMMs [4]. These models,\nhowever, assume a fixed number of loosely connected processes evolving in parallel,\nwhich is not a good match to our genomics domain.\n\nLike HMMs, our method, called CMN-OP (conditional Markov networks for over-\nlapping patterns), employs element-specific sub-models and probabilistic constraints\non neighboring elements qualitatively expressed in a graph. The key difference be-\ntween CMN-OP and HMMs is the probability distributions they define for an input\nsequence. While, as mentioned above, an HMM defines a probability distribution\nover partitions of the sequence, a CMN-OP defines a probability distribution over\nall possible joint arrangements of elements in an input sequence. Figure 2 illustrates\nthis distinction.\n\n\f\n(a) HMM                                                                                     (b) CMN-OP\n\n     predicted labels                           sample space                                   predicted signals                                                sample space\n\n                                                                                                                                                                           end position\n                                                                                                                                                                 1    2     3    4    5    6    7    8\n     1    2    3    4    5     6    7    8          1    2    3    4    5    6    7    8                                                              1\n\n                                                                                                                                                      2\n\n                                                                                               1    2    3    4    5    6    7    8                   3\n\n                                                                                                                                                      4\n\n                                                                                                                                                      5\n\n                                                                                                                                                      6\n                                                                                                                                            start position 7\n                                                                                                                                                      8\n\n\n\nFigure 2: An illustration of the difference in the sample spaces on which probability\ndistributions over labelings are defined by (a) HMMs and (b) CMN-OP models. The left\nside of (a) shows a sequence of length eight for which an HMM has predicted that an\nelement of interest occupies two subsequences, [1:3] and [6:7]. The darker subsequences,\n[4:5] and [8:8], represent sequence regions between predicted elements. The right side of\n(a) shows the corresponding event in the sample space of the HMM, which associates one\nlabel with each position. The left side of (b) shows four predicted elements made by a\nCMN-OP model. The right side of (b) illustrates the corresponding event in the CMN-OP\nsample space. Each square corresponds to a subsequence, and an event in this sample\nspace assigns a (possibly empty) label to each sub-sequence.\n\n\n\n\n2         Models\n\n\nA conditional Markov network [5, 10] (CMN) defines the conditional probability\ndistribution Pr(Y|X) where X is a set of observable input random variables and Y\nis a set of output random variables. As with standard Markov networks, a CMN\nconsists of a qualitative graphical component G = (V, E) with vertex set V and\nedge set E that encodes a set of conditional independence assertions along with a\nquantitative component in the form of a set of potentials  over the cliques of G.\nIn CMNs, V = X  Y. We denote an assignment of values to the set of random\nvariables U with u. Each clique, q = (Xq, Yq), in the clique set Q(G) has a potential\nfunction q(xq, yq)   that assigns a non-negative number to each of the joint\nsettings of (Xq, Yq). A CMN (G, ) defines the conditional probability distribution\nPr(y|x) =                      1                                                                                                           \n                              Z(x)            qQ(G)               q (xq , yq ) where Z (x) =                           y         qQ(G)                  q (xq , y q ) is the\nx dependent normalization factor called the partition function. One benefit of\nCMNs for classification tasks is that they are typically discriminatively trained by\nmaximizing a function based on the conditional likelihood Pr(Y|X) over a training\nset rather than the joint likelihood Pr(Y, X).\n\nA common representation for the potentials q(yq, xq) is with a log-linear model:\nq(yq, xq) = exp{                                   wb\n                                               b         q f b\n                                                              q (yq , xq )} = exp{wT\n                                                                                              q  fq (yq , xq )}. Here wb\n                                                                                                                                                                q is the weight\nof feature f bq and wq and fq are column vectors of q's weights and features.\nNow we show how we use CMNs to predict elements in observation sequences.\nGiven a sequence x of length L, our task is to identify the types and locations of\nall instances of patterns in P = {P1, ..., PN } that are present in x where P is a set\nof pattern types. In the genomics domain x is a DNA sequence and P is a set of\nregulatory elements such as {gene, promoter, terminator}.\n\nA match m of a pattern to x specifies a subsequence xi:j and a pattern type Pk  P.\nWe denote the set of all matches of pattern types in P to x with M(P, x). We call a\nsubset C = (m1, m2, ..., mM ) of M(P, x) a configuration. Matches in C are allowed\n\n\f\n  (a)                                                 (b)\n                         X                                           PROM                      GENE             TERM\n\n                                                      START                                                             END\n\n Y             Y                      Y\n  1              2                       L+1\n\n\n\n\n\nFigure 3: (a) The structure of the CMN-OP induced for the sequence x of length L. The\nath pattern match Ya is conditionally independent of its non-neighbors given its neighbors\nX, Ya-1 and Ya+1. (b) The interaction graph we use in the regulatory element prediction\ntask. Vertices are the pattern types along with START and END. Edges connect pattern\ntypes that may be adjacent. Edges from START connect to pattern types that may be\nthe first matches Edges into END come from pattern types that may be the last matches.\n\n\n\nto overlap however, we assume that no two matches in C have the same start index1.\nThus, the maximum size of a configuration C is L, and the elements of C may be\nordered by start position such that ma  ma+1. Our models define a conditional\nprobability distribution over configurations given an input sequence x.\n\nGiven a sequence x of length L, the output random variables of our models are\nY = (Y1, Y2, ..., YL, YL+1). We represent a configuration C = (m1, m2, ..., mM )\nwith Y in the following way. If a is less than or equal to the configuration size\nM , we assign Ya to the ath match in C (Ya = ma), otherwise we set Ya equal to a\nspecial value null. Note that YL+1 will always be null; it is included for notational\nconvenience. Our models define the conditional distribution Pr(Y|X).\n\nOur models assume that a pattern match is independent of other matches given\nits neighbors. That is, Ya is independent of Ya for a < a - 1 or a > a + 1\ngiven X, Ya-1 and Ya+1. This is analogous to the HMM assumption that the next\nstate depends only on the current state. The conditional Markov network structure\nassociated with this assumption is shown in Figure 3(a). The cliques in this graph\nare {Ya, Ya+1, X} for 1  a  L. We denote the clique {Ya, Ya+1, X} with qa.\n\nWe define the clique potential of qa for a = 1 as the product of a pattern match\nterm g(ya, x) and a pattern interaction term h(ya, ya+1, x). The functions g() and\nh() are shared among all cliques so q (y\n                                                                a      a, ya+1, x) = g(ya, x)  h(ya, ya+1, x) for\n2  a  L. The first clique q1 includes an additional start placement term (y1, x)\nthat scores the type and position of the first match y1. To ensure that real matches\ncome before any null settings and that additional null settings do not affect\nPr(y|x), we require that g(null, x) = 1, h(null,null, x) = 1 and h(null,ya,\nx) = 0 for all x and ya = null. The pattern match term measures the agreement\nbetween the matched subsequence and the pattern type associated with ya. In\nthe genomics domain our representation of the sequence match term is based on\nregulatory element specific HMMs. The pattern interaction term measures the\ncompatibility between the types and spacing (or overlap) of adjacent matches.\n\nA Conditional Markov Network for Overlapping Patterns (CMN-OP) = (g, h, )\nspecifies a pattern match function g, pattern interaction function h and\nstart placement function  that define the conditional distribution Pr(y|x) =\n 1        L                                     L      g(y\nZ(x)      a=1         a(qa, x) = (y1)\n                                 Z(x)            a=1           a, x)h(ya, ya+1, x) where Z (x) is the normal-\nizing partition function. Using the log-linear representation for g() and h() we have\nPr(y|x) = (y1) exp{             L      wT                                         f\n                      Z(x)       a=1            g  fg (ya, x) + wT\n                                                                             h           h(ya, ya+1, x)}. Here wg, fg, wh\nand fh are g() and h()'s weights and features.\n\n      1 We only need to require configurations to be ordered sets. We make this slightly more\nstringent assumption to simplify the description of the model.\n\n\f\n2.1    Representation\n\nOur representation of the pattern match function g() is based on HMMs. We\nconstruct an HMM with parameters k for each pattern type Pk along with a single\nbackground HMM with parameters B. The pattern match score of ya = null\nwith subsequence xi:j and pattern type Pk is the odds Pr(xi:j|k)/ Pr(xi:j|B).\nWe have a feature f k\n                     g (ya, x) for each pattern type Pk whose value is the logarithm\nof the odds if the pattern associated with ya is Pk and zero otherwise. Currently,\nthe weights wg are not trained and are fixed at 1. So, wTg  fg(ya, x) = fkg(ya, x) =\nlog(Pr(xi:j|k)/ Pr(xi:j|B)) where Pk is the pattern of ya.\n\nOur representation of the pattern interaction function h() consists of two compo-\nnents: (i) a directed graph I called the interaction graph that contains a vertex\nfor each pattern type in P along with special vertices START and END and (ii)\na set of weighted features for each edge in I. The interaction graph encodes qual-\nitative domain knowledge about allowable orderings of pattern types. The value\nof h(ya, ya+1, x) = wT  f\n                         h    h(ya, ya+1, x) is non-zero only if there is an edge in I\nfrom the pattern type associated with ya to the pattern type associated with ya+1.\nThus, any configuration with non-zero probability corresponds to a path through\nI. Figure 3(b) shows the interaction graph we use to predict bacterial regulatory\nelements. It asserts that between the start positions of two genes there may be no\nelement starts, a single terminator start or zero or more promoter starts with the\nrequirement that all promoters start after the start of the terminator. Note that\nin CMN-OP models, the interaction graph indicates legal orderings over the start\nposition of matches not over complete matches as in an HMM.\n\nEach of the pattern interaction features f  fh is associated with an edge in the\ninteraction graph I. Each edge e in I has single bias feature f be and a set of distance\nfeatures f D\n          e . The value of f b\n                               e (ya, ya+1, x) is 1 if the pattern types connected by e\ncorrespond to the types associated with ya and ya+1 and 0 otherwise. The distance\nfeatures for edge e provide a discretized representation of the distance between (or\namount of overlap of) two adjacent matches of types consistent with e. We associate\neach distance feature f re  fD\n                               e    with a range r. The value of f re(ya, ya+1, x) is 1 if\nthe (possibly negative) difference between the start position of ya+1 and the end\nposition of ya is in r, otherwise it is 0. The set of ranges for a given edge are non-\noverlapping. So, h(ya, ya+1, x) = exp(wT  f\n                                            h    h(ya, ya+1, x)) = exp(wb\n                                                                         e + wr\n                                                                               e ) where e\nis the edge for ya and ya+1, wbe is the weight of the bias feature f be and wre is the\nweight of the single distance feature f re whose range contains the spacing between\nthe matches of ya and ya+1.\n\n\n3      Inference and Training\n\nGiven a trained model with weights w and an input sequence x, the inference task\nis to determine properties of the distribution Pr(y|x). Since the cliques of a CMN-\nOP form a chain we could perform exact inference with the belief propagation (BP)\nalgorithm [8]. The number of joint settings in one clique grows O(L4), however,\ngiving BP a running time of O(L5) and which is impractical for longer sequences.\nThe exact inference procedure we use, which is inspired the energy minimization\nalgorithm for pictorial structures [3], runs in O(L2) time.\n\nOur inference procedure exploits two properties of our representation of the pattern\ninteraction function h(). First, we use the invariance of h(ya, ya+1, x) to the start\nposition of ya and the end position of ya+1. In this section, we make this explicit by\nwriting h(ya, ya+1, x) as h(k, k , d) where k and k are the pattern types of ya and\n\n\f\nya+1 respectively and d is the distance between (or overlap of if negative) ya and\nya+1. The second property we use is the fact that the difference between h(k, k , d)\nand h(k, k , d + 1) is non-zero only if d is the maximum value of the range of one of\nthe distance features f re  fD\n                                                  e    associated with the edge e = k  k\n\nThe inference procedure we use for our CMN-OP models consists of a forward\npass and a backward pass. Due to space limitations, we only describe the key\naspects of the forward pass.                            The forward pass fills an L  L  N matrix F\nwhere we define F (i, j, k) to be the sum of the scores of all partial configura-\ntions ~\n            y that end with y where y is the match of xi:j to Pk: F (i, j, k) \ng(y, x)            (y                                 g(y\n               ~\n               y           1, x)    y                          a, x)h(ya, ya+1, x) Here ~\n                                                                                                 y = (y1, y2, ..., y) and\n                                         a (~\n                                            y\\y)\n\\ denotes set difference.\n\nF has a recursive formulation:\n                                                                i-1    L      N                                    \n       F (i, j, k) = gk(y, x) k(i) +                                             F (i , j , k )h(k , k, i - j ) .\n                                                               i =1 j =i k =1                                      \nThe triple sum is over all possible adjacent previous matches. Due to the first\nproperty of h just discussed, the value of the triple sum for setting F (i, j, k) and\nF (i, j , k) is the same for any j . We cache the value of the triple sum in the L  N\nmatrix Fin where Fin(i, k) holds the value needed for setting F (i, j , k) for any j .\n\nWe begin the forward pass with i = 1 and set the values of F (1, j, k) for all j and\nk before incrementing i. After i is incremented, we use the second property of h to\nupdate Fin in time O(N 2B), which is independent of the sequence length L, where\nB is the number of \"bins\" used in our discretized represenation of distance. The\noverall time complexity of the forward pass is O(LN 2B + L2N ). The first term is\nfor updating Fin and the second term is for the constant time setting of the O(L2N )\nelements of F . If the sequence length L dominates N and B, as it does in the gene\nregulation domain, the effective running time is O(L2).\n\nTraining involves estimating the weights w from a training set D. An element d of\nD is a pair (xd, ^\n                            yd) where xd is a fully observable sequence and ^\n                                                                                                        yd is a partially\nobservable configuration for xd. To help avoid overfitting we assume a zero-mean\nGaussian prior over the weights and optimize the log of the MAP objective function\nfollowing Taskar et al. [10]: L(w, D) =                                        (log Pr( ^\n                                                                                        y                      .\n                                                                        dD                  d|xd)) - wT w\n                                                                                                       22\n\nThe value of the gradient                         L(w, D) in the direction of weight w  w is: L(w,D) =\n                                                                                                                    w\n            (E[C                                                      where C\n     dD            w |xd, ^\n                            yd] - E[Cw|xd]) - w\n                                                                2               w is a random variable representing\nthe number of times the binary feature of w is 1. The expectation is relative to\nPr(y|x) defined by the current setting of w. The value in the summation is the\ndifference in the expected number of times w is used given both x and ^\n                                                                                                                    y to the\nexpected number of times w is used given just x. The last term is the shrinking\neffect of the prior. With the gradient in hand, we can use any of a number of\noptimization procedures to set w. We use the quasi-Newton method BFGS [6].\n\n\n4     Empirical Evaluation\n\nIn this section we evaluate our Markov network approach by applying it to recognize\nregulatory signals in the E. coli genome. Our hypothesis is that the CMN-OP\nmodels will provide more accurate predictions than either of two baselines: (i)\npredicting the signals independently, and (ii) predicting the signals using an HMM.\n\nAll three approaches we evaluate  the Markov networks and the two baselines \nemploy two submodels [1]. The first submodel is an HMM that is used to predict\n\n\f\n(a)                                                           (b)                                                           (c)\n\n                                 Promoters                                                      Terminators                                           Overlapping Terminators\n\n               1                                                              1                                                            1\n                                  CMN-OP                                                          CMN-OP                                                       CMN-OP\n                                        HMM                                                             HMM                                                          HMM\n              0.8                       SCAN                                 0.8                        SCAN                              0.8                        SCAN\n\n              0.6                                                            0.6                                                          0.6\n\n\n              0.4                                                            0.4                                                          0.4\n Precision                                                      Precision                                                    Precision\n\n              0.2                                                            0.2                                                          0.2\n\n\n               0                                                              0                                                            0\n                     0    0.2    0.4     0.6    0.8      1                          0    0.2     0.4     0.6    0.8    1                         0     0.2    0.4     0.6     0.8    1\n\n                                  Recall                                                          Recall                                                       Recall\n\n\nFigure 4: Precision-recall curves for the CMN-OP, HMM and SCAN models on (a) the\npromoter localization task, (b) the terminator localization task and (c) the terminator\nlocalization task for terminators known to overlap genes or promoters.\n\n\n\ncandidate promoters and the second submodel is a stochastic context free grammar\n(SCFG) that is used to predict candidate terminators. The first baseline approach,\nwhich we refer to as SCAN, involves \"scanning\" a promoter model and a terminator\nmodel along each sequence being processed, and at each position producing a score\nindicating the likelihood that a promoter or terminator starts at that position. With\nthis baseline, each prediction is made independently of all other predictions. The\nsecond baseline is an HMM, similar to the one depicted in Figure 1(b). The HMM\nthat we use here, does not contain the gene submodel shown in Figure 1(b) because\nthe sequences we use in our experiments do not contain entire genes. We have the\nHMM and CMN-OP models make terminator and promoter predictions for each\nposition in each test sequence. We do this using posterior decoding which involves\nhaving a model compute the probability that a promoter (terminator) ends at a\nspecified position given that the model somehow explains the sequence.\n\nThe data set we use consists of 2,876 subsequences of the E. coli genome that\ncollectively contain 471 known promoters and 211 known terminators. Using ten-\nfold cross-validation, we evaluate the three methods by considering how well each\nmethod is able to localize predicted promoters and terminators in the test sequences.\nUnder this evaluation criterion, a correct prediction predicts a promoter (termina-\ntor) within k bases of an actual promoter (terminator). We set k to 10 for promoters\nand to 25 for terminators. For all methods, we plot precision-recall (PR) curves by\nvarying a threshold on the prediction confidences. Recall is defined as                                                                                                        T P     , and\n                                                                                                                                                                      T P +F N\nprecision is defined as                                   T P                 , where T P is the number of true positive predictions,\n                                                       T P +F P\nF N is the number of false negatives, and F P is the number of false positives.\n\nFigures 4(a) and 4(b) show PR curves for the promoter and terminator localization\ntasks, respectively. For both cases, the HMM and CMN-OP models are clearly\nsuperior to the SCAN models. This result indicates the value of taking the regu-\nlarities of relationships among these signals into account when making predictions.\nFor the case of localizing terminators, the CMN-OP PR curve dominates the curve\nfor the HMMs. The difference is not so marked for promoter localization, however.\nAlthough the CMN-OP curve is better at high recall levels, the HMM curve is\nsomewhat better at low recall levels. Overall, we conclude that these results show\nthe benefits of representing relationships among predicted signals (as is done in the\nHMMs and CMN-OP models) and being able to represent and predict overlapping\nsignals. Figure 4(c) shows the PR curves specifically for a set of filtered test sets\nin which each actual terminator overlaps either a gene or a promoter. These curves\nindicate that the CMN-OP models have a particular advantage in these cases.\n\n\f\n5    Conclusion\n\nWe have presented an approach, based on Markov networks, able to naturally rep-\nresent and predict overlapping sequence elements. Our approach first generates a\nset of candidate elements by having a set of models independently make predictions.\nThen, we construct a Markov network to decide which candidate predictions to ac-\ncept. We have empirically validated our approach by using it to recognize promoter\nand terminator \"signals\" in a bacterial genome. Our experiments demonstrate that\nour approach provides more accurate predictions than baseline HMM models.\n\nAlthough we describe and evaluate our approach in the context of genomics, we\nbelieve that it has other applications as well. Consider, for example, the task of\nsegmenting and indexing audio and video streams [7]. We might want to annotate\nsegments of a stream that correspond to specific types of events or to particular\nindividuals who appear or are speaking. Clearly, there might be overlapping events\nand appearances of people, and moreover, there are likely to be dependencies among\nevents and appearances. Any problem with these two properties is a good candidate\nfor our Markov-network approach.\n\nAcknowledgments\n\nThis research was supported in part by NSF grant IIS-0093016, and NIH grants\nT15-LM07359-01 and R01-LM07050-01.\n\n\nReferences\n\n [1] J. Bockhorst, Y. Qiu, J. Glasner, M. Liu, F. Blattner, and M. Craven. Predicting\n     bacterial transcription units using sequence and expression data. Bioinformatics,\n     19(Suppl. 1):i34i43, 2003.\n\n [2] M. Ermolaeva, H. Khalak, O. White, H. Smith, and S. Salzberg. Prediction of tran-\n     scription terminators in bacterial genomes. J. of Molecular Biology, 301:2733, 2000.\n\n [3] P. Felzenszwalb and D. Huttenlocher. Efficient matching of pictorial structures. In\n     Proc. of the 2000 IEEE Conf. on Computer Vision and Pattern Recognition, 6675.\n\n [4] Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning,\n     29:245273, 1997.\n\n [5] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic\n     models for segmenting and labeling sequence data. In Proc. of the 18th Internat. Conf.\n     on Machine Learning, pages 282289, Williamstown, MA, 2001. Morgan Kaufmann.\n\n [6] R. Malouf. A comparison of algorithms for maximum entropy parameter estimation.\n     Sixth workshop on computational language learning (CoNLL), 2002.\n\n [7] National Institute of Standards and Technology. TREC video retrieval evaluation\n     (TRECVID), 2004. http://www-nlpir.nist.gov/projects/t01v/.\n\n [8] J. Pearl. Probabalistic Reasoning in Intelligent Systems: Networks of Plausible Infer-\n     ence. Morgan Kaufmann, San Mateo, CA, 1988.\n\n [9] A. Pedersen, P. Baldi, S. Brunak, and Y. Chauvin. Characterization of prokaryotic\n     and eukaryotic promoters using hidden Markov models. In Proc. of the 4th Interna-\n     tional Conf. on Intelligent Systems for Molecular Biology, pages 182191, St. Louis,\n     MO, 1996. AAAI Press.\n\n[10] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational\n     data. In Proc. of the 18th International Conf. on Uncertainty in Artificial Intelligence,\n     Edmonton, Alberta, 2002. Morgan Kaufmann.\n\n[11] T. Yada, Y. Totoki, T. Takagi, and K. Nakai. A novel bacterial gene-finding system\n     with improved accuracy in locating start codons. DNA Research, 8(3):97106, 2001.\n\n\f\n", "award": [], "sourceid": 2546, "authors": [{"given_name": "Mark", "family_name": "Craven", "institution": null}, {"given_name": "Joseph", "family_name": "Bockhorst", "institution": null}]}