{"title": "Pictorial Structures for Molecular Modeling: Interpreting Density Maps", "book": "Advances in Neural Information Processing Systems", "page_first": 369, "page_last": 376, "abstract": null, "full_text": " Pictorial Structures for Molecular \nModeling: Interpreting Density Maps \n\n \n \n \n \n \n\nFrank DiMaio, Jude Shavlik \n\nDepartment of Computer Sciences \nUniversity of Wisconsin-Madison \n\n{dimaio,shavlik}@cs.wisc.edu \n\nDepartment of Biochemistry \n\nUniversity of Wisconsin-Madison \n\nGeorge Phillips \n\nphillips@biochem.wisc.edu \n\nAbstract \n\nX-ray crystallography is currently the most common way protein \nstructures are elucidated. One of the most time-consuming steps in \nthe crystallographic process is interpretation of the electron density \nmap, a task that involves finding patterns in a three-dimensional \npicture of a protein. This paper describes DEFT (DEFormable \nTemplate), an algorithm using pictorial structures to build a \nflexible protein model from the protein's amino-acid sequence. \nMatching this pictorial structure into the density map is a way of \nautomating density-map interpretation. Also described are several \nextensions to the pictorial structure matching algorithm necessary \nfor this automated interpretation. DEFT is tested on a set of \ndensity maps ranging from 2 to 4\u00c5 resolution, producing root-\nmean-squared errors ranging from 1.38 to 1.84\u00c5. \n\n1 Introduction \nAn important question in molecular biology is what is the structure of a particular \nprotein? Knowledge of a protein\u2019s unique conformation provides insight into the \nmechanisms by which a protein acts. However, no algorithm exists that accurately \nmaps sequence to structure, and one is forced to use \"wet\" laboratory methods to \nelucidate the structure of proteins. The most common such method is x-ray \ncrystallography, a rather tedious process in which x-rays are shot through a crystal \nof purified protein, producing a pattern of spots (or reflections) which is processed, \nyielding an electron density map. The density map is analogous to a three-\ndimensional image of the protein. The final step of x-ray crystallography \u2013 referred \nto as interpreting the map \u2013 involves fitting a complete molecular model (that is, the \nposition of each atom) of the protein into the map. Interpretation is typically \nperformed by a crystallographer using a time-consuming manual process. With \nlarge research efforts being put \ninto high-throughput structural genomics, \naccelerating this process is important. We investigate speeding the process of x-ray \ncrystallography by automating this time-consuming step. \nWhen interpreting a density map, the amino-acid sequence of the protein is known \nin advance, giving the complete topology of the protein. However, the intractably \nlarge conformational space of a protein \u2013 with hundreds of amino acids and \nthousands of atoms \u2013 makes automated map interpretation challenging. A few \ngroups have attempted automatic interpretation, with varying success [1,2,3,4]. \n\n\f \n\n 1\u00c5 2\u00c5 3\u00c5 4\u00c5 5\u00c5\n\n \n\nFigure 1: This graphic \nillustrates density map \nquality at various reso-\nlutions. All resolutions \ndepict the same alpha \nhelix structure \n\nConfounding the problem are several sources of error that make automated \ninterpretation extremely difficult. The primary source of difficulty is due to the \ncrystal only diffracting to a certain extent, eliminating higher frequency components \nof the density map. This produces an overall blurring effect evident in the density \nmap. This blurring is quantified as the resolution of the density map and is \nillustrated in Figure 1. Noise inherent in data collection further complicates \ninterpretation. Given minimal noise and sufficiently good resolution \u2013 about 2.3\u00c5 \nor less \u2013 automated density map interpretation is essentially solved [1]. However, \nin poorer quality maps, interpretation is difficult and inaccurate, and other \nautomated approaches have failed. \nThe remainder of the paper describes DEFT (DEFormable Template), our \ncomputational framework for building a flexible three-dimensional model of a \nmolecule, which is then used to locate patterns in the electron density map. \n\n2 Pictorial structures \n\nin \n\nPictorial structures model classes of objects as a single flexible template. The \ntemplate represents the object class as a collection of parts linked in a graph \nstructure. Each edge defines a relationship between the two parts it connects. For \nexample, a pictorial structure for a face may include the parts \"left eye\" and \"right \neye.\" Edges connecting these parts could enforce the constraint that the left eye is \nadjacent to the right eye. A dynamic programming (DP) matching algorithm of \nFelzenszwalb and Huttenlocher (hereafter referred to as the F-H matching \nalgorithm) [5] allows pictorial structures to be quickly matched into a two-\ndimensional image. The matching algorithm finds the globally optimal position and \nthe pictorial structure, assuming conditional \norientation of each part \nindependence on the position of each part given its neighbors. \nFormally, we represent the pictorial structure as a graph G = (V,E), V = {v1,v2,\u2026,vn} \nthe set of parts, and edge eij \u2208 E connecting neighboring parts vi and vj if an explicit \ndependency exists between the configurations of the corresponding parts. Each part \nvi is assigned a configuration li describing the part's position and orientation in the \nimage. We assume Markov independence: the probability distribution over a part's \nconfigurations is conditionally independent of every other part's configuration, \ngiven the configuration of all the part's neighbors in the graph. We assign each edge \na deformation cost dij(li,lj), and each part a \"mismatch\" cost mi(li,I). These functions \nare the negative log likelihoods of a part (or pair of parts) taking a specified \nconfiguration, given the pictorial structure model. \nThe matching algorithm places the model into the image using maximum-likelihood. \nThat is, it finds the configuration L of parts in model \u0398 in image I maximizing \n \n\nLIP\n(\n\nILP\n(\n\n(1) \n\n\u221d\u0398\n\n)\n\nLP\n(\n\n=\n\n\u0398\n\n)\n\n\u0398\n\n)\n\n(\nl\n,m\ni\n\ni\n\n)\n\nI\n\n(\nl\n,m\ni\n\ni\n\nI\n\n \n\n,\n\n,\n\n1\nZ\n\n\u239b\n\u239c\n\u239d\n\n\u239b\nexp\n\u239c\n\u239d\n\n\u2211\n\n\u2208\n\nV\n\niv\n\n\u239e\n\u22c5\u239f\n\u23a0\n\n\u239b\nexp\n\u239c\n\u239d\n\n\u2211\n\n(\n\n,\n\njviv\n\nE\n)\n\u2208\n\n\u239e\n\u239e\n) \u239f\n\u239f\n\u23a0\n\u23a0\n\n\f \n\nO\n\nC\n\nN \n\nC\u03b1\nC\u03b2\n\n \n\nN \nC \u03b1 \n\nC \u03b2 \n\nC \nO \n\nFigure 3. An example of \nthe \nconstruction of a pictorial structure \nmodel given an amino acid. \n\nFigure 2. An \"interpreted\" density map. \nThe right figure shows the arrangement of \natoms that generated the observed density.\n)\n. \nBy monotonicity of exponentiation, this minimizes \nThe F-H matching algorithm places several additional limitations on the pictorial \nstructure. The object's graph must be tree structured (cyclic constraints are not \n, where \nallowed), and the deformation cost function must take the form \nTij and Tji are arbitrary functions and ||\u00b7|| is some norm (e.g. Euclidian distance). \n\njviv\n,\n\n(\nl\n,m\ni\n\ni\n\n\u2211\n\nT\nij\n\nl\n)(\ni\n\n\u2211\n\nT\nji\n\n(\nl\n\ni\n\n\u2208\n\nV\n\niv\n\nI\n\n)\n\nl\n(\n\n)\n\nj\n\n,\n\nl\n\nj\n\n\u2212\n\nd\n\nij\n\nE\n)\n\u2208\n\n+\n\n(\n\n3 Building a flexible atomic model \n\nGiven a three-dimensional map containing a large molecule and the topology (i.e., \nfor proteins, the amino-acid sequence) of that molecule, our task is to determine the \nCartesian coordinates in the 3D density map of each atom in the molecule. Figure 2 \nshows a sample interpreted density map. DEFT finds the coordinates of all atoms \nsimultaneously by first building a pictorial structure corresponding to the protein, \nthen using F-H matching to optimally place the model into the density map. This \nsection describes DEFT's deformation cost function and matching cost function. \nDEFT's deformation cost is related to the probability of observing a particular \nconfiguration of a molecule. Ideally, this function is proportional to the inverse of \nthe molecule's potential function, since configurations with lower potential energy \nare more likely observed in nature. However, this potential is quite complicated and \ncannot be accurately approximated in a tree-structured pictorial structure graph. \nOur solution is to only consider the relationships between covalently bonded atoms. \nDEFT constructs a pictorial structure graph where vertices correspond to non-\nhydrogen atoms, and edges correspond to the covalent bonds joining atoms. The \ncost function each edge defines maintain invariants \u2013 interatomic distance and bond \nangles \u2013 while allowing free rotation around the bond. Given the protein's amino \nacid sequence, model construction, illustrated in Figure 3, is trivial. Each part's \nconfiguration is defined by six parameters: three translational, three rotational \n(Euler angles \u03b1, \u03b2, and \u03b3 ). For the cost function, we define a new connection type \nin the pictorial structure framework, the screw-joint, shown in Figure 4. \nThe screw-joint's cost function is mathematically specified in terms of a directed \nversion of the pictorial structure's undirected graph. Since the graph is constrained \nby the fast matching algorithm to take a tree structure, we arbitrarily pick a root \nnode and point every edge toward this root. We now define the screw joint in terms \nof a parent and a child. We rotate the child such that its z axis is coincident with the \nvector from child to parent, and allow each part in the model (that is, each atom) to \nfreely rotate about its local z axis. The ideal geometry between child and parent is \nthen described by three parameters stored at each edge, xij = (xij, yij, zij). These three \nparameters define the optimal translation between parent and child, in the \ncoordinate system of the parent (which in turn is defined such that its z-axis \ncorresponds to the axis connecting it to its parent). \n\n\f \n\nIn using these to construct the cost function dij, we define the function Tij, which \nmaps a parent vi's configuration li into the configuration lj of that parent's ideal child \nvj. Given parameters xij on the edge between vi and vj, the function is defined \n \nwith \n\n,\n\u03b3\u03b2\u03b1\nj\n\n,\n\u03b3\u03b2\u03b1\ni\n\nzyx\n,\ni\n\n(2) \n\nijT\n\n=\n\n(\n\n)\n\nx\n\ny\n\nz\n\n \n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\n,\n\ni\n\nj\n\ni\n\nj\n\ni\n\nj\n\ni\n\nj\n\nj\n\nj \u03b1\u03b1 =\ni\n\n, \n\n\u03b2\nj\n\n=\n\n)z\n(\nx\natan2\n,\n, \n2\n2\n\u2032\u2212\u2032+\u2032\nzyx\nzyx\n,\n,\n,\n\u2329\nj\nj\ni\ni\ni\n\ny\n\u2329=\u232a\n\n,\n\nj\n\nj\n\n2\u03c0\u03b3\n+\n=\nzyx\n,\n,\n\u2032\u2329+\u232a\n\u2032\n\n(\natan2\n \n\u232a\u2032\n\n)xy\n,\n\u2032\n\u2032\n\n, and \n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\nij\n\nT\n)\n\n with \n\n\u03b3,\u03b2,\u03b1R\n\nzyx\n,\n,(\nij\nij\n\n\u03b3,\u03b2,\u03b1R=\u2032\nzyx\nT\n,(\n),\n\u2032\n\u2032\n\nwhere (x', y', z') is rotation of the bond parameters (xij, yij, zij) to world coordinates. \n the rotation matrix corresponding to \nThat is, \nEuler angles (\u03b1i, \u03b2i, \u03b3i). The expressions for \u03b2j and \u03b3j define the optimal orientation \nof each child: +z coincident with the axis that connects child and parent. \nThe F-H matching algorithm requires our cost function to take a particular form, \nspecifically, it must be some norm. The screw-joint model sets the deformation cost \nbetween parent vi and child vj to the distance between child configuration lj and \nTij(li), the ideal child configuration given parent configuration li (Tji in equation (2) \nis simply the identity function). We use the 1-norm weighted in each dimension, \n\n \n\nd\n\nl\n,(\ni\n\nl\n\nj\n\n)\n\nij\n\n=\n=\n\nT\nl\nl\n)(\n\u2212\nij\ni\nrotate\nw\n( \n\u03b1\u03b1\nj\nij\n\n\u2212\n\nj\n\n)\n\n \n+\n\n \n+\n\n\u2212\n\ni\n\u239b\norient\nw\n(\n \n\u03b2\u03b2\n\u239c\nij\nj\n\u239d\n(\ntranslate\nw\nx\n( \nij\nijw\nrotate\n\ni\nx\ni\n\n\u2212\n\n \n )\natan(\n+\n\nx\n\n2\n+\u2032\u2212\u2032+\u2032\n\ny\n\nz\n\n)\n\n,\n\n2\n\n2\n\n+\n\natan(\n\nxy\n,\n\u2032\n\n)\n\u2032\n\n\u239e\n\u239f\n\u23a0\n\n(3) \n\nj\n\n)\n\n\u2212\n\n\u2212\n\n(\n\u03c0\u03b3\u03b3\n).\n\nz\n\u2032\u2212\n\ni\n)\n\n\u2212\n\nz\n\nz\n\nj\n\n)\n\n+\u2032\u2212\n\nx\n\n(\n\ny\n\n\u2212\n\ny\n\n)\n\n+\u2032\u2212\n\ny\n\n(\n\ni\n\ni\n\nj\n\nj\n\n and \n\nijw\norient\n\nijw\ntranslate\n\n to +100. \n\nijw\ntranslate\n to 0, and \n\nis based upon Cowtan's \n\n is the cost of rotating about a bond, \n\n \n is the cost \n is the cost of translating in x, y or z. \nijw\norient\nijw\nrotate\nimplementation \n\nIn the above equation, \nof rotating around any other axis, and \nDEFT's screw-joint model sets \nDEFT's match-cost \nfunction \nfffear \nalgorithm [4]. This algorithm quickly and efficiently calculates the mean squared \ndistance between a weighted 3D template of density and a region in a density map. \nGiven a learned template and a corresponding weight function, fffear uses a Fourier \nconvolution to determine the maximum likelihood that the weighted template \ngenerated a region of density in the density map. \nFor each non-hydrogen atom \ntemplate \ncorresponding to a neighborhood around that particular atom, using a training set of \ncrystallographer-solved structures. We build a separate template for each atom type \n\u2013 e.g., the \u03b2-carbon (2nd sidechain carbon) of leucine and the backbone oxygen of \nserine \u2013 producing 171 different templates in total. A part's m function is the fffear-\ncomputed mismatch score of that part's template over all positions and orientations. \nOnce we construct the model, parameters \u2013 including the optimal orientation xij \ncorresponding to each edge, and the template for each part \u2013 are learned by training \n\nthe protein, we create a \n\ntarget \n\nin \n\n \n\nvj \n\n\u03b1i \n\n(\u03b2i,\u03b3i)\n\n(xi,yi,zi) \n\nvi\n\n\u03b1j \n\n(\u03b2j,\u03b3j) \n\n(xj,yj,zj) \n\n(x',y',z')\n\nFigure 4: Showing the screw-joint \nconnection between two parts in the \nmodel. In the directed version of the \nMRF, vi is the parent of vj. By \ndefinition, vj is oriented such that \nits local z-axis is coincident with it's \nideal \norientation \nbond \nzyx=ijxv\nT)\n(\n,\n in vi. Bond para-\nij\nij\nmeters \n\n,\nijxv are learned by DEFT. \n\nij\n\n\f \n\nthe model on a set of crystallographer-determined structures. Learning the \norientation parameters is fairly simple: for each atom we define canonic coordinates \n(where +z corresponds to the axis of rotation). For each child, we record the \ndistance r and orientation (\u03b8,\u03c6) in the canonic coordinate frame. We average over \nall atoms of a given type in our training set \u2013 e.g., over all leucine \u03b2-carbon\u2019s \u2013 to \ndetermine average parameters ravg, \u03b8avg, and \u03c6avg. Converting these averages from \nspherical to Cartesian coordinates gives the ideal orientation parameters xij. \nA similarly-defined canonic coordinate frame is employed when learning the model \ntemplates; in this case, DEFT's learning algorithm computes target and weight \ntemplates based on the average and inverse variance over the training set, \nrespectively. Figure 5 shows an overview of the learning process. Implementation \nused Cowtan's Clipper library. \nFor each part in the model, DEFT searches through a six-dimensional conformation \nspace (x,y,z,\u03b1,\u03b2,\u03b3), breaking each dimension into a number of discrete bins. The \ntranslational parameters x, y, and z are sampled over a region in the unit cell. \nRotational space is uniformly sampled using an algorithm described by Mitchell [6]. \n\n4 Model Enhancements \n\nUpon initial testing, the pictorial-structure matching algorithm performs rather \npoorly at the density-map interpretation task. Consequently, we added two routines \n\u2013 a collision-detection routine, and an improved template-matching routine \u2013 to \nDEFT's pictorial-structure matching implementation. Both enhancements can be \napplied to the general pictorial structure algorithm, and are not specific to DEFT. \n\n4.1 Collision Detection \n\nOur closer investigation revealed that much of the algorithm's poor performance is \ndue to distant chains colliding. Since DEFT only models covalent bonds, the \nmatching algorithm sometimes returns a structure with non-bonded atoms \nimpossibly close together. These collisions were a problem in DEFT's initial \nimplementation. Figure 6 shows such a collision (later corrected by the algorithm). \nGiven a candidate solution, it is straightforward to test for spatial collisions: we \nsimply test if any two atoms in the structure are impossibly (physically) close \ntogether. If a collision occurs in a candidate, DEFT perturbs the structure. Though \n\n \n\nO \n\nN \n\nN \n\nO \n\nN \n\nC-1 \n\nC\u03b1 \n\nC \n\nC\u03b2 \n\nO \n\nN+1 \n\nAlanine C\u03b1 \n\nN \n\nC\u03b1 \n\nC\u03b2 \n\nC \nStandard \nOrientation \n\nfffear Target Template Map \n\nN \n\nC\u03b1 \n\nr = 1.53 \n\u03b8 = 0.0\u00b0 \n\u03c6 = -19.3\u00b0 \n\nC \n\nC\u03b2 \n\nr = 1.51 \n\u03b8 = 118.4\u00b0 \n\u03c6 = -19.7\u00b0 \n\nAveraged Bond Geometry \n\nFigure 5: An overview of the parameter-learning process. For each atom of a given \ntype \u2013 here alanine C\u03b1 \u2013 we rotate the atom into a canonic orientation. We then \naverage over every atom of that type to get a template and average bond geometry. \n\n\f \n\n \n\nillustrates \n\nFigure 6. This \nthe \ncollision avoidance algorithm. On \nthe left is a collision (the predicted \nmolecule is in the darker color). \nThe amino acid's sidechain \nis \nplaced coincident with the back-\nbone. \nthe right, collision \navoidance finds the right structure. \n\n On \n\nthe optimal match is no longer returned, this approach works well in practice. If \ntwo atoms are both aligned to the same space in the most probable conformation, it \nseems quite likely that one of the atoms belongs there. Thus, DEFT handles \ncollisions by assuming that at least one of the two colliding branches is correct. \nWhen a collision occurs, DEFT finds the closest branch point above the colliding \nnodes \u2013 that is, the root y of the minimum subtree containing all colliding nodes. \nDEFT considers each child xi of this root, matching the subtree rooted at xi, keeping \nthe remainder of the tree fixed. The change in score for each perturbed branch is \nrecorded, and the one with the smallest score increase is the one DEFT keeps. \nTable 1 describes the collision-avoidance algorithm. In the case that the colliding \nnode is due to a chain wrapping around on itself (and not two branches running into \none another), the root y is defined as the colliding node nearest to the top of the tree. \nEverything below y is matched anew while the remainder of the structure is fixed. \n\n4.2 \n\nImproved template matching \n\nIn our original implementation, DEFT learned a template by averaging over each of \nthe 171 atom types. For example, for each of the 12 (non-hydrogen) atoms in the \namino-acid tyrosine we build a single template \u2013 producing 12 tyrosine templates in \ntotal. Not only is this inefficient, requiring DEFT to match redundant templates \nagainst the unsolved density map, but also for some atoms in flexible sidechains, \naveraging blurs density contributions from atoms more than a bond away from the \ntarget, losing valuable information about an atom's neighborhood. \nDEFT improves the template-matching algorithm by modeling the templates using a \nmixture of Gaussians, a generative model where each template is modeled using a \nmixture of basis templates. Each basis template is simply the mean of a cluster of \ntemplates. Cluster assignments are learned iteratively using the EM algorithm. In \neach iteration of the algorithm we compute the a priori likelihood of each image \nbeing generated by a particular cluster mean (the E step). Then we use these \nprobabilities to update the cluster means (the M step). After convergence, we use \neach cluster mean (and weight) as an fffear search target. \n\nTable 1. DEFT's collision handing routine. \n\nGiven: An illegal pictorial structure configuration L = {l1,l2,\u2026,ln} \nReturn: A legal perturbation L' \nAlgorithm: \n \n \n \n \n \n \n \n \n\nX \u2190 all nodes in L illegally close to some other node \ny \u2190 root of smallest subtree containing all nodes in X \nfor each child xi of y \n L\n score\nimin \u2190 arg min (scorei) \nL' \u2190 replace subtree rooted at xi in L with Limin \nreturn L' \n\ni \u2190 score(Li) \u2013 score(subtree of L rooted at xi) \n\ni \u2190 optimal position of subtree rooted at xi fixing remainder of tree \n\n\f5 Experimental Studies \n\n \n\nWe tested DEFT on a set of proteins provided by the Phillips lab at the University \nof Wisconsin. The set consists of four different proteins, all around 2.0\u00c5 in \nresolution. With all four proteins, reflections and experimentally-determined initial \nphases were provided, allowing us to build four relatively poor-quality density \nmaps. To test our algorithm with poor-quality data, we down-sampled each of the \nmaps to 2.5, 3 and 4\u00c5 by removing higher-resolution reflections and recomputed the \ndensity. These down-sampled maps are physically identical to maps natively \nconstructed at this resolution. Each structure had been solved by crystallographers. \nFor this paper, our experiments are conducted under the assumption that the \nmainchain atoms of the protein were known to within some error factor. This \nassumption is fair; approaches exist for mainchain tracing in density maps [7]. \nDEFT simply walks along the mainchain, placing atoms one residue at a time \n(considering each residue independently). \nWe split our dataset into a training set of about 1000 residues and a test set of about \n100 residues (from a protein not in the training set). Using the training set we built \na set of templates for matching using fffear. The templates extended to a 6\u00c5 radius \naround each atom at 0.5\u00c5 sampling. Two sets of templates were built and \nsubsequently matched: a large set of 171 produced by averaging all training set \ntemplates for each atom type, and a smaller set of 24 learned through by the EM \nalgorithm. We ran DEFT's pictorial structure matching algorithm using both sets of \ntemplates, with and without the collision detection code. \nAlthough placing individual atoms into the sidechain is fairly quick, taking less than \nsix hours for a 200-residue protein, computing fffear match scores is very CPU-\ndemanding. For each of our 171 templates, fffear takes 3-5 CPU-hours to compute \nthe match score at each location in the image, for a total of one CPU-month to \nmatch templates into each protein! Fortunately the task is trivially parallelized; we \nregularly do computations on over 100 computers simultaneously. \nThe results of all tests are summarized in Figure 7. Using individual-atom \ntemplates and the collision detection code, the all-atom RMS deviation varied from \n1.38\u00c5 at 2\u00c5 resolution to 1.84\u00c5 at 4\u00c5. Using the EM-based clusters as templates \nproduced slight or no improvement. However, much less work is required; only 24 \ntemplates need to be matched to the image instead of 171 individual-atom templates. \nFinally, it was promising that collision detection leads to significant error reduction. \nIt \nthat \nindividually using the improved \ntemplates and using the collision \navoidance both \nthe \nsearch \nresults; however, using \nboth together was a bit worse than \nwith collision detection alone. \nMore research is needed to get a \nsynergy \ntwo \nenhancements. Further investi-\ngation is also needed balancing \nbetween the number and templates \nand template size. The match cost \nfunction is a critically important \npart of DEFT and improvements \nthere will have the most profound \nimpact on the overall error. \n\nbase\nimproved templates only\ncollision detection + improved templates\ncollision detection only\n\nFigure 7. Testset error under four strategies. \n\nDensity Map Resolution\n\ninteresting \n\nimproved \n\n \n\nv\ne\nD\nS\nM\nR\nn\ne\nt\no\nr\nP\n\ni\n\n \n\n \nt\ns\ne\nT\n\nto note \n\nbetween \n\n3.5\n\n3.0\n\n0.5\n\n0.0\n\nthe \n\n2.5\n\n2.0\n\n1.5\n\n1.0\n\nn\no\ni\nt\na\n\ni\n\nis \n\n4.0\n\n2A\n\n2.5A\n\n3A\n\n4A\n\n\f \n\n6 Conclusions and future work \n\nDEFT has applied the F-H pictorial structure matching algorithm to the task of \ninterpreting electron density maps. In the process, we extended the F-H algorithm \nin three key ways. In order to model atoms rotating in 3D, we designed another \njoint type, the screw joint. We also developed extensions to deal with spatial \ncollisions of parts in the model, and implemented a slightly-improved template \nconstruction routine. Both enhancements can be applied to pictorial-structure \nmatching in general, and are not specific to the task presented here. \nDEFT attempts to bridge the gap between two types of model-fitting approaches for \ninterpreting electron density maps. Several techniques [1,2,3] do a good job \nplacing individual atoms, but all fail around 2.5-3\u00c5 resolution. On the other hand, \nfffear [4] has had success finding rigid elements in very poor resolution maps, but is \nunable to locate highly flexible \u201cloops\u201d. Our work extends the resolution threshold \nat which individual atoms can be identified in electron density maps. DEFT's \nflexible model combines weakly-matching image templates to locate individual \natoms from maps where individual atoms have been blurred away. No other \napproach has investigated sidechain refinement in structures of this poor resolution. \nWe next plan to use DEFT as the refinement phase complementing a coarser \nmethod. Rather than model the configuration of each individual atom, instead treat \neach amino acid as a single part in the flexible template, only modeling rotations \nalong the backbone. Then, our current algorithm could place each individual atom. \nA different optimization algorithm that handles cycles in the pictorial structure \ngraph would better handle collisions (allowing edges between non-bonded atoms). \nIn recent work [8], loopy belief propagation [9] has been used with some success \n(though with no optimality guarantee). We plan to explore the use of belief propa-\ngation in pictorial-structure matching, adding edges in the graph to avoid collisions. \nFinally, the pictorial-structure framework upon which DEFT is built seems quite \nrobust; we believe the accuracy of our approach can be substantially improved \nthrough implementation improvements, allowing finer grid spacing and larger fffear \nML templates. The flexible molecular template we have described has the potential \nto produce an atomic model in a map where individual atoms may not be visible, \nthrough the power of combining weakly matching image templates. DEFT could \nprove important in high-throughput protein-structure determination. \nAcknowledgments \nThis work supported by NLM Grant 1T15 LM007359-01, NLM Grant 1R01 LM07050-01, \nand NIH Grant P50 GM64598. \n\nReferences \n[1] A. Perrakis, T. Sixma, K. Wilson, & V. Lamzin (1997). wARP: improvement and \n\nextension of crystallographic phases. Acta Cryst. D53:448-455. \n\n[2] D. Levitt (2001). A new software routine that automates the fitting of protein X-ray \n\ncrystallographic electron density maps. Acta Cryst. D57:1013-1019. \n\n[3] T. Ioerger, T. Holton, J. Christopher, & J. Sacchettini (1999). TEXTAL: a pattern \n\nrecognition system for interpreting electron density maps. Proc. ISMB:130-137. \n[4] K. Cowtan (2001). Fast fourier feature recognition. Acta Cryst. D57:1435-1444. \n[5] P. Felzenszwalb & D. Huttenlocher (2000). Efficient matching of pictorial structures. \n\nProc. CVPR. pp. 66-73. \n\n[6] J. Mitchell (2002). Uniform distributions of 3D rotations. Unpublished Document. \n[7] J. Greer (1974). Three-dimensional pattern recognition. J. Mol. Biol. 82:279-301. \n[8] E. Sudderth, M. Mandel, W. Freeman & A Willsky (2005). Distributed occlusion \n\nreasoning for tracking with nonparametric belief propagation. NIPS. \n\n[9] D. Koller, U. Lerner & D. Angelov (1999). A general algorithm for approximate \n\ninference and its application to hybrid Bayes nets. UAI. 15:324-333. \n\n\f", "award": [], "sourceid": 2558, "authors": [{"given_name": "Frank", "family_name": "Dimaio", "institution": null}, {"given_name": "George", "family_name": "Phillips", "institution": null}, {"given_name": "Jude", "family_name": "Shavlik", "institution": null}]}