{"title": "An Error Detection and Correction Framework for Connectomics", "book": "Advances in Neural Information Processing Systems", "page_first": 6818, "page_last": 6829, "abstract": "We define and study error detection and correction tasks that are useful for 3D reconstruction of neurons from electron microscopic imagery, and for image segmentation more generally. Both tasks take as input the raw image and a binary mask representing a candidate object. For the error detection task, the desired output is a map of split and merge errors in the object. For the error correction task, the desired output is the true object. We call this object mask pruning, because the candidate object mask is assumed to be a superset of the true object. We train multiscale 3D convolutional networks to perform both tasks. We find that the error-detecting net can achieve high accuracy. The accuracy of the error-correcting net is enhanced if its input object mask is ``advice'' (union of erroneous objects) from the error-detecting net.", "full_text": "An Error Detection and Correction Framework for\n\nConnectomics\n\nJonathan Zung\n\nPrinceton University\n\nIgnacio Tartavull\nPrinceton University\n\nKisuk Lee\n\nPrinceton University and MIT\n\njzung@princeton.edu\n\ntartavull@princeton.edu\n\nkisuklee@mit.edu\n\nH. Sebastian Seung\nPrinceton University\n\nsseung@princeton.edu\n\nAbstract\n\nWe de\ufb01ne and study error detection and correction tasks that are useful for 3D\nreconstruction of neurons from electron microscopic imagery, and for image seg-\nmentation more generally. Both tasks take as input the raw image and a binary\nmask representing a candidate object. For the error detection task, the desired\noutput is a map of split and merge errors in the object. For the error correction task,\nthe desired output is the true object. We call this object mask pruning, because\nthe candidate object mask is assumed to be a superset of the true object. We train\nmultiscale 3D convolutional networks to perform both tasks. We \ufb01nd that the\nerror-detecting net can achieve high accuracy. The accuracy of the error-correcting\nnet is enhanced if its input object mask is \u201cadvice\u201d (union of erroneous objects)\nfrom the error-detecting net.\n\n1\n\nIntroduction\n\nWhile neuronal circuits can be reconstructed from volumetric electron microscopic imagery, the\nprocess has historically [30] and even recently [28] been highly laborious. One of the most time-\nconsuming reconstruction tasks is the tracing of the brain\u2019s \u201cwires,\u201d or neuronal branches. This task\nis an example of instance segmentation, and can be automated through computer detection of the\nboundaries between neurons. Convolutional nets were \ufb01rst applied to neuronal boundary detection\na decade ago [10, 29]. Since then convolutional nets have become the standard approach, and the\naccuracy of boundary detection has become impressively high [31, 1, 15, 6].\nGiven the low error rates, it becomes helpful to think of subsequent processing steps in terms of\nmodules that detect and correct errors. In the error detection task (Figure 1a), the input is the raw\nimage and a binary mask that represents a candidate object. The desired output is a map containing\nthe locations of split and merge errors in the candidate object. Related work on this problem has been\nrestricted to detection of merge errors only by either hand-designed [18] or learned [24] computations.\nHowever, a typical segmentation contains both split and merge errors, so it would be desirable to\ninclude both in the error detection task.\nIn the error correction task (Figure 1b), the input is again the raw image and a binary mask that\nrepresents a candidate object. The candidate object mask is assumed to be a superset of a true object,\nwhich is the desired output. With this assumption, error correction is formulated as object mask\npruning. Object mask pruning can be regarded as the splitting of undersegmented objects to create\ntrue objects. In this sense, it is the opposite of agglomeration, which merges oversegmented objects\nto create true objects [11, 21]. Object mask pruning can also be viewed as the subtraction of voxels\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f(a) Error detection task for split (top) and merge\n(bottom) errors. The desired output is an error\nmap (red). A voxel in the error map is red if and\nonly if a window centered on it contains a split\nor merge error. We also consider a variant of the\ntask in which the object mask is the sole input; the\ngrayscale image is not used.\n\n(b) The object mask pruning task. The input mask\nis assumed to be a superset of a true object. The\ndesired output (right) is the true object containing\nthe central voxel (black dot). In the \ufb01rst case there\nis nothing to prune, while in the second case the\nobject not overlapping the central voxel is erased.\n\nFigure 1: Error detection and correction tasks. For both tasks, the inputs are a candidate object mask\n(blue) and the original image (grayscale). Note that diagrams are 2D for illustrative purposes, but in\nreality the inputs and outputs are 3D.\n\nfrom an object to create a true object. In this sense, it is the opposite of a \ufb02ood-\ufb01lling net [13, 12]\nor MaskExtend [18], each iteration of which is the addition of voxels to an object to create a true\nobject. Iterative mask extension has been studied in other work on instance segmentation in computer\nvision [25, 23]. The task of generating an object mask de novo from an image has also been studied\nin computer vision [22].\nWe implement both error detection and error correction using 3D multiscale convolutional networks.\nOne can imagine multiple uses for these nets in a connectomics pipeline. For example, the error-\ndetecting net could be used to reduce the amount of labor required for proofreading by directing human\nattention to locations in the image where errors are likely. This labor reduction could be substantial\nbecause the declining error rate of automated segmentation has made it more time-consuming for a\nhuman to \ufb01nd an error.\nWe show that the error-detecting net can provide \u201cadvice\u201d to the error-correcting net in the following\nway. To create the candidate object mask for the error-correcting net from a baseline segmentation,\none can simply take the union of all erroneous segments as found by the error-detecting net. Since\nthe error rate in the baseline segmentation is already low, this union is small and it is easy to select\nout a single object. The idea of using the error detector to choose locations for the error corrector was\nproposed previously though not actually implemented [18]. Furthermore, the idea of using the error\ndetector to not only choose locations but provide \u201cadvice\u201d is novel as far as we know.\nWe contend that our approach decomposes the neuron segmentation problem into two strictly easier\npieces. First, we hypothesize that recognizing an error is much easier than producing the correct\nanswer. Indeed, humans are often able to detect errors using only morphological cues such as abrupt\nterminations of axons, but may have dif\ufb01culty actually \ufb01nding the correct extension.\nOn the other hand, if the error-detection network has high accuracy and the initial set of errors is\nsparse, then the error correction module only needs to prune away a small number of irrelevant parts\nfrom the candidate mask described above. This contrasts with the \ufb02ood-\ufb01lling task which involves an\nunconstrained search for new parts to add. Given that most voxels are not a part of the object to be\nreconstructed, an upper bound on the object is usually more informative than a lower bound. As an\nadded bene\ufb01t, selective application of the error correction module near likely errors makes ef\ufb01cient\nuse of our computational budget [18].\n\n2\n\n\fIn this paper, we support the intuition above by demonstrating high accuracy detection of both split and\nmerge errors. We also demonstrate a complete implementation of the stated error detection-correction\nframework, and report signi\ufb01cant improvements upon our baseline segmentation.\nSome of the design choices we made in our neural networks may be of interest to other researchers.\nOur error-correcting net is trained to produce a vector \ufb01eld via metric learning instead of directly\nproducing an object mask. The vector \ufb01eld resembles a semantic labeling of the image, so this\napproach blurs the distinction between instance and semantic segmentation. This idea is relatively\nnew in computer vision [7, 4, 3]. Our multiscale convolutional net architecture, while similar in spirit\nto the popular U-Net [26], has some novelty. With proper weight sharing, our model can be viewed as\na feedback recurrent convolutional net unrolled in time (see the appendix for details). Although our\nmodel architecture is closely related to the independent works of [27, 9, 5], we contribute a feedback\nrecurrent convolutional net interpretation.\n\n2 Error detection\n\n2.1 Task speci\ufb01cation: detecting split and merge errors\n\nGiven a single segment in a proposed segmentation presented as an object mask Obj, the error\ndetection task is to produce a binary image called the error map, denoted Errpx\u00d7py\u00d7pz (Obj). The\nde\ufb01nition of the error map depends on a choice of a window size px \u00d7 py \u00d7 pz. A voxel i in the error\nmap is 0 if and only if the restriction of the input mask to a window centred at i of size px \u00d7 py \u00d7 pz\nis voxel-wise equal to the restriction of some object in the ground truth. Observe that the error map is\nsensitive to both split and merge errors.\nA smaller window size allows us to localize errors more precisely. On the other hand, if the window\nradius is less than the width of a typical boundary between objects, it is possible that two objects\nparticipating in a merge error never appear in the same window. These merge errors would not be\nclassi\ufb01ed as an error in any window.\nWe could use a less stringent measure than voxel-wise equality that disregards small perturbations\nof the boundaries of objects. However, our proposed segmentations are all composed of the same\nbuilding blocks (supervoxels) as the ground truth segmentation, so this is not an issue for us.\nObj Err(Obj) \u2217 Obj where \u2217 represents pointwise multipli-\ncation. In other words, we restrict the error map for each object to the object itself, and then sum the\nresults. The \ufb01gures in this paper show the combined error map.\n\nWe de\ufb01ne the combined error map as(cid:80)\n\n2.2 Architecture of the error-detecting net\n\nWe take a fully supervised approach to error detection. We implement error detection using a\nmultiscale 3D convolutional network. The architecture is detailed in Figure 2. Its design is informed\nby experience with convolutional networks for neuronal boundary detection (see [15]) and re\ufb02ects\nrecent trends in neural network design [26, 8]. Its \ufb01eld of view is Px \u00d7 Py \u00d7 Pz = 318 \u00d7 318 \u00d7 33\n(which is roughly cubic in physical size given the anisotropic resolution of our dataset). The network\ncomputes (a downsampling of) Err46\u00d746\u00d77. At test time, we perform inference in overlapping\nwindows and conservatively blend the output from overlapping windows using a maximum operation.\nWe trained two variants, one of which takes as input only Obj, and another which additionally\nreceives as input the raw image.\n\n3 Error correction\n\n3.1 Task speci\ufb01cation: object mask pruning\nGiven an image patch of size Px \u00d7 Py \u00d7 Pz and a candidate object mask of the same dimensions, the\nobject mask pruning task is to erase all voxels which do not belong to the true object overlapping the\ncentral voxel. The candidate object mask is assumed to be a superset of the true object.\n\n3\n\n\fFigure 2: Architectures for the error-detecting and error-correcting nets respectively. Each node\nrepresents a layer and the number inside represents the number of feature maps. The layers closer\nto the top of the diagram have lower resolution than the layers near the bottom. We make savings\nin computation by minimizing the number of high resolution feature maps. The diagonal arrows\nrepresent strided convolutions, while the horizontal arrows represent skip connections. Associated\nwith the diagonal arrows, black numbers indicate \ufb01lter size and red numbers indicate strides in\nx \u00d7 y \u00d7 z. Due to the anisotropy of the resolution of the images in our dataset, we design our nets\nso that the \ufb01rst convolutions are exclusively 2D while later convolutions are 3D. The \ufb01eld of view\nof a unit in the higher layers is therefore roughly cubic. To limit the number of parameters in our\nmodel, we factorize all 3D convolutions into a 2D convolution followed by a 1D convolution in\nz-dimension. We also use weight sharing between some convolutions at the same height. Note that\nthe error-correcting net is a prolonged, symmetric version of the error-detecting net. For more detail\nof the error corrector, see the appendix.\n\n3.2 Architecture of the error-correcting net\n\nYet again, we implement error correction using a multiscale 3D convolutional network. The ar-\nchitecture is detailed in Figure 2. One dif\ufb01culty with training a neural network to reconstruct the\nobject containing the central voxel is that the desired output can change drastically as the central\nvoxel moves between objects. We use an intermediate representation whose role is to soften this\ndependence on the location of the central voxel. The desired intermediate representation is a k = 6\ndimensional vector v(x, y, z) at each point (x, y, z) such that points within the same object have\nsimilar vectors and points in different objects have different vectors. We transform this vector \ufb01eld\ninto a binary image M representing the object overlapping the central voxel as follows:\n\nM (x, y, z) = exp(cid:0)\u2212||v(x, y, z) \u2212 v(0, 0, 0)||2(cid:1) ,\n\nwhere (0, 0, 0) is the central voxel. When an over-segmentation is available, we replace v(0, 0, 0) with\nthe average of v over the supervoxel containing the central voxel. This trick makes it unnecessary to\ncentre our windows far away from a boundary, as was necessary in [13]. Note that we backpropagate\n\n4\n\nOutput18x18x7Input (2 channels)318x318x331824283248218182424242832324x4x44x4x44x4x44x4x14x4x16424282832484x4x4322832486448642x2x12x2x12x2x22x2x22x2x22x2x2Summation joiningStrided convolutionSkip connectionStrided transposed convolution\fFigure 3: An example of a mistake in the initial segmentation. The dendrite is missing a spine. The\nred overlay on the left shows the combined error map (de\ufb01ned in Section 2.1); the stump in the centre\nof the image was clearly marked as an error.\n\nFigure 4: The right shows all objects which contained a detected error in the vicinity. For clarity,\neach supervoxel was drawn with a different colour. The union of these objects is the binary mask\nwhich is provided as input to the error correction network. For clarity, these objects were clipped to\nlie within the white box representing the \ufb01eld of view of our error correction network. The output of\nthe error correction network is overlaid in blue on the left.\n\nFigure 5: The supervoxels assembled in accordance with the output of the error correction network.\n\nthrough the transform M, so the vector representation may be seen as an implementation detail and\nthe \ufb01nal output of the network is just a (soft) binary image.\n\n4 How the error detector can \u201cadvise\u201d the error corrector\n\nSuppose that we would like to correct the errors in a baseline segmentation. Obviously, the error-\ndetecting net can be used to \ufb01nd locations where the error-correcting net can be applied [18]. Less\nobviously, the error-detecting net can be used to construct the object mask that is the input to the\nerror-correcting net. We refer to this object mask as the \u201cadvice mask\u201d and its construction is\n\n5\n\n\fimportant because the baseline object to be corrected might contain split as well as merge errors,\nwhile the object mask pruning task can correct only merge errors.\nThe advice mask is de\ufb01ned is the union of the baseline object at the central pixel with all other\nbaseline objects in the window that contain errors as judged by the error-detecting net. The advice\nmask is a superset of the true object overlapping the central voxel, assuming that the error-detecting\nnet makes no mistakes. Therefore advice is suitable as an input to the object mask pruning task.\nThe details of the above procedure are as follows. We begin with an initial baseline segmentation\nwhose remaining errors are assumed to be sparsely distributed. During the error correction phase,\nwe iteratively update a segmentation represented as the connected components of a graph G whose\nvertices are segments in a strict over-segmentation (henceforth called supervoxels). We also maintain\nthe combined error map associated with the current segmentation. We binarize the error map by\nthresholding it at 0.25.\nNow we iteratively choose a location (cid:96) = (x, y, z) which has value 1 in the binarized combined\nerror map. In a Px \u00d7 Py \u00d7 Pz window centred on (cid:96), we prepare an input for the error corrector\nby taking the union of all segments containing at least one white voxel in the error map. The error\ncorrection network produces from this input a binary image M representing the object containing\nthe central voxel. For each supervoxel S touching the central Px/2 \u00d7 Py/2 \u00d7 Pz/2 window, let\nM (S) denote the average value of M inside S. If M (S) (cid:54)\u2208 [0.1, 0.9] for all S in the relevant window\n(i.e. the error corrector is con\ufb01dent in its prediction for each supervoxel), we add to G a clique on\n{S | M (S) > 0.9} and delete from G all edges between {S | M (S) < 0.1} and {S | M (S) > 0.9}.\nThe effect of these updates is to change G to locally agree with M. Finally, we update the combined\nerror map by applying the error detector at all locations where its decision could have changed.\nWe iterate until every location is zero in the error map or has been covered by a window at least t = 2\ntimes by the error corrector. This stopping criterion guarantees that the algorithm terminates. In\npractice, the segmentation converges without this auxiliary stopping condition to a state in which\nthe error corrector fails con\ufb01dence threshold everywhere. However, it is hard to certify convergence\nsince it is possible that the error corrector could give different outputs on slightly shifted windows.\nBased on our validation set, increasing t beyond 2 did not measurably improve performance.\nNote that this algorithm deals with split and merge errors, but cannot \ufb01x errors already present at the\nsupervoxel level.\n\n5 Experiments\n\n5.1 Dataset\n\nOur dataset is a sample of mouse primary visual cortex (V1) acquired using serial section transmission\nelectron microscopy at the Allen Institute for Brain Science. The voxel resolution is 3.6 nm\u00d73.6 nm\u00d7\n40 nm.\nHuman experts used the VAST software tool [14, 2] to densely reconstruct multiple volumes that\namounted to 530 Mvoxels of ground truth annotation. These volumes were used to train a neuronal\nboundary detection network (see the appendix for architecture). We applied the resulting boundary\ndetector to a larger volume of size 4800 Mvoxels to produce a preliminary segmentation, which was\nthen proofread by the tracers. This bootstrapped ground truth was used to train the error detector and\ncorrector. A subvolume of size 910 Mvoxels was reserved for validation, and a subvolume of size\n910 Mvoxels was reserved for testing.\nProducing the gold standard segmentation required a total of \u223c 560 tracer hours, while producing the\nbootstrapped ground truth required \u223c 670 tracer hours.\n\n5.2 Baseline segmentation\n\nOur baseline segmentation was produced using a pipeline of multiscale convolutional networks for\nneuronal boundary detection, watershed, and mean af\ufb01nity agglomeration [15]. We describe the\npipeline in detail in the appendix. The segmentation performance values reported for the baseline\nare taken at a mean af\ufb01nity agglomeration threshold of 0.23, which minimizes the variation of\ninformation error metric [17, 20] on the test volumes.\n\n6\n\n\f5.3 Training procedures\n\nSampling procedure Here we describe our procedure for choosing a random point location in a\nsegmentation. Uniformly random sampling is unsatisfactory since large objects such as dendritic\nshafts will be overrepresented. Instead, given a segmentation, we sample a location (x, y, z) with\nprobability inversely proportional to the fraction of a window of size 128 \u00d7 128 \u00d7 16 centred at\n(x, y, z) which is occupied by the object containing the central voxel.\n\nTraining of error detector An initial segmentation containing errors was produced using our\nbaseline neuronal boundary detector combined with mean af\ufb01nity agglomeration at a threshold of 0.3.\nPoint locations were sampled according to the sampling procedure speci\ufb01ed in 5.3. We augmented\nall of our data with rotations and re\ufb02ections. We used a pixelwise cross-entropy loss.\n\nTraining of error corrector We sampled locations in the ground truth segmentation as in 5.3. At\neach location (cid:96) = (x, y, z), we generated a training example as follows. Let Obj(cid:96) be the ground truth\nobject touching (cid:96). We selected a random subset of the objects in the window centred on (cid:96) including\nObj(cid:96). To be speci\ufb01c, we chose a number p uniformly at random from [0, 1], and then selected each\nsegment in the window with probability p in addition Obj(cid:96). The input at (cid:96) was then a binary mask\nrepresenting the union of the selected objects along with the raw EM image, and the desired output\nwas a binary mask representing only Obj(cid:96). The dataset was augmented with rotations, re\ufb02ections,\nsimulated misalignments and missing sections [15]. We used a pixelwise cross-entropy loss.\nNote that this training procedure uses only the ground truth segmentation and is completely inde-\npendent of the error detector and the baseline segmentation. This convenient property is justi\ufb01ed by\nthe fact that if the error detector is perfect, the error corrector only ever receives as input unions of\ncomplete objects.\n\n5.4 Error detection results\n\nTo measure the quality of error detection, we densely sampled points in our test volume as in 5.3. In\norder to remove ambiguity over the precise location of errors, we \ufb01ltered out points which contained\nan error within a surrounding window of size 80\u00d7 80\u00d7 8 but not a window of size 40\u00d7 40\u00d7 4. These\nlocations were all unique, in that two locations in the same object were separated by at least 80, 80, 8\nin x, y, z, respectively. Precision and recall simultaneously exceed 90% (see Figure 6). Empirically,\nmany of the false positive examples come where a dendritic spine head curls back and touches its\ntrunk. These examples locally appear to be incorrectly merged objects.\nWe trained one error detector with access to the raw image and one without. The network\u2019s admirable\nperformance even without access to the image as seen in Figure 6 supports our hypothesis that error\ndetection is a relatively easy task and can be performed using only shape cues.\nMerge errors qualitatively appear to be especially easy for the network to detect; an example is shown\nin Figure 7.\n\n5.5 Error correction results\n\nTable 1: Comparing segmentation performance\n\nBaseline\nWithout Advice\nWith Advice\n\nV Imerge\n0.162\n0.130\n0.088\n\nV Isplit Rand Recall Rand Precision\n0.142\n0.057\n0.052\n\n0.952\n0.956\n0.974\n\n0.954\n0.979\n0.980\n\nIn order to demonstrate the importance of error detection to error correction, we ran two experiments:\none in which the binary mask input to the error corrector was simply the union of all segments in the\nwindow (\u201cwithout advice\u201d), and one in which the binary mask was the union of all segments with\na detected error (\u201cwith advice\u201d). In the \u201cwithout advice\u201d mode, the network is essentially asked to\nreconstruct the object overlapping the central voxel in one shot. Table 1 shows that advice confers a\nconsiderable advantage in performance on the error corrector.\n\n7\n\n\fFigure 6: Precision and recall for error detection, both with and without access to the raw image. In\nthe test volume, there are 8248 error free locations and 944 locations with errors. In practice, we use\nthreshold which guarantees > 95% recall and > 85% precision.\n\nFigure 7: An example of a detected error. The\nright shows two incorrectly merged axons,\nand the left shows the predicted combined\nerror map (de\ufb01ned in 2.1) overlaid on the cor-\nresponding 2D image in red.\n\nFigure 8: A dif\ufb01cult location with missing\ndata in one section combined with a misalign-\nment between sections. The error-correcting\nnet was able to trace across the missing data.\n\nIt is sometimes dif\ufb01cult to assess the signi\ufb01cance of an improvement in variation of information or\nrand score since changes can be dominated by modi\ufb01cations to a few large objects. Therefore, we\ndecomposed the variation of information into a score for each object in the ground truth. Figure 9\nsummarizes the cumulative distribution of the values of V I(i) = V Imerge(i) + V Isplit(i) for all\nsegments i in the ground truth. See the appendix for a precise de\ufb01nition of V I(i).\nThe number of errors from the set in Sec. 5.4 that were \ufb01xed or introduced by our iterative re\ufb01nement\nprocedure is shown in 2. These numbers should be taken with a grain of salt since topologically\ninsigni\ufb01cant changes could count as errors. Regardless, it is clear that our iterative re\ufb01nement\nprocedure \ufb01xed a signi\ufb01cant fraction of the remaining errors and that \u201cadvice\u201d improves the error\ncorrector.\nThe results are qualitatively impressive as well. The error-correcting network is sometimes able to\ncorrectly merge disconnected objects, for example in Figure 8.\n\nTable 2: Number of errors \ufb01xed and introduced relative to the baseline\n\n# Errors\n\n# Errors \ufb01xed\n\n# Errors introduced\n\nBaseline\nWithout Advice\nWith Advice\n\n944\n474\n305\n\n-\n547\n707\n\n-\n77\n68\n\n8\n\nPrecision0.500.900.951.00Obj + raw imageObj onlyExperiment0.500.900.951.00Recall\fFigure 9: Per-object VI scores for the 940 reconstructed objects in our test volume. Almost 800\nobjects are completely error free in our segmentation. These objects are likely all axons; almost every\ndendrite is missing a few spines.\n\n5.6 Computational cost analysis\n\nTable 3 shows the computational cost of the most expensive parts of our segmentation pipeline.\nBoundary detection and error detection are run on the entire image, while error correction is run on\nroughly 10% of the possible locations in the image. Error correction is still the most costly step, but\nit would be 10\u00d7 more costly without restricting to the locations found by the error detection network.\nTherefore, the cost of error detection is more than justi\ufb01ed by the subsequent savings during the\nerror correction phase. The number of locations requiring error correction will fall even further if the\nprecision of the error detector increases or the error rate of the initial segmentation decreases.\n\nTable 3: Computation time for a 2048 \u00d7 2048 \u00d7 256 volume using a single TitanX Pascal GPU\n\nBoundary Detection\nError Detection\nError Correction\n\n18 mins\n25 mins\n55 mins\n\n6 Conclusion and future directions\n\nWe have developed a error detector for the neuronal segmentation problem and combined it with\nan error correction module. In particular, we have shown that our error detectors are able to exploit\npriors on neuron shape, having reasonable performance even without access to the raw image. We\nhave made signi\ufb01cant savings in computation by applying expensive error correction procedures only\nwhere predicted necessary by the error detector. Finally, we have demonstrated that the \u201cadvice\u201d of\nerror detection improves an error correction module, improving segmentation performance upon our\nbaseline.\nWe expect that signi\ufb01cant improvements in the accuracy of error detection could come from aggressive\ndata augmentation. We can mutilate a ground truth segmentation in arbitrary (or even adversarial)\nways to produce unlimited examples of errors.\nAn error detection module has many potential uses beyond the ones presented here. For example,\nwe could use error detection to direct ground truth annotation effort toward mistakes. If suf\ufb01ciently\naccurate, it could also be used directly as a learning signal for segmentation algorithms on unlabelled\ndata. The idea of co-training our error-correction and error-detection networks is natural in view of\nrecent work on generative adversarial networks [19, 16].\n\n9\n\nVI (nats)0123BaselineWith adviceWithout adviceMethod050010001500Cumulative # of objects\fAuthor contributions and acknowledgements\n\nJZ conceptualized the study and conducted most of the experiments and evaluation. IT (along with\nWill Silversmith) created much of the infrastructure necessary for visualization and running our\nalgorithms at scale. KL produced the baseline segmentation. HSS helped with the writing.\nWe are grateful to Clay Reid, Nuno da Costa, Agnes Bodor, Adam Bleckert, Dan Bumbarger, Derrick\nBritain, JoAnn Buchannan, and Marc Takeno for acquiring the TEM dataset at the Allen Institute for\nBrain Science. The ground truth annotation was created by Ben Silverman, Merlin Moore, Sarah\nMorejohn, Selden Koolman, Ryan Willie, Kyle Willie, and Harrison MacGowan. We thank Nico\nKemnitz for proofreading a draft of this paper. We thank Jeremy Maitin-Shepard at Google and the\nother contributors to the neuroglancer project for creating an invaluable visualization tool.\nWe acknowledge NVIDIA Corporation for providing us with early access to Titan X Pascal GPU\nused in this research, and Amazon for assistance through an AWS Research Grant. This research\nwas supported by the Mathers Foundation, the Samsung Scholarship and the Intelligence Advanced\nResearch Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DoI/IBC)\ncontract number D16PC0005. The U.S. Government is authorized to reproduce and distribute reprints\nfor Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views\nand conclusions contained herein are those of the authors and should not be interpreted as necessarily\nrepresenting the of\ufb01cial policies or endorsements, either expressed or implied, of IARPA, DoI/IBC,\nor the U.S. Government.\n\nReferences\n[1] Thorsten Beier, Constantin Pape, Nasim Rahaman, Timo Prange, Stuart Berg, Davi D Bock,\nAlbert Cardona, Graham W Knott, Stephen M Plaza, Louis K Scheffer, et al. Multicut brings\nautomated neurite segmentation closer to human performance. Nature Methods, 14(2):101\u2013102,\n2017.\n\n[2] Daniel Berger. VAST Lite. URL https://software.rc.fas.harvard.edu/lichtman/\n\nvast/.\n\n[3] Bert De Brabandere, Davy Neven, and Luc Van Gool. Semantic instance segmentation with a\ndiscriminative loss function. CoRR, abs/1708.02551, 2017. URL http://arxiv.org/abs/\n1708.02551.\n\n[4] Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama,\nand Kevin P. Murphy. Semantic instance segmentation via deep metric learning. CoRR,\nabs/1703.10277, 2017. URL http://arxiv.org/abs/1703.10277.\n\n[5] Damien Fourure, R\u00e9mi Emonet, \u00c9lisa Fromont, Damien Muselet, Alain Tr\u00e9meau, and Christian\nWolf. Residual conv-deconv grid network for semantic segmentation. CoRR, abs/1707.07958,\n2017. URL http://arxiv.org/abs/1707.07958.\n\n[6] Jan Funke, Fabian David Tschopp, William Grisaitis, Chandan Singh, Stephan Saalfeld, and\nSrinivas C Turaga. A deep structured learning approach towards automating connectome\nreconstruction from 3d electron micrographs. arXiv preprint arXiv:1709.02974, 2017.\n\n[7] Adam W. Harley, Konstantinos G. Derpanis, and Iasonas Kokkinos. Learning dense con-\nvolutional embeddings for semantic segmentation. CoRR, abs/1511.04377, 2015. URL\nhttp://arxiv.org/abs/1511.04377.\n\n[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image\n\nrecognition. CoRR, abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.\n\n[9] Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Wein-\nberger. Multi-scale dense convolutional networks for ef\ufb01cient prediction. CoRR, abs/1703.09844,\n2017. URL http://arxiv.org/abs/1703.09844.\n\n[10] Viren Jain, Joseph F Murray, Fabian Roth, Srinivas Turaga, Valentin Zhigulin, Kevin L Brig-\ngman, Moritz N Helmstaedter, Winfried Denk, and H Sebastian Seung. Supervised learning of\nimage restoration with convolutional networks. In Computer Vision, 2007. ICCV 2007. IEEE\n11th International Conference on, pages 1\u20138. IEEE, 2007.\n\n[11] Viren Jain, Srinivas C. Turaga, Kevin L. Briggman, Moritz Helmstaedter, Winfried Denk, and\n\nH. Sebastian Seung. Learning to agglomerate superpixel hierarchies. In NIPS, 2011.\n\n10\n\n\f[12] Micha\u0142 Januszewski, J\u00f6rgen Kornfeld, Peter H Li, Art Pope, Tim Blakely, Larry Lindsey,\nJeremy B Maitin-Shepard, Mike Tyka, Winfried Denk, and Viren Jain. High-precision automated\nreconstruction of neurons with \ufb02ood-\ufb01lling networks. bioRxiv, page 200675, 2017.\n\n[13] Micha\u0142 Januszewski, Jeremy Maitin-Shepard, Peter Li, J\u00f6rgen Kornfeld, Winfried Denk, and\nViren Jain. Flood-\ufb01lling networks, Nov 2016. URL https://arxiv.org/abs/1611.00421.\n[14] Narayanan Kasthuri, Kenneth Jeffrey Hayworth, Daniel Raimund Berger, Richard Lee Schalek,\nJos\u00e9 Angel Conchello, Seymour Knowles-Barley, Dongil Lee, Amelio V\u00e1zquez-Reina, Verena\nKaynig, Thouis Raymond Jones, et al. Saturated reconstruction of a volume of neocortex. Cell,\n162(3):648\u2013661, 2015.\n\n[15] Kisuk Lee, Jonathan Zung, Peter Li, Viren Jain, and H. Sebastian Seung. Superhuman accuracy\non the SNEMI3D connectomics challenge. CoRR, abs/1706.00120, 2017. URL http://arxiv.\norg/abs/1706.00120.\n\n[16] Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. Semantic segmentation\nusing adversarial networks. CoRR, abs/1611.08408, 2016. URL http://arxiv.org/abs/\n1611.08408.\n\n[17] Marina Meil\u02d8a. Comparing clusterings\u2014an information based distance. Journal of Mul-\ntivariate Analysis, 98(5):873 \u2013 895, 2007.\nISSN 0047-259X. doi: http://dx.doi.org/10.\n1016/j.jmva.2006.11.013. URL http://www.sciencedirect.com/science/article/\npii/S0047259X06002016.\n\n[18] Yaron Meirovitch, Alexander Matveev, Hayk Saribekyan, David Budden, David Rol-\nnick, Gergely Odor, Seymour Knowles-Barley, Thouis Raymond Jones, Hanspeter P\ufb01ster,\nJeff William Lichtman, and Nir Shavit. A multi-pass approach to large-scale connectomics,\n2016.\n\n[19] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR,\n\nabs/1411.1784, 2014. URL http://arxiv.org/abs/1411.1784.\n\n[20] Juan Nunez-Iglesias, Ryan Kennedy, Tou\ufb01q Parag, Jianbo Shi, and Dmitri B. Chklovskii.\nMachine Learning of Hierarchical Clustering to Segment 2D and 3D Images. PLOS ONE, 8\n(8):1\u201311, 08 2013. doi: 10.1371/journal.pone.0071715. URL https://doi.org/10.1371/\njournal.pone.0071715.\n\n[21] Juan Nunez-Iglesias, Ryan Kennedy, Stephen M. Plaza, Anirban Chakraborty, and William T.\nKatz. Graph-based active learning of agglomeration (gala): a python library to segment\n2d and 3d neuroimages, 2014. URL https://www.ncbi.nlm.nih.gov/pmc/articles/\nPMC3983515/.\n\n[22] Pedro O. Pinheiro, Ronan Collobert, and Piotr Doll\u00e1r. Learning to segment object candidates.\nIn Proceedings of the 28th International Conference on Neural Information Processing Systems\n- Volume 2, NIPS\u201915, pages 1990\u20131998, Cambridge, MA, USA, 2015. MIT Press. URL\nhttp://dl.acm.org/citation.cfm?id=2969442.2969462.\n\n[23] Mengye Ren and Richard S. Zemel. End-to-end instance segmentation and counting with\nrecurrent attention. CoRR, abs/1605.09410, 2016. URL http://arxiv.org/abs/1605.\n09410.\n\n[24] David Rolnick, Yaron Meirovitch, Tou\ufb01q Parag, Hanspeter P\ufb01ster, Viren Jain, Jeff W. Lichtman,\nEdward S. Boyden, and Nir Shavit. Morphological error detection in 3d segmentations. CoRR,\nabs/1705.10882, 2017. URL http://arxiv.org/abs/1705.10882.\n\n[25] Bernardino Romera-Paredes and Philip H. S. Torr. Recurrent instance segmentation. CoRR,\n\nabs/1511.08250, 2015. URL http://arxiv.org/abs/1511.08250.\n\n[26] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for\nbiomedical image segmentation, May 2015. URL https://arxiv.org/abs/1505.04597.\n[27] Shreyas Saxena and Jakob Verbeek. Convolutional Neural Fabrics. CoRR, abs/1606.02492,\n\n2016. URL http://arxiv.org/abs/1606.02492.\n\n[28] Helene Schmidt, Anjali Gour, Jakob Straehle, Kevin M Boergens, Michael Brecht, and Moritz\nHelmstaedter. Axonal synapse sorting in medial entorhinal cortex. Nature, 549(7673):469,\n2017.\n\n11\n\n\f[29] S C Turaga, J F Murray, V Jain, F Roth, M Helmstaedter, K Briggman, W Denk, and H S Seung.\nConvolutional networks can learn to generate af\ufb01nity graphs for image segmentation., Feb 2010.\nURL https://www.ncbi.nlm.nih.gov/pubmed/19922289.\n\n[30] John G White, Eileen Southgate, J Nichol Thomson, and Sydney Brenner. The structure of the\nnervous system of the nematode caenorhabditis elegans: the mind of a worm. Phil. Trans. R.\nSoc. Lond, 314:1\u2013340, 1986.\n\n[31] Tao Zeng, Bian Wu, and Shuiwang Ji. Deepem3d: approaching human-level performance on 3d\nanisotropic em image segmentation. Bioinformatics, 33(16):2555\u20132562, 2017. doi: 10.1093/\nbioinformatics/btx188. URL +http://dx.doi.org/10.1093/bioinformatics/btx188.\n\n12\n\n\f", "award": [], "sourceid": 3428, "authors": [{"given_name": "Jonathan", "family_name": "Zung", "institution": "Princeton University"}, {"given_name": "Ignacio", "family_name": "Tartavull", "institution": "Princeton Universitiy"}, {"given_name": "Kisuk", "family_name": "Lee", "institution": "MIT"}, {"given_name": "H. Sebastian", "family_name": "Seung", "institution": "Princeton University"}]}