{"title": "Information Regularization with Partially Labeled Data", "book": "Advances in Neural Information Processing Systems", "page_first": 1049, "page_last": 1056, "abstract": null, "full_text": "Information Regularization with Partially\n\nLabeled Data\n\nMartin Szummer\n\nMIT AI Lab & CBCL\nCambridge, MA 02139\nszummer@ai.mit.edu\n\nTommi Jaakkola\n\nMIT AI Lab\n\nCambridge, MA 02139\n\ntommi@ai.mit.edu\n\nAbstract\n\nClassi\ufb01cation with partially labeled data requires using a large number\nof unlabeled examples (or an estimated marginal P (x)), to further con-\nstrain the conditional P (yjx) beyond a few available labeled examples.\nWe formulate a regularization approach to linking the marginal and the\nconditional in a general way. The regularization penalty measures the\ninformation that is implied about the labels over covering regions. No\nparametric assumptions are required and the approach remains tractable\neven for continuous marginal densities P (x). We develop algorithms for\nsolving the regularization problem for \ufb01nite covers, establish a limiting\ndifferential equation, and exemplify the behavior of the new regulariza-\ntion approach in simple cases.\n\n1 Introduction\n\nMany modern classi\ufb01cation problems are rife with unlabeled examples. To bene\ufb01t from\nsuch examples, we must exploit either implicitly or explicitly the link between the marginal\ndensity P (x) over examples x and the conditional P (yjx) representing the decision bound-\nary for the labels y. High density regions or clusters in the data, for example, can be ex-\npected to fall solely in one or another class.\n\nMost discriminative methods do not attempt to explicitly model or incorporate information\nfrom the marginal density P (x). However, many discriminative algorithms such as SVMs\nexploit the notion of margin that effectively relates P (x) to P (yjx); the decision boundary\nis biased to fall preferentially in low density regions of P (x) so that only a few points fall\nwithin the margin band.\n\nThe assumptions relating P (x) to P (yjx) are seldom made explicit. In this paper we appeal\nto information theory to explicitly constrain P (yjx) on the basis of P (x) in a regularization\nframework. The idea is in broad terms related to a number of previous approaches including\nmaximum entropy discrimination [1], data clustering by information bottleneck [2], and\nminimum entropy data partitioning [3]. See also [4].\n\n\f+\n+\n\n+\n+\n\n+\n+\n\n+\n+\n+\n+\n+\n\n+\n\n+\n\n+\n\n+\n+\n\u2013\n\n+\n\n+\n\n+\n\n\u2013\n\u2013\n\u2013\n\n\u2013\n\n+\n\n+\n\n\u2013\n+\n\u2013\n\nI(x; y)                   0                           0.65                           1                              1\n\nFigure 1: Mutual information I(x; y) measured in bits for four regions with different con-\n\ufb01gurations of labels y= f+,-g. The marginal P (x) is discrete and uniform across the points.\nThe mutual information is low when the labels are homogenous in the region, and high\nwhen labels vary. The mutual information is invariant to the spatial con\ufb01guration of points\nwithin the neighborhood.\n\n2 Information Regularization\n\nWe begin by showing how to regularize a small region of the domain X . We will subse-\nquently cover the domain (or any chosen subset) with multiple small regions, and describe\ncriteria that ensure regularization of the whole domain on the basis of the individual re-\ngions.\n\n2.1 Regularizing a Single Region\n\nConsider a small contiguous region Q in the domain X (e.g., an (cid:15)-ball). We will regularize\nthe conditional probability P (yjx) by penalizing the amount of information the condition-\nals imply about the labels within the region.\n\nThe regularizer is a function of both P (yjx) and P (x), and will penalize changes in P (yjx)\nmore in regions with high P (x). Let L be the set of labeled points (size NL) and L [ U\nbe the set of labeled and unlabeled points (size NLU ). The marginal P (x) is assumed to\nbe given, and may be available directly in terms of a continuous density, or as an empirical\n\ndensity P (x) = 1=NLU (cid:1)Pi2L[U (cid:14)(x (cid:0) xi) corresponding to a set of points fxig that\n\nmay not have labels ((cid:14)((cid:1)) is the Dirac delta function integrating to 1).\nAs a measure of information, we employ mutual information [5], which is the average\nnumber of bits that x contains about the label in region Q (see Figure 1.) The measure\ndepends both on the marginal density P (x) (speci\ufb01cally its restriction to x 2 Q namely\n\nP (xjQ) = P (x)=RQ P (x) dx) and the conditional P (yjx). Equivalently, we can interpret\n\nmutual information as a measure of disagreement among P (yjx), x 2 Q. The measure is\nzero for any constant P (yjx). More precisely, the mutual information in region Q is\n\nIQ(x; y) = Xy Zx2Q\n\nP (xjQ)P (yjx) log\n\nP (yjx)\nP (yjQ)\n\ndx;\n\n(1)\n\nwhere P (yjQ) = Rx2Q P (xjQ)P (yjx) dx: The densities conditioned on Q are normalized\n\nto integrate to 1 within the region Q. Note that the mutual information is invariant to\npermutations of the elements of X within Q, which suggests that the regions must be small\nenough to preserve locality.\n\nThe regularization penalty has to further scale with the number of points in the region (or\nthe probability mass). We introduce the following regularization principle:\n\n\fInformation regularization\npenalize (MQ=VQ) (cid:1) IQ(x; y), which is the information about the labels\nwithin a local region Q, weighted by the overall probability mass MQ in\nthe region, and normalized by a measure of variability VQ (variance) of\nx in the region.\n\nHere MQ = Rx2Q P (x) dx. The mutual information IQ(x; y) measures the information\n\nper point, and to obtain the total mutual information contained in a region, we must multi-\nply by the probability mass MQ. The regularization will be stronger in regions with high\nP (x).\nVQ is a measure of variance of x restricted to the region, and is introduced to remove\noverall dependence on the size of the region. In one dimension, VQ = var(xjQ). When the\nregion is small, then the marginal will be close to uniform over the region and VQ / R2,\nwhere R is, e.g., the radius for spherical regions. We omit here the analysis of the d-\ndimensional case and only note that we may choose VQ = tr (cid:6)Q, where the covariance\n\n(cid:6)Q = Rx2Q(x (cid:0) EQ(x))(x (cid:0) EQ(x))T P (xjQ) dx. The choice of VQ is based on the\n\nlimiting argument discussed next.\n\n2.2 Limiting Behavior for Vanishing Size Regions\n\nWhen the size of the region is scaled down, the mutual information will go to zero for any\ncontinuous P (yjx). We derive here the appropriate regularization penalty in the limit of\nvanishing regions. For simplicity, we only consider the one-dimensional case.\n\nWithin a small region Q we can (under mild continuity assumptions) approximate P (yjx)\nby a Taylor expansion around the mean point x0 2 Q, obtaining P (yjQ) (cid:25) P (yjx0) to\n\ufb01rst order. By using log(1 + z) (cid:25) z (cid:0) z2=2 and substituting the approximate P (yjx) and\nP (yjQ) into IQ(x; y), we get the following \ufb01rst order expression for mutual information:\n\nP (yjx0)\n\nd log P (yjx)\n\ndx\n\n2\n\nx0\n\n(2)\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\n}\n\nIQ(x; y) =\n\n1\n2\n\nvar(xjQ)\n\nsize-dependentXy\n|\n|\n\n{z\n\n}\n\nsize-independent\n\n{z\n\nvar(xjQ) is dependent on the size (and more generally shape) of region Q while the re-\nmaining parts are independent of the size (and shape). The regularization penalty should\nnot scale with the resolution at which we penalize information and we thus divide out the\nsize-dependent part.\n\nThe size-independent part is the Fisher information [5], where we think of P (yjx) as pa-\nrameterized by x. The expression d log P (yjx)=dx is known as the Fisher score.\n\n2.3 Regularizing the Domain\n\nWe want to regularize the conditional P (yjx) across the domain X (or any subset of inter-\nest). Since individual regions must be relatively small to preserve locality, we need multiple\nregions to cover the domain. The cover is the set C of these regions. Since the regularization\npenalty is assigned to each region, the regions must overlap to ensure that the conditionals\nin different regions become functionally dependent. See Figure 2.\n\nIn general all areas with signi\ufb01cant marginal density P (x) should be included in the cover\nor will not be regularized (areas of zero marginal need not be considered). The cover should\ngenerally be connected (with respect to neighborhood relations of the regions) so that la-\nbeled points have potential to in\ufb02uence all conditionals. The amount of overlap between\nany two regions in the cover determines how strongly the corresponding conditionals are\n\n\ftied to each other. On the other hand, the regions should be small to preserve locality.\nThe limit of a large number of small overlapping regions can be de\ufb01ned, and we ensure\ncontinuity of P (yjx) when the offset between regions vanishes relative to their size (in all\ndimensions).\n\n3 Classi\ufb01cation with Information Regularization\n\nInformation regularization across multiple regions can be performed, for example, by\nminimizing the maximum information per region, subject to correct classi\ufb01cation of the\nlabeled points. Speci\ufb01cally, we constrain each region in the cover (Q 2 C) to carry at most\n(cid:13) units of information.\n\nmin\n\nP (yjxk); (cid:13)\ns.t.\n\n(cid:13)\n\n(MQ=VQ) (cid:1) IQ(x; y) (cid:20) (cid:13)\nP (yjxk) = (cid:14)(y; ~yk)\n\n8Q 2 C\n8k 2 L\n\n(3a)\n\n(3b)\n(3c)\n(3d)\n\n0 (cid:20) P (yjxk) (cid:20) 1; Py P (yjxk) = 1 8k 2 L [ U; 8y:\n\nWe have incorporated the labeled points by constraining their conditionals to the observed\nvalues (eq. 3c) (see below for other ways of incorporating labeled information). The\nsolution P (yjx) to this optimization problem is unique in regions that achieve the\ninformation constraint with equality (as long as P (x) > 0). (Uniqueness follows from the\nstrict convexity of mutual information as a function of P (yjx) for nonzero P (x)).\nDe\ufb01ne an atomic subregion as a non-empty intersection of regions that cannot be further\nintersected by any region (Figure 2). All unlabeled points in an atomic subregion belong\nto the same set of regions, and therefore participate in exactly the same constraints. They\nwill be regularized the same way, and since mutual information is a convex function, it will\nbe minimized when the conditionals P (yjx) are equal in the atomic subregion. We can\ntherefore parsimoniously represent conditionals of atomic subregions, instead of individual\npoints, merely by treating such atomic subregions as \u201cmerged points\u201d and weighting the\nassociated constraint by the probability mass contained in the subregion.\n\n3.1 Incorporating Noisy Labels\n\nLabeled points participate in the information regularization in the same way as unlabeled\npoints. However, their conditionals have additional constraints, which incorporate the label\ninformation. In equation 3c we used the constraint P (yjxk) = (cid:14)(y; ~yk) for all labeled\npoints. This constraint does not permit noise in the labels (and cannot be used when two\npoints at the same location have disagreeing labels.) Alternatively, we can apply either of\nthe constraints\n\n(fix-lbl): P (yjxi) = (1 (cid:0) b)(cid:14)(y;~yi)b1(cid:0)(cid:14)(y;~yi);\n(exp-lbl): EP (i)[P (~yijxi)] (cid:21) 1 (cid:0) b.\n\n8i 2 L\n\nThe expectation is over the labeled set L,\nwhere P (i) = 1=NL.\n\nThe parameter b 2 [0; 0:5) models the amount of label noise, and is determined from prior\nknowledge or can be optimized via cross-validation.\n\nConstraint (fix-lbl) is written out for the binary case for simplicity. The conditionals\nof the labeled points are directly determined by their labels, and are treated as \ufb01xed con-\nstants. Since b < 0:5, the thresholded conditional classi\ufb01es labeled points in the observed\nclass. In constraint (exp-lbl), the conditionals for labeled points can have an average\n\n\ferror at most b, where the averaged is over all labeled points. Thus, a few points may have\nconditionals that deviate signi\ufb01cantly from their observed labels, giving robustness against\nmislabeled points and outliers.\n\nTo obtain classi\ufb01cation decisions, we simply choose the class with the maximum posterior\nyk = argmaxy P (yjxk). Working with binary valued P (yjx) 2 0; 1 directly would yield a\nmore dif\ufb01cult combinatorial optimization problem.\n\n3.2 Continuous Densities\n\nInformation regularization is also computationally feasible for continuous marginal den-\nsities, known or estimated. For example, we may be given a continuous unlabeled data\ndistribution P (x) and a few discrete labeled points, and regularize across a \ufb01nite set of\ncovering regions. The conditionals are uniform inside atomic subregions (except at labeled\npoints), requiring estimates of only a \ufb01nite number of conditionals.\n\n3.3 Implementation\n\nFirstly, we choose appropriate regions forming a cover, and \ufb01nd the atomic subregions.\nThe choices differ depending on whether the data is all discrete or whether continuous\nmarginals P (x) are given. Secondly, we perform a constrained optimization to \ufb01nd the\nconditionals.\n\nIf the data is all discrete, create a spherical region centered at every labeled and unlabeled\npoint (or over some reduced set still covering all the points). We have used regions of \ufb01xed\nradius R, but the radius could also be set adaptively at each point to the distance of its K-\nnearest neighbor. The union of such regions is our cover, and we choose the radius R (or\nK) large enough to create a connected cover. The cover induces a set of atomic subregions,\nand we merge the parameters P (yjx) of points inside individual atomic subregions (atomic\nsubregions with no observed points can be ignored). The marginal of each atomic subregion\nis proportional to the number of (merged) points it contains.\n\nIf continuous marginals are given, they will put probability mass in all atomic subregions\nwhere the marginal is non-zero. To avoid considering an exponential number of subregions,\nwe can limit the overlap between the regions by creating a sparser cover.\n\nGiven the cover, we now regularize the conditionals P (yjx) in the regions, according to\neq. 3a. This is a convex minimization problem with a global minimum, since mutual in-\nformation is convex in P (yjx). It can be solved directly in the given primal form, using a\nquasi-Newton BFGS method. For eq. 3a, the required gradients of the constraints for the\nbinary class (y = f(cid:6)1g) case (region Q, atomic subregion r) are:\n\nMQ\nVQ\n\ndIQ(x; y)\n\ndP (y = 1jxr)\n\n=\n\nMQ\nVQ\n\nP (xrjQ)(cid:18)log\n\nP (y = 1jxr)\nP (y = (cid:0)1jxr)\n\nP (y = (cid:0)1jQ)\n\nP (y = 1jQ) (cid:19) :\n\n(4)\n\nThe Matlab BFGS implementation fmincon can solve 100 subregion problems in a few\nminutes.\n\n3.4 Minimize Average Information\n\nAn alternative regularization criterion minimizes the average mutual information across\nregions. When calculating the average, we must correct for the overlaps of intersecting\nregions to avoid doublecounting (in contrast, the previous regularization criterion (eq. 3b)\navoided doublecounting by restricting information in each region individually). The in\ufb02u-\nence of a region is proportional to the probability mass MQ contained in it. However, a\npoint x may belong to N (x) regions. We de\ufb01ne an adjusted density P (cid:3)(x) = P (x)=N (x)\n\n\fto calculate an adjusted probability mass M (cid:3)\nminimize average mutual information according to\n\nQ which discounts overlap. We can then\n\nmin\n\nP (yjxk)XQ\n\nM (cid:3)\nQ\nVQ\n\nIQ(x; y)\n\ns.t. P (yjxk) = (cid:14)(y; ~yk)\n\n8k 2 L\n\nwith similar necessary adjustments to incorporate noisy labels.\n\n0 (cid:20) P (yjxk) (cid:20) 1; Py P (yjxk) = 1 8k 2 L [ U; 8y:\n\n(5a)\n\n(5b)\n(5c)\n\n3.4.1 Limiting Behavior\n\nThe above average information criterion is a discrete version of a continuous regularization\ncriterion. In the limit of a large number of small regions in the cover (where the spacing of\nthe regions vanishes relative to their size), we obtain a well-de\ufb01ned regularization criterion\nresulting in continuous P (yjx):\n\nmin\n\nP (yjx) s.t.\n\nP (~ykjxk)=(cid:14)(y;~yk) 8k2L\n\nZ Xy\n\nP (x0)P (yjx0)\n\nd log P (yjx)\n\ndx\n\n2\n\nx0\n\ndx0:\n\n(6)\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\nThe regularizer can also be seen as the average Fisher information (see section 2.2). More\ngenerally, we can formulate the regularization problem as a Tikhonov regularization, where\nthe loss is the negative log-probability of labels:\n\nmin\nP (yjx)\n\n1\n\nNL Xk2L\n\n(cid:0) log P (~ykjxk) + (cid:21)Z Xy\n\nP (x0)P (yjx0)\n\nd log P (yjx)\n\ndx\n\n3.4.2 Differential Equation Characterizing the Solution\n\ndx0:\n\n(7)\n\n2\n\nx0\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\nmin\n\nThe optimization problem (eq. 6) can be solved using calculus of variations. Consider the\none-dimensional binary class case and write the problem as\n\nP (y=1jx)R f(cid:0)x; P (y = 1jx); P 0(y = 1jx)(cid:1) dx where f ((cid:1)) = P (x)P 0(y = 1jx)2=[P (y =\n\n1jx)(1 (cid:0) P (y = 1jx))]. Necessary conditions for the solution P (y = 1jx) are provided by\nthe Euler-Lagrange equations [6]\n\n@f\n\n@P (y = 1jx)\n\n(cid:0)\n\nd\ndx\n\n@f\n\n@P 0(y = 1jx)\n\n= 0 8x:\n\n(8)\n\n(natural boundary conditions apply since we can assume P (x) = 0 and P 0(yjx) = 0 at the\nboundary of the domain X ). After substituting f and simplifying we have\n\nP 00(y = 1jx) =\n\nP 0(y = 1jx)2(1 (cid:0) 2P (y = 1jx))\n2P (y = 1jx)(1 (cid:0) P (y = 1jx))\n\n(cid:0)\n\nP 0(x)P 0(y = 1jx)\n\nP (x)\n\n:\n\n(9)\n\nThis differential equation governs the solution and we solve it numerically. The labeled\npoints provide boundary conditions, e.g. P (y = ~ykjxk) = 1 (cid:0) b for some small \ufb01xed\nb (cid:21) 0. We must search for initial values of P 0(~ykjxk) to match the boundary conditions of\nP (~ykjxk). The solution is continuous and piecewise differentiable.\n\n4 Results and Discussion\n\nWe have experimentally studied the behavior of the regularizer with different marginal den-\nsities P (x). Figure 3 shows the one-dimensional case with a continuous marginal density\n\n\f1\n\n3\n\n2\n\n5\n\n4\n\n6\n\n7\n\n)\nx\n|\ny\n(\nP\n\n \nr\no\ni\nr\ne\n\nt\ns\no\nP\n\n1.6\n\n1.4\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\u22121\n\nP(y|x)\nP(x)\nlabeled points\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\nFigure 2:\nP (yjx) for unlabeled points will be constant in atomic subregions.\n\n(Left) Three intersecting regions, and their atomic subregions (numbered).\n\nFigure 3: (Right) The conditional (solid line) for a continuous marginal P (x) (dotted line)\nconsisting of a mixture of two continuous Gaussian and two labeled points at (x=-0.8,y =-1)\nand (x=0.8,y=1). The row of circles at the top depicts the region structure used (a rendering\nof overlapping one-dimensional intervals.)\n\n1\n\n0.8\n\n)\nx\n|\ny\n(\nP\n\n0.6\n\nP(y|x)\nP(x)\nlabeled points\n\n1\n\n0.8\n\n)\nx\n|\ny\n(\nP\n\n0.6\n\nP(y|x)\nP(x)\nlabeled points\n\n \nr\no\ni\nr\ne\n\nt\ns\no\nP\n\n0.4\n\n \nr\no\ni\nr\ne\n\nt\ns\no\nP\n\n0.4\n\n0.2\n\n0\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n0.2\n\n0\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\nFigure 4: Conditionals (solid lines) for two continuous marginals (dotted lines) plus two\nlabeled points. Left: the marginal is uniform, and the conditional approaches a straight\nline. Right: the marginal is a mixture of two Gaussians (with lower variance and shifted\ncompared to Figure 3.) The conditional changes slowly in regions of high density.\n\n(mixture of two Gaussians), and two discrete labeled points. We choose NQ=40 regions\ncentered at uniform intervals of [(cid:0)1; 1], overlapping each other half-way, creating NQ + 1\natomic subregions. There are two labeled points. We show the solution attained by min-\nimizing the maximum information (eq. 3a), and using the (fix-lbl) constraint with\nlabel noise b = 0:05.\nThe conditional varies smoothly between the labeled points of opposite classes. Note the\ndependence on the marginal density P (x). The conditional is smoother in high-density\nregions, and changes more rapidly in low-density regions, as expected. Figure 4 shows\nmore examples, and Figure 5 illustrates solutions obtained via the differential equation\n(eq. 6).\n\n\f1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\u22122\n\nx \n\nP(y|x) \n\np(x) \n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\nx \n\n2\n\n1\n\n0.9\n\nx \n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\u22122\n\nP(y|x) \n\np(x) \n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\nx \n\n2\n\nFigure 5: Conditionals for two other continuous marginals plus two labeled points (marked\nas crosses and located at x=-1, 2 in the left \ufb01gure and x=-2, 2 in the right), solved via the\ndifferential equation (eq. 6). The conditionals are continuous but non-differentiable at the\ntwo labeled points (marked as crosses).\n\n5 Conclusion\n\nWe have presented an information theoretic regularization framework for combining con-\nditional and marginal densities in a semi-supervised estimation setting. The framework\nadmits both discrete and continuous (known or estimated) densities. The tractability is\nlargely a function of the number of nonempty intersections of chosen covering regions.\n\nThe principle extends beyond the presented scope. It provides \ufb02exible means of tailoring\nthe regularizer to particular needs. The shape and structure of the regions give direct ways\nof imposing relations between particular variables or values of those variables. The regions\ncan be easily de\ufb01ned on low-dimensional data manifolds.\n\nIn future work we will try the regularizer on large high-dimensional datasets and explore\ntheoretical connections to network information theory.\n\nAcknowledgements\n\nThe authors gratefully acknowledge support from Nippon Telegraph & Telephone (NTT) and NSF\nITR grant IIS-0085836. Tommi Jaakkola also acknowledges support from the Sloan Foundation in\nthe form of the Sloan Research Fellowship. Martin Szummer would like to thank Thomas Minka for\nvaluable comments.\n\nReferences\n[1] Tommi Jaakkola, Marina Meila, and Tony Jebara. Maximum entropy discrimination. Technical\n\nReport AITR-1668, Mass. Inst. of Technology AI lab, 1999. http://www.ai.mit.edu/.\n\n[2] Naftali Tishby and Noam Slonim. Data clustering by markovian relaxation and the information\nbottleneck method. In Advances in Neural Information Processing Systems (NIPS), volume 13,\npages 640\u2013646. MIT Press, 2001.\n\n[3] Stephen Roberts, C. Holmes, and D. Denison. Minimum-entropy data partitioning using re-\nIEEE Trans. Pattern Analysis and Mach. Intell.\n\nversible jump Markov chain Monte Carlo.\n(PAMI), 23(8):909\u2013914, 2001.\n\n[4] Matthias Seeger. Input-dependent regularization of conditional density models. Unpublished.\n\nhttp://www.dai.ed.ac.uk/homes/seeger/, 2001.\n\n[5] Thomas Cover and Joy Thomas. Elements of Information Theory. Wiley, 1991.\n[6] Robert Weinstock. Calculus of Variations. Dover, 1974.\n\n\f", "award": [], "sourceid": 2199, "authors": [{"given_name": "Martin", "family_name": "Szummer", "institution": null}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": null}]}