{"title": "Co-Generation with GANs using AIS based HMC", "book": "Advances in Neural Information Processing Systems", "page_first": 5808, "page_last": 5819, "abstract": "Inferring the most likely configuration for a subset of variables of a joint distribution given the remaining ones -- which we refer to as co-generation -- is an important challenge that is computationally demanding for all but the simplest settings. This task has received a considerable amount of attention, particularly for classical ways of modeling distributions like structured prediction. In contrast, almost nothing is known about this task when considering recently proposed techniques for modeling high-dimensional distributions, particularly generative adversarial nets (GANs). Therefore, in this paper, we study the occurring challenges for co-generation with GANs. To address those challenges we develop an annealed importance sampling based Hamiltonian Monte Carlo co-generation algorithm. The presented approach significantly outperforms classical gradient based methods on a synthetic and on the CelebA and LSUN datasets.", "full_text": "Co-Generation with GANs using AIS based HMC\n\nUniversity of Illinois at Urbana-Champaign\n\nUniversity of Illinois at Urbana-Champaign\n\nTiantian Fang\n\ntf6@illinois.edu\n\nAlexander G. Schwing\n\naschwing@illinois.edu\n\nAbstract\n\nInferring the most likely con\ufb01guration for a subset of variables of a joint distribution\ngiven the remaining ones \u2013 which we refer to as co-generation \u2013 is an important\nchallenge that is computationally demanding for all but the simplest settings. This\ntask has received a considerable amount of attention, particularly for classical\nways of modeling distributions like structured prediction. In contrast, almost\nnothing is known about this task when considering recently proposed techniques\nfor modeling high-dimensional distributions, particularly generative adversarial\nnets (GANs). Therefore, in this paper, we study the occurring challenges for\nco-generation with GANs. To address those challenges we develop an annealed\nimportance sampling based Hamiltonian Monte Carlo co-generation algorithm.\nThe presented approach signi\ufb01cantly outperforms classical gradient based methods\non a synthetic and on the CelebA and LSUN datasets. The code is available at\nhttps://github.com/AilsaF/cogen_by_ais.\n\n1\n\nIntroduction\n\nFinding a likely con\ufb01guration for part of the variables of a joint distribution given the remaining ones\nis a computationally challenging problem with many applications in machine learning, computer\nvision and natural language processing.\nClassical structured prediction approaches [39, 72, 75] which explicitly capture correlations over an\noutput space of multiple discrete random variables permit to formulate an energy function restricted\nto the unobserved variables when conditioned on partly observed data. However, in many cases,\nit remains computationally demanding to \ufb01nd the most likely con\ufb01guration or to sample from the\nenergy restricted to the unobserved variables [66, 78].\nAlternatively, to model a joint probability distribution which implicitly captures the correlations,\ngenerative adversarial nets (GANs) [25] and variational auto-encoders (VAEs) [36] evolved as com-\npelling tools which exploit the underlying manifold assumption: a latent \u2018perturbation\u2019 is drawn\nfrom a simple distribution which is subsequently transformed via a deep net (generator/encoder)\nto the output space. Those methods have been used for a plethora of tasks, e.g., for domain trans-\nfer [3, 10], inpainting [61, 83], image-to-image translation [32, 42, 31, 51, 63, 84, 87, 88], machine\ntranslation [11] and health care [67].\nWhile GANs and VAEs permit easy sampling from the entire output space domain, it also remains an\nopen question of how to sample from part of the domain given the remainder? We refer to this task as\nco-generation subsequently.\nCo-generation has been addressed in numerous works. For instance, for image-to-image trans-\nlation [32, 42, 31, 51, 63, 84, 87, 88], mappings between domains are learned directly via an\nencoder-decoder structure. While such a formulation is convenient if we have two clearly separate\ndomains, this mechanism isn\u2019t scaleable if the number of output space partitionings grows, e.g., for\nimage inpainting where missing regions are only speci\ufb01ed at test time.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fTo enable co-generation for a domain unknown at training time, for GANs, optimization based\nalgorithms have been proposed [83, 50]. Intuitively, they aim at \ufb01nding that latent sample that\naccurately matches the observed part. Dinh et al. [16] maximize the log-likelihood of the missing part\ngiven the observed one. However, we \ufb01nd that successful training of a GAN leads to an increasingly\nragged energy landscape, making the search for an appropriate latent variable via back-propagation\nthrough the generator harder and harder until it eventually fails.\nTo deal with this ragged energy landscape for co-generation, we develop an annealed importance\nsampling (AIS) [58] based Hamiltonian Monte Carlo (HMC) algorithm [19, 59]. The proposed\napproach leverages the bene\ufb01ts of AIS, i.e., gradually annealing a complex probability distribution,\nand HMC, i.e., avoiding a localized random walk.\nWe evaluate the proposed approach on synthetic data and imaging data (CelebA and LSUN), showing\ncompelling results via MSE and MSSIM metrics.\n\n2 Related Work\nIn the following, we brie\ufb02y discuss generative adversarial nets before providing background on\nco-generation with adversarial nets.\nGenerative adversarial nets (GANs) [24] have originally been proposed as a non-cooperative two-\nplayer game, pitting a generator against a discriminator. The discriminator is tasked to tell apart real\ndata from samples produced by the generator, while the generator is asked to make differentiation for\nthe discriminator as hard as possible. For a dataset of samples x \u2208 X and random perturbations z\ndrawn from a simple distribution, this intuitive formulation results in the saddle-point objective\n\nmax\n\n\u03b8\n\nmin\n\nw\n\n\u2212Ex[ln Dw(x)] \u2212 Ez[ln(1 \u2212 Dw(G\u03b8(z)))],\n\nwhere G\u03b8 denotes the generator parameterized by \u03b8 and Dw refers to the discriminator parameterized\nby w. The discriminator assesses the probability of its input argument being real data. We let X\ndenote the output space. Subsequently, we refer to this formulation as the \u2018Vanilla GAN,\u2019 and note\nthat its loss is related to the Jensen-Shannon divergence. Many other divergences and distances have\nbeen proposed recently [4, 44, 26, 38, 14, 12, 56, 6, 55, 49, 28, 64] to improve the stability of the\nsaddle-point objective optimization during training and to address mode-collapse, some theoretically\nfounded and others empirically motivated. It is beyond the scope to review all those variants.\nCo-generation, is the task of obtaining a sample for a subset of the output space domain, given\nas input the remainder of the output space domain. This task is useful for applications like image\ninpainting [61, 83] or image-to-image translation [32, 42, 31, 51, 63, 84, 87, 88]. Many formulations\nfor co-generation have been considered in the past. However, few meet the criteria that any given a\nsubset of the output space could be provided to generate the remainder.\nConditional GANs [54] have been used to generate output space objects based on a given input\nsignal [80]. The output space object is typically generated as a whole and, to the best of our\nknowledge, no decomposition into multiple subsets is considered.\nCo-generation is related to multi-modal Boltzmann machines [70, 60], which learn a shared repre-\nsentation for video and audio [60] or image and text [70]. Restricted Boltzmann Machine based\nencoder-decoder architectures are used to reconstruct either video/audio or image/text given one of the\nrepresentations. Co-generation is also related to deep net based joint embedding space learning [37].\nSpeci\ufb01cally, a joint embedding of images and text into a single vector space is demonstrated using\ndeep net encoders. After performing vector operations in the embedding space, a new sentence\ncan be constructed using a decoder. Co-generation is also related to cross-domain image genera-\ntion [85, 62, 18, 3]. Those techniques use an encoder-decoder style deep net to transform rotation\nof faces, to learn the transfer of style properties like rotation and translation to other objects, or to\nencode class, view and transformation parameters into images.\nImage-to-image translation is related in that a transformation between two domains is learned either\nvia an Image Transformation Net or an Encoder-Decoder architecture. Early works in this direction\ntackled supervised image-to-image translation [33, 32, 41, 9, 48, 10] followed by unsupervised vari-\nants [71, 68, 84, 63, 8, 81, 73, 29]. Cycle-consistency was discovered as a convenient regularization\nmechanism in [35, 87, 51, 2] and a distance preserving regularization was shown by Benaim and\nWolf [5]. Disentangling of image representations was investigated recently [31, 42] and ambiguity in\n\n2\n\n\f500th iteration\n\n1500th iteration\n\n2500th iteration\n\n15000th iteration\n\nFigure 1: Vanilla GAN loss in Z space (top) and gradient descent (GD) reconstruction error for 500, 1.5k, 2.5k\nand 15k generator training epochs.\n\nthe task was considered by Zhu et al. [88]. Other losses such as a \u2018triangle formulation\u2019 have been\ninvestigated in [21, 43].\nAttribute transfer [40], analogy learning [27, 62] and many style transfer approaches [74, 7, 79, 13,\n52, 77, 30, 45, 47, 76, 17, 65, 20, 23, 86] just like feature learning via inpainting [61] are also using\nan encoder-decoder formulation, which maps entire samples from one domain to entire samples in\nanother domain.\nCo-generation is at least challenging if not impossible for all the aforementioned works since decoders\nneed to be trained for every subset of the output space domain. This is not scalable unless we know\nahead of time the few distinct subsets of interest.\nHence, to generate arbitrary sub-spaces, other techniques need to be considered. Some applicable\nexceptions from the encoder-decoder style training are work on style transfer by Gatys et al. [22],\nwork on image inpainting by Yeh et al. [83], and coupled generative adversarial nets (CoGANs) by\nLiu and Tuzel [50]. In all three formulations, a loss is optimized to match observations to parts of the\ngenerated data by iteratively computing gradient updates for a latent space sample. In particular, Liu\nand Tuzel [50] learn a joint distribution over multiple domains by coupling multiple generators and\npossibly discriminators via weight-sharing. Liu and Tuzel [50] brie\ufb02y discuss co-generation when\ntalking about \u201ccross-domain image transformation,\u201d report to observe coverage issues and state that\nthey leave a detailed study to \u201cfuture work.\u201d Instead of an optimization based procedure, we propose\nto use an annealed importance sampling based Hamiltonian Monte Carlo approach. We brie\ufb02y review\nboth techniques subsequently.\nAnnealded importance sampling (AIS) [58] is an algorithm typically used to estimate (ratios of)\nthe partition function [82, 69]. Speci\ufb01cally, it gradually approaches the partition function for a\ndistribution of interest by successively re\ufb01ning samples which were initially obtained from a \u2018simple\u2019\ndistribution, e.g., a multivariate Gaussian. Here, we are not interested in the partition function itself,\nbut rather in the ability of AIS to accurately draw samples from complex distributions, which makes\nAIS a great tool for co-generation.\nHamiltonian Monte Carlo (HMC) [19] originally referred to as \u201cHybrid Monte Carlo\u201d united the\nMarkov Chain Monte Carlo [53] technique with molecular dynamics approaches [1]. Early on they\nwere used for neural net models [57] and a seminal review by Neal [59] provides a detailed account.\nIn short, a Hamiltonian combines the potential energy, i.e., the log probability that we are interested\nin sampling from with auxiliary kinetic energy. The latter typically follows a Gaussian distribution.\nHMC alternates updates for the kinetic energy with Metropolis updates computed by following a\ntrajectory of constant value along the Hamiltonian to compute a new proposal. HMC is useful for\nco-generation because of its reduced random-walk behavior as we will explain next.\n\n3 AIS based HMC for Co-Generation\n\nIn the following we \ufb01rst motivate the problem of co-generation before we present an overview of our\nproposed approach and discuss the details of the employed Hamiltonian Monte Carlo method.\n\n3\n\n2.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0020.0020.0020.0030.0030.0030.0030.0040.0040.0050.0050.0060.0060.0070.0070.0090.0090.0100.0100.0130.0130.0150.0150.0180.0180.0220.0220.0270.0270.0270.0320.0320.0390.0390.0460.0460.0560.0560.0670.0670.0810.0810.0980.0980.1180.1180.1420.1710.2060.2480.2980.3590.4330.5212.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0020.0020.0020.0020.0020.0030.0030.0030.0030.0030.0030.0040.0040.0040.0050.0050.0060.0060.0070.0070.0090.0090.0100.0100.0130.0130.0150.0150.0150.0180.0180.0220.0220.0270.0270.0320.0320.0390.0390.0460.0460.0560.0560.0670.0670.0810.0810.0980.0980.1180.1180.1420.1420.1710.1710.2060.2060.2480.2480.2980.2980.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9111.0971.0971.3221.3221.5921.5921.9182.3102.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0020.0020.0030.0030.0040.0050.0050.0060.0060.0070.0070.0090.0090.0100.0100.0130.0130.0150.0150.0180.0180.0220.0220.0270.0270.0320.0320.0390.0390.0460.0460.0560.0560.0670.0670.0810.0810.0980.0980.1180.1180.1420.1420.1710.1710.2060.2060.2480.2480.2980.2980.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.9111.0971.0972.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.911010002000300040005000iteration0.00.20.40.60.81.0loss010002000300040005000iteration0.00.20.40.60.81.0loss010002000300040005000iteration0.00.20.40.60.81.0loss010002000300040005000iteration0.00.20.40.60.81.0loss\fz\u2217 = arg min\n\n(cid:107)xo \u2212 G\u03b8(z)o(cid:107)2\n2,\n\nz\n\n3.1 Motivation\nAssume we are given a well trained generator \u02c6x = G\u03b8(z), parameterized by \u03b8, which is able\nto produce samples \u02c6x from an implicitly modeled distribution pG(x|z) via a transformation of\nembeddings z [25, 4, 46, 14, 15]. Further assume we are given partially observed data xo while the\nremaining part xh of the data x = (xo, xh) is latent, i.e., hidden. Note that during training of the\ngenerator parameters \u03b8 we don\u2019t assume information about which part of the data is missing to be\navailable.\nTo reconstruct the latent parts of the data xh from available observations xo, a program is often\nformulated as follows:\n\n(1)\nwhere G\u03b8(z)o denotes the restriction of the generated sample G\u03b8(z) to the observed part. We focus\non the (cid:96)2 loss here but note that any other function measuring the \ufb01t of xo and G\u03b8(z)o is equally\napplicable. Upon solving the program given in Eq. (1), we easily obtain an estimate for the missing\ndata \u02c6xh = G(z\u2217)h.\nAlthough the program given in Eq. (1) seems rather straightforward, it turns out to be really hard to\nsolve, particularly if the generator G\u03b8(z) is very well trained. To see this, consider as an example a\ngenerator operating on a 2-dimensional latent space z = (z1, z2) and 2-dimensional data x = (x1, x2)\ndrawn from a mixture of \ufb01ve equally weighted Gaussians with a variance of 0.02, the means of which\nare spaced equally on the unit circle. For this example we use h = 1 and let xo = x2 = 0. In the \ufb01rst\nrow of Fig. 1 we illustrate the loss surface of the objective given in Eq. (1) obtained when using a\ngenerator G\u03b8(z) trained on the original 2-dimensional data for 500, 1.5k, 2.5k, and 15k iterations\n(columns in Fig. 1).\nEven in this simple 2-dimensional setting, we observe the latent space to become increasingly ragged,\nexhibiting folds that clearly separate different data regimes. First or second order optimization\ntechniques cannot cope easily with such a loss landscape and likely get trapped in local optima. To\nillustrate this we highlight in Fig. 1 (\ufb01rst row) the trajectory of a sample z optimized via gradient\ndescent (GD) using red color and provide the corresponding loss over the number of GD updates for\nthe objective given in Eq. (1) in Fig. 1 (second row). We observe optimization to get stuck in a local\noptimum as the loss fails to decrease to zero once the generator better captures the data.\nTo prevent those local-optima issues for co-generation, we propose an annealed importance-sampling\n(AIS) based Hamiltonian Monte Carlo (HMC) method in the following.\n3.2 Overview\nIn order to reconstruct the hidden portion xh of the data x = (xo, xh) we are interested in drawing\nsamples \u02c6z such that \u02c6xo = G\u03b8(\u02c6z)o has a high probability under log p(z|xo) \u221d \u2212(cid:107)xo \u2212 G\u03b8(z)o(cid:107)2\n2.\nNote that the proposed approach is not restricted to this log-quadratic posterior p(z|xo) just like the\nobjective in Eq. (1) is not restricted to the (cid:96)2 norm.\nTo obtain samples \u02c6z following the posterior distribution p(z|xo), the sampling-importance-resampling\nframework provides a mechanism which only requires access to samples and doesn\u2019t need computation\nof a normalization constant. Speci\ufb01cally, for sampling-importance-resampling, we \ufb01rst draw latent\npoints z \u223c p(z) from a simple prior distribution p(z), e.g., a Gaussian. We then compute weights\naccording to p(z|xo) in a second step and \ufb01nally resample in a third step from the originally drawn\nset according to the computed weights.\nHowever, sampling-importance-resampling is particularly challenging in even modestly high-\ndimensional settings since many samples are required to adequately cover the space to a reasonable\ndegree. As expected, empirically, we found this procedure to not work very well. To address this\nconcern, here, we propose an annealed importance sampling (AIS) based Hamiltonian Monte Carlo\n(HMC) procedure. Just like sampling-importance-resampling, the proposed approach only requires\naccess to samples and no normalization constant needs to be computed.\nMore speci\ufb01cally, we use annealed importance sampling to gradually approach the complex and\noften high-dimensional posterior distribution p(z|xo) by simulating a Markov Chain starting from\nthe prior distribution p(z) = N (z|0, I), a standard normal distribution with zero mean and unit\nvariance. With the increasing number of updates, we gradually approach the true posterior. Formally,\nwe de\ufb01ne an annealing schedule for the parameter \u03b2t from \u03b20 = 0 to \u03b2T = 1. At every time\nstep t \u2208 {1, . . . , T} we re\ufb01ne the samples drawn at the previous timestep t \u2212 1 so as to represent\n\n4\n\n\f// AIS loop\n\n// HMC loop\n\nDe\ufb01ne \u02c6pt(z|xo) = p(z|xo)\u03b2tp(z)1\u2212\u03b2t\nfor m = 1, . . . , M do\n\n\u2200z \u2208 Z initialize Hamiltonian and momentum variables v \u223c N (0, I)\n\u2200z \u2208 Z compute new proposal sample using leapfrog integration on Hamiltonian\n\u2200z \u2208 Z use Metropolis Hastings to check whether to accept the proposal and update Z\n\nAlgorithm 1 AIS based HMC\n1: Input: p(z|xo), \u03b2t \u2200t \u2208 {1, . . . , T}\n2: Draw set of samples z \u2208 Z from prior distribution p(z)\n3: for t = 1, . . . , T do\n4:\n5:\n6:\n7:\n8:\nend for\n9:\n10: end for\n11: Return: Z\nthe distribution \u02c6pt(z|xo) = p(z|xo)\u03b2tp(z)1\u2212\u03b2t. Intuitively and following the spirit of annealed\nimportance sampling, it is easier to gradually approach sampling from p(z|xo) = \u02c6pT (z|xo) by\nsuccessively re\ufb01ning the samples. Note the notational difference between the posterior of interest\np(z|xo), and the annealed posterior \u02c6pt(z|xo).\nTo successively re\ufb01ne the samples we use Hamilton Monte Carlo (HMC) sampling because a proposed\nupdate can be far from the current sample while still having a high acceptance probability. Speci\ufb01cally,\nHMC enables to bypass to some extent slow exploration of the space when using classical Metropolis\nupdates based on a random walk proposal distribution.\nCombining both AIS and HMC, the developed approach summarized in Alg. 1 iteratively proceeds\nas follows after having drawn initial samples from p(z) = \u02c6p0(z|xo): (1) de\ufb01ne the desired proposal\ndistribution; and (2) for K iterations compute new proposals using leapfrog integration and check\nwhether to replace the previous sample with the new proposal. Subsequently, we discuss how to\ncompute proposals and how to check acceptance.\n\n3.3 Hamilton Monte Carlo\nHamilton Monte Carlo (HMC) explores the latent space much more quickly than a classical random\nwalk algorithm. Moreover, HMC methods are particularly suitable for co-generation because they are\ncapable of traversing folds in an energy landscape. To this end, HMC methods trade potential energy\nUt(z) = \u2212 log \u02c6pt(z|xo) with kinetic energy Kt(v). Hereby the dimension d of the momentum\nvariable v \u2208 Rd is identical to that of the latent samples z \u2208 Rd. For readability, we drop the\ndependence on the time index t from here on.\nSpeci\ufb01cally, HMC de\ufb01nes a Hamiltonian H(z, v) = U (z) + K(v) or conversely a joint probability\ndistribution log p(z, v) \u221d \u2212H(z, v) and proceeds by iterating three steps M times.\nIn a \ufb01rst step, the Hamiltonian is initialized by randomly sampling the momentum variable v, typically\nusing a standard Gaussian. Note that this step leaves the joint distribution p(z, v) corresponding to\nthe Hamiltonian invariant as the momentum v is independent of samples z and as we sample from\nthe correct pre-de\ufb01ned distribution for the momentum variables.\nIn a second step, we compute proposals (z\u2217, v\u2217) via leapfrog integration to move along a hypersurface\nof the Hamiltonian, i.e., the value of the Hamiltonian does not change. However note, in this step,\nkinetic energy K(v) can be traded for potential energy U (z) and vice versa.\nIn the \ufb01nal third step we decide whether to accept the proposal (z\u2217, v\u2217) computed via leapfrog\nintegration. Formally, we accept the proposal with probability\n\nmin{1, exp (\u2212H(z\u2217, v\u2217) + H(z, v))}.\n\n(2)\nIf the proposed state (z\u2217, v\u2217) is rejected, the m + 1-th iteration reuses z, otherwise z is replaced with\nz\u2217 in the m + 1-th iteration.\nNote that points (z, v) with different probability density are only obtained during the \ufb01rst step, i.e.,\nsampling of the moment variables v. Importantly, resampling of v can change the probability density\nby a large amount. As evident from Eq. (2), a low value for the Hamiltonian obtained after resampling\nv increases the chances of accepting this proposal, i.e., we gradually increase the number of samples\nwith a low value for the Hamiltonian, conversely a high probability.\n\n5\n\n\f0\n0\n5\n\n0\n0\n5\n1\n\n0\n0\n5\n2\n\n0\n0\n0\n5\n1\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 2: Rows correspond to generators trained for a different number of epochs as indicated (left). The\ncolumns illustrate: (a) Samples generated with a vanilla GAN (black); (b) GD reconstructions from 100 random\ninitializations; (c) Reconstruction error bar plot for the result in column (b); (d) Reconstructions recovered with\nAlg. 1; (e) Reconstruction error bar plot for the results in column (d).\n\nt = 100\n\nt = 2000\n\nt = 3000\n\nt = 4000\n\nt = 6000\n\nFigure 3: Samples z in Z space during the AIS procedure: after 100, 2k, 3k, 4k and 6k AIS loops.\n\n3.4 Implementation Details\nWe use a sigmoid schedule for the parameter \u03b2t, i.e., we linearly space T \u2212 1 temperature values\nwithin a range and apply a sigmoid function to these values to obtain \u03b2t. This schedule, emphasizes\nlocations where the distribution changes drastically. We use 0.01 as the leapfrog step size and employ\n10 leapfrog updates per HMC loop for the synthetic 2D dataset and 20 leapfrog updates for the\nreal dataset at \ufb01rst. The acceptance rate is 0.65, as recommended by Neal [59]. Low acceptance\nrate means the leapfrog step size is too large in which case the step size will be decreased by 0.98\nautomatically. In contrast, a high acceptance rate will increase the step size by 1.021.\n4 Experiments\nBaselines: In the following, we evaluate the proposed approach on synthetic and imaging data. We\ncompare Alg. 1 with two GD baselines by employing two different initialization strategies. The \ufb01rst\none is sampling a single z randomly. The second picks that one sample z from 5000 initial points\nwhich best matches the objective given in Eq. (1).\n4.1 Synthetic Data\nTo illustrate the advantage of our proposed method over the common baseline, we \ufb01rst demonstrate\nour results on 2-dimensional synthetic data. Speci\ufb01cally, the 2-dimensional data x = (x1, x2) is\ndrawn from a mixture of \ufb01ve equally weighted Gaussians each with a variance of 0.02, the means of\n\n1We adapt the AIS implementation from https://github.com/tonywu95/eval_gen\n\n6\n\n1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x21.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x21.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x21.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x21.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count1.51.00.50.00.51.01.5x11.51.00.50.00.51.01.5x20.00.51.01.52.02.53.0reconstruction error020406080100count2.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2480.2980.2980.3590.3590.3590.4330.4330.4330.5210.5210.5210.6280.6280.6280.7560.7560.7560.7560.9110.9111.0972.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.9112.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.9112.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.9112.01.51.00.50.00.51.01.52.0z12.01.51.00.50.00.51.01.52.0z20.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010.0020.0020.0020.0030.0030.0040.0050.0060.0070.0090.0100.0130.0150.0180.0220.0270.0320.0390.0460.0560.0670.0810.0980.1180.1420.1710.2060.2480.2980.3590.3590.3590.4330.4330.5210.5210.6280.6280.7560.7560.9110.9110.911\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\n(f)\n\nFigure 4: Reconstructions errors over the number of progressive GAN training iterations. (a) MSSIM on CelebA;\n(b) MSE on CelebA; (c) MSSIM on LSUN; (d) MSE on LSUN; (e) MSSIM on CelebA-HQ; (f) MSE on\nCelebA-HQ.\n\nGround Truth\n\nMasked Image\n\nGD+single z\n\nGD+multi z\n\nAIS\n\nFigure 5: Reconstructions on 128 \u00d7 128 CelebA images for a progressive GAN trained for 10k iterations.\n\nwhich are spaced equally on the unit circle. See the blue points in columns (a), (b), and (d) of Fig. 2\nfor an illustration.\nIn this experiment we aim to reconstruct x = (x1, x2), given xo = x2 = 0. Considering the generator\nhas learned the synthetic data very well, the optimal solution for the reconstruction is \u02c6x = (1, 0),\nwhere the reconstruction error should be 0. However, as discussed in reference to Fig. 1 earlier, we\nobserve that energy barriers in the Z-space complicate optimization. Speci\ufb01cally, if we initialize\noptimization with a sample far from the optimum, it is hard to recover. While the strategy to pick the\nbest initializer from a set of 5, 000 points works reasonably well in the low-dimensional setting, it\nobviously breaks down quickly if the latent space dimension increases even moderately.\nIn contrast, our proposed AIS co-generation method only requires one initialization to achieve the\ndesired result after 6, 000 AIS loops, as shown in Fig. 2 (15000 (d)). Speci\ufb01cally, reconstruction with\ngenerators trained for a different number of epochs (500, 1.5k, 2.5k and 15k) are shown in the rows.\nThe samples obtained from the generator for the data (blue points in column (a)) are illustrated in\ncolumn (a) using black color. Using the respective generator to solve the program given in Eq. (1)\nvia GD yields results highlighted with yellow color in column (b). The empirical reconstruction\nerror frequency for this baseline is given in column (c). The results and the reconstruction error\nfrequency obtained with Alg. 1 are shown in columns (d, e). We observe signi\ufb01cantly better results\nand robustness to initialization.\nIn Fig. 3 we show for 100 samples that Alg. 1 moves them across the energy barriers during the\nannealing procedure, illustrating the bene\ufb01ts of AIS based HMC over GD.\n4.2\nTo validate our method on real data, we evaluate on three datasets, using MSE and MSSIM metrics.\nFor all three experiments, we use the progressive GAN architecture [34] and evaluate baselines and\nAIS on progressive GAN training data.\nCelebA: For CelebA, the size of the input and the output are 512 and 128 \u00d7 128 respectively. We\ngenerate corrupted images by randomly masking blocks of width and height ranging from 30 to 60.\nThen we use Alg. 1 for reconstruction with 500 HMC loops.\nIn Fig. 4 (a,b), we observe that Alg. 1 outperforms both baselines for all GAN training iterations on\nboth MSSIM and MSE metrics. The difference increases for better trained generators. In Fig. 5, we\n\nImaging Data\n\n7\n\n2000300040005000600070008000900010000Iterations0.30.40.50.60.7MSSIMSingle Init+GDMulti Init+GDAIS2000300040005000600070008000900010000Iterations0.0050.0100.0150.0200.025MSESingle Init+GDMulti Init+GDAIS2000300040005000600070008000900010000Iterations0.400.450.500.550.600.650.70MSSIMSingle Init+GDMulti Init+GDAIS2000300040005000600070008000900010000Iterations0.0050.0100.0150.0200.0250.030MSESingle Init+GDMulti Init+GDAIS20004000600080001000012000140001600018000Iterations0.450.500.550.600.650.70MSSIMSingle Init+GDMulti Init+GDAIS20004000600080001000012000140001600018000Iterations0.0000.0050.0100.0150.0200.0250.030MSESingle Init+GDMulti Init+GDAIS\fGround Truth\n\nMasked Image\n\nGD+single z\n\nGD+multi z\n\nAIS\n\nFigure 6: Reconstructions on 256 \u00d7 256 LSUN images using a pre-trained progressive GAN trained for 10k iter.\n\nLR\n\nGD+single z GD+multi z\n\nAIS\n\nLR\n\nGD+single z GD+multi z\n\nAIS\n\nFigure 7: SISR: 128 \u00d7 128 to 1024 \u00d7 1024 for CelebA-HQ images using a progressive GAN (19k iter.).\n\nshow some results generated by both baselines and Alg. 1. Compared to baselines, Alg. 1 results are\nmore similar to the ground truth and more robust to different mask locations. Note that Alg. 1 only\nuses one initialization, which demonstrates its robustness to initialization.\nLSUN: The output size is 256\u00d7256. We mask images with blocks of width and height between 50\nto 80. The complex distribution and intricate details of LSUN challenge the reconstruction. Here,\nwe sample 5 initializations in our Alg. 1 (line 2). We use 500 HMC loops for each initialization\nindependently. For each image, we pick the best score among \ufb01ve and show the average in Fig. 4\n(c,d). We observe that Alg. 1 with 5 initializations easily outperforms GD with 5,000 initializations.\nWe also show reconstructions in Fig. 6.\nCelebA-HQ Besides recovering masked images, we also demo co-generation on single image super-\nresolution (SISR). In this task, the ground truth is a high-resolution image x (1024 \u00d7 1024) and the\nexposure information xe is a low-resolution image (128\u00d7 128). Here, we use the Progressive CelebA-\nHQ GAN as the generator. After obtaining the generated high-resolution image, we downsample it to\n128 \u00d7 128 via pooling and aim to reduce the squared error between it and the \ufb01nal result. We use 3 z\nsamples for the SISR task. We show MSSIM and MSE between the ground truth (1024 \u00d7 1024) and\nthe \ufb01nal output on Fig. 4 (e, f). Fig. 7 compares the outputs of baselines to those of Alg. 1.\n5 Conclusion\nWe propose a co-generation approach, i.e., we complete partially given input data, using annealed\nimportance sampling (AIS) based on the Hamiltonian Monte Carlo (HMC) method. Different from\nthe classical optimization based methods, speci\ufb01cally GD, which get easily trapped in local optima\nwhen solving this task, the proposed approach is much more robust. Importantly, the method can\ntraverse large energy barriers that occur when training generative adversarial nets. Its robustness is\ndue to AIS gradually annealing a probability distribution and HMC avoiding localized walks.\nAcknowledgments: This work is supported in part by NSF under Grant No. 1718221 and MRI\n#1725729, UIUC, Samsung, 3M, Cisco Systems Inc. (Gift Award CG 1377144) and Adobe. We\nthank NVIDIA for providing GPUs used for this work and Cisco for access to the Arcetri cluster.\n\n8\n\n\fReferences\n[1] B. J. Alder and T. E. Wainwright. Studies in molecular dynamics. I. General method. Journal of Chemical\n\nPhysics, 1959.\n\n[2] A. Almahairi, S. Rajeswar, A. Sordoni, P. Bachman, and A. Courville. Augmented CycleGAN: Learning\n\nmany-to-many mappings from unpaired data. In arXiv:1802.10151, 2018.\n\n[3] A. Anoosheh, E. Agustsson, R. Timofte, and L. Van Gool. Combogan: Unrestrained scalability for image\n\ndomain translation. In arXiv:1712.06909, 2017.\n\n[4] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. In ICML, 2017.\n\n[5] S. Benaim and L. Wolf. One-sided unsupervised domain mapping. In Proc. NIPS, 2017.\n\n[6] D. Berthelot, T. Schumm, and L. Metz. Began: Boundary equilibrium generative adversarial networks. In\n\narXiv preprint arXiv:1703.10717, 2017.\n\n[7] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In\n\nProc. NIPS, 2016.\n\n[8] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain\n\nadaptation with generative adversarial networks. In Proc CVPR, 2017.\n\n[9] Q. Chen and V. Koltun. Photographic image synthesis with cascaded re\ufb01nement networks. In Proc. ICCV,\n\n2017.\n\n[10] Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, and J. Choo. Stargan: Uni\ufb01ed generative adversarial networks\n\nfor multi-domain image-to-image translation. In Proc. CVPR, 2018.\n\n[11] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jegou. Word translation without parallel data. In\n\nICLR, 2018.\n\n[12] R. W. A. Cully, H. J. Chang, and Y. Demiris. Magan: Margin adaptation for generative adversarial networks.\n\narXiv preprint arXiv:1704.03817, 2017.\n\n[13] E. Denton and V. Birodkar. Unsupervised learning of disentangled representations from video. In Proc.\n\nNIPS, 2017.\n\n[14] I. Deshpande, Z. Zhang, and A. G. Schwing. Generative Modeling using the Sliced Wasserstein Distance.\n\nIn Proc. CVPR, 2018.\n\n[15] I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing.\n\nMax-Sliced Wasserstein Distance and its use for GANs. In Proc. CVPR, 2019.\n\n[16] L. Dinh, D. Krueger, and Y. Bengio. NICE: Non-linear Independent Components Estimation. In ICLR,\n\n2015.\n\n[17] C. Donahue, A. Balsubramani, J. McAuley, and Z. C. Lipton. Semantically decomposing the latent spaces\n\nof generative adversarial networks. In Proc. ICLR, 2018.\n\n[18] A. Dosovitskiy, J. T. Springenberg, and T. Brox. Learning to generate chairs with convolutional neural\n\nnetworks. In Proc. CVPR, 2015.\n\n[19] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. D. Roweth. Hybrid Monte Carlo. Physics Letters B,\n\n1987.\n\n[20] V. Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. In Proc. ICLR, 2017.\n\n[21] Z. Gan, L. Chen, W. Wang, Y. Pu, Y. Zhang, H. Liu, C. Li, and L. Carin. Triangle generative adversarial\n\nnetworks. In Proc. NIPS, 2017.\n\n[22] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In\n\nProc. CVPR, 2016.\n\n[23] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the structure of a real-time, arbitrary\n\nneural artistic stylization network. In Proc. BMVC, 2017.\n\n[24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.\n\nGenerative adversarial nets. In Proc. NIPS, 2014.\n\n9\n\n\f[25] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and\n\nY. Bengio. Generative Adversarial Networks. In https://arxiv.org/abs/1406.2661, 2014.\n\n[26] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of wasserstein\n\ngans. In NeurIPS, 2017.\n\n[27] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In Proc. SIGGRAPH,\n\n2001.\n\n[28] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale\n\nupdate rule converge to a local nash equilibrium. In NeurIPS, 2017.\n\n[29] Y. Hoshen and L. Wolf. Identifying analogies across domains. In Proc. ICLR, 2018.\n\n[30] X. Huang and S. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In\n\nProc. ICCV, 2017.\n\n[31] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz. Multimodal Unsupervised Image-to-Image Translation.\n\nIn Proc. ECCV, 2018.\n\n[32] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial\n\nnetworks. In CVPR, 2017.\n\n[33] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In\n\nProc. ECCV, 2016.\n\n[34] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability,\n\nand variation. In ICLR, 2017.\n\n[35] T. Kim, M. Cha, H. Kim, J. Lee, and J. Kim. Learning to discover cross-domain relations with generative\n\nadversarial networks. In Proc. ICML, 2017.\n\n[36] D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. In https://arxiv.org/abs/1312.6114, 2013.\n\n[37] R. Kiros, R. R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal\n\nneural language models. In arXiv:1411.2539, 2014.\n\n[38] S. Kolouri, G. K. Rohde, and H. Hoffman. Sliced wasserstein distance for learning gaussian mixture\n\nmodels. In CVPR, 2018.\n\n[39] J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for segmenting\n\nand labeling sequence data. In Proc. ICML, 2001.\n\n[40] P.-Y. Laffont, Z. Ren, X. Tao, C. Qian, and J. Hayes. Transient attributes for high-level understanding and\n\nediting of outdoor scenes. TOG, 2014.\n\n[41] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz,\nZ. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network.\nIn Proc. CVPR, 2017.\n\n[42] H. Y. Lee, H. Y. Tseng, J. B. Huang, M. K. Singh, and M. H. Yang. Diverse image-to-image translation via\n\ndisentangled representation. In Proc. ECCV, 2018.\n\n[43] C. Li, K. Xu, J. Zhu, and B. Zhang. Triple generative adversarial nets. In Proc. NIPS, 2017.\n\n[44] C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P\u00f3czos. Mmd gan: Towards deeper understanding of\n\nmoment matching network. In NeurIPS, 2017.\n\n[45] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M. H. Yang. Universal style transfer via feature transforms.\n\nIn Proc. NIPS, 2017.\n\n[46] Y. Li, A. G. Schwing, K.-C. Wang, and R. Zemel. Dualing GANs. In Proc. NeurIPS, 2017.\n\n[47] Y. Li, M. Y. Liu, X. Li, M. H. Yang, and J. Kautz. A closed-form solution to photo-realistic image\n\nstylization. In Proc. ECCV, 2018.\n\n[48] X. Liang, H. Zhang, and E. P. Xing. Generative semantic manipulation with contrasting GAN.\n\narXiv:1708.00315, 2017.\n\nIn\n\n[49] Z. Lin, A. Khetan, G. Fanti, and S. Oh. Pacgan: The power of two samples in generative adversarial\n\nnetworks. In NeurIPS, 2018.\n\n10\n\n\f[50] M.-Y. Liu and O. Tuzel. Coupled Generative Adversarial Networks. In Proc. NIPS, 2016.\n\n[51] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised Image-to-Image Translation Networks. In Proc. NeurIPS,\n\n2017.\n\n[52] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of\n\nvariation in deep representation using adversarial training. In Proc. NIPS, 2016.\n\n[53] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. E. Teller. Equation of state\n\ncalculations by fast computing machines. Journal of Chemical Physics, 1953.\n\n[54] M. Mirza and S. Osindero. Conditional generative adversarial nets. In arXiv preprint arXiv:1411.1784,\n\n2014.\n\n[55] Y. Mroueh and T. Sercu. Fisher gan. In NeurIPS, 2017.\n\n[56] Y. Mroueh, T. Sercu, and V. Goel. Mcgan: Mean and covariance feature matching gan. arXiv preprint\n\narXiv:1702.08398, 2017.\n\n[57] R. M. Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics, 1996.\n\n[58] R. M. Neal. Annealed Importance Sampling. Statistics and Computing, 2001.\n\n[59] R. M. Neal. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2010.\n\n[60] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proc. ICML,\n\n2011.\n\n[61] D. Pathak, P. Kr\u00e4henb\u00fchl, J. Donahue, T. Darrell, and A. A. Efros. Context Encoders: Feature Learning By\n\nInpainting. In Proc. CVPR, 2016.\n\n[62] S. E. Reed, Y. Zhang, Y. Zhang, and H. Lee. Deep visual analogy-making. In Proc. NIPS, 2015.\n\n[63] A. Royer, K. Bousmalis, S. Gouws, F. Bertsch, I. Moressi, F. Cole, and K. Murphy. Xgan: Unsupervised\n\nimage-to-image translation for many-to-many mappings. In arXiv:1711.05139, 2017.\n\n[64] T. Salimans, H. Zhang, A. Radford, and D. Metaxas. Improving gans using optimal transport. In ICLR,\n\n2018.\n\n[65] T. Shen, T. Lei, R. Barzilay, and T. Jaakkola. Style transfer from non-parallel text by cross-alignment. In\n\nProc. NIPS, 2017.\n\n[66] S. E. Shimony. Finding MAPs for belief networks is NP-hard. Arti\ufb01cial Intelligence, 1994.\n\n[67] H.-C. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter, K. Andriole, and\nM. Michalski. Medical Image Synthesis for Data Augmentation and Anonymization using Generative\nAdversarial Networks. In https://arxiv.org/abs/1807.10225, 2018.\n\n[68] A. Shrivastava, T. P\ufb01ster, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and\n\nunsupervised images through adversarial training. In Proc. CVPR, 2017.\n\n[69] J. Sohl-Dickstein and B. Culpepper. Hamiltonian annealed importance sampling for partition function\n\nestimation. In https://arxiv.org/abs/1205.1925, 2012.\n\n[70] N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In Proc.\n\nNeurIPS, 2012.\n\n[71] Y. Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. In Proc. ICLR, 2017.\n\n[72] B. Taskar, C. Guestrin, and D. Koller. Max-Margin Markov Networks. In Proc. NeurIPS, 2003.\n\n[73] T. G. Tau, L. Wolf, and S. B. Tau. The role of minimal complexity functions in unsupervised learning of\n\nsemantic mappings. In Proc. ICLR, 2018.\n\n[74] J. B. Tenenbaum and W. T. Freeman. Separating style and content. In Proc. NIPS, 1997.\n\n[75] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large Margin Methods for Structured and\n\nInterdependent Output Variables. JMLR, 2005.\n\n[76] S. Tulyakov, M. Y. Liu, X. Yang, and J. Kautz. Mocogan: Decomposing motion and content for video\n\ngeneration. In Proc. CVPR, 2018.\n\n11\n\n\f[77] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky. Texture networks: Feed-forward synthesis of\n\ntextures and stylized images. In Proc. ICML, 2016.\n\n[78] L. G. Valliant. The complexity of computing the permanent. Theoretical Computer Science, 1979.\n\n[79] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee. Decomposing Motion and Content for Natural Video\n\nSequence Prediction. In Proc. ICLR, 2017.\n\n[80] T. C. Wang, M. Y. Liu, J. Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and\n\nsemantic manipulation with conditional GANs. In Proc. CVPR, 2018.\n\n[81] L. Wolf, Y. Taigman, and A. Polyak. Unsupervised creation of parameterized avatars. In Proc. ICCV, 2017.\n\n[82] Y. Wu, Y. Burda, R. Salakhutdinov, and R. B. Grosse. On the Quantitative Analysis of Decoder-Based\n\nGenerative Models. In ICLR, 2017.\n\n[83] R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do. Semantic Image\n\nInpainting with Deep Generative Models. In Proc. CVPR, 2017.\n\n[84] Z. Yi, H. Zhang, P. Tan, and M. Gong. Dualgan: Unsupervised dual learning for image-to-image translation.\n\nIn Proc. ICCV, 2017.\n\n[85] J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim. Rotating your face using multi-task deep neural\n\nnetwork. In Proc. CVPR, 2015.\n\n[86] J.-Y. Zhu, P. Kr\u00e4henb\u00fchl, E. Shechtman, and A. A. Efros. Generative Visual Manipulation on the Natural\n\nImage Manifold. In Proc. ECCV, 2016.\n\n[87] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent\n\nadversarial networks. In ICCV, 2017.\n\n[88] J. Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward multimodal\n\nimage-to-image translation. In Proc. NeurIPS, 2017.\n\n12\n\n\f", "award": [], "sourceid": 3109, "authors": [{"given_name": "Tiantian", "family_name": "Fang", "institution": "University of Illinois Urbana-Champaign"}, {"given_name": "Alexander", "family_name": "Schwing", "institution": "University of Illinois at Urbana-Champaign"}]}