Diffeomorphic random sampling using optimal information transport

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22608
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)|an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of di eomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge{Ampere equation.
Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.

Diffeomorphic random sampling using optimal information transport

Collection

application/pdf Diffeomorphic random sampling using optimal information transport Martin Bauer, Sarang Joshi, Klas Modin
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)|an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of di eomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge{Ampere equation.
Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.
Diffeomorphic random sampling using optimal information transport

Média

Voir la vidéo

Métriques

0
0
1004.69 Ko
 application/pdf
bitcache://6caf06434659449b6a87a9f105d7eb31b2d79115

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22608</identifier><creators><creator><creatorName>Martin Bauer</creatorName></creator><creator><creatorName>Klas Modin</creatorName></creator><creator><creatorName>Sarang Joshi</creatorName></creator></creators><titles>
            <title>Diffeomorphic random sampling using optimal information transport</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Information geometry</subject><subject>Optimal transport</subject><subject>Fisher-Rao metric</subject><subject>density matching</subject><subject>image registration</subject><subject>diffeomorphism groups</subject><subject>random sampling</subject></subjects><dates>
	    <date dateType="Created">Thu 8 Mar 2018</date>
	    <date dateType="Updated">Thu 8 Mar 2018</date>
            <date dateType="Submitted">Tue 13 Nov 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">6caf06434659449b6a87a9f105d7eb31b2d79115</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37355</version>
        <descriptions>
            <description descriptionType="Abstract">In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)|an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of di eomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge{Ampere equation.<br />
Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.
</description>
        </descriptions>
    </resource>
.

Diffeomorphic random sampling using optimal information transport Martin Bauer1 , Sarang Joshi2 , and Klas Modin3 1 Department of Mathematics, Florida State University bauer@math.fsu.edu 2 Department of Bioengineering, Scientific Computing and Imaging Institute, University of Utah sjoshi@sci.utah.edu 3 Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg klas.modin@chalmers.se Abstract. In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Rieman- nian manifolds. The algorithm is based on optimal information transport (OIT)—an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of diffeomorphisms. The resulting sampling algo- rithm is a promising alternative to OMT, in particular as our formu- lation is semi-explicit, free of the nonlinear Monge–Ampere equation. Compared to Markov Chain Monte Carlo methods, we expect our al- gorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed. Keywords: density matching, information geometry, Fisher–Rao met- ric, optimal transport, image registration, diffeomorphism groups, ran- dom sampling MSC2010: 58E50, 49Q10, 58E10 1 Introduction We construct algorithms for random sampling, addressing the following problem. Problem 1. Let µ be a probability distribution on a manifold M. Generate N random samples from µ. The classic approach to sample from a probability distribution on a higher di- mensional space is to use Markov Chain Monte Carlo (MCMC) methods, for example the Metropolis–Hastings algorithm [6]. An alternative idea is to use dif- feomorphic density matching between the density µ and a standard density µ0 from which samples can be drawn easily. Standard samples are then transformed by the diffeomorphism to generate non-uniform samples. In Bayesian inference, for example, the distribution µ would be the posterior distribution and µ0 would be the prior distribution. In case the prior itself is hard to sample from the uni- form distribution can be used. For M being a subset of the real line, the standard approach is to use the cumulative distribution function to define the diffeomor- phic transformation. If, however, the dimension of M is greater then one there is no obvious change of variables to transform the samples to the distribution of the prior. We are thus led to the following matching problem. Problem 2. Given a probability distribution µ on M, find a diffeomorpism ϕ such that ϕ∗µ0 = µ. Here, µ0 denotes a standard distribution on M from which samples can be drawn, and ϕ∗ is the the push-forward of ϕ acting on densities, i.e., ϕ∗µ0 = |Dϕ|µ0 ◦ ϕ, where |Dϕ| is the Jacobian determinant. A benefit of transport-based methods over traditional MCMC methods is cheap computation of additional samples; it amounts to drawing uniform samples and then evaluating the transformation. On the other hand, transport-based methods scale poorly with increasing dimensionality of M, contrary to MCMC. The action of the diffeomorphism group on the space of smooth probability densities is transitive (Moser’s lemma [13]), so existence of a solution to Prob- lem 2 is guaranteed. However, if the dimension of M is greater then one, there is an infinite-dimensional space of solutions. Thus, one needs to select a specific diffeomorphism within the set of all solutions. Moselhy and Marzouk [12] and Reich [15] proposed to use optimal mass transport (OMT) to construct the de- sired diffeomorphism ϕ, thereby enforcing ϕ = ∇c for some convex function c. The OMT approach implies solving, in one form or another, the heavily non- linear Monge–Ampere equation for c. A survey of the OMT approach to random sampling is given by Marzouk et. al. [9]. In this article we pursue an alternative approach for diffeomorphic based ran- dom sampling, replacing OMT by optimal information transport (OIT), which is diffeomorphic transport based on the Fisher–Rao geometry [11]. Building on deep geometric connections between the Fisher–Rao metric on the space of probability densities and the right-invariant information metric on the group of diffeomor- phisms [7, 11], we developed in [3] an efficient numerical method for density matching. The efficiency stems from a solution formula for ϕ that is explicit up to inversion of the Laplace operator, thus avoiding the solution of nonlinear PDE such as Monge–Ampere. In this paper we explore this method for ran- dom sampling (the initial motivation in [3] is medical imaging, although other applications, including random sampling, are also suggested). The resulting al- gorithm is implemented in a short MATLAB code, available under MIT license at https://github.com/kmodin/oit-random. 2 2 Density Transport Problems Let M be an d–dimensional orientable, compact manifold equipped with a Rie- mannian metric g = h., .i. The volume density induced by g is denoted µ0 and without loss of generality we assume that the total volume of M with respect to µ0 is one, i.e., R M µ0 = 1. Furthermore, the space of smooth probability densities on M is given by Prob(M) = {µ ∈ Ωd (M) | µ > 0, Z M µ = 1}, (1) where Ωd (M) denotes the space of smooth d-forms. The group of smooth diffeo- morphisms Diff(M) acts on the space of probability densities via push-forward: Diff(M) × Prob(M) 7→ Prob(M) (2) (ϕ, µ) → ϕ∗µ . (3) By a result of Moser [13] this action is transitive. We introduce the subgroup of volume preserving diffeomorphisms SDiff(M) = {ϕ ∈ Diff(M) | ϕ∗µ0 = µ0} . (4) Note that SDiff(M) is the isotropy group of µ0 with respect to the action of Diff(M). The spaces Prob(M), Diff(M), and SDiff(M) all have the structure of smooth, infinite dimensional Fréchet manifold. Furthermore, Diff(M) and SDiff(M) are infinite dimensional Fréchet Lie groups. A careful treatment of these Fréchet topologies can be found in the work by Hamilton [5]. In the following we will focus our attention on the diffeomorphic density matching problem (Problem 2). A common approach to overcome the non- uniqueness in the solution is to add a regularization term to the problem. That is, to search for a minimum energy solution that has the required matching property, for some energy functional E on the diffeomorphism group. Following ideas from mathematical shape analysis [10] it is a natural approach to define this energy functional using the geodesic distance function dist of a Rieman- nian metric on the diffeomorphism group. Then the regularized diffeomorphic matching problem can be written as follows. Problem 3. Given a probability density µ ∈ Prob(M) we want to find the dif- feomorphism ϕ ∈ Diff(M) that minimizes the energy functional E(ϕ) = dist2 (id, ϕ) (5) over all diffeomorphisms ϕ with ϕ∗µ0 = µ. The free variable in the above matching problem is the choice of Riemannian metric—thus distance function—on the group of diffeomorphisms. Although not formulated as here, Moselhy and Marzouk [12] proposed to use the L2 metric on Diff(M) Gϕ(u ◦ ϕ, v ◦ ϕ) = Z M hu ◦ ϕ, v ◦ ϕi µ0 (6) 3 for u ◦ ϕ, v ◦ ϕ ∈ TϕDiff(M). This corresponds to distance-squared optimal mass transport (OMT), which induces the Wasserstein L2 distance on Prob(M), see, for example, [14, 8, 16]. In this article we use the right-invariant H1 -type metric GI ϕ(u ◦ ϕ, v ◦ ϕ) = − Z M h∆u, viµ0 + λ k X i=1 Z M hu, ξiiµ0 Z M hv, ξiiµ0, (7) where λ > 0, ∆ is the Laplace–de Rham operator lifted to vector fields, and ξ1, . . . , ξk is an orthonormal basis of the harmonic 1-forms on M. Because of the Hodge decomposition theorem, GI is independent of the choice of orthonormal basis ξ1, . . . , ξk for the harmonic vector fields. This construction is related to the Fisher-Rao metric on the space of probability density [4, 2], which is predominant in the field of information geometry [1]. We call GI the information metric. See [7, 11, 3] for more information on the underlying geometry. The connection between the information metric and the Fisher-Rao metric allows us to construct almost explicit solutions formulas for Problem 2 using the explicit formulas for the geodesics of the Fisher-Rao metric. Theorem 1 ([11, 3]) Let µ ∈ Prob(M) be a smooth probability density. The diffeomorphism ϕ ∈ Diff(M) minimizing distGI (id, ϕ) under the constraint ϕ∗µ0 = µ is given by ϕ(1), where ϕ(t) is obtained as the solution to the problem ∆f(t) = µ̇(t) µ(t) ◦ ϕ(t), v(t) = ∇(f(t)), d dt ϕ(t)−1 = v(t) ◦ ϕ(t)−1 , ϕ(0) = id (8) where µ(t) is the (unique) Fisher-Rao geodesic connecting µ0 and µ µ(t) =  sin ((1 − t)θ) sin θ + sin (tθ) sin θ r µ µ0 2 µ0, cos θ = Z M r µ µ0 µ0 . (9) The algorithm for diffeomorphic random sampling, described in the following section, is directly based on solving the equations (8). 3 Numerical Algorithm In this section we explain the algorithm for random sampling using optimal information transport. It is a direct adaptation of [3, Algorithm 1]. Algorithm 1 (OIT based random sampling) Assume we have a numerical way to represent functions, vector fields, and dif- feomorphisms on M, and numerical methods for 4 – composing functions and vector fields with diffeomorphisms, – computing the gradient of functions, – computing solutions to Poisson’s equation on M, – sampling from the standard distribution µ0 on M, and – evaluating diffeomorphisms. An OIT based algorithm for Problem 1 is then given as follows: 1. Choose a step size ε = 1/K for some positive integer K and calculate the Fisher-Rao geodesic µ(t) and its derivative µ̇(t) at all time points tk = k K using equation (9). 2. Initialize ϕ0 = id. Set k ← 0. 3. Compute sk = µ̇(tk) µ(tk) ◦ ϕk and solve the Poisson equation ∆fk = sk. (10) 4. Compute the gradient vector field vk = ∇fk. 5. Construct approximations ψk to exp(−εvk), for example ψk = id −εvk. (11) 6. Update the diffeomorphism4 ϕk+1 = ϕk ◦ ψk. (12) 7. Set k ← k + 1 and continue from step 3 unless k = K. 8. Draw N random samples x1, . . . xN from the uniform distribution µ0. 9. Set yn = ϕK(xn), n ∈ {1, . . . N}. The algorithm generates N random samples y1, . . . , yN from the distribu- tion µ. One can save ϕK and repeat 8-9 whenever additional samples are needed. The computationally most intensive part of the algorithm is the solution of Poisson’s equation at each time step. Notice, however, that we do not need to solve nonlinear equations, such as Monge–Ampere, as is necessary in OMT. 4 Example In this example we consider M = T2 ' (R/2πZ)2 with distribution defined in Cartesian coordinates x, y ∈ [−π, π) by µ ∼ 3 exp(−x2 − 10(y − x2 /2 + 1)2 ) + 2 exp(−(x + 1)2 − y2 ) + 1/10, (13) normalized so that the ratio between the maximum and mimimum of µ is 100. The resulting density is depicted in Fig. 1 (left). We draw 105 samples from this distribution using a MATLAB implementa- tion of our algorithm, available under MIT license at 4 If needed, one may also compute the inverse by ϕ−1 k+1 = ϕ−1 k + εv ◦ ϕ−1 k . 5 https://github.com/kmodin/oit-random The implementation can be summarized as follows. To solve the lifting equa- tions (8) we discretize the torus by a 256 × 256 mesh and use the fast Fourier transform (FFT) to invert the Laplacian. We use 100 time steps. The resulting diffeomorphism is shown as a mesh warp in Fig. 2. We then draw 105 uniform samples on [−π, π]2 and apply the diffeomorphism on each sample (applying the diffeomorphism corresponds to interpolation on the warped mesh). The result- ing random samples are depicted in Fig. 1 (right). To draw new samples is very efficient. For example, another 107 samples can be drawn in less than a second on a standard laptop. Fig. 1. (left) The probability density µ of (13). The maximal density ratio is 100. (right) 105 samples from µ calculated using our OIT based random sampling algorithm. 5 Conclusions In this paper we explore random sampling based on the optimal information transport algorithm developed in [3]. Given the semi-explicit nature of the algo- rithm, we expect it to be an efficient competitor to existing methods, especially for drawing a large number of samples from a low dimensional manifold. How- ever, a detailed comparison with other methods, including MCMC methods, is outside the scope of this paper and left for future work. We provide an example of a complicated distribution on the flat 2-torus. It is straighforward to extended the method to more elaborate manifolds, e.g., by using finite element methods for Poisson’s equation on manifolds. For non- compact manifolds, most importantly Rn , one might use standard techniques, such as Box–Muller, to first transform the required distribution to a compact domain. 6 Fig. 2. The computed diffeomorphism ϕK shown as a warp of the uniform 256 × 256 mesh (every 4th mesh-line is shown). Notice that the warp is periodic. It satisfies ϕ∗µ0 = µ and solves Problem 3 by minimizing the information metric (7). The ratio between the largest and smallest warped volumes is 100. 7 Bibliography [1] Amari, S., Nagaoka, H.: Methods of information geometry. Amer. Math. Soc., Providence, RI (2000) [2] Bauer, M., Bruveris, M., Michor, P.W.: Uniqueness of the Fisher-Rao metric on the space of smooth densities. Bull. Lond. Math. Soc. 48(3), 499–506 (2016) [3] Bauer, M., Joshi, S., Modin, K.: Diffeomorphic density matching by optimal information transport. SIAM J. Imaging Sci. 8(3), 1718–1751 (2015) [4] Friedrich, T.: Die Fisher-information und symplektische strukturen. Math. Nachr. 153(1), 273–296 (1991) [5] Hamilton, R.S.: The inverse function theorem of Nash and Moser. Bull. Amer. Math. Soc. (N.S.) 7(1), 65–222 (1982) [6] Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970) [7] Khesin, B., Lenells, J., Misiolek, G., Preston, S.C.: Geometry of diffeomor- phism groups, complete integrability and geometric statistics. Geom. Funct. Anal. 23(1), 334–366 (2013) [8] Khesin, B., Wendt, R.: The Geometry of Infinite-dimensional Groups, A Series of Modern Surveys in Mathematics, vol. 51. Springer-Verlag, Berlin (2009) [9] Marzouk, Y., Moselhy, T., Parno, M., Spantini, A.: Sampling via measure transport: An introduction. In: Ghanem, R., Higdon, D., Owhadi, H. (eds.) Handbook of Uncertainty Quantification. Springer International Publishing, Cham (2016) [10] Miller, M.I., Trouvé, A., Younes, L.: On the metrics and euler-lagrange equations of computational anatomy. Annu Rev Biomed Eng. (4), 375–405 (2002) [11] Modin, K.: Generalized Hunter–Saxton equations, optimal information transport, and factorization of diffeomorphisms. J. Geom. Anal. 25(2), 1306–1334 (2015) [12] Moselhy, T.A.E., Marzouk, Y.M.: Bayesian inference with optimal maps. Journal of Computational Physics 231(23), 7815 – 7850 (2012) [13] Moser, J.: On the volume elements on a manifold. Trans. Amer. Math. Soc. 120, 286–294 (1965) [14] Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations 26(1-2), 101–174 (2001) [15] Reich, S.: A nonparametric ensemble transform method for Bayesian infer- ence. SIAM Journal on Scientific Computing 35(4), A2013–A2024 (2013) [16] Villani, C.: Optimal transport: old and new, Grundlehren der Mathematis- chen Wissenschaften, vol. 338. Springer-Verlag, Berlin (2009)