Analysis of Optimal Transport Related Mis fit Functions in Seismic Imaging

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22619
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

We analyze different misfit functions for comparing synthetic and observed data in seismic imaging, for example, the Wasserstein metric and the conventional least-squares norm.We revisit the convexity and insensitivity to noise of the Wasserstein metric which demonstrate the robustness of the metric in seismic inversion. Numerical results illustrate that full waveform inversion with quadratic Wasserstein metric can often effectively overcome the risk of local minimum trapping in the optimization part of the algorithm. A mathematical study on Fréchet derivative with respect to the model parameters of the objective functions further illustrates the role of optimal transport maps in this iterative approach. In this context we refer to the objective function as misfit. A realistic numerical example is presented.

Analysis of Optimal Transport Related Misfit Functions in Seismic Imaging

Collection

application/pdf Analysis of Optimal Transport Related Misfi t Functions in Seismic Imaging (slides)
application/pdf Analysis of Optimal Transport Related Mis fit Functions in Seismic Imaging Yunan Yang, Björn Engquist
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

We analyze different misfit functions for comparing synthetic and observed data in seismic imaging, for example, the Wasserstein metric and the conventional least-squares norm.We revisit the convexity and insensitivity to noise of the Wasserstein metric which demonstrate the robustness of the metric in seismic inversion. Numerical results illustrate that full waveform inversion with quadratic Wasserstein metric can often effectively overcome the risk of local minimum trapping in the optimization part of the algorithm. A mathematical study on Fréchet derivative with respect to the model parameters of the objective functions further illustrates the role of optimal transport maps in this iterative approach. In this context we refer to the objective function as misfit. A realistic numerical example is presented.
Analysis of Optimal Transport Related Misfit Functions in Seismic Imaging

Média

Voir la vidéo

Métriques

0
0
446.51 Ko
 application/pdf
bitcache://891b16db9f5e34b05dec5c30fad8304368fda40f

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22619</identifier><creators><creator><creatorName>Yunan Yang</creatorName></creator><creator><creatorName>Björn Engquist</creatorName></creator></creators><titles>
            <title>Analysis of Optimal Transport Related Misfit Functions in Seismic Imaging</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Optimal transport</subject><subject>optimization</subject><subject>full waveform inversion</subject><subject>seismic imaging</subject><subject>inverse problem</subject></subjects><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Tue 13 Nov 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">891b16db9f5e34b05dec5c30fad8304368fda40f</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37369</version>
        <descriptions>
            <description descriptionType="Abstract">We analyze different misfit functions for comparing synthetic and observed data in seismic imaging, for example, the Wasserstein metric and the conventional least-squares norm.We revisit the convexity and insensitivity to noise of the Wasserstein metric which demonstrate the robustness of the metric in seismic inversion. Numerical results illustrate that full waveform inversion with quadratic Wasserstein metric can often effectively overcome the risk of local minimum trapping in the optimization part of the algorithm. A mathematical study on Fréchet derivative with respect to the model parameters of the objective functions further illustrates the role of optimal transport maps in this iterative approach. In this context we refer to the objective function as misfit. A realistic numerical example is presented.
</description>
        </descriptions>
    </resource>
.

Analysis of Optimal Transport Related Misfit Functions in Seismic Imaging Yunan Yang and Björn Engquist Department of Mathematics, The University of Texas at Austin, University Station C1200, Austin, TX 78712 USA yunanyang@math.utexas.edu Abstract. We analyze different misfit functions for comparing synthetic and observed data in seismic imaging, for example, the Wasserstein met- ric and the conventional least-squares norm. We revisit the convexity and insensitivity to noise of the Wasserstein metric which demonstrate the robustness of the metric in seismic inversion. Numerical results illustrate that full waveform inversion with quadratic Wasserstein metric can often effectively overcome the risk of local minimum trapping in the optimiza- tion part of the algorithm. A mathematical study on Fréchet derivative with respect to the model parameters of the objective functions further illustrates the role of optimal transport maps in this iterative approach. In this context we refer to the objective function as misfit. A realistic numerical example is presented. Keywords: full waveform inversion, optimal transport, seismic imaging, optimization, inverse problem 1 Introduction Seismic data contains interpretable information about subsurface properties. Imaging predicts the spatial locations as well as properties that are useful in ex- ploration seismology. The inverse method in the imaging predicts more physical properties if a full wave equation is employed instead of an asymptotic far-field approximation to it [9]. This, so called full waveform inversion (FWI) is a data-driven method to ob- tain high resolution subsurface properties by minimizing the difference or mis- fit between observed and synthetic seismic waveforms [12]. In the past three decades, the least-squares norm (L2 ) has been widely used as a misfit func- tion [10], which is known to suffer from cycle skipping issues (local minimum trapping) and sensitivity to noise [12]. Optimal transport has become a well developed topic in mathematics since it was first proposed by Gaspard Monge in 1781. The idea of using optimal trans- port for seismic inversion was first proposed in 2014 [3]. A useful tool from the theory of optimal transport, the Wasserstein metric computes the optimal cost of rearranging one distribution into another given a cost function. In computer science the metric is often called the “Earth Mover’s Distance” (EMD). Here we will focus on the quadratic Wasserstein metric (W2). In this paper, we briefly review the theory of optimal transport and revisit the convexity and noise insensitivity of W2 that were proved in [4]. The properties come from the analysis of the objective function. Next, we compare the Fréchet derivative with respect to the model parameters in different misfit functions using the adjoint-state method [8]. Discussions and comparisons between large scale inversion results using W2 and L2 metrics illustrate that the W2 metric is very promising for overcoming the cycle skipping issue in seismic inversion. 2 Theory 2.1 Full waveform inversion and the least squares functional Full waveform inversion is a PDE-constrained optimization problem, minimizing the data misfit J(f, g) by updating the model m, i.e. : m? = argmin m J(f(xr, t; m), g(xr, t)), (1) where g is observed data, f is simulated data, xr are receiver locations. We get the modeled data f(x, t; m) by numerically solving in both the space and time domain [1]. Generalized least squares functional is a weighted sum of the squared errors and hence a generalized version of the standard least-squares misfit function. The formulation is J1(m) = X r Z |W(f(xr, t; m)) − W(g(xr, t))| 2 dt. (2) In the conventioinal L2 misfit, the weighting operator W is the identity I. The integral wavefields misfit functional [5] is a generalized least squares func- tional applied on full-waveform inversion (FWI) with weighting operator W(u) = R t 0 u(x, τ)dτ. If we define the integral wavefields U(x, t) = R t 0 u(x, τ)dτ, then mis- fit function becomes the ordinary least squares difference between R t 0 g(xr, τ)dτ and R t 0 f(xr, τ; m)dτ. The integral wavefields still satisfy the original acoustic wave equation with a different source term: δ(x − xs)H(t) ∗ S(t), where S is the original source term and H(t) is the Heaviside step function [5]. We will refer this misfit function as H−1 norm in this paper. Normalized Integration Method (NIM) is another generalized least squares functional, with an additional normalization step than integral wavefields misfit functional [6]. The weighting operator is W(u)(xr, t) = R t 0 P(u)(xr, τ)dτ R T 0 P(u)(xr, τ)dτ , (3) where function P is included to make the data nonnegative. Three common choices are P1(u) = |u|, P2(u) = u2 and P3 = E(u), which correspond to the absolute value, the square and the envelop of the signal [6]. 2.2 Optimal transport Optimal transport is a problem that seeks the minimum cost required to trans- port mass of one distribution into another given a cost function, e.g. |x − y| 2 . The mathematical definition of the distance between the distributions f : X → R+ and g : Y → R+ can then be formulated as W2 2 (f, g) = inf Tf,g∈M Z X |x − Tf,g(x)| 2 f(x) dx (4) where M is the set of all maps Tf,g that rearrange the distribution f into g [11]. 0 1 2 3 4 5 6 7 8 0 0.2 0.4 0.6 0.8 1 NIM /W2 /W1 F G dist dist' 2 W 1 = |F -1 - G -1 | dist 2 W 2 2 = |F -1 - G -1 | 2 NIM = |F - G| 2 Fig. 1: After data normalization NIM measures R (F −G)2 dt, while W2 considers R (F−1 − G−1 )2 dt and W1 considers R |F−1 − G−1 |dt. The Wasserstein metric is an alternative misfit function for FWI to measure the difference between synthetic data f and observed data g. We can compare the data trace by trace and use the Wasserstein metric (Wp) in 1D to measure the misfit. The overall misfit is then J2(m) = R X r=1 Wp p (f(xr, t; m), g(xr, t)), (5) where R is the total number of traces. In this paper, we mainly discuss about quadratic Wasserstein metric (W2) when p = 2 in (4) and (5). Here we consider f0(t) and g0(t) as synthetic data and observed data from one single trace. After proper scaling with operator P, we get preconditioned data f = P(f0) and g = P(g0) which are positive and having total sum one. If we consider they are probability density functions (pdf), then after integrating once, we get the cumulative distribution function (cdf) F(t) and G(t). If f is continuous we can write the explicit formulation for the 1D Wasserstein metric as: W2 2 (f, g) = Z 1 0 |F−1 (t) − G−1 (t)|2 dt = Z T 0 (G−1 F(t) − t)2 f(t)dt. (6) The interesting fact is that W2 computes the L2 misfit between F−1 and G−1 (Fig. 1), while the objective function of NIM measures the L2 misfit between F and G (Fig. 1). This is identical to the mathematical norm of Sobolev space H−1 , ||f − g||2 H−1 , given f and g are nonnegative and sharing equal mass. 3 Properties Fig. 2a shows two signals f and its shift g, both of which contain two ricker wavelets. Shift of signals are common in seismic data when we have incorrect velocity. We compute the L2 norm and W2 norm between f and g, and plot the misfit curves in terms of s in Fig. 2b and Fig. 2c. The L2 difference between two signals has many local minima and maxima as s changes. It is a clear demon- stration of the cycle skipping issue of L2 norm. The global convexity of Fig. 2c is a motivation to further study the ideal properties of W2 norm. 2 4 6 8 10 Time -0.5 0 0.5 1 Signal f and its shift g f g (a) Two seismic signals -4 -2 0 2 4 Shift 0 0.2 0.4 0.6 0.8 1 L2 misfit L2 difference between f and f(t-s) (b) L2 misfit -4 -2 0 2 4 Shift 0 5 10 15 20 W2 misfit W2 distince between P2(f) and P2(f(t-s)) (c) W2 misfit Fig. 2: (a) A signal consisting two Ricker wavelets (blue) and its shift (red). (b) L2 norm between f and g which is a shift of f. (c) W2 norm between P2(f) and P2(g) in terms of different shift s. As demonstrated in [4], the squared Wasserstein metric has several properties that make it attractive as a choice of misfit function. One highly desirable feature is its convexity with respect to several parameterizations that occur naturally in seismic waveform inversion [13]. For example, variations in the wave velocity lead to simulations f(m) that are derived from shifts, f(x; s) = g(x + sη), η ∈ Rn , (7) or dilations, f(x; A) = g(Ax), AT = A, A > 0, (8) applied to the observation g. Variations in the strength of a reflecting surface or the focusing of seismic waves can also lead to local rescalings of the form f(x; β) = ( βg(x), x ∈ E g(x), x ∈ Rn \E. (9) In Theorem 1, f and g are assumed to be nonnegative with identical integrals. Theorem 1 (Convexity of squared Wasserstein metric [4]). The squared Wasserstein metric W2 2 (f(m), g) is convex with respect to the model parameters m corresponding to a shift s in (7), the eigenvalues of a dilation matrix A in (8), or the local rescaling parameter β in (9). The Fig. 2c numerically exemplifies Theorem 1. Even if the scaling P(u) = u2 perfectly fits the theorem it has turned out not to work well in generating an adjoint source that works well in inversion. The linear scaling, P(u) = au+b, on the other hand works very well even if the related misfit lacks strict convexity with respect to shifts. The two-variable example described below and Fig. 3 are based on the linear scaling. It gives the convexity with respect to other variables in velocity than a simple shift in the data. The example from [7] shows a convexity result in higher dimensional model domain. The model velocity is increasing linearly in depth as v(x, z) = vp,0 +αz, where vp,0 is the starting velocity on the surface, α is vertical gradient and z is depth. The reference for (vp,0, α) is (2km/s, 0.7s−1 ), and we plot the misfit curves with α ∈ [0.4, 1] and v0 ∈ [1.75, 2.25] on 41 × 45 grid in Fig. 3. We observe many local minima and maxima in Fig. 3a. The curve for W2 (Fig. 3b) is globally convex in model parameters vp,0 and α. It demonstrates the capacity of W2 in mitigating cycle skipping issues. 0 1 0.5 Misfit 0.8 1.6 L2 misfit 1 1.8 v p,0 0.6 2 2.2 0.4 2.4 (a) L2 0 1 0.5 Misfit 0.8 1.6 W2 misfit 1 1.8 v p,0 0.6 2 2.2 0.4 2.4 (b) W2 Fig. 3: (a) Conventional L2 misfit function (b) W2 misfit function trace-by-trace Another ideal property of optimal transport is the insensitivity to noise. All seismic data contains either natural or experimental noise. For example, the ocean waves lead to extremely low frequency data in marine acquisition. Wind and cable motions also generate random noise. The L2 norm is known to be sensitive to noise since the misfit between clean and noisy data is calculated as the sum of squared noise amplitude at each sampling point. In [4] W2 norm is proved to be insensitive to mean-zero noise and the property apply for any dimension of the data. This is a natural result from optimal transport theory since the W2 metric defines a global comparison that not only considers the change in signal intensity but also the phase difference. Theorem 2 (Insensitivity to noise [4]). Let fns be f with a piecewise con- stant additive noise of mean zero uniform distribution. The squared Wasserstein metric W2 2 (f, fns) is of O( 1 N ) where N is the number of pieces of the additive noise in fns. 4 Discussions Typically we solve the linearized problem iteratively to approximate the solu- tion in FWI. This approach requires the Fréchet derivatives of the misfit function J(m) which is expensive to compute directly. The adjoint-state method [8] pro- vides an efficient way of computing the gradient. This approach requires the Fréchet derivative ∂J ∂f and two modelings by solving the wave equations. Here we will only discuss about the the acoustics wave equation (10). m ∂2 u(x, t) ∂t2 − ∆u(x, t) = S(x, t) (10) In the adjoint-state method, we first forward propagate the source wavelet with zero initial conditions. The simulated data f is the source wavefield u recorded on the boundary. Next we back propagate the Fréchet derivative ∂J ∂f as the source with zero final conditions and get the receiver wavefield v. With both the forward wavefield u and backward wavefield v, the Fréchet derivative of m becomes ∂J ∂m = − Z T 0 utt(x, t)v(x, T − t) = − Z T 0 u(x, t)vtt(x, T − t) (11) In the acoustic setting, the vtt(x, t) is equivalent to the wavefield with the second order time derivative of ∂J ∂f being the source. The change of the misfit function only impacts the source term of the back propagation, particularly the second order time derivative of ∂J ∂f . For L2 norm, the term is 2(ftt(x, t)−gtt(x, t)), and for H−1 norm it becomes 2(g(x, t)−f(x, t)). For trace-by-trace W2 norm, the second order time derivative of ∂W 2 2 (f,g) ∂f is 2  g(x,t0 )−f(x,t) g(x,t0)  where t0 = G−1 F(t), the optimal coupling of t for each trace. Compared with L2 norm, the source term of H−1 does not has the two time derivatives and therefore has more of a focus on the lower frequency part of the data. Lower frequency components normally provide a wider basin of attraction in optimization. The source term of W2 is similar to the one of H−1 norm, but the order of signal g in time has changed with the optimal map for each trace at receiver x. The optimal couplings often change the location of the wavefront. For example, if g is a shift of f, then the wavefront of g will be mapped to the wavefront of f even if two wavefronts do not match in time. The change of time order in g also helps generate a better image under the reflectors when we back propagate the source and compute the gradient as in (11). 0 5 10 15 x (km) 0 2 4 6 z (km) (a) True model velocity 0 5 10 15 x (km) 0 2 4 6 z (km) (b) Initial velocity 0 5 10 15 x (km) 0 2 4 6 z (km) (c) Inversion result using L2 0 5 10 15 x (km) 0 2 4 6 z (km) (d) Inversion result using W2 Fig. 4: Large scale FWI example 5 Numerical example In this section, we use a part of the BP 2004 benchmark velocity model [2] (Fig. 4a) and a highly smoothed initial model without the upper salt part (Fig. 4b) to do inversion with W2 and L2 norm respectively. A fixed-spread surface acquisition is used, involving 11 shots located every 1.6km on top. A Ricker wavelet centered on 5Hz is used to generate the synthetic data with a bandpass filter only keeping 3 to 9Hz components. We stopped the inversion after 300 L-BFGS iterations. Here we precondition the data with function P(f) = a · f + b to satisfy the nonnegativity and mass balance in optimal transport. Inversion with trace-by- trace W2 norm successfully construct the shape of the salt bodies (Fig. 4d), while FWI with the conventional L2 failed to recover boundaries of the salt bodies as shown by Fig. 4c. 6 Conclusion In this paper, we revisited the quadratic Wasserstein metric from the optimal transport theory in the application of seismic inversion. The desirable properties of convexity and insensitivity to noise make it a promising alternative misfit function in FWI. We also analyze the conventional least-squares inversion (L2 norm), the integral wavefields misfit function (H−1 norm) and the quadratic Wasserstein metric (W2) in terms of the model parameter gradient using the adjoint-state method. The analysis further demonstrate the effectiveness of op- timal transport ideas in dealing with cycle skipping. Acknowledgments. We thank Sergey Fomel, Junzhe Sun and Zhiguang Xue for helpful discussions, and thank the sponsors of the Texas Consortium for Computational Seismology (TCCS) for financial support. This work was also partially supported by NSF DMS-1620396. References 1. Alford, R., Kelly, K., Boore, D.M.: Accuracy of finite-difference modeling of the acoustic wave equation. Geophysics 39(6), 834–842 (1974) 2. Billette, F., Brandsberg-Dahl, S.: The 2004 bp velocity benchmark. In: 67th EAGE Conference & Exhibition (2005) 3. Engquist, B., Froese, B.D.: Application of the Wasserstein metric to seismic signals. Communications in Mathematical Sciences 12(5) (2014) 4. Engquist, B., Froese, B.D., Yang, Y.: Optimal transport for seismic full waveform inversion. Communications in Mathematical Sciences 14(8), 2309 2330 (2016) 5. Huang, G., Wang, H., Ren, H.: Two new gradient precondition schemes for full waveform inversion. arXiv preprint arXiv:1406.1864 (2014) 6. Liu, J., Chauris, H., Calandra, H.: The normalized integration method-an alterna- tive to full waveform inversion? In: 25th Symposium on the Application of Geoph- pysics to Engineering & Environmental Problems (2012) 7. Métivier, L., Brossier, R., Mrigot, Q., Oudet, E., Virieux, J.: An optimal transport approach for seismic tomography: application to 3d full waveform inversion. Inverse Problems 32(11), 115008 (2016) 8. Plessix, R.E.: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophysical Journal International 167(2), 495–503 (2006) 9. Stolt, R.H., Weglein, A.B.: Seismic imaging and inversion: Volume 1: Application of linear inverse theory, vol. 1. Cambridge University Press (2012) 10. Tarantola, A., Valette, B.: Generalized nonlinear inverse problems solved using the least squares criterion. Reviews of Geophysics 20(2), 219–232 (1982) 11. Villani, C.: Topics in optimal transportation, Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence, RI (2003) 12. Virieux, J., Operto, S.: An overview of full-waveform inversion in exploration geo- physics. Geophysics 74(6), WCC1–WCC26 (2009) 13. Yang, Y., Engquist, B., Sun, J., Froese, B.D.: Application of optimal transport and the quadratic wasserstein metric to full-waveform inversion. arXiv preprint arXiv:1612.05075 (2016)