Variational Bayesian Approximation for Linear Inverse Problems with a hierarchical prior models

28/08/2013
OAI : oai:www.see.asso.fr:2552:4879
DOI : You do not have permission to access embedded form.

Résumé

Variational Bayesian Approximation for Linear Inverse Problems with a hierarchical prior models

Média

Voir la vidéo

Métriques

760
150
299.39 Ko
 application/pdf
bitcache://fc59deceff52641fbae6eb5a1d606b7393334434

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors financier

logo_gdr-mia.png
logo_inria.png
image010.png
logothales.jpg

Sponsors logistique

logo-minesparistech.jpg
logo-universite-paris-sud.jpg
logo_supelec.png
Séminaire Léon Brillouin Logo
logo_cnrs_2.jpg
logo_ircam.png
logo_imb.png
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/2552/4879</identifier><creators><creator><creatorName>Ali Mohammad-Djafari</creatorName></creator></creators><titles>
            <title>Variational Bayesian Approximation for Linear Inverse Problems with a hierarchical prior models</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2013</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Mon 16 Sep 2013</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Tue 16 Jan 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">fc59deceff52641fbae6eb5a1d606b7393334434</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>15067</version>
        <descriptions>
            <description descriptionType="Abstract"></description>
        </descriptions>
    </resource>
.

. Variational Bayesian Approximation for Linear Inverse Problems with a hierarchical prior models Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes, UMR8506 CNRS-SUPELEC-UNIV PARIS SUD 11 SUPELEC, 91192 Gif-sur-Yvette, France http://lss.supelec.free.fr Email: djafari@lss.supelec.fr http://djafari.free.fr A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 1/21 1. General linear inverse problem g(t) = Hf(t) + ǫ(t), t ∈ [1, · · · , T] g(r) = Hf(r) + ǫ(r), r = (x, y) ∈ R2 ◮ f unknown quantity (input) ◮ H Forward operator: (Convolution, Radon, Fourier or any Linear operator) ◮ g observed quantity (output) ◮ ǫ represents the errors of modeling and measurement Discretization: g = Hf + ǫ ◮ Forward operation Hf ◮ Adjoint operation H′ g : < H′ g, f >=< Hf, g > ◮ Inverse operation (if exists) H−1 g, but this is never the case. A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 2/21 2. General Bayesian Inference ◮ Bayesian inference: p(f|g, θ) = p(g|f, θ1) p(f|θ2) p(g|θ) with θ = (θ1, θ2) ◮ Point estimators: Maximum A Posteriori (MAP) or Posterior Mean (PM) −→ f ◮ Unsupervised Bayesian inference: Joint estimation of f and θ ◮ Simple prior models: p(f|θ2) q(f, θ|g) = p(g|f, θ1) p(f|θ2) p(θ) / p(g) ◮ Prior models with hidden variables: p(f|z, θ2) p(z|θ3) q(f, z, θ|g) = p(g|f, θ1) p(f|z, θ2) p(z|θ3) p(θ) / p(g) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 3/21 Two main steps in the Bayesian approach ◮ Prior modeling: p(f|θ1) ◮ Separable: Gaussian, Generalized Gaussian, Gamma, mixture of Gaussians, mixture of Gammas, ... ◮ Markovian: Gauss-Markov, GGM, ... ◮ Separable or Markovian with hidden variables (contours, region labels) ◮ Choice of the estimator and computational aspects ◮ MAP, Posterior mean, Marginal MAP ◮ MAP needs optimization algorithms ◮ Posterior mean needs integration methods ◮ Marginal MAP and Hyperparameter estimation need integration and optimization ◮ Approximations: ◮ Gaussian approximation (Laplace) ◮ Numerical exploration MCMC ◮ Variational Bayes (Separable approximation) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 4/21 3. Sparsity enforcing prior models ◮ Simple heavy tailed models: ◮ Generalized Gaussian, Double Exponential ◮ Student-t, Cauchy ◮ Elastic net ◮ Symmetric Weibull, Symmetric Rayleigh ◮ Generalized hyperbolic ◮ Hierarchical mixture models: ◮ Mixture of Gaussians ◮ Bernoulli-Gaussian ◮ Mixture of Gammas ◮ Bernoulli-Gamma ◮ Mixture of Dirichlet ◮ Bernoulli-Multinomial A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 5/21 4. Simple heavy tailed models • Generalized Gaussian, Double Exponential p(f|γ, β) = j GG(fj|γ, β) ∝ exp  −γ j |fj|β   β = 1 Double exponential or Laplace. 0 < β ≤ 1 are of great interest for sparsity enforcing. • Student-t and Cauchy models p(f|ν) = j St(fj|ν) ∝ exp  − ν + 1 2 j log 1 + f2 j /ν   Cauchy model is obtained when ν = 1. • Elastic net prior model p(f|ν) = j EN(fj|ν) ∝ exp  − j (γ1|fj| + γ2f2 j )   A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 6/21 5. Mixture models • Mixture of two Gaussians (MoG2) model p(f|λ, v1, v0) = j (λN(fj|0, v1) + (1 − λ)N(fj|0, v0)) • Bernoulli-Gaussian (BG) model p(f|λ, v) = j p(fj) = j (λN(fj|0, v) + (1 − λ)δ(fj)) • Mixture of Gammas p(f|λ, v1, v0) = j (λG(fj|α1, β1) + (1 − λ)G(fj|α2, β2)) • Bernoulli-Gamma model p(f|λ, α, β) = j [λG(fj|α, β) + (1 − λ)δ(fj)] A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 7/21 6. MAP, Joint MAP ◮ Inverse problems: g = Hf + ǫ ◮ Posterior law: p(f|θ, g) ∝ p(g|f, θ1) p(f|θ2) ◮ MAP: f = arg max f {p(f|θ, g)} = arg min f {J(f) = − ln p(f|θ, g)} ◮ Examples: Gaussian noise, Gaussian prior: J(f) = g − Hf 2 2 + λ f 2 2 Gaussian noise, Double Exponential prior: J(f) = g − Hf 2 2 + λ f 1 ◮ Full Bayesian: Joint Posterior: p(f, θ|g) ∝ p(g|f, θ1) p(f|θ2) p(θ) ◮ Joint MAP: (f, θ) = arg max (f,θ) {p(f, θ|g)} A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 8/21 7. Marginal MAP and PM estimates ◮ Joint posterior: p(f, θ|g) ◮ Marginal MAP: θ = arg maxθ {p(θ|g)} where p(θ|g) = p(f, θ|g) df = p(g|f, θ1) p(f|θ2) df and then: ◮ MAP: f = arg maxf p(f|θ, g) ◮ Posterior Mean: f = f p(f|θ, g) df ◮ EM and BEM Algorithms ◮ Variational Bayesian Approximation: Main idea: Approximate p(f, θ|g) by a simpler, for example a separable one: q(f, θ|g) = q1(f) q2(θ) and then continue computations with q1(f) and q2(θ). A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 9/21 8. Variational Bayesian Approximation (VBA) ◮ Joint posterior law: p(f, θ|g, M) = p(f, θ|g, M) p(g|M) = p(g|f, θ1, M) p(f|θ2, M) p(θ|M) p(g|M) where p(g|M) = p(f, θ, g|M) df dθ ◮ Approximate p(f, θ|g, M) by a simpler q(f, θ) by minimizing KL(q : p) = q ln q p = ln q p q ◮ Free energy: F(q) = ln p(f, θ, g|M) q(f, θ) q ◮ Link between the evidence of the model ln p(g|M), KL(q : p) and F(q): KL(q : p) = ln p(g|M) − F(q) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 10/21 Variational Bayesian Approximation ◮ Alternate optimization of F(q) or KL(q : p) with respect to q1 and q2 results in    q1(f) ∝ exp − ln p(f, θ, g) q2(θ) q2(θ) ∝ exp − ln p(f, θ, g) q1(f) ◮ q1 = δ(f − f), q2 = δ(θ − θ) → JMAP: θ = arg maxθ {p(f, θ|g)} f = arg maxf {p(f, θ|g)} ◮ q1 free but q2 = δ(θ − θ) ← Expectation-Maximization (EM): E stap: Q(θ) = ln p(f, θ, g) p(f|g,θ) M stap: θ = arg maxθ {Q(θ)} ◮ If p is in the exponential family, then q will be too. A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 11/21 Comparison between JMAP, BEM and VBA ◮ JMAP: θ −→ Optimize p(f, θ|g) with respect to f f −→ Optimize p(f, θ|g) with respect to θ θ −→ ✻ ◮ BEM: q1 −→ E: Q(θ) =< ln p(f, θ|g) >q1 M: θ = arg maxθ {Q(θ)} θ −→ Compute q1(f|θ; g) = p(f|θ; g) q1 −→ ✻ ◮ VBA: θ −→ q2 −→ Compute q1(f|θ; g) and f f −→ q1 −→ Compute q2(θ|f, g) and θ θ −→ q2 −→✻ ✻ A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 12/21 9. Hierarchical models and hidden variables ◮ All the mixture models and some of simple models can be modeled via hidden variables z. ◮ Example 1: Student-t model p(f|z) = j p(fj|zj) = j N(fj|0, 1/zj) ∝ exp −1 2 j zjf2 j p(zj|a, b) = G(zj|a, b) ∝ z (a−1) j exp [−bzj] with a = b = ν/2 ◮ Example 2: MoG model:    p(f|z) = j p(fj|zj) = j N fj|0, vzj ∝ exp −1 2 j f2 j vzj P(zj = 1) = λ, P(zj = 0) = 1 − λ ◮ With these models we have: p(f, z, θ|g) ∝ p(g|f, θ1) p(f|z, θ2) p(z|θ3) p(θ) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 13/21 10. Bayesian Computation and Algorithms ◮ Often, the expression of p(f, z, θ|g) is complex. ◮ Its optimization (for Joint MAP) or its marginalization or integration (for Marginal MAP or PM) is not easy ◮ Two main techniques: MCMC and Variational Bayesian Approximation (VBA) ◮ MCMC: Needs the expressions of the conditionals p(f|z, θ, g), p(z|f, θ, g), and p(θ|f, z, g) ◮ VBA: Approximate p(f, z, θ|g) by a separable one q(f, z, θ|g) = q1(f) q2(z) q3(θ) and do any computations with these separable ones. A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 14/21 11. Bayesian Variational Approximation ◮ Joint posterior law: p(f, z, θ|g, M) = p(f,z,θ|g,M) p(g|M) = p(g|f,θ1,M) p(f|z,θ2,M) p(z|θ3,M) p(θ|M) p(g|M) where p(g|M) = p(f, z, θ, g|M) df dz dθ ◮ Approximate p(f, z, θ|g, M) by a simpler q(f, z, θ) by minimizing KL(q : p) = q ln q p = ln q p q ◮ Free energy: F(q) = ln p(f, z, θ, g|M) q(f, z, θ) q ◮ Link between the evidence of the model ln p(g|M), KL(q : p) and F(q): KL(q : p) = ln p(g|M) − F(q) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 15/21 Bayesian Variational Approximation ◮ Alternate optimization of F(q) with respect to q1, q2 and q3 results in q(f) ∝ exp − ln p(f, z, θ, g) q(z)q(θ) q(z) ∝ exp − ln p(f, z, θ, g) q(f)q(θ) q(θ) ∝ exp − ln p(f, z, θ, g) q(f)q(z) ◮ If p is in the exponential family, then q will be too. ◮ Other separable decompositions: q(f, z, θ) = q1(f|z) q2(z) q3(θ) q(f, z, θ) = j q1j(fj|zj) q2j(zj) k q3k(θk) A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 16/21 12. BVA with Student-t priors Scale Mixture Model of Student-t: St(fj|ν) = ∞ 0 N(fj|0, 1 zj ) G(zj|ν/2, ν/2) dzj Hidden variables zj: p(f|z) = j p(fj|zj) = j N(fj|0, 1 zj ) ∝ exp −1 2 j zjf2 j p(zj|α, β) = G(zj|α, β) ∝ zj (α−1) exp [−βzj] with α = β = ν/2 Cauchy model is obtained when ν = 1: ◮ Graphical model: αz0 , βz0 ✲ ♥z ✲ f ✒✑ ✓✏ ❄ ✒✑ ✓✏ g ✒✑ ✓✏ ǫ ✲ ♥H ❅❘ ♥zǫ ✲αǫ0 , βǫ0 ✲ A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 17/21 BVA with Student-t priors Algorithm   Likelihood: p(g|f, zǫ) = N(g|Hf, 1 zǫ I) p(zǫ|αz0, βz0) = G(zǫ|αz0, βz0) Prior laws: p(f|z) = j N fj|0, 1 zj p(z|α0, β0) = j G(zj|α0, β0) Joint posterior law: p(f, z, zǫ|g|) ∝ p(g|f, zǫ)p(f|z)p(z) Approximation by: q(f, zǫ) = q1(f|zǫ) j q2(zj)q3(zǫ)    q1(f|µ, Σ) = N(f|µ, Σ) µ = λ q ΣH′ g Σ = ( λ q H′ H + Z)−1 with Z = diag [z] q2j(zj) = G(zj|αj, βj) αj = α00 + 1 2 βj = β00+ < f2 j > /2 q3(zǫ) = G(zǫ|αzǫ , βzǫ ), αzǫ = αz0 + (n + 1)/2 βzǫ = βz0 + 1 2 [ g 2 −2 f ′ q H′ g + H′ ff′ q H] < f >= µ, < ff′ >= Σ + µµ′ , < f2 j >= [Σ]jj + µ2 j , λ = αz βz , ˜zj = αj βj λ−→ z−→ q1(f|z, λ) = N (f, Σ) f = λΣH′ g Σ = (λH′ H + z−1 )−1 f−→ Σ−→ q2j (zj |f) = G(zj|αj , βj ) αj = α00 + n+1 2 βj = β00 + 1 2 f2 j q ˜zj = αj /βj f−→ Σ−→ ˜zj−→ q3(z|f) = G(z|αz , βz) αz = αz0 + n+1 2 βz = βz0 + 1 2 [ g 2 − 2 < f >′ H′ g + H′ < ff′ > H] λ = αz/βz λ−→ z−→ ✻ ✻ A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 18/21 13. Implementation issues ◮ In inverse problems, often we do not have access directly to the matrix H. But, we can compute: ◮ Forward operator : Hf −→ g g=direct(f,...) ◮ Adjoint operator : H′ g −→ f f=transp(g,...) ◮ For any particular application, we can always write two programs (direct & transp) corresponding to the application of these two operators. ◮ To compute f, we use a gradient based optimization algorithm which will use these operators. ◮ The main computational cost for BEM and VBA is the computation and inversion of the covariances. ◮ Complete factorization may reduce this to the computation of the diagonal elements of [H′ H] which can be done using (direct & transp). A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 19/21 14. Conclusions and Perspectives ◮ Inverse problems arise in many signal and image processing applications. ◮ We proposed a list of different prior models which can be used for sparsity enforcing for Baysian inversion. ◮ We classified these models in two categories: simple heavy tails and hierarchical mixture models ◮ We showed how to use these models for inverse problems where the desired solutions are sparse ◮ Different algorithms (JMAP, BEM, VBA) have been developed and their relative performances are compared. ◮ We use these models for inverse problems in different signal and image processing applications such as: ◮ Period estimation in biological time series ◮ X ray Computed Tomography, ◮ Signal deconvolution in Proteomic and molecular imaging ◮ Diffraction Optical Tomography ◮ Microwave Imaging, Acoustic imaging and sources localization ◮ Synthetic Aperture Radar (SAR) Imaging A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 20/21 15. References 1. A. Mohammad-Djafari, “Bayesian approach with prior models which enforce sparsity in signal and image processing,” EURASIP Journal on Advances in Signal Processing, vol. Special issue on Sparse Signal Processing, (2012). 2. S. Zhu, A. Mohammad-Djafari, H. Wang, B. Deng, X. Li and J. Mao J, “Parameter estimation for SAR micromotion target based on sparse signal representation,” EURASIP Journal on Advances in Signal Processing, vol. Special issue on Sparse Signal Processing, (2012). 3. N. Chu, J. Picheral and A. Mohammad-Djafari, “A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging,” IEEE International Symposium on Signal Processing and Information Technology pp 286–289, Bilbao, Spain, Dec14-17,2011 4. N. Bali and A. Mohammad-Djafari, “Bayesian Approach With Hidden Markov Modeling and Mean Field Approximation for Hyperspectral Data Analysis,” IEEE Trans. on Image Processing 17:A. Mohammad-Djafari, GSI 2013: Geometric Science of Information, Aug 28-30, 2013, Mines ParisTech, France, 21/21