Translations in the exponential Orlicz space with Gaussian weight

Publication GSI2017
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit


We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density.

Translations in the exponential Orlicz space with Gaussian weight


application/pdf Translations in the exponential Orlicz space with Gaussian weight Giovanni Pistone
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density.
Translations in the exponential Orlicz space with Gaussian weight
application/pdf Translations in the exponential Orlicz space with Gaussian weight (slides)


Voir la vidéo


231.38 Ko


Creative Commons Aucune (Tous droits réservés)


Sponsors Platine


Sponsors Bronze


Sponsors scientifique





<resource  xmlns:xsi=""
        <identifier identifierType="DOI">10.23723/17410/22636</identifier><creators><creator><creatorName>Giovanni Pistone</creatorName></creator></creators><titles>
            <title>Translations in the exponential Orlicz space with Gaussian weight</title></titles>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Thu 14 Mar 2019</date>
	    <alternateIdentifier alternateIdentifierType="bitstream">bd7c9241e5fa2bbbf5ea6d3a317051d016d8eb7a</alternateIdentifier>
            <description descriptionType="Abstract">We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density.

Translations in the exponential Orlicz space with Gaussian weight Giovanni Pistone de Castro Statistics, Collegio Carlo Alberto, Moncalieri, Italy Abstract. We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density. 1 Introduction On the Gaussian probability space (Rn , B, M · `), M being the standard Gaus- sian density and ` the Lebesgue measure, we consider densities of the form eM (U) = exp (U − KM (U)) · M, where U belongs to the exponential Orlicz space L(cosh −1) (M), EM [U] = 0, and KM (U) is constant [8, 7]. An application to the homogeneous Boltzmann equation has been discussed in [5]. The main limitation of the standard version of Information Geometry is its inability to deal with the structure of the sample space as it provides a geometry of the “parameter space” only. As a first step to overcome that limitation, we want to study the effect of a space translation τh, h ∈ Rn , on the exponential probability density eM (U). Such a model has independent interest and, moreover, we expect such a study to convey informations about the case where the density eM (U) admits directional derivatives. The present note is devoted to the detailed discussion of the some results concerning the translation model that have been announced at the IGAIA IV Conference, Liblice CZ on June 2016. All results are given in Sec. 2, in particular the continuity result in Prop. 4. The final Sec. 3 gives some pointers to further research work to be published elsewhere. 2 Gauss-Orlicz spaces and translations The exponential space L(cosh −1) (M) and the mixture space L(cosh −1)∗ (M) are the Orlicz spaces associated the Young functions (cosh −1) and its convex con- jugate (cosh −1)∗, respectively [6]. They are both Banach spaces and the second one has the ∆2-property, because of the inequality (cosh −1)∗(ay) ≤ max(1, a2 )(cosh −1)∗(y), a, y ∈ R . The closed unit balls are  f Z φ(f(x)) M(x)dx ≤ 1  with φ = cosh −1 and φ = (cosh −1)∗, respectively. Convergence to 0 in norm of a sequence gn, n ∈ N holds if, and only if, for all ρ > 0 one has lim sup n→∞ Z φ(ρgn(x)) M(x)dx ≤ 1 . If 1 < a < ∞, the following inclusions hold L∞ (M) ,→ L(cosh −1) (M) ,→ La (M) ,→ L(cosh −1)∗ (M) ,→ L1 (M) , and the restrictions to the ball ΩR = {x ∈ Rn ||x| < R}, L(cosh −1) (M) → La (ΩR), L(cosh −1)∗ (M) → L1 (ΩR) , are continuous. The exponential space L(cosh −1) (M) contains all functions f ∈ C2 (Rn ; R) whose Hessian is uniformly bounded in operator’s norm. In particular, it contains all polynomials with degree up to 2, hence all functions which are bounded by such a polynomial. The mixture space L(cosh −1)∗ (M) contains all random variables f : Rd → R which are bounded by a polynomial, in particular, all polynomials. Let us review those properties of the exponential function on the space L(cosh −1) (M) that justify our definition of non-parametric exponential model as the set of densities eM (U) = exp (U − KM (U)) · M, where U has zero M- expectation and belongs to the interior SM of the proper domain of the partition functional ZM (U) = EM  eU  . Proposition 1. 1. The functionals ZM and KM = log ZM are both convex. 2. The proper domain of both ZM and KM contains the open unit ball of L(cosh −1) (M), hence its interior SM is nonempty. 3. The functions ZM and KM are both Fréchet differentiable on SM . Proof. Statements 1–3 above are all well known. Nevertheless, we give the proof of the differentiability. We have 0 ≤ exp (U + H) − exp (U) − exp (U) H = Z 1 0 (1 − s) exp (U + sH) H2 ds . For all U, U + H ∈ SM , choose α > 1 such that αU ∈ SM . We have 0 ≤ ZM (U+H)−ZM (U)−EM [exp (U) H] = Z 1 0 (1−s)EM  exp (U + sH) H2  ds , where the derivative term H 7→ EM [exp (U) H] is continuous at U because |EM [exp (U) H]| ≤ EM [exp (αU)] 1/α EM h |H| α/(α−1) i(α−1)/α ≤ const × EM [exp (αU)] 1/α kHkL(cosh −1)(M) . The remainder term is bounded by |ZM (U + H) − ZM (U) − EM [exp (U) H]| = Z 1 0 (1 − s)EM  exp (U + sH) H2  ds ≤ EM  eαU 1/α Z 1 0 (1 − s)EM  exp  s α α − 1 H  H2 α α−1 (α−1)/α ds ≤ const × EM  H4 α α−1 (α−1)/2α Z 1 0 (1 − s)EM  exp  s 2α α − 1 H (α−1)/2α ds . We have EM  exp  s 2α α − 1 H  ≤ 2  EM  (cosh −1)  s 2α α − 1 H  + 1  ≤ 4 if kHkL(cosh −1)(M) ≤ (α − 1)/2α. Under this condition, we have |ZM (U + H) − ZM (U) − EM [exp (U) H]| ≤ const × kHk 2 L4α/(α−1)(M) ≤ const × kHk 2 L(cosh −1)(M) 0 where the constant depends on U. u t The space L(cosh −1) (M) is neither separable nor reflexive. However, we have the following density property for the bounded point-wise convergence. The proof uses a form of the Monotone-Class argument [3, 22.3]. Let Cc (Rn ) and C∞ c (Rn ) respectively denote the space of continuous real functions with compact support and its sub-space of infinitely-differentiable functions. Proposition 2. For each f ∈ L(cosh −1) (M) there exists a nonnegative function h ∈ L(cosh −1) (M) and a sequence fn ∈ C∞ c (Rn ) with |fn| ≤ h, n = 1, 2, . . . , such that limn→∞ fn = f a.e. As a consequence, C∞ c (Rn ) is weakly dense in L(cosh −1) (M). Proof. Before starting the proof, let us note that L(cosh −1) (M) is stable un- der bounded a.e. convergence. Assume fn, h ∈ L(cosh −1) (M) with |fn| ≤ h, n = 1, 2, . . . and limn→∞ fn = f a.e. By definition of h ∈ L(cosh −1) (M), for α = khk −1 L(cosh −1)(M) we have the bound EM [(cosh −1)(αh)] ≤ 1. The sequence of functions (cosh −1)(αfn), n = 1, 2, . . . , is a.e. convergent to (cosh −1)(αf) and it is bounded by the integrable function (cosh −1)(αh). The inequality EM [(cosh −1)(αf)] ≤ 1 follows now by dominated convergence and is equiv- alent to kfkL(cosh −1)(M) ≤ khkL(cosh −1)(M). By taking a converging sequences (fn) in C∞ c (Rn ) we see that the condition in the proposition is sufficient. Con- versely, let L be the set of all functions f ∈ L(cosh −1) (M) such that there exists a sequence (fn)n∈N in Cc(Rn ) which is dominated by a function h ∈ L(cosh −1) (M) and converges to f point-wise. The set L contains the constant functions and Cc(Rn ) itself. The set L is a vector space: if f1 , f2 ∈ L and both f1 n → f1 a.s. with f1 n ≤ h1 and f2 n → f2 point-wise with h2 n ≤ h2 , then α1f1 n + α2f2 n → α1f1 + α2f2 point-wise with α1f1 n + α2f2 n ≤ |α1| h1 + |α2| h2 . Moreover, L is closed under the min operation: if f1 , f2 ∈ L, with both f1 n → f1 with g1 n ≤ h1 and f2 n → f2 with g2 n ≤ h2 , then f1 n ∧ f2 n → f1 ∧ f2 and f1 n ∧ f2 n ≤ h1 ∧ h2 ∈ L(cosh −1) (M). L is closed for the maximum too, because f1 ∨f2 = − (−f1 ) ∧ (−f2 )  . We come now to the application of the Monotone- Class argument. As 1f>a = ((f − a) ∨ 0) ∧ 1 ∈ L, each element of L is the point-wise limit of linear combinations of indicator functions in L. Consider the class C of sets whose indicator belongs to L. C is a σ-algebra because of the closure properties of L and contains all open bounded rectangles of Rn because they are all of the form {f > 1} for some f ∈ Cc (Rn ). Hence C is the Borel σ-algebra and L is the set of Borel functions which are bounded by an ele- ment of L(cosh −1) (M), namely L = L(cosh −1) (M). To conclude, note that each g ∈ Cc (Rn ) is the uniform limit of a sequence in C∞ c (Rn ). The last statement is proved by bounded convergence. u t Let us discuss some consequences of this result. Let be given u ∈ SM and consider the exponential family p(t) = exp (tu − KM (tu)) · M, t ∈] − 1, 1[. From Prop. 2 we get a sequence (fn)n∈N in C∞ c (Rn ) and a bound h ∈ L(cosh −1) (M) such that fn → u point-wise and |fn| , |u| ≤ h. As SM is open and contains 0, we have αh ∈ SM for some 0 < α < 1. For each t ∈] − α, α[, exp (tfn) → exp (tu) point-wise and exp (tf) ≤ exp (αh) with EM [E (αh)] < ∞. It follows that KM (tfn) → K(tu), so that we have the point-wise convergence of the density pn(t) = exp (tfn − KM (tfn))·M to the density p(t). By Scheffé’s lemma, the convergence holds in L1 (Rn ). In particular, for each φ ∈ C∞ c (Rn ), we have the convergence Z ∂iφ(x)pn(x; t) dx → Z ∂iφ(x)p(x; t) dx, n → ∞ . for all t small enough. By computing the derivatives, we have Z ∂iφ(x)pn(x; t) dx = − Z φ(x)∂i  etfn(x)−KM (tfn) M(x)  dx = Z φ(x) (xi − t∂ifn(x)) pn(x; t) dx , that is, (Xi − t∂ifn) pn(t) → −∂ip(t) in the sense of (Schwartz) distributions. It would be of interest to discuss the possibility of the stronger convergence of pn(t) in L(cosh −1)∗ (M), but we do follow this development here. The norm convergence of the point-wise bounded approximation will not hold in general. Consider the following example. The function f(x) = |x| 2 belongs in L(cosh −1) (M), but for the tails fR(x) = (|x| > R) |x| 2 we have Z (cosh −1)(−1 fR(x)) M(x)dx ≥ 1 2 Z |x|>R e−1 |x|2 M(x)dx = +∞, if  ≤ 2 , hence there is no convergence to 0. However, the truncation of f(x) = |x| does converge. This, together with Prop. 2, suggests the following variation of the classical definition of Orlicz class. Definition 1. The exponential class, C (cosh −1) c (M), is the closure of C∞ c (Rn ) in the space L(cosh −1) (M). Proposition 3. Assume f ∈ L(cosh −1) (M) and write fR(x) = f(x)(|x| > R). The following conditions are equivalent: 1. The real function ρ 7→ R (cosh −1)(ρf(x)) M(x)dx is finite for all ρ > 0. 2. f is the limit in L(cosh −1) (M)-norm of a sequence of bounded functions. 3. f ∈ C (cosh −1) c (M). Proof. (1) ⇔ (2) This is well known, but we give a proof for sake of clarity. We can assume f ≥ 0 and consider the sequence of bounded functions fn = f ∧ n, n = 1, 2, . . . . We have for all ρ > 0 that limn→∞(cosh −1)(ρ(f − fn)) = 0 point-wise and (cosh −1)(ρ(f − fn))M ≤ (cosh −1)(ρ(f)M which is integrable by assumption. Hence 0 ≤ lim sup n→∞ Z (cosh −1)(ρ(f(x) − fn(x)))M(x) dx ≤ Z lim sup n→∞ (cosh −1)(ρ(f(x) − fn(x)))M(x) dx = 0 , which in turn implies limn→∞ kf − fnkL(cosh −1)(M) = 0. Conversely, observe first that we have from the convexity of (cosh −1) that 2(cosh −1)(ρ(x + y)) ≤ (cosh −1)(2ρx) + (cosh −1)(2ρy) . It follows that, for all ρ > 0 and n = 1, 2, . . . , we have 2 Z (cosh −1)(ρf(x))M(x) dx ≤ Z (cosh −1)(2ρ(f(x) − fn(x)))M(x) dx + Z (cosh −1)(2ρfn(x))M(x) dx , where the lim supn→∞ of the first term of the RHS is bounded by 1 because of the assumption of strong convergence, while the second term is bounded by (cosh −1)(2ρn). Hence the LHS is finite for all ρ > 0. (2) ⇒ (3) Assume first f bounded and use Prop. 2 to find a point-wise approx- imation fn ∈ C0(Rn ), n ∈ N, of f together with a dominating function |fn(x)| ≤ h(x), h ∈ L(cosh −1) (M). As f is actually bounded, we can assume h to be equal to the constant bounding f. We have limn→∞(cosh −1)(ρ(f − fn)) = 0 point-wise, and (cosh −1)(ρ(f − fn)) ≤ (cosh −1)(2ρh). By domi- nated convergence we have limn→∞ R (cosh −1)(ρ(f(x)−fn(x)))M(x) dx = 0 for all ρ > 0, which implies the convergence limn→∞ kf − fnkL(cosh −1)(M) = 0. Because of (2), we have the desired result. (3) ⇒ (2) Obvious from Cc (Rn ) ⊂ L∞ (M). u t We discuss now properties of translation operators in a form adapted to the exponential space L(cosh −1) (M). Define τhf(x) = f(x − h), h ∈ Rn . Proposition 4 (Translation by a vector). 1. For each h ∈ Rn , the mapping f 7→ τhf is linear from L(cosh −1) (M) to itself and kτhfkL(cosh −1)(M) ≤ 2 kfkL(cosh −1)(M) if |h| ≤ √ log 2. 2. The transpose of τh is defined on L(cosh −1)∗ (M) by hτhf, giM = hf, τ∗ hgiM , f ∈ L(cosh −1) (M), and is given by τ∗ hg(x) = e−h·x+|h|2 /2 τ−hg(x). For the dual norm, the bound kτ∗ hgkL(cosh −1)(M)∗ ≤ 2 kgkL(cosh −1)(M)∗ holds if |h| ≤ √ log 2. 3. If f ∈ C (cosh −1) c (M) then τhf ∈ C (cosh −1) c (M), h ∈ Rn and the mapping Rn : h 7→ τhf is continuous in L(cosh −1) (M). Proof. 1. Let us first prove that τhf ∈ L(cosh −1) (M). It is enough to consider the case kfkL(cosh −1)(M) ≤ 1. For each ρ > 0, with Φ = cosh −1, we have Z Φ(ρτhf(x)) M(x)dx = e− 1 2 |h|2 Z e−z·h Φ(ρf(z)) M(z)dz , hence, using the elementary inequality Φ(u)2 ≤ Φ(2u)/2, we obtain Z Φ(ρτhf(x)) M(x)dx ≤ e− 1 2 |h|2 Z e−2z·h M(z)dz 1 2 Z Φ2 (ρf(z)) M(z)dz 1 2 ≤ 1 √ 2 e |h|2 2 Z Φ(2ρf(z))M(z) dz 1 2 . Take ρ = 1/2 to get EM  Φ τh 1 2 f(x)  ≤ e |h|2 2 / √ 2, which in particular implies f ∈ L(cosh −1) (M). Moreover, kτhfkL(cosh −1)(M) ≤ 2 if e |h|2 2 ≤ √ 2. 2. The computation of τ∗ h is hτhf, giM = Z f(x − h)g(x) M(x)dx = Z f(x)g(x + h)M(x + h) dx = Z f(x)e−h·x− |h|2 2 τ−hg(x) M(x)dx = hf, τ∗ hgiM . If |h| ≤ √ log 2, kτ∗ hgk(L(cosh −1)(M))∗ = sup n hτhf, giM kfkL(cosh −1)(M) ≤ 1 o ≤ sup n kτhfkL(cosh −1)(M) kgk(L(cosh −1)(M))∗ kfkL(cosh −1)(M) ≤ 1 o ≤ 2 kgk(L(cosh −1)(M))∗ . 3. For each ρ > 0 we have found that EM [Φ(ρτhf)] ≤ 1 √ 2 e |h|2 2 Z Φ(2ρf(z))M(z) dz 1 2 where the right-end-side if finite for all ρ if f ∈ C (cosh −1) c (M). It follows that τhf ∈ C (cosh −1) c (M). Recall that f ∈ Cc (Rn ), implies τhf ∈ Cc (Rn ) and limh→0 τhf = f in the uniform topology. Let fn be a sequence in Cc (Rn ) that converges to f in L(cosh −1) (M)-norm. Let |h| ≤ √ log 2 and let A be positive and Φ(A) = 1. kτhf − fkL(cosh −1)(M) = kτh(f − fn) + (τhfn − fn) − (f − fn)kL(cosh −1)(M) ≤ kτh(f − fn)kL(cosh −1)(M) + kτhfn − fnkL(cosh −1)(M) + kf − fnkL(cosh −1)(M) ≤ 2 kf − fnkL(cosh −1)(M) + A−1 kτhfn − fnk∞ + kf − fnkL(cosh −1)(M) ≤ 3 kf − fnkL(cosh −1)(M) + A−1 kτhfn − fnk∞ , which implies the desired limit at 0. The continuity at a generic point follows from the continuity at 0 and the semigroup property, lim k→h kτkf − τhfkL(cosh −1)(M) = lim k−h→0 kτk−h(τhf) − τhfkL(cosh −1)(M) = 0 . u t We conclude by giving, without proof, the corresponding result for a trans- lation by a probability measure µ, namely τµf(x) = R f(x−y)µ(dy). We denote by Pe the set of probability measures µ such that h 7→ e 1 2 |h|2 is integrable for example, µ could be a normal with variance σ2 I and σ2 < 1, or µ could have a bounded support. Proposition 5 (Translation by a probability). Let µ ∈ Pe. 1. The mapping f 7→ τµf is linear and bounded from L(cosh −1) (M) to itself. If, moreover, R e|h|2 /2 µ(dh) ≤ √ 2, then its norm is bounded by 2. 2. If f ∈ C (cosh −1) c (M) then τµf ∈ C (cosh −1) c (M). The mapping Pe : µ 7→ τµf is continuous at δ0 from the weak convergence to the L(cosh −1) (M) norm. We can use the previous proposition to show the existence of sequences of mollifiers. A bump function is a non-negative function ω in C∞ c (Rn ) such that R ω(x) dx = 1. It follows that R λ−n ω(λ−1 x) dx = 1, λ > 0 and the family of mollifiers ωλ(dx) = λ−n ω(λ−1 x)dx converges weakly to the Dirac mass at 0 as λ ↓ 0, so that for all f ∈ C (cosh −1) c (M), the translations τωλ f ∈ C∞ c (Rn ) and convergence to f in L(cosh −1) (M) holds for λ → 0 . 3 Conclusions We have discussed the density for the bounded point-wise convergence of the space of smooth functions C∞ c (Rn ) in the exponential Orlicz space with Gaus- sian weight L(cosh −1) (M). The exponential Orlicz class C (cosh −1) c (M) has been defined as the norm closure of the space of smooth functions. The continuity of translations holds in the latter space. The continuity of translation is the first step in the study of differentiability in the exponential Gauss-Orlicz space. The aim is to apply non-parametric ex- ponential models to the study of Hyvärinen divergence [4, 5] and the projection problem for evolution equations [1, 2]. A preliminary version of the Gauss-Orlicz- Sobolev theory has been published in the second part of [5]. Acknowledgments The author thanks Bertrand Lods (Università di Torino and Collegio Carlo Alberto, Moncalieri) for his comments and acknowledges the support of de Castro Statistics and Collegio Carlo Alberto, Moncalieri. He is a member of GNAMPA-INDAM. References 1. Brigo, D., Hanzon, B., Le Gland, F.: Approximate nonlinear filtering by projection on exponential manifolds of densities. Bernoulli 5(3), 495–534 (1999) 2. Brigo, D., Pistone, G.: Projection based dimensionality reduction for measure valued evolution equations in statistical manifolds. In: Nielsen, F., Critchley, F., Dodson, C. (eds.) Computational Information Geometry. For Image and Signal Processing, pp. 217–265. Signals and Communication Technology, Springer (2017) 3. Dellacherie, C., Meyer, P.A.: Probabilités et potentiel. Chapitres I à IV. Édition entièrment refondue. Hermann (1975) 4. Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005) 5. Lods, B., Pistone, G.: Information geometry formalism for the spatially homoge- neous Boltzmann equation. Entropy 17(6), 4323–4363 (2015) 6. Musielak, J.: Orlicz spaces and modular spaces, Lecture Notes in Mathematics, vol. 1034. Springer-Verlag (1983) 7. Pistone, G.: Nonparametric information geometry. In: Nielsen, F., Barbaresco, F. (eds.) Geometric science of information, Lecture Notes in Comput. Sci., vol. 8085, pp. 5–36. Springer, Heidelberg (2013) 8. Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Statist. 23(5), 1543–1561 (October 1995)