## Nonparametric Information Geometry

28/08/2013
OAI : oai:www.see.asso.fr:2552:4886
DOI :
 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

## Métriques

105
10
1.27 Mo
application/pdf
bitcache://a2510397ff96b4a2d35d0243d00d4c9bcfc8bc8e

## Licence

<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://datacite.org/schema/kernel-4"
<identifier identifierType="DOI">10.23723/2552/4886</identifier><creators><creator><creatorName>Giovanni Pistone</creatorName></creator></creators><titles>
<title>Nonparametric Information Geometry</title></titles>
<publisher>SEE</publisher>
<publicationYear>2013</publicationYear>
<resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
<date dateType="Created">Mon 16 Sep 2013</date>
<date dateType="Updated">Sun 25 Dec 2016</date>
<date dateType="Submitted">Tue 15 May 2018</date>
</dates>
<alternateIdentifiers>
<alternateIdentifier alternateIdentifierType="bitstream">a2510397ff96b4a2d35d0243d00d4c9bcfc8bc8e</alternateIdentifier>
</alternateIdentifiers>
<formats>
<format>application/pdf</format>
</formats>
<version>27115</version>
<descriptions>
<description descriptionType="Abstract"></description>
</descriptions>
</resource>
.

Nonparametric Information Geometry http://www.giannidiorestino.it/GSI2013-talk.pdf Giovanni Pistone de Castro Statistics Initiative Moncalieri, Italy August 30, 2013 Abstract The diﬀerential-geometric structure of the set of positive densities on a given measure space has raised the interest of many mathematicians after the discovery by C.R. Rao of the geometric meaning of the Fisher information. Most of the research is focused on parametric statistical models. In series of papers by author and coworkers a particular version of the nonparametric case has been discussed. It consists of a minimalistic structure modeled according the theory of exponential families: given a reference density other densities are represented by the centered log likelihood which is an element of an Orlicz space. This mappings give a system of charts of a Banach manifold. It has been observed that, while the construction is natural, the practical applicability is limited by the technical diﬃculty to deal with such a class of Banach spaces. It has been suggested recently to replace the exponential function with other functions with similar behavior but polynomial growth at inﬁnity in order to obtain more tractable Banach spaces, e.g. Hilbert spaces. We give ﬁrst a review of our theory with special emphasis on the speciﬁc issues of the inﬁnite dimensional setting. In a second part we discuss two speciﬁc topics, diﬀerential equations and the metric connection. The position of this line of research with respect to other approaches is brieﬂy discussed. References in • GP, GSI2013 Proceedings. A few typos corrected in arXiv:1306.0480; • GP, arXiv:1308.5312 • If µ1, µ2 are equivalent measures on the same sample space, a statistical model has two representations L1(x; θ)µ1(dx) = L2(x; θ)µ2(dx). • Fisher’s score is a valid option s(x; θ) = d dθ ln Li (x; θ), i = 1, 2, and Eθ [sθ] = 0. • Each density q equivalent to p is of the form q(x) = ev(x) p(x) Ep [ev ] = exp (v(x) − ln (Ep [ev ])) p(x), where v is a random variable such that Ep [ev ] < +∞. • To avoid borderline cases, we actually require Ep eθv < +∞, θ ∈ I open ⊃ [0, 1]. • Finally, we require Ep [v] = 0. Plan Part I Exponential manifold Part II Vector bundles Part III Deformed exponential Part I Exponential manifold Sets of densities Deﬁnition P1 is the set of real random variables f such that f dµ = 1, P≥ the convex set of probability densities, P> the convex set of strictly positive probability densities: P> ⊂ P≥ ⊂ P1 • We deﬁne the (diﬀerential) geometry of these spaces in a way which is meant to be a non-parametric generalization of Information Geometry • We try to avoid the use of explicit parameterization of the statistical models and therefore we use a parameter free presentation of diﬀerential geometry. • We construct a manifold modeled on an Orlicz space. • We look for applications to applications intrisically non parametric, i.e. Statistical Physics, Information Theory, Optimization, Filtering. Banach manifold Deﬁnition 1. Let P be a set, E ⊂ P a subset, B a Banach space. A 1-to-1 mapping s : E → B is a chart if the image s(E) = S ⊂ B is open. 2. Two charts s1 : E1 → B1, s2 : E2 → B2, are both deﬁned on E1 ∩ E2 and are compatible if s1(E1 ∩ E2) is an open subset of B1 and the change of chart mapping s2 ◦ s−1 1 : s1(E1 ∩ E2) s−1 1 // E1 ∩ E2 s2 // s2(E1 ∩ E2) is smooth. 3. An atlas is a set of compatible charts. • Condition 2 implies that the model spaces B1 and B2 are isomorphic. • In our case: P = P>, the atlas has a chart sp for each p ∈ P> such that sp(p) = 0 and two domains Ep1 and Ep2 are either equal or disjoint. Charts on P> Model space Orlicz Φ-space If φ(y) = cosh y − 1, the Orlicz Φ-space LΦ (p) is the vector space of all random variables such that Ep [Φ(αu)] is ﬁnite for some α > 0. Properties of the Φ-space 1. u ∈ LΦ (p) if, and only if, the moment generating function α → Ep [eαu ] is ﬁnite in a neighborhood of 0. 2. The set S≤1 = u ∈ LΦ (p) Ep [Φ(u)] ≤ 1 is the closed unit ball of a Banach space with norm u p = inf ρ > 0 Ep Φ u ρ ≤ 1 . 3. u p = 1 if either Ep [Φ(u)] = 1 or Ep [Φ(u)] < 1 and Ep Φ u ρ = ∞ for ρ < 1. If u p > 1 then u p ≤ Ep [Φ(u)]. In particular, lim u p→∞ Ep [Φ (u)] = ∞. Example: boolean state space • In the case of a ﬁnite state space, the moment generating function is ﬁnite everywhere, but its computation can be challenging. • Boolean case: Ω = {+1, −1} n , uniform density p(x) = 2−n , x ∈ Ω. A generic real function on Ω has the form u(x) = α∈L ˆu(α)xα , with L = {0, 1} n , xα = n i=1 xαi i , ˆu(α) = 2−n x∈Ω u(x)xα . • The moment generating function of u under the uniform density p is Ep etu = B∈B(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)), where B(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2. • Ep [Φ(tu)] = B∈B0(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)) − 1, where B0(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2 and α∈Supp ˆu α = 0. Example : the sphere is not smooth in general • p(x) ∝ (a + x)−3 2 e−x , x, a > 0. • For the random variable u(x) = x, the function Ep [Φ(αu)] = 1 ea Γ −1 2, a ∞ 0 (a+x)−3 2 e−(1−α)x + e−(1+α)x 2 dx−1 is convex lower semi-continuous on α ∈ R, ﬁnite for α ∈ [−1, 1], inﬁnite otherwise, hence not smooth. −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 \alpha E_p(\Phi(\alphau) q q Isomorphism of LΦ spaces Theorem LΦ (p) = LΦ (q) as Banach spaces if p1−θ qθ dµ is ﬁnite on an open neighborhood I of [0, 1]. It is an equivalence relation p q and we denote by E(p) the class containing p. The two spaces have equivalent norms Proof. Assume u ∈ LΦ (p) and consider the convex function C : (s, θ) → esu p1−θ qθ dµ. The restriction s → C(s, 0) = esu p dµ is ﬁnite on an open neighborhood Jp of 0; the restriction θ → C(0, θ) = p1−θ qθ dµ is ﬁnite on the open set I ⊃ [0, 1]. hence, there exists an open interval Jq 0 where s → C(s, 1) = esu q dµ is ﬁnite. q q J_p J_q I e-charts Deﬁnition (e-chart) For each p ∈ P>, consider the chart sp : E(p) → LΦ 0 (p) by q → sp(q) = log q p + D(p q) = log q p − Ep log q p For u ∈ LΦ 0 (p) let Kp(u) = ln Ep [eu ] the cumulant generating function of u and let Sp the interior of the proper domain. Deﬁne ep : Sp u → eu−Kp(u) · p ep ◦ sp is the identity on E(p) and sp ◦ ep is the identity on Sp. Theorem (Exponential manifold) {sp : E (p)|p ∈ P>} is an aﬃne atlas on P>. Cumulant functional • The divergence q → D(p q) is represented in the chart centered at p by Kp(u) = log Ep [eu ], where q = eu−Kp(u) · p, u ∈ Bp = LΦ 0 (p). • Kp : Bp → R≥ ∪ {+∞} is convex and its proper domain Dom (Kp) contains the open unit ball of Tp. • Kp is inﬁnitely Gˆateaux-diﬀerentiable on the interior Sp of its proper domain and analytic on the unit ball of Bp. • For all v, v1, v2, v3 ∈ Bp the ﬁrst derivatives are: d Kpuv = Eq [v] d2 Kpu(v1, v2) = Covq (v1, v2) d3 Kpu(v1, v2, v3) = Covq(v1, v2, v3) Change of coordinate The following statements are equivalent: 1. q ∈ E (p); 2. p q; 3. E (p) = E (q); 4. ln q p ∈ LΦ (p) ∩ LΦ (q). 1. If p, q ∈ E(p) = E(q), the change of coordinate sq ◦ ep(u) = u − Eq [u] + ln p q − Eq ln p q is the restriction of an aﬃne continuous mapping. 2. u → u − Eq [u] is an aﬃne transport from Bp = LΦ 0 (p) unto Bq = LΦ 0 (q). Summary p q =⇒ E (p) sp // Sp sq◦s−1 p  I // Bp d(sq◦s−1 p )  I // LΦ (p) E (q) sq // Sq I // Bq I // LΦ (q) • If p q, then E (p) = E (q) and LΦ (p) = LΦ (q). • Bp = LΦ 0 (p), Bq = LΦ 0 (q) • Sp = Sq and sq ◦ s−1 p : Sp → Sq is aﬃne sq ◦ s−1 p (u) = u − Eq [u] + ln p q − Eq ln p q • The tangent application is d(sq ◦ s−1 p )(v) = v − Eq [v] (does not depend on p) Duality Young pair (N–function) • φ−1 = φ∗, • Φ(x) = |x| 0 φ(u) du • Φ∗(y) = |y| 0 φ∗(v) dv • |xy| ≤ Φ(x) + Φ∗(y) 0 1 2 3 4 5 050100150 v phi φ∗(u) φ(v) Φ∗(x) Φ(y) ln (1 + u) ev − 1 (1 + |x|) ln (1 + |x|) − |x| e|y| − 1 − |y| sinh−1 u sinh v |x| sinh−1 |x| − √ 1 + x2 + 1 cosh y − 1 • LΦ∗ (p) × LΦ (p) (v, u) → u, v p = Ep [uv] • u, v p ≤ 2 u Φ∗,p v Φ,p • (LΦ∗ (p)) = LΦ (p) because Φ∗(ax) ≤ a2 Φ∗(x) if a > 1 (∆2). m-charts For each p ∈ P>, consider a second type of chart on f ∈ P1 : ηp : f → ηp(f ) = f p − 1 Deﬁnition (Mixture manifold) The chart is deﬁned for all f ∈ P1 such that f /p − 1 belongs to ∗ Bp = LΦ+ 0 (p). The atlas (ηp : ∗ E (p)), p ∈ P> deﬁnes a manifold on P1 . If the sample space is not ﬁnite, such a map does not deﬁne charts on P>, nor on P≥. Example: N(µ, Σ), det Σ = 0 I G = (2π)− n 2 (det Σ)− 1 2 exp − 1 2 (x − µ)T Σ−1 (x − µ) µ ∈ Rn , Σ ∈ Symn + . ln f (x) f0(x) = − 1 2 ln (det Σ) − 1 2 (x − µ)T Σ−1 (x − µ) + 1 2 xT x = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) Ef0 ln f f0 = 1 2 (n − Tr Σ−1 ) − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) u(x) = ln f (x) f0(x) − Ef0 ln f f0 = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 (n − Tr Σ−1 ) Kf0 (u) = − 1 2 (n − Tr Σ−1 ) + 1 2 µT Σ−1 µ + 1 2 ln (det Σ) Example: N(µ, Σ), det Σ = 0 II G as a sub-manifold of P> G = x → eu(x)−K(u) f0(x) u ∈ H1,2 ∩ Sf0 • H1,2 is the Hemite space of total degree 1 and 2, that is the vector space generated by the Hermite polynomials X1, . . . , Xn, (X2 1 − 1), . . . , (X2 n − 1), X1X2, . . . , Xn−1Xn • If the matrix S, Sii = βii − 1 2 , Sij