Information Geometry Under Monotone Embedding. Part II: Geometry

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22621
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

The rho-tau embedding of a parametric statistical model defines both a Riemannian metric, called “rho-tau metric”, and an alpha family of rho-tau connections. We give a set of equivalent conditions for such a metric to become Hessian and for the ±1-connections to be dually flat. Next we argue that for any choice of strictly increasing functions ρ(u) and τ (u) one can construct a statistical model which is Hessian and phi-exponential. The metric derived from the escort expectations is conformally equivalent with the rho-tau metric.

Information Geometry Under Monotone Embedding. Part II: Geometry

Collection

application/pdf Information Geometry Under Monotone Embedding. Part II: Geometry Jan Naudts, Jun Zhang
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

The rho-tau embedding of a parametric statistical model defines both a Riemannian metric, called “rho-tau metric”, and an alpha family of rho-tau connections. We give a set of equivalent conditions for such a metric to become Hessian and for the ±1-connections to be dually flat. Next we argue that for any choice of strictly increasing functions ρ(u) and τ (u) one can construct a statistical model which is Hessian and phi-exponential. The metric derived from the escort expectations is conformally equivalent with the rho-tau metric.
Information Geometry Under Monotone Embedding. Part II: Geometry
application/pdf Information Geometry Under Monotone Embedding. Part II: Geometry (slides)

Média

Voir la vidéo

Métriques

0
0
100.01 Ko
 application/pdf
bitcache://3dd5209446e2a3969544cf06b34b451c4c131b41

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22621</identifier><creators><creator><creatorName>Jan Naudts</creatorName></creator><creator><creatorName>Jun Zhang</creatorName></creator></creators><titles>
            <title>Information Geometry Under Monotone Embedding. Part II: Geometry</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Hessian geometry</subject><subject>dually-flat</subject><subject>rho-tau embedding</subject><subject>phi-exponential family</subject><subject>escort probability</subject></subjects><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Mon 17 Dec 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">3dd5209446e2a3969544cf06b34b451c4c131b41</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37373</version>
        <descriptions>
            <description descriptionType="Abstract">
The rho-tau embedding of a parametric statistical model defines both a Riemannian metric, called “rho-tau metric”, and an alpha family of rho-tau connections. We give a set of equivalent conditions for such a metric to become Hessian and for the ±1-connections to be dually flat. Next we argue that for any choice of strictly increasing functions ρ(u) and τ (u) one can construct a statistical model which is Hessian and phi-exponential. The metric derived from the escort expectations is conformally equivalent with the rho-tau metric.

</description>
        </descriptions>
    </resource>
.

Information Geometry Under Monotone Embedding. Part II: Geometry Jan Naudts1 and Jun Zhang2 1 Universiteit Antwerpen, Antwerpen, Belgium, jan.naudts@uantwerpen.be, 2 University of Michigan, Ann Arbor, MI U.S.A., junz@umich.edu, Both authors contributed equally to this paper. Abstract. The rho-tau embedding of a parametric statistical model de- fines both a Riemannian metric, called “rho-tau metric”, and an alpha family of rho-tau connections. We give a set of equivalent conditions for such a metric to become Hessian and for the ±1-connections to be dually flat. Next we argue that for any choice of strictly increasing functions ρ(u) and τ(u) one can construct a statistical model which is Hessian and phi-exponential. The metric derived from the escort expectations is conformally equivalent with the rho-tau metric. Keywords: Hessian geometry, dually-flat, rho-tau embedding, phi-exponential family, escort probability 1 Introduction Amari [1, 2] introduced the alpha family of connections Γ(α) for a statistical model belonging to the exponential family. He showed that Γ(α) and Γ(−α) are each others dual and that for α = ±1 the corresponding geometries are flat. Both the notions of an alpha family of connections and that of an exponential family of statistical models have been generalized. The present paper combines two general settings, that of the alpha family of connections determined by rho-tau embeddings [3] and that of phi-deformed exponential families [4]. Let M denote the space of probability density functions over the measure space (X, dx). A parametric model pθ is a map from some open domain in Rn into M. It becomes a parametric statistical model if θ → pθ is a Riemannian manifold with metric tensor g(θ). Throughout the paper it is assumed that two strictly increasing functions ρ and τ are given. The rho-tau divergence (see Part I) induces a metric tensor g on finite-dimensional manifolds of probability distributions and makes them into Riemannian manifolds. 2 The metric tensor The rho-tau divergence Dρ,τ (p, q) can be used ([3, 5, 6]) to define a metric tensor g(θ) by gi,j(θ) = ∂j∂iDρ,τ (p, pθ ) p=pθ , (1) with ∂i = ∂/∂θi . A short calculation gives gij(θ) = Z X dx  ∂iτ(pθ (x))   ∂jρ(pθ (x))  . (2) Because τ = f′ ◦ ρ, the rho-tau metric g(θ) also takes the form: gij(θ) = Z X dx  ∂if′ (ρ(pθ (x)))   ∂jρ(pθ (x))  = Z X dx f′′ (ρ(pθ (x)))  ∂iρ(pθ (x))   ∂jρ(pθ (x))  . This shows that the matrix g(θ) is symmetric. Moreover, it is positive-definite, because the derivatives ρ′ and f′′ are strictly positive and the matrix with com- ponents ∂jpθ (x)  ∂ipθ (x)  has eigenvalues 0 and 1 (assuming θ → pθ has no stationary points). Finally, g(θ) is covariant, so g is indeed a metric tensor on the Riemannian manifold pθ . From (2) follows that it is invariant under the exchange of ρ and τ. The rho-tau entropy Sρ,τ of the parametric family pθ can be written as Sρ,τ (pθ ) = − Z X dx f(ρ(pθ (x))). (3) So its second derivative hij(θ) = −∂i∂jSρ,τ (pθ ) is symmetric in i, j. When positive-definite, h(θ) can also serve as a metric tensor as is found sometimes in the Physics literature. Note that h(θ) differs from g(θ) in general: the former is induced by the entropy function Sρ,τ (p), whose definition depends on the single function f ◦ ρ, the latter is derived from the function Dρ,τ (p, q). 3 Gauge freedom Write the rho-tau metric gij as gij(θ) = Z X dx 1 φ(pθ)  ∂ipθ (x)   ∂jpθ (x)  , (4) where φ(u) = 1/(ρ′ (u)τ′ (u)). So despite of the two independent choices of em- bedding functions ρ and τ, the metric tensor gij is determined by one function φ only. More remarkably, gij(θ) = Z X dx f′′ (ρ(pθ (x)))  ∂i(ρ(pθ (x))   ∂jρ(pθ (x))  = Z X dx (f∗ )′′ (τ(pθ (x)))  ∂iτ(pθ (x))   ∂jτ(pθ (x))  , so the gauge freedom in gij exists independent of the embedding – there is freedom in choosing an arbitrary function f in the case of the ρ-embedding and an arbitrary function f∗ in the case of the τ-embedding of pθ . Without loss of generality, we choose τ-embedding and denote Xθ (x) = τ(pθ (x)). From the form of the rho-tau metric gij(θ) = Z X dx ρ′ (pθ (x)) τ′(pθ(x))  ∂iτ(pθ (x))   ∂jτ(pθ (x))  , we introduce a bilinear form h·, ·i defined on pairs of random variables u(x), v(x) hu, viθ = Z X dx ρ′ (pθ (x)) τ′(pθ(x)) u(x) v(x). For any random variable u it holds that ∂j Z X dx ρ(pθ (x))u(x) = Z X dx ρ′ (pθ (x)) τ′(pθ(x)) ∂jτ(pθ (x))u(x) = h∂jXθ , uiθ Following [2], ∂jXθ is then, by definition, tangent to the rho-representation ρ(pθ ) of the model pθ . We also have −∂jSρ,τ (pθ ) = h∂jXθ , Xθ iθ. (5) The difference of the metrics g(θ) and h(θ) can be readily appreciated: gij(θ) = h∂jXθ , ∂iXθ iθ whereas hij(θ) = −∂i∂jSρ,τ (pθ ) = ∂ih∂jXθ , Xθ iθ = gij(θ) + Z X dx τ(pθ (x))∂i∂jρ(pθ (x)). (6) 4 The Hessian case We now consider the condition under which the rho-tau metric g becomes Hes- sian. Theorem 1 Let be given a C∞ -manifold of probability distributions pθ . For fixed strictly increasing functions ρ and τ, let the metric tensor g(θ) be given by (2). Then the following statements are equivalent: (i) g is Hessian, i.e., there exists Φ(θ) such that gij(θ) = ∂i∂jΦ(θ). (ii) There exists a function V (θ) such that ∂2 V ∂θi∂θj = − Z X dx τ(pθ (x))∂i∂jρ(pθ (x)). (7) (iii) There exists a function W(θ) such that ∂2 W ∂θi∂θj = − Z X dx ρ(pθ (x))∂i∂jτ(pθ (x)). (8) (iv) There exist coordinates ηi(θ) for which gij(θ) = ∂jηi. (v) There exist coordinates ξi such that ∂jξi(θ) = − Z X dx τ(pθ (x))∂i∂jρ(pθ (x)). (9) (vi) There exist coordinates ζi such that ∂jζi(θ) = − Z X dx ρ(pθ (x))∂i∂jτ(pθ (x)). (10) Proof. (i) ←→ (iv) This is well-known: the existence of a strictly convex function Φ is equivalent to the existence of dual coordinates ηi. (ii) ←→ (v) From (ii) to (v): Given the existence of V (θ) satisfying (7), choose ξi = ∂iV , and (9) is satisfied. From (v) to (ii): Since the right-hand side of (9) is symmetric with respect to i, j, we have ∂jξi = ∂iξj. Hence there exists a function V (θ) such that ξi = ∂iV ; this is the V function satisfying (7). (iii) ←→ (vi) The proof is similar to the previous paragraph, by simply changing V to W and ξ to ζ. (i) ←→ (ii) From the identity (6), the existence of Φ(θ) to represent gij as its second derivatives allows us to choose the function V as V = Φ + S. So from (i) we obtain (ii). Conversely when the integral term can be represented by the second derivative of V (θ), we can choose Φ = V − S that would satisfy (6). This yields (i) from (ii). (i) ←→ (iii) The proof is similar to that of the previous paragraph, except that we will invoke the following identity instead of (6): −∂i∂jS∗ ρ,τ (pθ ) = gij(θ) + Z X dx ρ(pθ (x))∂i∂jτ(pθ (x)). ⊓ ⊔ The case when g is Hessian is very special, because of the existence of various bi-orthogonal coordinates. The ηi are the dual coordinates of the θi . The ζi are called escort coordinates. They are linked to ηi by ζi = − Z X dx ρ(pθ (x))∂iτ(pθ (x)) + ηi = ∂iS∗ ρ,τ (pθ ) + ηi. (11) They satisfy ∂j∂kζi = −h∂kXθ , ∂i∂jXθ i. The dual escort coordinates ξi are given by ξj(θ) = ∂jSρ,τ (pθ ) + ηj. (12) The Hessian of the function V (θ), when it does not vanish, causes a discrepancy between a metric tensor h defined as minus the Hessian of the entropy and the metric tensor g as defined by (2). 5 Zhang’s rho-tau connections Given a pair of strictly increasing functions ρ and τ and a model pθ , Zhang introduced the following connections [3] Γ (α) ij,k = 1 + α 2 Z X dx  ∂i∂jρ(pθ (x))   ∂kτ(pθ (x))  + 1 − α 2 Z X dx  ∂i∂jτ(pθ (x))   ∂kρ(pθ (x))  , (13) where Γ (α) ij,k ≡ (Γ(α) )l ijglk. One readily verifies Γ (α) ij,k + Γ (−α) jk,i = ∂igjk(θ). (14) This shows that, by definition, Γ(−α) is the dual connection of Γ(α) . The coefficients of the connection Γ(−1) vanish identically if Z X dx  ∂i∂jτ(pθ (x))   ∂kρ(pθ (x))  = 0. (15) This condition can be written as ∂j∂kζi = −h∂i∂jXθ , ∂kXθ iθ = 0. (16) It states that the escort coordinates are affine functions of θ and expresses that the second derivatives ∂i∂jXθ are orthogonal to the tangent plane of the statis- tical manifold. If satisfied then the dual of Γ(−1) satisfies Γ (1) ij,k = ∂igjk(θ). (17) Likewise, the coefficients of the connection Γ(1) vanish identically if Z X dx  ∂i∂jρ(pθ (x))   ∂kτ(pθ (x))  = 0. (18) Proposition 1 With respect to conditions (15) and (18), 1. When (15) holds, the coordinates θi are affine coordinates for Γ(−1) ; the dual coordinates ηi are affine coordinates for Γ(1) ; 2. When (18) holds, the coordinates θi are affine coordinates for Γ(1) ; the dual coordinates ηi are affine coordinates for Γ(−1) ; 3. In either case above, g(θ) is Hessian. Proof. One recalls that when Γ = 0 under a coordinate system θ, then θi ’s are affine coordinates – the geodesics are straight lines: θ(t) = (1 − t)θ(t=1) + tθ(t=0). The geodesics of the dual connection Γ∗ satisfies the Euler-Lagrange equations d2 dt2 θi + Γi km  d dt θk   d dt θm  = 0. (19) Its solution is such that the dual coordinates η are affine coordinates: η(t) = (1 − t)η(t=1) + tη(t=0). For Statement 1, we apply the above knowledge, taking Γ = Γ(−1) and Γ∗ = Γ(1) ; for Statement 2, taking Γ = Γ(1) and Γ∗ = Γ(−1) . To prove Statement 3 observe that ∂kgij(θ) = Z X dx  ∂iτ(pθ (x))  ∂j∂kρ(pθ (x)) + Z X dx  ∂jρ(pθ (x))  ∂i∂kτ(pθ (x)). So the vanishing of either term, i.e., either (15) or (18) holding, will lead ∂kgij(θ) to be symmetric in j, k or in i, k, respectively. This, in conjunction with the fact that gij is symmetric in i, j, leads to the conclusion that ∂kgij(θ) is totally symmetric in an exchange of any two of the three indices i, j, k. This implies that ηi exist for which gij(θ) = ∂jηi. That g is Hessian follows now from Theorem 1. ⊓ ⊔ 6 Rho-tau embedding of phi-exponential models Let φ(u) = 1/(ρ′ (u)τ′ (u)) as before and fix real random variables F1, F2, · · · , Fn. These functions determine a phi-exponential family θ → pθ by the relation (see [4, 7, 8]) pθ (x) = expφ  θk Fk(x) − α(θ)  . (20) The function α(θ) is determined by the requirement that pθ is a probability distribution and must be normalized to 1. Assume that the integral z(θ) = Z X dx φ(pθ (x)) converges. Then the escort family of probability distributions p̃θ is defined by p̃θ (x) = 1 z(θ) φ(pθ (x)). The corresponding escort expectation is denoted Ẽθ. From the normalization of the pθ follows that ∂iα(θ) = ẼθFi. Now calculate, starting from (4), gij(θ) = Z X dx 1 φ(pθ(x))  ∂ipθ (x)   ∂jpθ (x)  = Z X dx φ(pθ (x)) [Fi − ∂iα(θ)] [Fj − ∂jα(θ)] = z(θ) h ẼθFiFj − ẼθFiẼθFj i . (21) The latter expression is the metric tensor of the phi-exponential model as intro- duced in [4]. It implies that the rho-tau metric tensor is conformally equivalent with the metric tensor as derived from the escort expectation of the random variables Fi. Finally, let ηi = EθFi. A short calculation shows that ∂jηi = Z X dx φ(pθ (x)) [Fj − ∂jα(θ)] Fi = z(θ) h ẼθFiFj − ẼθFiẼθFj i = gij(θ). (22) By (iv) of Theorem 1 this implies that the metric tensor gij is Hessian. Note that the ηi are dual coordinates. As defined here, they only depend on φ and not on the particular choice of embeddings ρ and τ. In particular, also the dually flat geometry does not depend on it. One concludes that for any choice of strictly increasing functions ρ(u) and τ(u) one can always construct statistical models for which the rho-tau metric is Hessian. These are phi-exponential models, with φ given by φ(u) = 1/ρ′ (u)τ′ (u). Conversely, given a phi-exponential model, its metric tensor is always a rho- tau metric tensor, with ρ, τ subject to the condition that ρ′ (u)τ′ (u) = 1/φ(u). Two special cases are that either ρ or τ is the identity map, with the other being identified as the logφ function. In the terminology of Zhang [3] the models of the phi-exponential family are called ρ-affine models where the normalization condition is, however, not imposed. 7 Discussion This paper studies parametrized statistical models pθ and the geometry induced on them by the choice of a pair of strictly increasing functions ρ and τ. Theorem 1 gives equivalent conditions for the metric to be Hessian. It is shown that for the existence of a dually flat geometry the metric has to be Hessian. The rho-tau metric tensor depends on a single function φ which is defined by φ(u) = 1/(ρ′ (u)τ′ (u)). If the model is phi-exponential for the same function φ then the rho-tau metric coincides with the metric used in the context of phi- exponential families and in particular the metric is Hessian. This shows that it is always possible to construct models which are Hessian for the given rho-tau metric. Acknowledgement The second author is supported by DARPA/ARO Grant W911NF-16-1-0383. References 1. Amari, S.: Differential-geometric methods in statistics. Lecture Notes in Statistics 28 (Springer, 1985) 2. Amari, S., Nagaoka, H.: Methods of information geometry, Translations of math- ematical monographs 191 (Am. Math. Soc., 2000; Oxford University Press, 2000); Originally in Japanese (Iwanami Shoten, Tokyo, 1993) 3. Zhang, J.: Divergence function, duality, and convex analysis, Neural Comput. 16, 159–195 (2004) 4. Naudts, J.: Estimators, escort probabilities, and phi-exponential families in statis- tical physics. J. Ineq. Pure Appl. Math. 5, 102 (2004) 5. Zhang J.: Nonparametric information geometry: from divergence function to ref- erential-representational biduality on statistical manifolds. Entropy 15, 5384–5418 (2013) 6. Zhang J.: On monotone embedding in information geometry. Entropy 17, 4485–4499 (2015) 7. Naudts, J.: Generalised Exponential Families and Associated Entropy Functions. Entropy 10, 131–149 (2008) 8. Naudts, J.: Generalised Thermostatistics (Springer, 2011)