Information Geometry Under Monotone Embedding. Part I: Divergence Functions

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22620
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

The standard model of information geometry, expressed as Fisher-Rao metric and Amari-Chensov tensor, reflects an embedding of probability density by log-transform. The standard embedding was generalized by one-parametric families of embedding function, such as α-embedding, q-embedding, κ-embedding. Further generalizations using arbitrary monotone functions (or positive functions as derivatives) include the deformed-log embedding (Naudts), U-embedding (Eguchi), and rho-tau dual embedding (Zhang). Here we demonstrate that the divergence function under the rho-tau dual embedding degenerates, upon taking ρ = id, to that under either deformed-log embedding or U-embedding; hence the latter two give an identical divergence function. While the rho-tau embedding gives rise to the most general form of cross-entropy with two free functions, its entropy reduces to that of deformed entropy of Naudts with only one free function. Fixing the gauge freedom in rho-tau embedding through normalization of dual-entropy function renders rho-tau cross-entropy to degenerate to U cross-entropy of Eguchi, which has the simpler property, not true for general rho-tau cross-entropy, of reducing to the deformed entropy upon setting the two pdfs to be equal. In Part I, we investigate monotone embedding in divergence function, entropy and cross-entropy, whereas in the sequel (Part II), in induced geometries and probability families.

Information Geometry Under Monotone Embedding. Part I: Divergence Functions

Collection

application/pdf Information Geometry Under Monotone Embedding. Part I: Divergence Functions Jun Zhang, Jan Naudts
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Information Geometry Under Monotone Embedding. Part I: Divergence Functions

Média

Voir la vidéo

Métriques

0
0
252.74 Ko
 application/pdf
bitcache://4e9d646ffda04ea31511e20ce1f57fc350e9046a

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
logo_gdr-mia.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22620</identifier><creators><creator><creatorName>Jan Naudts</creatorName></creator><creator><creatorName>Jun Zhang</creatorName></creator></creators><titles>
            <title>Information Geometry Under Monotone Embedding. Part I: Divergence Functions</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>rho-tau embedding</subject><subject>phi-embedding</subject><subject>rho-tau divergence</subject><subject>rho-tau cross-entropy</subject><subject>U cross-entropy</subject><subject>deformed entropy</subject><subject>gauge</subject></subjects><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Mon 15 Oct 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">4e9d646ffda04ea31511e20ce1f57fc350e9046a</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37383</version>
        <descriptions>
            <description descriptionType="Abstract">
The standard model of information geometry, expressed as Fisher-Rao metric and Amari-Chensov tensor, reflects an embedding of probability density by log-transform. The standard embedding was generalized by one-parametric families of embedding function, such as α-embedding, q-embedding, κ-embedding. Further generalizations using arbitrary monotone functions (or positive functions as derivatives) include the deformed-log embedding (Naudts), U-embedding (Eguchi), and rho-tau dual embedding (Zhang). Here we demonstrate that the divergence function under the rho-tau dual embedding degenerates, upon taking ρ = id, to that under either deformed-log embedding or U-embedding; hence the latter two give an identical divergence function. While the rho-tau embedding gives rise to the most general form of cross-entropy with two free functions, its entropy reduces to that of deformed entropy of Naudts with only one free function. Fixing the gauge freedom in rho-tau embedding through normalization of dual-entropy function renders rho-tau cross-entropy to degenerate to U cross-entropy of Eguchi, which has the simpler property, not true for general rho-tau cross-entropy, of reducing to the deformed entropy upon setting the two pdfs to be equal. In Part I, we investigate monotone embedding in divergence function, entropy and cross-entropy, whereas in the sequel (Part II), in induced geometries and probability families.

</description>
        </descriptions>
    </resource>
.

Information Geometry Under Monotone Embedding. Part I: Divergence Functions Jun Zhang1 and Jan Naudts2 1 University of Michigan, Ann Arbor, MI U.S.A., junz@umich.edu, 2 Universiteit Antwerpen, Antwerpen, Belgium, Jan.Naudts@uantwerpen.be, Both authors contributed equally to this paper. Keywords: phi-embedding, U-embedding, rho-tau embedding, rho-tau divergence, rho-tau cross-entropy, U cross-entropy, deformed entropy, gauge Abstract. The standard model of information geometry, expressed as Fisher-Rao metric and Amari-Chensov tensor, reflects an embedding of probability density by log-transform. The standard embedding was gen- eralized by one-parametric families of embedding function, such as α- embedding, q-embedding, κ-embedding. Further generalizations using ar- bitrary monotone functions (or positive functions as derivatives) include the deformed-log embedding (Naudts), U-embedding (Eguchi), and rho- tau dual embedding (Zhang). Here we demonstrate that the divergence function under the rho-tau dual embedding degenerates, upon taking ρ = id, to that under either deformed-log embedding or U-embedding; hence the latter two give an identical divergence function. While the rho-tau embedding gives rise to the most general form of cross-entropy with two free functions, its entropy reduces to that of deformed entropy of Naudts with only one free function. Fixing the gauge freedom in rho- tau embedding through normalization of dual-entropy function renders rho-tau cross-entropy to degenerate to U cross-entropy of Eguchi, which has the simpler property, not true for general rho-tau cross-entropy, of reducing to the deformed entropy upon setting the two pdfs to be equal. In Part I, we investigate monotone embedding in divergence function, entropy and cross-entropy, whereas in the sequel (Part II), in induced geometries and probability families. 1 Introduction: A Plethora of Probability Embeddings One motivation to study probability embedding functions is to extend the frame- work of information geometry beyond the now-classic expressions of Fisher-Rao metric and Amari-Chensov tensor. Realizing that the standard α-geometry is based on log-embedding of probability functions, various approaches have been proposed to generalize such probability embedding, using a one-parameter fam- ily of specific functions at the first level of generality, and using arbitrarily chosen (monotone or positive) functions at the second level of generality. i). α-embedding. It was Amari [1] who first investigated the one-parameter family of embeddings logα : R+ → R defined by logα(u) =  log u α = 1 2 1−α u(1−α)/2 α 6= 1 (1) Under this α-embedding, α-divergence becomes canonical divergence, and α- connections have a simple Γ1 , Γ−1 -like characteristics [2]. ii). q-exponential embedding. Tsallis [3], in investigating the equilibrium dis- tribution of statistical physics which maximizes the Boltzmann-Gibbs-Shannon entropy under constraints, replaced the entropy function by a q-dependent en- tropy, resulting in a deformed version of statistical physics; here, q ∈ R. The q-logarithmic/exponential functions were introduced [4]: logq(u) = 1 1 − q u1−q − 1  , expq(u) = [1 + (1 − q)u] 1/(1−q) , q 6= 1. Note that q-embedding and α-embedding functions are different: logq(·) 6= logα(·), even after the identification α = 2q −1. Like α-embedding, q-embedding reduces to the standard logarithm as limq→1. iii). κ-exponential embedding. An alternative to the q-deformed exponential model for statistical physics is the κ-model ([5]), where logκ(u) = 1 2κ uκ − u−κ  , expκ(u) =  κu + p 1 + κ2u2  1 κ , κ 6= 0; the case of limκ→0 corresponds to the standard exponential/logarithm. iv). φ-, U-, and (ρ, τ)-embedding. Generalizing any parametric forms of em- bedding functions further leads to the consideration of probability embedding using arbitrary monotone (or after taking derivative, positive) functions. The prominent inventions are Naudts’ phi-embedding [7], Eguchi’s U-embedding [8], and Zhang’s rho-tau embedding [6], though they have been re-invented/renamed by later authors, causing confusion and distraction. We discuss these in the next Section. Below we first review the deformed logarithm, logφ, and deformed exponen- tial, expφ, functions. Then we point out that logφ and expφ are nothing but an arbitrary pair of mutually inverse monotone functions, and are representable as derivatives of a pair of conjugate convex functions f, f∗ . The deformed diver- gence Dφ(p, q) is then precisely the Bregman divergence Df (p, q) associated with f. The construction of entropy and cross-entropy from this deformed approach is reviewed, as well as their contruction from the U-embedding. Then, we review the rho-tau embedding, which provides two independently chosen embedding functions, and explictly identify its entropy and cross-entropy. Our Main The- orem shows that the divergence function and entropy function of the rho-tau embedding reduce as a special case to those given by the phi-embedding and U-embedding, while the rho-tau cross-entropy reduces as another special case to the U cross-entropy. 2 Deformation versus embedding 2.1 “Deforming” exponential and logarithmic functions Naudts [7, 9] defines the phi-deformed logarithm logφ(u) = Z u 1 1 φ(v) dv. Here, φ(v) is a strictly positive function. In the context of discrete probabilities it suffices that it is strictly positive on the open interval (0, 1), possibly vanishing at the end points. In the case of a probability density function it is assumed to be strictly positive on the interval (0, +∞). Note that by construction one has logφ(1) = 0. The inverse of the phi-logarithm is denoted expφ(u), and called phi-exponential function: expφ(logφ(u)) = logφ(expφ(u)) = u. The phi-exponential has an integral expression expφ(u) = 1 + Z u 0 dv ψ(v), where the function ψ(u) is given by ψ(u) = d du expφ(u) = d du (logφ)−1 (u). In terms of φ, ψ, we have the following relations: ψ(u) = φ(expφ(u)), φ(u) = ψ(logφ(u)). We want to stress that all four functions, φ, ψ, logφ, expφ, arise out of choosing one positive-valued function φ. As examples, φ(v) = v gives rise to the classic natural logarithm and expo- nential. Taking φ(u) = u 1+u in [13] leads to logφ(u) = u − 1 + log(u). Taking φ(u) = u(1 + u) in [14] leads to log  (1 + )u 1 + u  , expφ(u) = 1 (1 + )e−v −  . 2.2 Deformed entropy and deformed divergence functions The phi-entropy of the probability distribution p is defined by [9] Sφ(p) = −Ep logφ p + Z X dx Z p(x) 0 du u φ(u) + constant. (2) By partial integration one obtains an equivalent expression Sφ(p) = − Z X dx Z p(x) 1 du logφ(u) + constant. (3) For standard logarithm φ(u) = u, the above expression is the well-known entropy of Boltzmann-Gibbs-Shannon S(p) = −Ep log p. The phi-divergence of two probability functions p and q is defined by [9] Dφ(p, q) = Z X dx Z p(x) q(x) dv  logφ(v) − logφ(q(x))  , (4) which has another equivalent expression Dφ(p, q) = Sφ(q) − Sφ(p) − Z X dx [p(x) − q(x)] logφ(q(x)). (5) Now let us express these quantities in terms of a strictly convex function f, satisfying f0 (u) = logφ(u). We have: Sφ(p) = − Z X dx f(p(x)) + constant, (6) Dφ(p, q) = Z X dx {f(p(x)) − f(q(x)) − [p(x) − q(x)]f0 (q(x))} . (7) One can readily recognize that Dφ(p, q) is nothing but the Bregman divergence, whereas the function f itself determines the deformed entropy Sφ(p). Note that p 7→ Sφ(p) is strictly concave while the map p 7→ Dφ(p, q) is strictly convex. 2.3 U-embedding Eguchi [8] introduces the U-embedding, which is essentially the Bregman di- vergence under a strictly convex function U coupled with an embedding using ψ = (U0 )−1 . The U cross-entropy CU (p, q) is defined as: CU (p, q) = Z X dx {U(ψ(q(x))) − p(x) · ψ(q(x))} , (8) whereas the U entropy HU is defined as HU (p) = CU (p, p). The U-divergence is DU (p, q) = CU (p, q) − HU (p, p) = Z X dx  U (ψ(q(x))) − U (ψ(p(x))) − p(x) [(ψ(q(x)) − ψ(p(x))]  . (9) Note that the U-embedding only has one arbitrarily chosen function, as does phi-embedding. 2.4 Dual rho-tau embedding In contrast with the “single function” embedding of the phi-model and the U- model, Zhang’s (2004) rho-tau framework uses two arbitrarily and independently chosen monotone functions. He starts with the observation that a pair of mutu- ally inverse functions occurs naturally in the context of convex duality. Indeed, if f is strictly convex and f∗ is its convex dual then the derivatives f0 and (f∗ )0 are inverse functions of each other: f0 ◦ (f∗ )0 (u) = (f∗ )0 ◦ f0 (u) = u. Here the definition of the convex dual f∗ of f is: f∗ (u) = sup{uv − f(v)}. For u in the range of f0 it is given by f∗ (u) = u(f0 )−1 (u) − f ◦ (f0 )−1 (u). Take the derivative of this expression to find (f∗ )0 ◦f0 (u) = u. By convex duality then follows that also f0 ◦ (f∗ )0 (u) = u. Take an additional derivative to obtain f00 ((f∗ )0 (u)) · (f∗ )00 (u) = (f∗ )00 (f0 (u)) · f00 (u) = 1. (10) This identity will be used further on. Consider now a pair (ρ(·), τ(·)) of strictly increasing functions. Then there exists a strictly convex function f(·) satisfying f0 (u) = τ ◦ρ−1 (u). This is because the family of strictly increasing functions form a group, with function composi- tion as group operation, an observation made in [6, 12]. In terms of the conjugate function f∗ , the relation is (f∗ )0 (u) = ρ ◦ τ−1 (u). The derivatives of f(u) and of its conjugate f∗ (u) have the property that f0 (ρ(u)) = τ(u) and (f∗ )0 (τ(u)) = ρ(u). (11) Among the triple (f, ρ, τ), given any two functions, the third is specified. When we arbitrarily choose two strictly increasing functions ρ and τ as embedding functions, then they are automatically linked by a pair of conjugated convex functions f, f∗ . On the other hand, we may also independently choose to specify (ρ, f), (ρ, f∗ ), (τ, f), or (τ, f∗ ), with the others being fixed. Therefore, rho-tau embedding is a mechanism with two independently chosen functions; this differs from both the phi-embedding and the U-embedding. The following identities will be useful: f00 (ρ(u)) ρ0 (u) = τ0 (u) , (f∗ )00 (τ(u)) τ0 (u) = ρ0 (u) , (12) f00 (ρ(u)) (ρ0 (u))2 = (f∗ )00 (τ(u)) (τ0 (u))2 , (13) f00 (ρ(u)) (f∗ )00 (τ(u)) = 1. (14) 2.5 Divergence of the rho-tau embedding Zhang (2004) introduces3 the rho-tau divergence (see Proposition 6 of [6]) Dρ,τ (p, q) = Z X dx {f(ρ(p(x))) + f∗ (τ(q(x))) − ρ(p(x))τ(q(x))} , (15) where f is a strictly convex function satisfying f0 (ρ(u)) = τ(u). Lemma 1. Expression (15) can be written as Dρ,τ (p, q) = Z X dx  f(ρ(p(x))) − f(ρ(q(x))) − [ρ(p(x)) − ρ(q(x))]τ(q(x))  = Z X dx Z p(x) q(x) [τ(v) − τ(q(x))] dρ(v) = Z X dx Z ρ(p(x)) ρ(q(x)) du [f0 (u) − f0 (ρ(q(x)))] . (16) In particular this implies that Dρ,τ (p, q) ≥ 0, with equality if and only if p = q. We note the following identity: f(ρ(p(x))) − ρ(p(x))τ(p(x)) + f∗ (τ(p(x))) = 0. (17) The “reference-representation biduality” [6, 10, 12] reveals as Dρ,τ (p, q) = Dτ,ρ(q, p). 2.6 Entropy and cross-entropy of rho-tau embedding It is now obvious to give the following definition of the rho-tau entropy Sρ,τ (p) = − Z X dx f(ρ(p(x))), (18) where f(u) is a strictly convex function satisfying f0 (u) = τ ◦ ρ−1 (u). This can be written as Sρ,τ (p) = − Z X dx Z ρ(p(x)) f0 (v)dv + constant = − Z X dx Z p(x) τ(u)dρ(u) + constant. (19) 3 The original definition as found in [6, 12] uses the notation Df,ρ(p, q) and treats f and ρ as independent. In the present definition Dρ,τ (p, q) the definition of f depends on ρ, τ. The difference in only notational and inconsequential. Note that the rho-tau entropy Sρ,τ (p) is concave in ρ(p), but not necessarily in p. This has consequences further on. We likewise define rho-tau cross-entropy Cρ,τ (p, q) = − Z X dx ρ(p(x))τ(q(x)) with Cρ,τ (p, q) = Cτ,ρ(q, p). The rho-tau divergence can then be given by Dρ,τ (p, q) = Sρ,τ (q) − Sρ,τ (p) − Z X dx [ρ(p(x)) − ρ(q(x))]τ(q(x)). = [Sρ,τ (q) − Cρ,τ (q, q)] − [Sρ,τ (p) − Cρ,τ (p, q)] Note that in general Sρ,τ (q) 6= Cρ,τ (q, q); this is because Sρ,τ (p) − Cρ,τ (p, p) = Z X dx f∗ (τ(p(x))). So unless f(u) = cu for constant c, f∗ would not vanish. In fact, denote S∗ ρ,τ (p) = − Z X dx f∗ (τ(p(x))). (20) Then S∗ ρ,τ (p) = Sτ,ρ(p), and Sρ,τ (p) − Cρ,τ (p, p) + S∗ ρ,τ (p) = 0 (21) which is, after integrating R X dx, a re-write of (17). Therefore, Dρ,τ (p, q) = Sρ,τ (p) − Cρ,τ (p, q) + S∗ ρ,τ (q). (22) Because Dρ,τ (p, q) is non-negative and vanishes if and only if p = q, the function p 7→ Sρ,τ (p) − Cρ,τ (p, q) has its unique maximum at p = q. Therefore, minimizing p 7→ Dρ,τ (p, q) is equivalent with maximizing p 7→ Sρ,τ (p)−Cρ,τ (p, q). 2.7 Gauge freedom of the rho-tau embedding Because rho-tau embedding has the freedom of two functions, it reduces to the single-function embeddings (either phi- or U-embedding) upon fixing one em- bedding function. Divergence. In the phi-embedding, Expression (15) of Dρ,τ (p, q) reduces to the phi-divergence Dφ(p, q) for instance if ρ = id, the identity function; in this case, τ(u) = logφ(u) = f0 (u). The U-embedding is also a special case of the rho-tau embedding, with ρ = id identification: U = f∗ , τ = (U0 )−1 = f0 . So phi-divergence (7) and U-divergence (9) are identical. U- and phi-embedding are the same, with U0 = expφ, as noted in [11]. Entropy. By virtue of gauge selection ρ = id in the rho-tau embedding, any phi-deformed entropy (3) is a special case of rho-tau entropy (18) Sρ,τ (p) = Sφ(ρ(p)). On the other hand, though the rho-tau entropy (18) has two free functions in appearance, it is the result of their function composition that matters. So any rho-tau entropy is also a phi-entropy for a well-chosen φ. The situation with the U-embedding is the same, because U-entropy is iden- tical with phi-entropy: HU (p) = Z X dx  U((U0 )−1 (p(x))) − p(x) · (U0 )−1 (p(x))  = Z X dx [f∗ (f0 (p(x))) − p(x) · f0 (p(x))] = − Z X dxf(p(x)) = Sφ(p). Cross-entropy. The rho-tau embedding identifies Cρ,τ (p, q) as the cross-entropy with a dual embedding mechanism, one free function for each of the p, q. In this most general form, however, we do not require that Cρ,τ (p, q) reduce to either Sρ,τ (p) or S∗ ρ,τ (p) ≡ Sτ,ρ(p) when p = q. This is different from the approach of the U-embedding, where its cross-entropy CU (p, q) is such that CU (p, p) = HU (p). It turns out that CU (p, q) given by (8) equals the rho-tau cross-entropy minus the dual rho-tau entropy (after adopting the ρ = id gauge): Cρ,τ (p, q) − S∗ ρ,τ (q) = CU (p, q). (23) Below, we extend Eguchi’s definition of U cross-entropy by removing the ρ = id restriction. In other words, we can call the left-hand side of (23) U cross-entropy, which depends on two free functions ρ, τ, and obtain from (22) Dρ,τ = CU (p, q) − CU (p, p). 2.8 The normalization gauge Let us fix the gauge by f∗ = τ−1 . In this case, R X dx f∗ (τ(p(x))) = R X p(x)dx = 1, so S∗ ρ,τ (p) = S∗ ρ,τ (q) = −1. Adopting the f∗ = τ−1 gauge (we call this “normalization gauge”) implies that ρ(p) = (f∗ )0 (τ(p)) = (τ−1 )0 (τ(p)) = 1 τ0(p) . So the transformation λ : τ(·) −→ 1 τ0(·) ≡ (τ−1 )0 (τ(·)) reflects a transformation of embedding functions. In the phi-embedding lan- guage, τ → ρ is simply logφ → φ, or the phi-exponentiation operation. This transformation is important in studying phi-exponential family of pdfs (Part II). Fixing the gauge freedom by normalization simplifies the form of Dρ,τ . Mak- ing use of (21), with S∗ = const, implies that the rho-tau cross-entropy Cρ,τ and U cross-entropy CU (ρ, τ), as given by left-hand side of (23), are equal and are denoted C0: C0(p, q) = − Z X dx ρ(p(x)) · τ(q(x)) = − Z X dx (τ−1 )0 (τ(p(x))) · τ(q(x)) or, in terms of deformed-logarithm notation, C0(p, q) = − Z X dx ρ(p(x)) logρ(q(x)). Then H0(p) ≡ C0(p, p) = − Z X dx ρ(p(x)) logρ(p(x)), with D0(p, q) = C0(p, q) − C0(p, p) = Z X dx ρ(p(x)) · (logρ(p(x)) − logρ(q(x))) = Z X dx 1 τ0(p(x)) (τ(p(x)) − τ(q(x))). (24) Note that D0 6= Dφ; they both degenerate from Dρ,τ under different gauges. We summarize the above conclusions in the following Theorem: Theorem 1. The (ρ, τ) embedding reduces to special cases upon fixing the gauge as: (i) ρ = id: rho-tau divergence Dρ,τ reduces to deformed phi-divergence Dφ with τ = f0 = logφ, and to U-divergence DU with U = f∗ and τ = f0 = (U0 )−1 ; (ii) f∗ = τ−1 : rho-tau cross-entropy Cρ,τ reduces to U-cross-entropy as rede- fined in (23). In this case, ρ = φ, τ = logφ, i.e., τ → ρ = (τ−1 )0 ◦ τ ≡ 1/τ0 is taking phi-exponentiation operation; (iii) ρ = τ: rho-tau divergence Dρ,τ becomes R dx(ρ(p(x)) − ρ(q(x)))2 /2. 3 Discussion The main thesis of our paper is that the divergence function Dρ,τ constructed from (ρ, τ)-embedding subsumes both the phi-divergence Dφ constructed from the deformed-log embedding and the U-divergene constructed from the U-embedding. A highlight of our analysis is that the rho-tau divergence Dρ,τ provides a clear distinction between entropy and cross-entropy as two distinct quantities without requiring the latter to degenerate to the former. This is significant in terms of the resulting geometry generated by these two quantities (see Part II). On the other hand, upon fixing the gauge f∗ = τ−1 (normalization gauge) renders the rho-tau cross-entropy to be U cross-entropy, where the dual-entropy is constant. In this case, τ ↔ ρ is akin to logφ ↔ φ tranformation encountered in studying normalization of phi-exponential family. A thorough discussion of the geometries induced from the rho-tau divergence and from the phi-exponential family will be given in Part II. Acknowledgement The first author is supported by DARPA/ARO Grant W911NF-16-1-0383. References 1. S. Amari, Differential-geometric methods in statistics. Lecture Notes in Statistics, 28 (Springer, 1985) 2. S. Amari, H. Nagaoka, Methods of information geometry, Translations of mathematical monographs 191 (Am. Math. Soc., 2000; Oxford University Press, 2000); Originally in Japanese (Iwanami Shoten, Tokyo, 1993). 3. C. Tsallis, Possible Generalization of Boltzmann-Gibbs Statistics, J. Stat. Phys. 52, 479–487 (1988). 4. C. Tsallis, What are the numbers that experiments provide?, Quim. Nova 17, 468 (1994). 5. G. Kaniadakis. Non-linear kinetics underlying generalized statistics. Physica A: Statistical Mechanics and its Applications 296.3, 405-425 (2001). 6. J. Zhang, Divergence function, duality, and convex analysis, Neural Comput. 16, 159–195 (2004). 7. J. Naudts, Estimators, escort probabilities, and phi-exponential families in statistical physics, J. Ineq. Pure Appl. Math. 5, 102 (2004), arXiv:math-ph/0402005. 8. S. Eguchi, Information geometry and statistical pattern recognition, Sugaku Expositions (Amer. Math. Soc.) 19 (2006) 197–216 (originally Sūgaku 56 (2004) 380 in Japanese). 9. Jan Naudts, Generalised Thermostatistics (Springer, 2011) ISBN: 978-0-85729-354-1. 10. J. Zhang, Nonparametric information geometry: from divergence function to referential-representational biduality on statistical manifolds, Entropy 15, 5384-5418 (2013) 11. J. Naudts, B. Anthonis, The exponential family in abstract information theory. GSI 2013 LNCS proceedings, F. Nielsen and F. Barbaresco eds., (Springer, 2013), p. 265–272. 12. Jun Zhang, On monotone embedding in information geometry, Entropy 17, 4485-4499 (2015) 13. N. J. Newton, Information Geometric Nonlinear Filtering, Inf. Dim. Anal., Quantum Prob. Rel. Topics, 18, 1550014 (2015). 14. Jian Zhou, Information theory and statistical mechanics revisited, arXiv:1604.08739.