Deformed exponential bundle: the linear growth case

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22627
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

Vigelis and Cavalcante extended the Naudts' deformed exponential families to a generic reference density. Here, the special case of Newton's deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities.

Deformed exponential bundle: the linear growth case

Collection

application/pdf Deformed exponential bundle: the linear growth case Luigi Montrucchio, Giovanni Pistone
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Vigelis and Cavalcante extended the Naudts' deformed exponential families to a generic reference density. Here, the special case of Newton's deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities.
Deformed exponential bundle: the linear growth case

Média

Voir la vidéo

Métriques

0
0
237.11 Ko
 application/pdf
bitcache://da37b08798c8d91dbbd704fa3398f53ed8fddc26

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22627</identifier><creators><creator><creatorName>Giovanni Pistone</creatorName></creator><creator><creatorName>Luigi Montrucchio</creatorName></creator></creators><titles>
            <title>Deformed exponential bundle: the linear growth case</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Tue 13 Nov 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">da37b08798c8d91dbbd704fa3398f53ed8fddc26</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37379</version>
        <descriptions>
            <description descriptionType="Abstract">Vigelis and Cavalcante extended the Naudts' deformed exponential families to a generic reference density. Here, the special case of Newton's deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities.
</description>
        </descriptions>
    </resource>
.

Deformed exponential bundle: the linear growth case Luigi Montrucchio1 and Giovanni Pistone2 1 Collegio Carlo Alberto, Moncalieri, Italy, luigi.montrucchio@unito.it 2 de Castro Statistics, Collegio Carlo Alberto, Moncalieri, Italy, giovanni.pistone@carloalberto.org, www.giannidiorestino.it Abstract. Vigelis and Cavalcante extended the Naudts’ deformed ex- ponential families to a generic reference density. Here, the special case of Newton’s deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities. 1 Introduction Let P be a family of positive probability densities on the probability space (X, X, µ). At each p ∈ P we have the Hilbert space of square-integrable random variables L2 (p · µ) so that we can define the Hilbert bundle consisting of P with linear fibers L2 (p·µ). Such a bundle supports most of the structure of Information Geometry, cf. [1] and the non-parametric version in [7, 6]. If P is an exponential manifold, there exists a splitting of each fiber L(p·µ) = Hp⊕H⊥ p , such that Hp is equal or contains as a dense subset, the tangent space of the manifold at p. Moreover, the geometry on P is affine and, as a consequence, there are natural transport mappings on the Hilbert bundle. We shall study a similar set-up when the manifold is defined by charts based on mapping other than the exponential, while retaining an affine structure, see e.g. [10]. Here, we use p = expA(v), where expA is exponential-like function with linear growth at +∞. In such a case, the Hilbert bundle has fibers which are all sub-spaces of the same L2 (µ) space. The formalism of deformed exponentials by J. Naudts [4] is reviewed and adapted in Sec. 2. The following Sec. 3 is devoted to the adaptation of that formalism to the non-parametric case. Our construction is based on the work of R.F. Vigelis and C.C. Cavalcante [9], and we add a few more details about the infinite-dimensional case. Sec. 4 discusses the construction of the Hilbert statistical bundle in our case. 2 Background We recall a special case of a nice and useful formalism introduced by J. Naudts [4]. Let A: [0, +∞[→ [0, 1[ be an increasing, concave and differentiable function with A(0) = 0, A(+∞) = 1 and A0 (0+) = 1. We focus on the case A(x) = 1−1/(1+x) = x/(1+x) that has been firstly discussed by N.J. Newton [5]. The deformed A-logarithm is the function logA(x) = R x 1 A(ξ)−1 dξ = x − 1 + log x, x ∈]0, +∞[. The deformed A-exponential is expA = log−1 A which turns out to be the solution to the Cauchy problem e0 (y) = A(e(y)) = 1+1/(1+e(y)), e(0) = 1. In the spirit of [9, 8] we consider the curve in the space of positive measures on (X, X) given by t 7→ µt = expA(tu+logA p)·µ, where u ∈ L2 (µ). As expA(a+b) ≤ a+ + expA(b), each µt is a finite measure, µt(X) ≤ R (tu)+ dµ + 1, with µ0 = p · µ. The curve is actually continuous and differentiable because the pointwise derivative of the density pt = expA(tu+logA(p)) is ṗt = A(pt)u so that |ṗt| ≤ |u|. In conclusion µ0 = p and µ̇0 = u. Notice that there are two ways to normalize the density pt, either dividing by a normalizing constant Z(t) to get the statistical model t 7→ expA(tu − logA p)/Z(t) or, subtracting a constant ψ(t) from the argument to get the model t 7→ expA(tu−ψ(t)+logA(p)). In the standard exponential case the two methods lead to the same result, which is not the case for deformed exponentials where expA(α + β) 6= expA(α) expA(β). We choose in the present paper the latter option. 3 Deformed exponential family based on expA Here we use the ideas of [4, 9, 8] to construct deformed non-parametric exponen- tial families. Recall that we are given: the measure space (X, X, µ); the set P of probability densities; the function A(x) = x/(1 + x). Throughout this section, the density p ∈ P will be fixed. Proposition 1. 1. The mapping L1 (µ) 3 u 7→ expA(u + logA p) ∈ L1 (µ) has full domain and is 1-Lipschitz. Consequently, the mapping u 7→ Z g expA(u + logA p) dµ is kgk∞-Lipschitz for each bounded function g. 2. For each u ∈ L1 (µ) there exists a unique constant K(u) ∈ R such that expA(u − K(u) + logA p) · µ is a probability. 3. It holds K(u) = u if, and only if, u is constant. In such a case, expA(u − K(u) + logA p) · µ = p · µ . Otherwise, expA(u − K(u) + logA p) · µ 6= p · µ. 4. A density q takes the form q = expA(u − K(u) + logA p), with u ∈ L1 (µ) if, and only if, logA q − logA p ∈ L1 (µ). 5. If u, v ∈ L1 (µ) expA(u − K(u) + logA p) = expA(v − K(v) + logA p) , then u − v is constant. 6. The functional K : L1 (µ) → R is translation invariant. More specifically, c ∈ R implies K(u + c) = K(u) + cK(1). 7. The functional K : L1 (µ) → R is continuous and quasi-convex, namely all its sub-levels Lα =  u ∈ L1 (µ) K(u) ≤ α are convex. 8. K : L1 (µ) → R is convex. Proof. 1. As expA(u + logA p) ≤ u+ + p and so expA(u + logA p) ∈ L1 (µ) for all u ∈ L1 (µ). The estimate |expA(u + logA p) − expA(v + logA p)| ≤ |u − v| leads to the desired result. 2. For all κ ∈ R the integral I(κ) = R expA(u − κ + logA p) dµ is bounded by 1 + R (u − κ)+ dµ < ∞ and the function κ 7→ I(k) is continuous and strictly decreasing. Convexity of expA together with the equation for its derivative imply expA(u − κ + logA p) ≥ expA(u + logA p) − A(expA(u + logA p))κ, so that R expA(u − κ + logA p) dµ ≥ R expA(u + logA p) dµ − κ R A(expA(u + logA p)) dµ, where the coefficient of κ is positive. Hence limκ→−∞ R expA(u−κ+logA p) dµ = +∞. For each κ ≥ 0, we have expA(u− κ + logA p) ≤ expA(u + logA p) ≤ p + u+ so that by dominated convergence we get limκ→∞ I(κ) = 0. Therefore K(u) will be the unique value for which R expA(u − κ + logA p) dµ = 1. 3. If the function u is a constant, then R expA(u − u + logA p) dµ = R p dµ = 1 and so K(u) = u. The converse implication is trivial. The equality expA(u− K(u) + logA p) = p holds if, and only if, u − K(u) = 0. 4. If logA q = u − K(u) + logA p, then logA q − logA p = u − K(u) ∈ L1 (µ). Conversely, if logA q − logA p = v ∈ L1 (µ), then q = expA(v + logA p). As q is a density, then K(v) = 0. 5. If u − K(u) + logA p = v − K(v) + logA p, then u − v = K((u) − K(v). 6. Clearly, K(c) = c = cK(1) and K(u + c) = K(u) + c. 7. Observe that R expA(u + logA p) dµ ≤ 1 if, and only if, K(u) ≤ 0. Hence u1, u2 ∈ L0, implies R expA(ui + logA p) dµ ≤ 1, i = 1, 2. Thanks to the convexity of the function expA, we have R expA((1−α)u1+αu2)+logA p dµ ≤ (1−α) R expA(u1 +logA p) dµ+α R expA(u2 +logA p) dµ ≤ 1, that provides K((1−α)u1 +αu2) ≤ 0. Hence the sub-level L0 is convex. Notice that all the other sub-levels are convex since they are obtained by translation of L0. More precisely, Lα = L0 + α. Clearly both the sets R expA(u + logA p) dµ ≤ 1 and R expA(u + logA p) dµ ≥ 1 are closed in L1 (µ), since the functional u → R expA(u) dµ is continuous. Hence u → K(u) is continuous as well. 8. A functional which is translation invariant and quasiconvex is necessarily convex. Though this property is more or less known, a proof is gathered below. Lemma 1. A translation invariant functional on a vector space V , namely I : V → R such that for some v ∈ V one has I(x + λv) = I(x) + λI(v) for all x ∈ V and λ ∈ R, is convex if and only if I is quasiconvex, namely all level sets are convex, provided I(v) 6= 0. Proof. Let I be quasiconvex, then the sublevel L0 (I) = {x ∈ V : I (x) ≤ 0} is nonempty and convex. Clearly, Lλ (I) = L0 (I)+(λ/I(v))v holds for every λ ∈ R. Hence, if λ and µ are any pair of assigned real numbers and α ∈ (0, 1), ᾱ = 1−α, then αLλ (I) + ᾱLµ (I) = αL0 (I) + ᾱL0 (I) + αλ + ᾱµ I (v) v = L0 (I) + αλ + ᾱµ I (v) v = Lαλ+ᾱµ (I) . Therefore, if for any pair of points x, y ∈ V , we set I (x) = λ and I (y) = µ, then x ∈ Lλ (I) and y ∈ Lµ (I). Consequently αx + ᾱy ∈ αLλ (I) + ᾱLµ (I) = Lαλ+ᾱµ(I). That is, I (αx + ᾱy) ≤ αλ + ᾱµ = αI (x) + ᾱI (y) that shows the convexity of I. Of course the converse holds in that a convex function is quasi- convex. For each positive density q, define its escort density to be e q = A(q)/ R A(q) dµ, see [4]. Notice that 0 < A(q) < 1. The next proposition provides a subgradient of the convex function K. Proposition 2. Let v ∈ L1 (µ) and q(v) = expA(v − K(v) + logA p). For every u ∈ L1 (µ), the inequality K(u + v) − K(v) ≥ R ue q(v) dµ holds i.e., the density e q(v) ∈ L∞ (µ) is a subgradient of K at v. Proof. Thanks to convexity of expA and the derivation formula, we have expA(u + v − K(u + v) + logA p) − q ≥ A(q)(u − K(u + v) + K(v)) . If we take µ-integral of both sides, 0 ≥ Z uA(q) dµ − (K(u + v) − K(v)) Z A(q) dµ . Isolating the increment K(u + v) − K(v), the desired inequality obtains. By Prop. 2, if the functional K were differentiable, the gradient mapping would be v 7→ e q(v), whose strong continuity requires additional assumptions. We would like to show that K is differentiable by means of the Implicit Function Theorem. That too, would require specific assumptions. In fact, it is in general not true that a superposition operator such as L1 (µ) 3 u 7→ expA(u + logA p) ∈ L1 (µ) is differentiable, cf. [2, §1.2]. In this perspective, we prove the following. Proposition 3. 1. The superposition operator L2 (µ) 3 v 7→ expA(v+logA p) ∈ L1 (µ) is continuously Fréchet differentiable with derivative d expA(v) = (h 7→ A(expA(v + logA p))h) ∈ L(L2 (µ), L1 (µ)) . 2. The functional K : L2 (µ) → R, implicitly defined by the equation Z expA(v − K(v) + logA p) dµ = 1, v ∈ L2 (µ) is continuously Fréchet differentiable with derivative dK(v) = (h 7→ Z he q(v) dµ), q(v) = expA(v − K(v)) where e q(v) = A ◦ q(v) R A ◦ q(v) dµ is the escort density of p. Proof. 1. It is easily seen that expA(v + h + logA p) − expA(v + expA p) − A[expA(v + logA p)]h = R2(h), with the bound |R2(h)| ≤ (1/2) |h| 2 . It follows R |R2(h)| dµ R |h| 2 dµ 1 2 ≤ 1 2 R |h| 2 dµ R |h| 2 dµ 1 2 = 1 2 Z |h| 2 dµ 1 2 . Therefore kR2(h)kL1(µ) = o  khkL2(µ)  and so the operator v 7→ expA(v + logA p) is Fréchet-differentiable with derivative h 7→ A(expA(v + logA p))h at v. Let us show that the F-derivative is a continuous map L2 (µ) → L(L2 (µ), L1 (µ)). If khkL2(µ) ≤ 1 and v, w ∈ L2 (µ) we have Z |(A[expA(v + logA p)] − A[expA(w + logA p)])h| dµ ≤ kA[expA(v + logA p) − A[expA(w + logA p)]kL2(µ) ≤ kv − wkL2(µ) , hence the derivative is 1-Lipschitz. 2. Frechét differentiability of K is a consequence of the Implicit Function The- orem in Banach spaces, see [3], applied to the C1 -mapping L2 (µ) × R 3 (v, κ) 7→ Z expA(v − κ + logA p) dµ . The derivative can be easily obtained from the computation of the subgra- dient. In the expression q(u) = expA(u − K(u) + logA p), u ∈ L1 (µ), the random variable u is identified up to a constant. We can choose in the class a unique representative, by assuming R ue p dµ = 0, the expected value being well defined as the escort density is bounded. In this case we can solve for u and get u = logA q − logA p − Ee p [logA p − logA q] In analogy with the exponential case, we can express the functional K as a divergence associated to the N.J. Newton logarithm: K(u) = Ee p [logA p − logA q(u)] = DA(pkq(u)) . It would be interesting to proceed with the study of the convex conjugation of K and the related properties of the divergence, but do not do that here. 4 Hilbert bundle based on expA In this section A(x) = x/(1+x) and P(µ) denotes the set of all µ-densities on the probability space (X, X, µ) of the form q = expA(u − K(u)) with u ∈ L2 (µ) and Eµ [u] = 0, cf. [5]. Notice that 1 ∈ P(µ) because we can take u = 0. Equivalently, P(µ) is the set of all densities q such that logA q ∈ L2 (µ) because in such a case we can take u = logA q − Eµ [logA q]. The condition for q ∈ P(µ) can be expressed by saying that both q and log q are in L2 (µ). In fact, as expA is 1-Lipschitz, we have kq − 1kµ ≤ ku − K(u)kµ and the other inclusion follows from log q = logA q + 1 − q. An easy but important consequence of such a characterization is the compatibility of the class P(µ) with the product of measures. If qi = expA(ui − K1(ui) ∈ P(µi), i = 1, 2, the product is (q1 · µ1) ⊗ (q2 · µ2) = (q2 ⊗ q2) · (µ1 ⊗ µ2), hence q2 ⊗ q2 ∈ P(µ1 ⊗ µ2) since kq1 ⊗ q2kµ1⊗µ2 = kq1kµ1 kq2kµ2 . Moreover log (q1 ⊗ q2) = log q1+log q2, hence klog (q1 ⊗ q2)kµ1⊗µ2 ≤ klog q1kµ1 + klog q2kµ2 . We proceed now to define an Hilbert bundle with base P(µ). For each p ∈ P(µ) consider the Hilbert spaces Hp =  u ∈ L2 (µ) Ee p [u] = 0 with scalar product hu, vip = R uv dµ and form the Hilbert bundle HP(µ) = {(p, u)|p ∈ P(µ), u ∈ Hp} . For each p, q ∈ P(µ) the mapping Uq pu = u−Ee q [u] is a continuous linear mapping from Hp to Hq. We have Ur qUq p = Ur p. In particular, Up qUq p is the identity on Hp, hence Uq p is an isomorphism of Hp onto Hq. In the next proposition we construct an atlas of charts for which P(µ) is a Riemannian manifold and HP(µ) is an expression of the tangent bundle. In the following proposition we introduce an affine atlas of charts and use it to define our Hilbert bundle which is an expression of the tangent bundle. The velocity of a curve t 7→ p(t) ∈ P(µ) is expressed in the Hilbert bundle by the so called A-score that, in our case, takes the form A(p(t))−1 ṗ(t), with ṗ(t) computed in L1 (µ). Proposition 4. 1. q ∈ P(µ) if, and only if, both q and log q are in L2 (µ). 2. Fix p ∈ P(µ). Then a positive density q can be written as q = expA(v − Kp(v) + logA p), with v ∈ L2 (µ) and Ee p [v] = 0, if, and only if, q ∈ P(µ). 3. For each p ∈ P(µ) the mapping sp : P(µ) 3 q 7→ logA q − logA p − Ee p [logA q − logA p] ∈ Hp is injective and surjective, with inverse ep(u) = expA(u − Kp(u) + logA p). 4. The atlas {sp|p ∈ P(µ)} is affine with transitions sq ◦ ep(u) = Uq pu + sp(q) . 5. The expression of the velocity of the differentiable curve t 7→ p(t) ∈ P(µ) in the chart sp is dsp(p(t))/dt ∈ Hp. Conversely, given any u ∈ Hp, the curve p: t 7→ expA(tu − Kp(tu) + logA p) has p(0) = p and has velocity at t = 0 expressed in the chart sp by u. If the velocity of a curve is expressed in the chart sp by t 7→ u̇(t), then its expression in the chart sq is Uq pu̇(t). 6. If t 7→ p(t) ∈ P(µ) is differentiable with respect to the atlas then it is dif- ferentiable as a mapping in L1 (µ). It follows that the A-score is well-defined and is the expression of the velocity of the curve t 7→ p(t) in the moving chart t 7→ sp(t). Proof. 1. Assume q = expA(u − K(u)) with u ∈ L2 0(µ). It follows u − K(u) ∈ L2 (µ) hence q ∈ L2 (µ) because expA is 1-Lipschitz. As moreover q + log q − 1 = u−K(u) ∈ L2 (µ), then log q ∈ L2 (µ). Conversely, loga q = q−1+log q = v ∈ L2 (µ) and we can write q = expA v = expA((v − Ep [v]) + Ep [v] and we can take u = v − Eµ [v]. 2. The assumption p, q ∈ P(µ) is equivalent to logA p, logA q ∈ L2 (µ). Define u = logA q−logA p−Ee p [logA q − logA p] and DA(pkq) = Ee p [logA p − logA q]. It follows u ∈ L2 (µ), Ee p [u] = 0, and expA(u − DA(pkq) + logA p) = q. Conversely, logA q = u − Kp(u) + logA p ∈ L2 (µ). 3. This has been already proved. 4. All simple computations. 5. If p(t) = expA(u(t) − Kp(u(t)) + logA p), with u(t) = sp(u(t)) then in that chart the velocity is u̇(t) ∈ Hp. When u(t) = tu the expression of the velocity will be u. The proof of the second part follows from the fact that Uq p is the linear part of the affine change of coordinates sq ◦ ep. 6. Choose a chart sp and express the curve as t 7→ sp(p(t)) = u(t) so that p(t) = expA(u(t)−Kp(u(t))+logA p). It follows that the derivative of t 7→ p(t) exists in L1 (µ) by derivation of the composite function and it is given by ṗ(t) = A(p(t))U p(t) p u̇(t), hence A(p(t))−1 ṗ(t) = U p(t) p u̇(t). If the velocity at t is expressed in the chart centered at p(t), then its expression is the score. 5 Conclusions We have constructed an Hilbert statistical bundle using an affine atlas of charts based on the A-logarithm with A(x) = x/(1 + x). In particular, this entails a Riemannian manifold of densities. On the other end, our bundle structure could be useful in certain contexts. The general structure of the argument mimics the standard case of the exponential manifold. We would like to explicit some, hopefully new, features of our set-up. The proof of the convexity and continuity of the functional K when defined on L1 (µ) relies on the property of translation invariance. Whenever K is restricted to L2 (µ), it is shown to be differentiable along with the deformed exponential and this, in turn, provides a rigorous construction of the A-score. The gradient mapping of K is continuous and 1-to-1, but its inverse cannot be continuous as it takes values which are bounded functions. It would be interesting to analyze the analytic properties of the convex conjugate of K∗ , as both K and K∗ are the coordinate expression of relevant divergences. If F is a section of the Hilbert bundle namely, F : P(µ) → L2 (µ) with Ee p [F(p)] = 0 for all p, differential equations take the form A(p(t))ṗ(t) = F(p(t)) in the atlas, which in turn implies ṗ(t) = A(p(t))F(p(t)) in L1 (µ). This is impor- tant for some applications e.g., when the section F is the gradient with respect to the Hilbert bundle of a real function. Namely, the gradient, grad φ, of a smooth function φ: P(µ) → R is a section of the Hilbert bundle such that d dt φ(p(t)) = hgrad φ(p(t)), A(p(t))ṗ(t)iµ for each differentiable curve t 7→ p(t) ∈ P(µ). Acknowledgments L. Montrucchio acknowledges the support of Collegio Carlo Alberto Foundation. G. Pistone is a member of GNAFA-INDAM and acknowl- edges the support of de Castro Statistics Foundation and Collegio Carlo Alberto Foundation. References 1. Amari, S.: Dual connections on the Hilbert bundles of statistical models. In: Ge- ometrization of statistical theory (Lancaster, 1987). pp. 123–151. ULDM Publ., Lancaster (1987) 2. Ambrosetti, A., Prodi, G.: A primer of nonlinear analysis, Cambridge Studies in Advanced Mathematics, vol. 34. Cambridge University Press, Cambridge (1993) 3. Dieudonné, J.: Foundations of Modern Analysis. Academic press, New York (1960) 4. Naudts, J.: Generalised thermostatistics. Springer-Verlag London Ltd., London (2011) 5. Newton, N.J.: An infinite-dimensional statistical manifold modelled on Hilbert space. J. Funct. Anal. 263(6), 1661–1681 (2012) 6. Pistone, G.: Nonparametric information geometry. In: Geometric science of infor- mation, Lecture Notes in Comput. Sci., vol. 8085, pp. 5–36. Springer, Heidelberg (2013) 7. Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Statist. 23(5), 1543– 1561 (October 1995) 8. Schwachhöfer, L., Ay, N., Jost, J., Lê, H.V.: Parametrized measure models. Bernoulli (online-to appear) 9. Vigelis, R.F., Cavalcante, C.C.: On φ-families of probability distributions. Journal of Theoretical Probability 26, 870–884 (2013) 10. Zhang, J., Hästö, P.: Statistical manifold as an affine space: a functional equation approach. Journal of Mathematical Psychology 50(1), 60–65 (2006)