The Cramér-Rao inequality on singular statistical models

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22633
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

We introduce the notions of essential tangent space and reduced Fisher metric and extend the classical Cramér-Rao inequality to 2-integrable (possibly singular) statistical models for general ϕ-estimators, where ϕ is a V-valued feature function and V is a topological vector space. We show the existence of a ϕ-efficient estimator on strictly singular statistical models associated with a finite sample space and on a class of infinite dimensional exponential models that have been discovered by Fukumizu. We conclude that our general Cramér-Rao inequality is optimal.

The Cramér-Rao inequality on singular statistical models

Collection

application/pdf The Cramér-Rao inequality on singular statistical models (slides)
application/pdf The Cramér-Rao inequality on singular statistical models Hong Van Le, Jürgen Jost, Lorenz Schwachhöfer
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

We introduce the notions of essential tangent space and reduced Fisher metric and extend the classical Cramér-Rao inequality to 2-integrable (possibly singular) statistical models for general ϕ-estimators, where ϕ is a V-valued feature function and V is a topological vector space. We show the existence of a ϕ-efficient estimator on strictly singular statistical models associated with a finite sample space and on a class of infinite dimensional exponential models that have been discovered by Fukumizu. We conclude that our general Cramér-Rao inequality is optimal.
The Cramér-Rao inequality on singular statistical models

Média

Voir la vidéo

Métriques

0
0
184.62 Ko
 application/pdf
bitcache://ec3ddcc06a6bca9f6864464a1e3565b1249e3897

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22633</identifier><creators><creator><creatorName>Lorenz Schwachhöfer</creatorName></creator><creator><creatorName>Jürgen Jost</creatorName></creator><creator><creatorName>Hong Van Le</creatorName></creator></creators><titles>
            <title>The Cramér-Rao inequality on singular statistical models</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Fri 9 Mar 2018</date>
	    <date dateType="Updated">Fri 9 Mar 2018</date>
            <date dateType="Submitted">Tue 13 Nov 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">ec3ddcc06a6bca9f6864464a1e3565b1249e3897</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37390</version>
        <descriptions>
            <description descriptionType="Abstract">
We introduce the notions of essential tangent space and reduced Fisher metric and extend the classical Cramér-Rao inequality to 2-integrable (possibly singular) statistical models for general ϕ-estimators, where ϕ is a V-valued feature function and V is a topological vector space. We show the existence of a ϕ-efficient estimator on strictly singular statistical models associated with a finite sample space and on a class of infinite dimensional exponential models that have been discovered by Fukumizu. We conclude that our general Cramér-Rao inequality is optimal.

</description>
        </descriptions>
    </resource>
.

The Cramér-Rao inequality on singular statistical models Hông Vân Lê?1 , Jürgen Jost2 and Lorenz Schwachhöfer3 1 Mathematical Institute of ASCR, Zitna 25, 11567 Praha, Czech Republic, 2 Max-Planck-Institut für Mathematik in den Naturwissenschaften, Inselstrasse 22, 04103 Leipzig, Germany, 3 Technische Universität Dortmund, Vogelpothsweg 87, 44221 Dortmund, Germany Abstract. We introduce the notions of essential tangent space and re- duced Fisher metric and extend the classical Cramér-Rao inequality to 2- integrable (possibly singular) statistical models for general ϕ-estimators, where ϕ is a V -valued feature function and V is a topological vector space. We show the existence of a ϕ-efficient estimator on strictly sin- gular statistical models associated with a finite sample space and on a class of infinite dimensional exponential models that have been discov- ered by Fukumizu. We conclude that our general Cramér-Rao inequality is optimal. 1 k-integrable parametrized measure models and the reduced Fisher metric In this section we recall the notion of a k-integrable parametrized measure model (Definitions 1, 3). Then we give a characterization of k-integrability (Theorem 1), which is important for later deriving the classical Cramér-Rao inequalities from our general Cramér-Rao inequality. Finally we introduce the notion of essential tangent space of a 2-integrable parametrized measure model (Definition 4) and the related notion of reduced Fisher metric. Notations. For a measurable space Ω and a finite measure µ0 on Ω we denote P(Ω) := {µ : µ a probability measure on Ω}, M(Ω) := {µ : µ a finite measure on Ω}, S(Ω) := {µ : µ a signed finite measure on Ω}, S(Ω, µ0) = {µ = φ µ0 : φ ∈ L1 (Ω, µ0)}. Definition 1. ([AJLS2016b, Definition 4.1]) Let Ω be a measurable space. 1. A parametrized measure model is a triple (M, Ω, p) where M is a (finite or infinite dimensional) Banach manifold and p : M → M(Ω) ⊂ S(Ω) is a Frechét-C1 -map, which we shall call simply a C1 -map. ? speaker, partially supported by RVO: 6798584 2 2. The triple (M, Ω, p) is called a statistical model if it consists only of prob- ability measures, i.e., such that the image of p is contained in P(Ω). 3. We call such a model dominated by µ0 if the image of p is contained in S(Ω, µ0). In this case, we use the notation (M, Ω, µ0, p) for this model. Let (M, Ω, p) be a parametrized measure model. It follows from [AJLS2016b, Proposition 2.1] that for all ξ ∈ M the differential dξp(V ) is dominated by p(ξ). Hence the logarithmic derivative of p at ξ in direction V [AJLS2016b, (4.2)] ∂V log p(ξ) := d{dξp(V )} dp(ξ) (1) is an element in L1 (Ω, p(ξ)). If measures p(ξ), ξ ∈ M, are dominated by µ0, we also write p(ξ) = p(ξ) · µ0 for some p(ξ) ∈ L1 (Ω, p0). (2) Definition 2. ([AJLS2016b, Definition 4.2]) We say that a parametrized model (M, Ω, µ0, p) has a regular density function if the density function p : Ω×M → R satisfying (2) can be chosen such that for all V ∈ TξM the partial derivative ∂V p(.; ξ) exists and lies in L1 (Ω, µ0) for some fixed µ0. If the model has a positive regular density function, we have ∂V log p(ξ) = ∂V log p. (3) Next we recall the notion of k-integrability. On the set M(Ω) we define the preordering µ1 ≤ µ2 if µ2 dominates µ1. Then (M(Ω), ≤) is a directed set, meaning that for any pair µ1, µ2 ∈ M(Ω) there is a µ0 ∈ M(Ω) dominating both of them (e.g. µ0 := µ1 + µ2). For fixed r ∈ (0, 1] and measures µ1 ≤ µ2 on Ω we define the linear embedding ıµ1 µ2 : L1/r (Ω, µ1) −→ L1/r (Ω, µ2), φ 7−→ φ µ dµ1 dµ2 ¶r . Observe that kıµ1 µ2 (φ)k1/r = ¯ ¯ ¯ ¯ Z Ω |ıµ1 µ2 (φ)|1/r dµ2 ¯ ¯ ¯ ¯ r = ¯ ¯ ¯ ¯ Z Ω |φ|1/r dµ1 dµ2 dµ2 ¯ ¯ ¯ ¯ r (4) = ¯ ¯ ¯ ¯ Z Ω |φ|1/r dµ1 ¯ ¯ ¯ ¯ r = kφk1/r. It has been proved that ıµ1 µ2 is an isometry [AJLS2016b, (2.6)]. Moreover, ıµ1 µ2 ıµ2 µ3 = ıµ1 µ3 whenever µ1 ≤ µ2 ≤ µ3. Then we define the space of r-th roots of measures on Ω to be the directed limit over the directed set (M(Ω), ≤) Sr (Ω) := lim −→ L1/r (Ω, µ). (5) By [AJLS2016b, (2.9)] the space Sr (Ω) is a Banach space provided with the norm ||φ||1/r defined in (4). 3 Denote the equivalence class of φ ∈ L1/r (Ω, µ) by φµr , so that µr ∈ Sr (Ω) is the equivalence class represented by 1 ∈ L1/r (Ω, µ). In [AJLS2016b, Proposition 2.2], for r ∈ (0, 1] and 0 < k ≤ 1/r we defined a map π̃k : Sr (Ω) → Srk (Ω), φ · µr 7→ sign(φ)|φ|k µrk . For 1 ≤ k ≤ 1/r the map π̃k is a C1 -map between Banach spaces [AJLS2016b, (2.13)]. Using the same analogy, we set [AJLS2016b, (4.3)] p1/k := π̃1/k ◦ p : M → S1/k (Ω) (6) and dξp1/k (V ) := 1 k ∂V log p(ξ) p1/k (ξ) ∈ S1/k (Ω, p(ξ)). (7) Definition 3. ([JLS2017a, Definition 2.6]) A parametrized measure model (M, Ω, p) is called k-integrable, if the map p1/k from (6) is a Fréchet-C1 -map. The k-integrability of parametrized measure models can be characterized in different ways. Theorem 1. ([JLS2017a, Theorem 2.7]) Let (M, Ω, p) be a parametrized mea- sure model. Then the model is k-integrable if and only if the map V 7−→ kdp1/k (V )kk < ∞ (8) defined on TM is continuous. Thus, (M, Ω, p) is k-integrable if and only if the map dp1/k : M → S1/k (Ω) from (7) is well defined (i.e., ∂V log p(ξ) ∈ Lk (Ω, p(ξ))) and continuous. In particular, the definition of k-integrability in Definition 3 above is equivalent to that in [AJLS2016b, Definition 4.4] and [AJLS2015, Definition 2.4]. Remark 1. 1. The Fisher metric g on a 2-integrable parametrized measure model (M, Ω, p) is defined as follows for v, w ∈ TξM gξ(v, w) := h∂v log p; ∂w log piL2(Ω,p(ξ)) = hdp1/2 (v); dp1/2 (w)iS1/2(Ω) (9) 2. The standard notion of a statistical model always assumes that it is dominated by some measure and has a positive regular density function (e.g. [Borovkov1998, p. 140, p.147], [BKRW1998, p. 23],[AN2000, §2.1], [AJLS2015, Definition 2.4]). In fact, the definition of a parametrized measure model or statis- tical model in [AJLS2015, Definition 2.4] is equivalent to a parametrized measure model or statistical model with a positive regular density function in the sense of Definition 2. Let (M, Ω, p) be a 2-integrable parametrized measure model. Formula (9) shows that the kernel of the Fisher metric g at ξ ∈ M coincides with the ker- nel of the map Λξ : TξM → L2 (Ω, p(ξ)), V 7→ ∂V (log p). In other words, the degeneracy of the Fisher metric g is caused by the non-effectiveness of the para- metrisation of the family p(ξ) by the map p. The tangent cone Tp(ξ)p(M) of the image p(M) ⊂ S(Ω) is isomorphic to the quotient TξM/ ker Λx. This motivates the following 4 Definition 4. ([JLS2017a, Definition 2.9]) The quotient T̂ξM := TξM/ ker Λξ will be called the essential tangent space of M at ξ. Clearly the Fisher metric g descends to a non-degenerated metric ĝ on T̂M, which we shall call the reduced Fisher metric. Denote by T̂ĝ M the fiberwise completion of T̂M wrt the reduced Fisher metric ĝ. Its inverse ĝ−1 is a well- defined quadratic form on the fibers of the dual bundle T̂∗,ĝ−1 M, which we can therefore identify with T̂ĝ M. 2 The general Cramér-Rao inequality In this section we assume that (M, Ω, p) is a 2-integrable measure model. We introduce the notion of a regular function on a measure space Ω (Definition 5), state a rule of differentiation under integral sign (Proposition 1) and derive a general Cramér-Rao inequality (Theorem 2). We set for k ∈ N+ Lk M (Ω) := {f ∈ Lk (Ω, p(ξ)) for all ξ ∈ M}. Definition 5. Let (M, Ω, p) be a parametrized measure model. We call an el- ement f ∈ Lk M (Ω) regular if the function ξ 7→ kfkLk(Ω,p(ξ)) is locally bounded, i.e. if for all ξ0 ∈ M lim sup ξ→ξ0 kfkLk(Ω,p(ξ)) < ∞. The regularity of a function f is important for the validity of differentiation under the integral sign. Proposition 1. Let k, k0 > 1 be dual indices, i.e. k−1 + k0−1 = 1, and let (M, Ω, p) be a k0 -integrable parametrized measure model. If f ∈ Lk M (Ω) is regu- lar, then the map M −→ R, ξ 7−→ Ep(ξ)(f) = Z Ω f dp(ξ) (10) is Gatêaux-differentiable, and for X ∈ TM the Gâteaux-derivative is ∂XEp(ξ)(f) = Ep(ξ)(f ∂X log p(ξ)) = Z Ω f ∂X log p(ξ) dp(ξ). (11) Let V be a topological vector space over the real field R, possibly infinite dimensional. We denote by V M the vector space of all V -valued functions on M. A V -valued function ϕ will stand for the coordinate functions on M, or in general, a feature of M (cf. [BKRW1998]). Let V ∗ denote the dual space of V . Later, for l ∈ V ∗ we denote the composition l◦ϕ by ϕl . This should be considered as the l-th coordinate of ϕ. Assume that (M, Ω, p) is a 2-integrable parametrized measure model. A Gateaux-differentiable function f on M whose differential df vanishes on ker dp ⊂ TP will be called a visible function. 5 Recall that an estimator is a map σ̂ : Ω → M. If k, k0 > 1 are dual indices, i.e., k−1 + k0−1 = 1, and given a k0 -integrable parametrized measure model (M, Ω, p) and a function ϕ ∈ V M , we define Lk ϕ(M, Ω) := {σ̂ : Ω → M | ϕl ◦ σ̂ ∈ Lk M (Ω) for all l ∈ V ∗ }. We call an estimator σ̂ ∈ Lk ϕ(M, Ω) ϕ-regular if ϕl ◦ σ̂ ∈ Lk M (Ω) is regular for all l ∈ V ∗ . Any σ̂ ∈ Lk ϕ(M, Ω) induces a V ∗∗ -valued function ϕσ̂ on M by computing the expectation of the composition ϕ ◦ σ̂ as follows hϕσ̂(ξ), li := Ep(ξ)(ϕl ◦ σ̂) = Z Ω ϕl ◦ σ̂ dp(ξ) (12) for any l ∈ V ∗ . If σ̂ ∈ Lk ϕ(M, Ω) is ϕ-regular, then Proposition 1 immediately implies that ϕσ̂ : M → V ∗∗ is visible with Gâteaux-derivative h∂Xϕσ̂(ξ), li = Z Ω ϕl ◦ σ̂ · ∂X log p(ξ) p(ξ). (13) Let pr : TM → T̂M denote the natural projection. Definition 6. ([JLS2017a, Definition 3.8]) A section ξ 7→ ∇ĝf(ξ) ∈ T̂ĝ ξ M will be called the generalized Fisher gradient of a visible function f, if for all X ∈ TξM we have df(X) = ĝ(pr(X), ∇ĝf). If the generalized gradient belongs to T̂M we will call it the Fisher gradient. We set (cf. [Le2016]) Lk 1(Ω) := {(f, µ)| µ ∈ M(Ω) and f ∈ Lk (Ω, µ)}. For a map p : P → M(Ω) we denote by p∗ (Lk 1(Ω)) the pull-back “fibration” (also called the fiber product) P ×M(Ω) Lk 1(Ω). Definition 7. ([JLS2017a, Definition 3.10]) Let h be a visible function on M. A section M → p∗ (L2 1(Ω)), ξ 7→ ∇hξ ∈ L2 (Ω, p(ξ)), is called a pre-gradient of h, if for all ξ ∈ M and X ∈ TξM we have dh(X) = Ep(ξ)((∂X log p) · ∇hξ). Proposition 2. ([JLS2017a, Proposition 3.12]) 1. Let (M, Ω, p) be a 2-integrable measure model and f ∈ L2 M (Ω, V ) is a regular function. Then the section of the pullback fibration p∗ (L2 1(Ω)) defined by ξ 7→ f ∈ L2 (Ω, p(ξ)) is a pre-gradient of the visible function Ep(ξ)(f). 2. Let (P, Ω, p) be a 2-integrable statistical model and f ∈ L2 P (Ω, V ). Then the section of the pullback fibration p∗ (L2 1(Ω)) defined by ξ 7→ f − Ep(ξ)(f) ∈ L2 (Ω, p(ξ)) is a pre-gradient of the visible function Ep(ξ)(f). 6 For an estimator σ̂ ∈ L2 ϕ(P, Ω) we define the variance of σ̂ w.r.t. ϕ to be the quadratic form V ϕ p(ξ)[σ̂] on V ∗ such that for all l, k ∈ V ∗ we have [JLS2017a, (4.3)] V ϕ p(ξ)[σ̂](l, k) := Ep(ξ)[(ϕl ◦ σ̂ − Ep(ξ)(ϕl ◦ σ̂)) · (ϕk ◦ σ̂ − Ep(ξ)(ϕk ◦ σ̂))]. (14) We regard ||dϕl σ̂||2 ĝ−1 (ξ) as a quadratic form on V ∗ and denote the latter one by (ĝϕ σ̂ )−1 (ξ), i.e. (ĝϕ σ̂ )−1 (ξ)(l, k) := hdϕl σ̂, dϕk σ̂iĝ−1 (ξ). Theorem 2 (General Cramér-Rao inequality). ([JLS2017a, Theorem 4.4]) Let (P, Ω, p) be a 2-integrable statistical model, ϕ a V -valued function on P and σ̂ ∈ L2 ϕ(P, Ω) a ϕ-regular estimator. Then the difference V ϕ p(ξ)[σ̂] − (ĝϕ σ̂ )−1 (ξ) is a positive semi-definite quadratic form on V ∗ for any ξ ∈ P. Remark 2. Assume that V is finite dimensional and ϕ is a coordinate mapping. Then g = ĝ, dϕl = dξl , and abbreviating bϕ σ̂ as b, we write (gϕ σ̂ )−1 (ξ)(l, k) = h X i ( ∂ξl ∂ξi + ∂bl ∂ξi )dξi , X j ( ∂ξk ∂ξj + ∂bk ∂ξj )dξj ig−1 (ξ). (15) Let D(ξ) be the linear transformation of V whose matrix coordinates are D(ξ)l k := ∂bl ∂ξk . Using (15) we rewrite the Cramér-Rao inequality in Theorem 2 as follows Vξ[σ̂] ≥ (E + D(ξ))g−1 (ξ)(E + D(ξ))T . (16) The inequality (16) coincides with the Cramér-Rao inequality in [Borovkov1998, Theorem 1.A, p. 147]. By Theorem 1, the condition (R) in [Borovkov1998, p. 140, 147] for the validity of the Cramér-Rao inequality is essentially equivalent to the 2-integrability of the (finite dimensional) statistical model with positive density function under consideration, more precisely Borokov ignores/excludes the points x ∈ Ω where the density function vanishes for computing the Fisher metric. Borovkov also uses the ϕ-regularity assumption, written as Eθ((θ∗ )2 ) < c < ∞ for θ ∈ Θ, see also [Borovkov1998, Lemma 1, p. 141] for a more precise formulation. Classical versions of Cramér-Rao inequalities, as in e.g. [CT2006], [AN2000], are special cases of the Cramér-Rao inequality in [Borovkov1998]. We refer the reader to [JLS2017a] for comparison of our Cramér-Rao inequality with more recent Cramér-Rao inequalities in parametric statistics. 3 Optimality of the general Cramér-Rao inequality To investigate the optimality of our general Cramér-Rao inequality we introduce the following 7 Definition 8. ([JLS2017b]) Assume that ϕ is a V -valued function on P, where (P, Ω, p) is a 2-integrable statistical model. A ϕ-regular estimator σ̂ ∈ L2 ϕ(P, Ω) will be called ϕ-efficient, if V ϕ p(ξ) = (ĝϕ σ̂ )−1 (ξ) for all ξ ∈ P. If a statistical model (P, Ω, p) admits a ϕ-efficient estimator, the Cramér-Rao inequality is optimal on (P, Ω, p). Example 1. Assume that (P ⊂ Rn , Ω ⊂ Rn , p) is a minimal full regular expo- nential family, ϕ : P → Rn - the canonical embedding P → Rn , and σ̂ : Ω → P - the mean value parametrization. Then it is well known that σ̂ is an unbi- ased ϕ-efficient estimator, see e.g. [Brown1986, Theorem 3.6, p. 74]. Let S be a submanifold in P and f : P0 → P is a blowing-up of P along S, i.e. f is a smooth surjective map such that ker df is non -trivial exactly at f−1 (S). Then (P0 , Ω, p ◦ f) is a strictly singular statistical model which admits an unbiased ϕ-efficient estimator, since (P, Ω, p) admits unbiased ϕ-efficient estimator. Example 2. Let Ωn be a finite set of n elements. Let A : Ωn → Rd + be a map, where d ≤ m − 1. We define an exponential family PA (·|θ) ⊂ M(Ωm) with parameter θ in Rd as follows. PA (x|θ) = ZA(θ) · exphθ, A(x)i, for θ ∈ Rd , and x ∈ Ωm. (17) Here ZA(θ) is the normalizing factor such that PA (·|θ) · µ0 is a probability measure, where µ0 is the counting measure on Ωm: µ0(xi) = 1 for xi ∈ Ωm. Denote Al (x) := hl, A(x)i for l ∈ (Rd )∗ . We set σ̂ : Ωn → Rd , x 7→ log A(x) := (log A1 (x), · · · , log Ad (x)), ϕ : Rd → Rd + ⊂ Rd , θ 7→ exp θ. Then σ̂ is a (possibly biased) ϕ-efficient estimator [JLS2017b]. Using blowing- up, we obtain strictly singular statistical models admitting (possibly biased) ϕ-efficient estimators. In [Fukumizu2009] Fukumizu constructed a large class of infinite dimensional exponential families using reproducing kernel Hilbert spaces (RKHS). Assume that Ω is a topological space and µ is a Borel probability measure such that sppt(µ) = Ω. Let k : Ω × Ω → R be a continuous positive definite kernel on Ω. It is known that for a positive definite kernel k on Ω there exists a unique RKHS Hk such that 1. Hk consists of functions on Ω, 2. Functions of the form Pm i=1 aik(·, xi) are dense in Hk, 3. For all f ∈ Hk we have hf, k(·, x)i = f(x) for all x ∈ Ω, 4. Hk contains the constant functions c|Ω, c ∈ R. For a given positive definite kernel k on Ω we set k̂ : Ω → Hk, k̂(x) := k(·, x). 8 Theorem 3. ([JLS2017b]) Assume that Ω is a complete topological space and µ is a Borel probability measure with sppt(µ) = Ω. Suppose that a kernel k on Ω is bounded and satisfies the following relation whenever x, y ∈ Ω k̂(x) − k̂(y) = c|Ω ∈ Hk =⇒ c|Ω = 0 ∈ Hk. Let Pµ := {f ∈ L1 (Ω, µ) ∩ C0 (Ω)| f > 0 and Z Ω fdµ = 1}. Set p : Pµ → M(Ω), f 7→ f · µ0. Then there exists a map ϕ : Pµ → Hk such that (Pµ, Ω, p) admits a ϕ-efficient estimator. References [AJLS2015] N. Ay, J. Jost, H. V. Lê, and L. Schwachhöfer, Information geometry and sufficient statistics, Probability Theory and related Fields, 162 (2015), 327-364. [AJLS2016] N. Ay, J. Jost, H. V. Lê, and L. Schwachhöfer, Information geome- try, Springer 2017 (in press). [AJLS2016b] N. Ay, J. Jost, H. V. Lê, and L. Schwachhöfer, Parametrized mea- sure models, (accepted for Bernoulli Journal), arXiv:1510.07305. [Amari2016] S. Amari, Information Geometry and Its Applications, Springer, Applied Mathematical Sciences, Volume 194, 2016. [AN2000] S. Amari, H. Nagaoka, Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society, 2000. [BKRW1998] P. Bickel, C. A. J. Klaassen, Y. Ritov, J. A. Wellner, Efficient and Adaptive Estimation for Semiparametric Models, Springer, 1998. [Borovkov1998] A. A. Borovkov, Mathematical statistics, Gordon and Breach Sci- ence Publishers, 1998. [Brown1986] L. D. Brown, Fundamentals of Statistical Families with Applications in Statistical Decision Theory, Lecture Notes-Monograph Series, vol. 9, IMS, 1986. [CT2006] T. M. Cover and J. A. Thomas, Elements of Information theory, Wiley and Sons, second edition, 2006. [Fukumizu2009] K. Fukumizu, Exponential manifold by reproducing kernel Hilbert spaces. In P. Gibilisco, E. Riccomagno, M.-P. Rogantin, and H. Winn, editors, Al- gebraic and Geometric methods in Statistics, pages 291-306, Cambridge University Press, 2009. [JLS2017a] J. Jost, H. V. Lê and L. Schwachhöfer, The Cramér-Rao inequality on singular statistical models I, arXiv:1703.09403. [JLS2017b] J. Jost, H. V. Lê and L. Schwachhöfer, The Cramér-Rao inequality on singular statistical models II, preprint (2017). [Le2016] H.V. Lê, The uniqueness of the Fisher metric as information metric, AISM, 69 (2017), 879-896. [Watanabe2007] S. Watanabe, Almost all learning machines are singular, In the Pro- ceedings of the IEEE Int. Conf. FOCI, pages 383-388, 2007. [Watanabe2009] S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge University Press, 2009.