Warped metrics for location-scale models

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22336
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

This paper argues that a class of Riemannian metrics, called warped metrics, plays a fundamental role in statistical problems involving location-scale models. The paper reports three new results : i) the Rao-Fisher metric of any location-scale model is a warped metric, provided that this model satis es a natural invariance condition, ii) the analytic expression of the sectional curvature of this metric, iii) the exact analytic solution of the geodesic equation of this metric. The paper applies these new results to several examples of interest, where it shows that warped metrics turn location-scale models into complete Riemannian manifolds of negative sectional curvature. This is a very suitable situation for developing algorithms which solve problems of classification and on-line estimation. Thus, by revealing the connection between warped metrics and location-scale models, the present paper paves the way to the introduction of new ecient statistical algorithms.

Warped metrics for location-scale models

Collection

application/pdf Warped metrics for location-scale models Salem Said, Yannick Berthoumieu
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Warped metrics for location-scale models

Média

Voir la vidéo

Métriques

0
0
222.82 Ko
 application/pdf
bitcache://636159c5b40e988ef9b8a449ff86bc42324e8647

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
logo_gdr-mia.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22336</identifier><creators><creator><creatorName>Yannick Berthoumieu</creatorName></creator><creator><creatorName>Salem Said</creatorName></creator></creators><titles>
            <title>Warped metrics for location-scale models</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Sectional curvature</subject><subject>Rao-Fisher metric</subject><subject>warped metric</subject><subject>location-scale model</subject><subject>geodesic equation</subject></subjects><dates>
	    <date dateType="Created">Sat 17 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Wed 19 Sep 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">636159c5b40e988ef9b8a449ff86bc42324e8647</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>36996</version>
        <descriptions>
            <description descriptionType="Abstract">This paper argues that a class of Riemannian metrics, called warped metrics, plays a fundamental role in statistical problems involving location-scale models. The paper reports three new results : i) the Rao-Fisher metric of any location-scale model is a warped metric, provided that this model satis es a natural invariance condition, ii) the analytic expression of the sectional curvature of this metric, iii) the exact analytic solution of the geodesic equation of this metric. The paper applies these new results to several examples of interest, where it shows that warped metrics turn location-scale models into complete Riemannian manifolds of negative sectional curvature. This is a very suitable situation for developing algorithms which solve problems of classification and on-line estimation. Thus, by revealing the connection between warped metrics and location-scale models, the present paper paves the way to the introduction of new ecient statistical algorithms.
</description>
        </descriptions>
    </resource>
.

Warped metrics for location-scale models Salem Said, Yannick Berthoumieu Laboratoire IMS (CNRS - UMR 5218), Université de Bordeaux salem.said;yannick.berthoumieu@ims-bordeaux.fr Abstract. This paper argues that a class of Riemannian metrics, called warped metrics, plays a fundamental role in statistical problems involv- ing location-scale models. The paper reports three new results : i) the Rao-Fisher metric of any location-scale model is a warped metric, pro- vided that this model satisfies a natural invariance condition, ii) the analytic expression of the sectional curvature of this metric, iii) the ex- act analytic solution of the geodesic equation of this metric. The paper applies these new results to several examples of interest, where it shows that warped metrics turn location-scale models into complete Rieman- nian manifolds of negative sectional curvature. This is a very suitable situation for developing algorithms which solve problems of classifica- tion and on-line estimation. Thus, by revealing the connection between warped metrics and location-scale models, the present paper paves the way to the introduction of new efficient statistical algorithms. Keywords: Rao-Fisher metric, warped metric, location-scale model, sectional curvature, geodesic equation 1 Introduction : definition and two examples This paper argues that a class of Riemannian metrics, called warped metrics, is natural and useful to statistical problems involving location-scale models. A warped metric is defined as follows [1]. Let M be a Riemannian manifold with Riemannian metric ds2 M . Consider the manifold M = M × (0 , ∞) , equipped with the Riemannian metric, ds2 (z) = I0(σ) dσ2 + I1(σ) ds2 M (x̄) (1) where each z ∈ M is a couple (x̄ , σ) with x̄ ∈ M and σ ∈ (0 , ∞). The Rieman- nian metric (1) is called a warped metric on M . The functions I0 and I1 have strictly positive values and are part of the definition of this metric. The main claim of this paper is that warped metrics arise naturally as Rao- Fisher metrics for a variety of location-scale models. Here, to begin, two examples of this claim are given. Example 1 is classic, while Example 2, to our knowledge, is new in the literature. As of now, the reader is advised to think of M as a statistical manifold, where x̄ is a location parameter and σ is either a scale parameter or a concentration parameter. Example 1 (univariate normal model) : let M = R, with ds2 M (x̄) = dx̄2 the canonical metric of R . If each z = (x̄ , σ) in M is identified with the univariate normal density of mean x̄ and standard deviation σ , then the resulting Rao- Fisher metric on M is given by [2] ds2 (z) = σ−2 dσ2 + 1 2 σ−2 dx̄2 (2) Example 2 (von Mises-Fisher model) : let M = S 2 , the unit sphere with ds2 M = dθ2 its canonical metric induced from R3 . Identify z = (x̄ , σ) in M with the von Mises-Fisher density of mean direction x̄ and concentration parameter σ [3]. The resulting Rao-Fisher metric on M is given by ds2 (z) = ( σ−2 − sinh−2 σ ) dσ2 + ( σ coth σ − 1 ) dθ2 (x̄) (3) Remark a : note that σ is a scale parameter in Example 1, but a concentration parameter in Example 2. Accordingly, at σ = 0, the metric (2) becomes infinite, while the metric (3) remains finite and degenerates to ds2 (z) |σ=0 = (1/3) dσ2 . Thus, (3) gives a Riemannian metric on the larger Riemannian manifold M̂ = R3 , which contains M, obtained by considering σ as a radial coordinate and σ = 0 as the origin of R3 .  2 A general theorem : from Rao-Fisher to warped metrics Examples 1 and 2 of the previous section are special cases of Theorem 1, given here. To state this theorem, let (M, ds2 M ) be an irreducible Riemannian homo- geneous space, under the action of a group of isometries G [4]. Denote by g · x the action of g ∈ G on x ∈ M. Then, assume each z = (x̄ , σ) in M can be identified uniquely and regularly with a probability density p(x| z) = p(x| x̄ , σ) on M, with respect to the Riemannian volume element, such that the following property is verified, p(g · x| g · x̄ , σ) = p(x| x̄ , σ) g ∈ G (4) The densities p(x| x̄ , σ) form a statistical model on M, where x̄ is a location parameter and σ can be chosen as either a scale or a concentration parameter, (roughly, a scale parameter is the inverse of a concentration parameter). In the statement of Theorem 1, `(z) = log p(x| z) and ∇x̄ `(z) denotes the Riemannian gradient vector field of `(z), with respect to x̄ ∈ M. Moreover, k∇x̄`(z)k denotes the length of this vector field, as measured by the metric ds2 M . Theorem 1 (warped metrics). The Rao-Fisher metric of the statistical model { p(x| z) ; z ∈ M } is a warped metric of the form (1), defined by I0(σ) = Ez ( ∂σ`(z) ) 2 I1(σ) = Ez k∇x̄`(z)k 2  dim M (5) where Ez denotes expectation with respect to p(x| z) . Due to property (4), the two expectations appearing in (5) do not depend on the parameter x̄, so I0 and I1 are well-defined functions of σ. Remark b : the proof of Theorem 1 cannot be given here, due to lack of space. It relies strongly on the assumption that the Riemannian homogeneous space M is irreducible. In particular, this allows the application of Schur’s lemma, from the theory of group representations [5]. To say that M is an irreducible Rieman- nian homogeneous space means that the following property is verified : if Kx̄ is the stabiliser in G of x̄ ∈ M , then the isotropy representation k 7→ dk|x̄ is an irreducible representation of Kx̄ in the tangent space Tx̄M .  Remark c : if the assumption that M is irreducible is relaxed, then Theorem 1 generalises to a similar statement, involving so-called multiply warped metrics. Roughly, this is because a homogeneous space which is not irreducible, may still decompose into a direct product of irreducible homogeneous spaces [4].  Remark d : statistical models on M which verify (4) often arise under an ex- ponential form, p(x| x̄ , σ) = exp ( η · D(x , x̄) − ψ(η) ) (6) where η = η(σ) is a natural parameter, and ψ(η) is the cumulant generating function of the statistic D(x , x̄) . Then, for assumption (4) to hold, it is necessary and sufficient that D(g · x , g · x̄) = D(x , x̄) (7) Both examples 1 and 2 are of the form (6), as is Example 3, in the following section, which deals with the Riemannian Gaussian model [6][7].  3 Curvature equations and the extrinsic geometry of M For each σ ∈ (0 , ∞) , there is an embedding of M into M , as the surface M × {σ} . This embedding yields an extrinsic geometry of M, given by the first and second fundamental forms [8]. The first fundamental form is the restriction of the metric ds2 of M to the tangent bundle of M. This will be denoted ds2 M (x| σ) for x ∈ M. It is clear from (1) that ds2 M (x| σ) = I1(σ) ds2 M (x) (8) This extrinsic Riemannian metric on M is a scaled version of its intrinsic metric ds2 M . It induces an extrinsic Riemannian distance given by d2 (x , y| σ) = I1(σ) d2 (x , y) x , y ∈ M (9) where d(x , y) is the intrinsic Riemannian distance, induced by the metric ds2 M . The extrinsic distance (9) is a generalisation of the famous Mahalanobis distance. In fact, replacing in Example 1 yields the classical expression of the Mahalanobis distance d2 (x , y) = |x − y|2 /2σ2 . The significance of this distance can be visualised as follows : if σ is a dispersion parameter, the extrinsic distance between two otherwise fixed points x , y ∈ M will decrease as σ increases, as if the space M were contracting, (for a concentration parameter, there is an expansion, rather than a contraction). The second fundamental form is given by the tangent component of the covariant derivative of the unit normal to the surface M ×{σ} . This unit normal is ∂r where r is the vertical distance coordinate, given by dr/dσ = I 1 2 0 (σ) . Using Koszul’s formula [9], it is possible to express the second fundamental form, S(v) = 1 2 ( ∂rI1/I1 ) v (10) for any v tangent to M. Knowledge of the second fundamental form is valuable, as it yields the relationship between extrinsic and intrinsic curvatures of M. Proposition 1 (curvature equations). Let KM and KM denote the sectional curvatures of M and M . The following are true KM (u , v) = ( 1/I1 ) KM (u , v) − 1 4 ( ∂rI1/I1 ) 2 (11) KM (u , ∂r ) = −  ∂ 2 r I 1 2 1 . I 1 2 1  (12) for any linearly independent u , v tangent to M . Remark e : here, Equation (11) is the Gauss curvature equation. Roughly, it shows that embedding M into M adds negative curvature. Equation (12) is the mixed curvature equation. If the intrinsic sectional curvature KM is negative, then (11) and (12) show that the sectional curvature KM of M is negative if and only if I 1 2 1 is a convex function of the vertical distance r .  Return to example 1 : here, M = R is one-dimensional, so the Gauss equation (11) does not provide any information. The mixed curvature equation gives the curvature of the two-dimensional manifold M. In this equation, ∂r = σ ∂σ , and it follows that KM (u , ∂r ) = − 1 (13) so M has constant negative curvature. In fact, it was observed long ago that the metric (2) is essentially the Poincaré half-plane metric [2].  Return to example 2 : in Example 2, M = S2 so KM ≡ 1 is constant. It follows from the Gauss equation that each sphere S2 ×{σ} has constant extrinsic curvature, equal to KM |σ = ( 1/I1 ) − 1 4 ( ∂rI1/I1 ) 2 (14) Upon replacing the expressions of I1 and ∂r based on (3), this is found to be strictly negative for σ > 0, KM |σ < 0 for σ > 0 (15) Thus, the Rao-Fisher metric (3) induces a negative extrinsic curvature on each spherical surface S2 × {σ} . In fact, by studying the mixed curvature equation (12), it is seen the whole manifold M equipped with the Rao-Fisher metric (3) is a manifold of negative sectional curvature.  Example 3 (Riemannian Gaussian model) : a Riemannian Gaussian distri- bution may be defined on any Riemannian symmetric space M of non-positive curvature. It is given by the probability density with respect to Riemannian volume p(x| x̄ , σ) = Z−1 (σ) exp  − d2 (x , x̄) 2σ2  (16) where the normalising constant Z(σ) admits a general expression, which was given in [7]. If M is an irreducible Riemannian symmetric space, then Theorem 1 above applies to the Riemannian Gaussian model (16), leading to a warped metric with I0(σ) = ψ00 (η) I1(σ) = 4η2 ψ0 (η)/dim M (17) where η = −1/2σ2 and ψ(η) = log Z(σ). The result of equation (17) is here pub- lished for the first time. Consider now the special case where M is the hyperbolic plane. The analytic expression of I0 and I1 can be found from (17) using Z(σ) = Const. σ × eσ2/4 × erf(σ/2) (18) which was derived in [6]. Here, erf denotes the error function. Then, replacing (17) in the curvature equations (11) and (12) yields the same result as for Exam- ple 2 : the manifold M equipped with the Rao-Fisher metric (17) is a manifold of negative sectional curvature.  Remark f (a conjecture) : based on the three examples just considered, it seems reasonable to conjecture that warped metrics arising from Theorem 1 will always lead to manifolds M of negative sectional curvature.  4 Solution of the geodesic equation : conservation laws If the assumptions of Theorem 1 are slightly strengthened, then an analytic solution of the geodesic equation of the Riemannian metric (1) on M can be obtained, by virtue of the existence of a sufficient number of conservation laws. To state this precisely, let h·, ·iM and h·, ·iM denote respectively the scalar products defined by the metrics ds2 M and ds2 . Two kinds of conservation laws hold along any affinely parameterised geodesic curve γ(t) in M, with respect to the metric ds2 . These are conservation of energy and conservation of moments [10]. If the geodesic γ(t) is expressed as a couple ( σ(t) , x(t) ) where σ(t) > 0 and x(t) ∈ M , then the energy of this geodesic is E = I0(σ) σ̇2 + I1(σ) kẋk2 (19) where the dot denotes differentiation with respect to t , and kẋk the Riemannian length of ẋ as measured by the metric ds2 M . On the other hand, if ξ is any element of the Lie algebra of the group of isometries G acting on M, the corresponding moment of the geodesic γ(t) is J(ξ) = I1(σ) h ẋ , Xξ iM (20) where Xξ is the vector field on M given by Xξ(x) = d dt t=0 etξ · x . The equation of the geodesic γ(t) is given as follwos. Proposition 2 (conservation laws and geodesics). For any geodesic γ(t), its energy E and its moment J(ξ) for any ξ are conserved quantities, remaining constant along this geodesic. If M is an irreducible Riemannian symmetric space, the equation of the geodesic γ(t) is the following, x(t) = Expx(0)   Z t 0 I1(σ(0)) I1(σ(s)) ds  ẋ(0)  (21) t = ± Z σ(t) σ(0) I 1 2 0 (σ) dσ p E − V (σ) (22) where Exp denotes the Riemannian exponential mapping of the metric ds2 M on M , and V (σ) is the function V (σ) = J0 × I1(σ(0))/I1(σ) , with J0 = I1(σ(0)) kẋ(0)k2 . Remark g : under the assumption that M is an irreducible Riemannian sym- metric space, the second part of Proposition 2, stating the equations of x(t) and σ(t) is a corollary of the first part, stating the conservation of energy and mo- ment. The proof, as usual not given due to lack of space, relies on a technique of lifting the geodesic equation to the Lie algebra of the group of isometries G.  Remark h : here, Equation (21) states that x(t) describes a geodesic curve in the space M, with respect to the metric ds2 M , at a variable speed equal to I1(σ(0))/I1(σ(t)). Equation (22) states that σ(t) describes the one-dimensional motion of a particle of energy E and mass 2I0(σ), in a potential field V (σ).  Remark i (completeness of M) : from Equation (22) it is possible to see that any geodesic γ(t) in M is defined for all t > 0, if and only if the following conditions are verified Z 0 I 1 2 0 (σ) dσ = ∞ Z ∞ I 1 2 0 (σ) dσ = ∞ (23) where the missing integration bounds are arbitrary. The first condition ensures that γ(t) may not escape to σ = 0 within a finite time, while the second condi- tion ensures the same for σ = ∞. The two conditions (23), taken together, are necessary and sufficient for M to be a complete Riemannian manifold.  Return to Example 2 : for the von Mises-Fisher model of Example 2, the second condition in (23) is verified, but not the first. Therefore, a geodesic γ(t) in M may escape to σ = 0 within a finite time. However, γ(t) is also a geodesic in the larger manifold M̂ = R3 , which contains σ = 0 as its origin. If γ(t) arrives at σ = 0 at some finite time, it will just go through this point and immediately return to M. In fact, M̂ is a complete Riemannian manifold which has M as an isometrically embedded submanifold.  5 The road to applications: classification and estimation The theoretical results of the previous chapters have established that warped metrics are natural statistical objects arising in connection with location-scale models, which are invariant under some group action. Precisely, Theorem 1 has stated that warped metrics appear as Rao-Fisher metrics for all location-scale models which verify the group invariance condition (4). Analytical knowledge of the Rao-Fisher metric of a statistical model is po- tentially useful to many applications. In particular, to problems of classification and efficient on-line estimation. However, in order for such applications to be re- alised, it is necessary for the Rao-Fisher metric to be well-behaved. Propositions 2 and 3 in the above seem to indicate such a good behavior for warped metrics on location-scale models. Indeed, as conjectured in Remark f, the curvature equations of Proposition 2 would indicate that the sectional curvature of these warped metrics is always negative. Then, if the conditions for completeness, given in Remark i based on Proposition 3, are verified, the location-scale models equipped with these warped metrics appear as complete Riemannian manifolds of negative curvature. This is a favourable scenario, (which at least holds for the von Mises-Fisher model of Example 2), under which many algorithms can be implemented. For classification problems, it becomes straightforward to find the analytic expression of Rao’s Riemannian distance, and to compute Riemannian centres of mass, whose existence and uniqueness will be guaranteed. These form the building blocks of many classification methodologies. For efficient on-line estimation, Amari’s natural gradient algorithm turns out to be identical to the stochastic Riemannian gradient algorithm, defined using the Rao-Fisher metric. Then, analytical knowledge of the Rao-Fisher metric, (which is here a warped metric), and of its completeness and curvature prop- erties, yields an elegant formulation of the natural gradient algorithm, and a geometrical means of proving its efficiency and understanding its convergence properties. References 1. Petersen, P.: Riemannian geometry, (2nd edition). Springer (2006) 2. Atkinson, C., Mitchell, A.: Rao’s distance measure. Sankhya Ser. A 43 (1981) 345–365 3. Mardia, K.V., Jupp, P.E.: Directional statistics. John Wiley & Sons ltd. (2000) 4. Kobayashi, S., Nomizu, K.: Foundations of differential geometry, Volume II. John Wiley & Sons, Inc. (1969) 5. Chevalley, C.: Theory of Lie groups, Volume I. Princeton University Press (1946) 6. Said, S., Bombrun, L., Berthoumieu, Y., Manton, J.H.: Riemannian Gaussian distributions on the space of symmetric positive definite matrices (accepted). IEEE Trans. Inf. Theory (2016) 7. Said, S., Hajri, H., Bombrun, L., Vemuri, B.C.: Gaussian distributions on Rieman- nian symmetric spaces : statistical learning with structured covariance matrices (under review). IEEE Trans. Inf. Theory (2017) 8. Do Carmo, M.P.: Riemannian geometry (1st edition). Birkhauser (1992) 9. Helgason, S.: Differential geometry, Lie groups, and symmetric spaces. American Mathematical Society (2001) 10. Gallot, S., Hulin, D., Lafontaine, J.: Riemannian geometry. Springer-Verlag (2004)