Bregman divergences from comparative convexity

Publication GSI2017
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit


Comparative convexity is a generalization of ordinary convexity based on abstract means instead of arithmetic means. We define and study the Bregman divergences with respect to comparative convexity. As an example, we consider the convexity induced by quasiarithmetic means, report explicit formulas, and show that those Bregman
divergences are equivalent to conformal ordinary Bregman divergences on monotone embeddings.

Bregman divergences from comparative convexity


application/pdf Bregman divergences from comparative convexity Frank Nielsen, Richard Nock
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Bregman divergences from comparative convexity


Voir la vidéo


267.35 Ko


Creative Commons Aucune (Tous droits réservés)


Sponsors Platine


Sponsors Bronze


Sponsors scientifique





<resource  xmlns:xsi=""
        <identifier identifierType="DOI">10.23723/17410/22338</identifier><creators><creator><creatorName>Frank Nielsen</creatorName></creator><creator><creatorName>Richard Nock</creatorName></creator></creators><titles>
            <title>Bregman divergences from comparative convexity</title></titles>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>convexity</subject><subject>regular mean</subject><subject>quasi-arithmetic weighted mean</subject><subject>skew Jensen divergence</subject><subject>Bregman divergence</subject><subject>conformal divergence</subject></subjects><dates>
	    <date dateType="Created">Sun 18 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Fri 17 Aug 2018</date>
	    <alternateIdentifier alternateIdentifierType="bitstream">6f6891e39392508ea0534c85560aadad4aee6ba0</alternateIdentifier>
            <description descriptionType="Abstract">Comparative convexity is a generalization of ordinary convexity based on abstract means instead of arithmetic means. We define and study the Bregman divergences with respect to comparative convexity. As an example, we consider the convexity induced by quasiarithmetic means, report explicit formulas, and show that those Bregman<br />
divergences are equivalent to conformal ordinary Bregman divergences on monotone embeddings.

Bregman divergences from comparative convexity Frank Nielsen1,2 and Richard Nock3,4,5 1 École Polytechnique, France, 2 Sony CSL, Japan 3 Data61, Australia 4 Australian National University 5 University of Sydney, Australia. Abstract. Comparative convexity is a generalization of ordinary con- vexity based on abstract means instead of arithmetic means. We de- fine and study the Bregman divergences with respect to comparative convexity. As an example, we consider the convexity induced by quasi- arithmetic means, report explicit formulas, and show that those Bregman divergences are equivalent to conformal ordinary Bregman divergences on monotone embeddings. Keywords: convexity, regular mean, quasi-arithmetic weighted mean, skew Jensen divergence, Bregman divergence, conformal divergence 1 Introduction: Convexity, Jensen and Bregman divergences Convexity allows one to define classes of dissimilarity measures parameterized by functional generators. Let F : X → R be a real-valued function. Burbea and Rao[8] studied the Jensen difference for F ∈ C as such a family of dissimilarities: JF (p, q):= F(p) + F(q) 2 − F  p + q 2  . (1) A dissimilarity D(p, q) is proper iff D(p, q) ≥ 0 with equality iff p = q. It follows from the strict midpoint convex property of F that JF is proper. Nowadays, these Jensen differences are commonly called Jensen Divergences (JD), where a divergence is a smooth dissimilarity measure inducing a dual geometry [2]. One can further define the proper skewed Jensen divergences for α ∈ (0, 1), see [20, 15]: JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq), (2) with JF,α(q : p) = JF,1−α(p : q). The “:” notation emphasizes the fact that the dissimilarity may be asymmetric. Another popular class of dissimilarities are the Bregman Divergences [6] (BDs): BF (p : q):=F(p) − F(q) − (p − q)> ∇F(q), (3) where ∇F denotes the gradient of F. Let J0 F,α(p : q):= 1 α(1−α) JF,α(p : q) denote the scaled skew JDs. Then it was proved that BDs can be obtained as limit cases of skew JDs [20, 15]: BF (p : q) = lim α→0+ J0 F,α(p : q), BF (q : p) = lim α→1− J0 F,1−α(p : q). 2 Jensen and Bregman divergences with comparative convexity 2.1 Comparative convexity The branch of comparative convexity [14] studies classes CM,N of (M, N)-strictly convex functions F that satisfies the following generalized strict midpoint convex inequality: F ∈ CM,N ⇔ F(M(p, q)) < N(F(p), F(q)), ∀p, q ∈ X, (4) where M and N are two abstract means defined on the domain X and codomain R, respectively. When M = N = A, the arithmetic mean, we recover the ordinary convexity. An abstract mean M(p, q) aggregates two values to produce an intermedi- ate quantity that satisfies the innerness property [7]: min{x, y} ≤ M(x, y) ≤ max{x, y}. There are many families of means. For example, the family of power means Pδ (Hölder means [10]) is defined by: Pδ(x, y) =  xδ +yδ 2 1 δ . The arith- metic, harmonic and quadratic means are obtained for δ = 1, δ = −1, and δ = 2, respectively. To get a continuous family of power means for δ ∈ R, we define for δ = 0, P0(x, y) = √ xy, the geometric mean. Notice that power means satisfy the innerness property, and include in the limit cases the minimum and maximum values: limδ→−∞ Pδ(x, y) = min{x, y} and limδ→∞ Pδ(x, y) = max{x, y}. More- over, the power means are ordered, Pδ(x, y) ≤ Pδ0 (x, y) for δ0 ≥ δ, a property generalizing the well-known inequality of arithmetic and geometric means [7]. There are many ways to define parametric family of means [7]: For example, let us cite the Stolarksy, Lehmer and Gini means, with the Gini means including the power means. Means can also be parameterized by monotone functions: Let us cite the quasi-arithmetic means [11, 13, 9], the Lagrange means [4], the Cauchy means [12], etc. 2.2 Generalized skew Jensen divergences We shall introduce univariate divergences for X ⊂ R in the remainder. Multivari- ate divergences for X ⊂ Rd can be built from univariate divergences component- wise. Definition 1 (Comparative Convexity Jensen Divergence). The Com- parative Convexity Jensen Divergence (ccJD) is defined for a midpoint (M, N)- strictly convex function F : I ⊂ R → R by: JM,N F (p, q):=N(F(p), F(q))) − F(M(p, q)) (5) It follows from the strict midpoint (M, N)-convexity that the ccJDs are proper: JM,N F (p, q) ≥ 0 with equality iff p = q. To define generalized skew Jensen divergences, we need (i) to consider weighted means, and (ii) to ensure that the divergence is proper. This restrict weighted means to be regular: Definition 2 (Regular mean). A mean M is said regular if it is (i) symmetric (M(p, q) = M(q, p)), (ii) continuous, (iii) increasing in each variable, and (iv) homogeneous (M(λp, λq) = λM(p, q), ∀λ > 0). Power means are regular: They belong to a broader family of regular means, the quasi-arithmetic means. A quasi-arithmetic mean is defined for a continuous and strictly increasing function f : I ⊂ R → J ⊂ R as: Mf (p, q):=f−1  f(p) + f(q) 2  . (6) These means are also called Kolmogorov-Nagumo-de Finetti means [11, 13, 9]. By choosing f(x) = x, f(x) = log x or f(x) = 1 x , we obtain the Pythagorean arith- metic, geometric, and harmonic (power) means, respectively. A quasi-arithmetic weighted mean is defined by Mf (p, q; 1 − α, α):=f−1 ((1 − α)f(p) + αf(q)) for α ∈ [0, 1]. Let Mα(p, q):=M(p, q; 1 − α, α) denote a shortcut for a weighted regular mean. A continuous function F satisfying the (M, N)-midpoint convex property for regular means M and N is (M, N)-convex (Theorem A of [14]): Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1]. (7) Thus we can define a proper divergence for a strictly (M, N)-convex function when considering regular weighted means: Definition 3 (Comparative Convexity skew Jensen Divergence). The Comparative Convexity skew α-Jensen Divergence (ccsJD) is defined for a strictly (M, N)-convex function F ∈ CM,N : I → R by: JM,N F,α (p : q):=Nα(F(p), F(q)) − F(Mα(p, q)), (8) where M and N are regular weighted means, and α ∈ (0, 1). For regular weighted means, we have JM,N F,α (q, p) = JM,N F,1−α(p : q) since the weighted means satisfy Mα(p, q) = M1−α(q, p). This generalized ccsJD can be extended to a positively weighted set of values by defining a notion of diversity [8] as: Definition 4 (Comparative Convexity Jensen Diversity Index). Let {(wi, xi)}n i=1 be a set of n positive weighted values so that Pn i=1 wi = 1. Then the Jensen diversity index with respect to the strict (M, N)-convexity of a function F for regular weighted means is: JM,N F (x1, . . . , xn; w1, . . . , wn):=N({(F(xi), wi)}i) − F(M({(xi, wi)}i)). When both means M and N are set to the arithmetic mean, this diversity index has also been called the Bregman information [3] in the context of k-means clustering. 2.3 Generalized Bregman divergences By analogy to the ordinary setting, let us define the (M, N)-Bregman diver- gence as the limit case of a scaled skew (M, N)-ccsJDs. Let J0M,N F,α (p : q) = 1 α(1−α) JM,N F,α (p : q). Definition 5 ((M, N)-Bregman divergence). For regular weighted means M and N, the (M, N)-Bregman divergence is defined for a strictly (M, N)-convex function F : I → R by BM,N F (p : q) := lim α→1− J0M,N F,α (p : q). (9) It follows from the symmetry J0 F,α(p : q) = J0 F,1−α(q : p) that we get the reverse Bregman divergence as: BM,N F (q : p) = limα→0+ J0M,N F,α (p : q). Note that a generalization of Bregman divergences has also been studied by Petz [18] to get generalized quantum relative entropies when considering the arithmetic weighted means: Petz defined the Bregman divergence between two points p and q of a convex set C sitting in a Banach space for a given function F : C → B(H) (Banach space induced by a Hilbert space H) as: BF (p : q):=F(p)−F(q)−limα→0+ 1 α (F(q +α(p−q))−F(q)). This last equation can be rewritten in our framework as BF (p : q) = limα→1− 1 1−α JA,A F,α (p, q). In general, we need to prove that (i) when the limits exists, (ii) the (M, N)- Bregman divergences are proper: BM,N F (q : p) ≥ 0 with equality iff p = q. 3 Quasi-arithmetic Bregman divergences Let us report direct formulas for the generalized Bregman divergences de- fined with respect to comparative convexity when using regular quasi-arithmetic means. Let ρ and τ be two continuous differentiable functions defining the quasi- arithmetic means Mρ and Mτ , respectively. 3.1 A direct formula By definition, a function F ∈ Cρ,τ is (ρ, τ)-convex iff Mτ (F(p), F(q))) ≥ F(Mρ(p, q)). This (ρ, τ)-midpoint convexity property with the continuity of F yields the more general definition of (ρ, τ)-convexity Mτ,α(F(p), F(q))) ≥ F(Mρ,α(p, q)), α ∈ [0, 1]. Let us study the generalized Bregman Divergences Bρ,τ F obtained when taking the limit: Bρ,τ F (q : p):= lim α→0 Mτ,α(F(p), F(q))) − F (Mρ,α(p, q)) α(1 − α) . (10) for Mρ,α and Mτ,α two quasi-arithmetic weighted means obtained for continuous and monotonic functions ρ and τ, respectively. We state the generalized Bregman divergence formula obtained with respect to quasi-arithmetic comparative convexity: Theorem 1 (Quasi-arithmetic Bregman divergences). Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function defined on an interval I for two strictly monotone and differentiable functions ρ and τ. The Quasi-Arithmetic Bregman divergence (QABD) induced by the comparative convexity is: Bρ,τ F (p : q) = τ(F(p)) − τ(F(q)) τ0(F(q)) − ρ(p) − ρ(q) ρ0(q) F0 (q), = κτ (F(q) : F(p)) − κρ(q : p)F0 (q), (11) where κγ(x : y) = γ(y) − γ(x) γ0(x) . (12) Proof. By taking the first-order Taylor expansion of τ−1 (x) at x0, we get τ−1 (x) 'x0 τ−1 (x0) + (x − x0)(τ−1 )0 (x0). Using the property of the deriva- tive of an inverse function, (τ−1 )0 (x) = 1 (τ0(τ−1)(x)) , it follows that the first-order Taylor expansion of τ−1 (x) is τ−1 (x) ' τ−1 (x0)+(x−x0) 1 (τ0(τ−1)(x0)) . Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), we get a first-order approximation of the weighted quasi-arithmetic mean Mτ when α → 0: Mα(p, q) ' p + α(τ(q) − τ(p)) τ0(p) . (13) For example, when τ(x) = x (ie., arithmetic mean), we have Aα(p, q) ' p + α(q − p), when τ(x) = log x (ie., geometric mean), we obtain Gα(p, q) ' p + αp log q p , and when τ(x) = 1 x (ie., harmonic mean) we get Hα(p, q) ' p + α(p − p2 q ). For the regular power means, we have Pα(p, q) ' p + αqδ −pδ δpδ−1 . These are first-order weighted mean approximations obtained for small values of α. Now, consider the comparative convexity skewed Jensen Divergence de- fined by Jτ,ρ F,α(p : q) = (Mτ,α(F(p), F(q)) − F(Mρ,α(p, q))), and apply a first-order Taylor expansion to get F(Mτ,α(p, q))) ' F  p + α(τ(q)−τ(p)) τ0(p)  ' F(p)+ α(τ(q)−τ(p)) τ0(p) F0 (p). Thus it follows that the Bregman divergence for quasi- arithmetic comparative convexity is Bρ,τ F (q : p) = limα→0 J0 τ,ρ,α(p : q) = τ(F (q))−τ(F (p)) τ0(F (p)) − ρ(q)−ρ(p) ρ0(p) F0 (p), and the reverse Bregman divergence Bρ,τ F (p : q) = limα→1 1 α(1−α) Jτ,ρ α (p : q) = limα→0 1 α(1−α) Jτ,ρ α (q : p). Since power means are regular quasi-arithmetic means, we get the following family of power mean Bregman divergences: Corollary 1 (Power Mean Bregman Divergences). For δ1, δ2 ∈ R\{0} with F ∈ CPδ1 ,Pδ2 , we have the family of Power Mean Bregman Divergences (PMBDs): Bδ1,δ2 F (p : q) = Fδ2 (p) − Fδ2 (q) δ2Fδ2−1(q) − pδ1 − qδ1 δ1qδ1−1 F0 (q) (14) A sanity check for δ1 = δ2 = 1 let us recover the ordinary Bregman divergence. 3.2 Quasi-arithmetic Bregman divergences are proper Appendix A proves that a function F ∈ Cρ,τ iff G = Fρ,τ = τ ◦ F ◦ ρ−1 ∈ C. We still need to prove that QABDs are proper: Bρ,τ F (p : q) ≥ 0 with equality iff p = q. Defining the ordinary Bregman divergence on the convex generator G(x) = τ(F(ρ−1 (x))) for a (ρ, τ)-convex function with G0 (x) = τ(F(ρ−1 (x)))0 = 1 (ρ0(ρ−1)(x)) F0 (ρ−1 (x))τ0 (F(ρ−1 (x))), we get an ordinary Bregman divergence that is, in general, different from the generalized quasi-arithmetic Bregman di- vergence Bρ,τ F : BG(p : q) 6= Bρ,τ F (p : q) with: BG(p : q) = τ(F(ρ−1 (p))) − τ(F(ρ−1 (q))) − (p − q) F0 (ρ−1 (q))τ0 (F(ρ−1 (q))) (ρ0(ρ−1)(q)) A sanity check shows that BG(p : q) = Bρ,τ F (p : q) when ρ(x) = τ(x) = x (since we have the derivatives ρ0 (x) = τ0 (x) = 1). Let us notice the following remarkable identity: Bρ,τ F (p : q) = 1 τ0(F(q)) BG(ρ(p) : ρ(q)). (15) This identity allows us to prove that QABDs are proper divergences. Theorem 2 (QABDs are proper). The quasi-arithmetic Bregman diver- gences are proper divergences. Proof. BG is a proper ordinary BD, τ0 > 0 a positive function since τ is a strictly increasing function, and ρ(p) = ρ(q) iff p = q since ρ is strictly monotonous. It follows that 1 τ0(F (q)) BG(ρ(p) : ρ(q)) ≥ 0 with equality iff p = q. 3.3 Conformal Bregman divergences on monotone embeddings A closer look at Eq. 15 allows one to interpret the QABDs Bρ,τ F (p : q) as conformal divergences. A conformal divergence [2, 17, 16] Dκ(p : q) of a diver- gence D(p : q) is defined by a positive conformal factor function κ as follows: Dκ(p : q) = κ(q)D(p : q). An example of Bregman conformal divergence is the total Bregman divergence [19] with κ(q) = 1 √ 1+k∇F (q)k2 . Property 1 (QABDs as conformal BDs). The quasi-arithmetic Bregman diver- gence Bρ,τ F (p : q) amounts to compute an ordinary Bregman conformal diver- gence in the ρ-embedded space: Bρ,τ F (p : q) = κ(ρ(q))BG(ρ(p) : ρ(q)), (16) with conformal factor κ(x) = 1 τ0(F (ρ−1(x))) > 0. 4 Concluding remarks We have introduced generalized (M, N)-Bregman divergences as limit of scaled skew (M, N)-Jensen divergences for regular M and N means. Regular means include power means, quasi-arithmetic means, Stolarsky means, etc. But not all means are regular: For example, the Lehmer mean L2(x, y) = x2 +y2 x+y is not increasing and therefore not regular. We reported closed-form expression for quasi-arithmetic (ρ, τ)-Bregman divergences, prove that those divergences are proper, and show that they can be interpreted as conformal ordinary Bregman divergences on a monotone embedding [21]. This latter observation further let us extend usual Bregman divergence results to quasi-arithmetic Bregman di- vergences (eg., conformal Bregman k-means [19], conformal Bregman Voronoi diagrams [5]). A Quasi-arithmetic to ordinary convexity criterion Lemma 1 ((ρ, τ)-convexity ↔ ordinary convexity [1]). Let ρ : I → R and τ : J → R be two continuous and strictly monotone real-valued functions with τ increasing, then function F : I → J is (ρ, τ)-convex iff function G = Fρ,τ = τ ◦ F ◦ ρ−1 is (ordinary) convex on ρ(I). Proof. Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows: F(Mρ(x, y)) ≤ Mτ (F(x), F(y)), F  ρ−1  ρ(x) + ρ(y) 2  ≤ τ−1  τ(F(x)) + τ(F(y)) 2  , Since τ is strictly increasing, we have: (τ ◦ F ◦ ρ−1 )  ρ(x) + ρ(y) 2  ≤ (τ ◦ F)(x) + (τ ◦ F)(y) 2 . (17) Let u = ρ(x) and v = ρ(y) so that x = ρ−1 (u) and y = ρ−1 (v) (with u, v ∈ ρ(I)). Then it comes that: (τ ◦ F ◦ ρ−1 )  u + v 2  ≤ (τ ◦ F ◦ ρ−1 )(u) + (τ ◦ F ◦ ρ−1 )(v) 2 . (18) This last inequality is precisely the ordinary midpoint convexity inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1 . Thus a function F is (ρ, τ)-convex iff G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa. References 1. John Aczél. A generalization of the notion of convex functions. Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947. 2. Shun-ichi Amari. Information Geometry and Its Applications. Applied Mathemat- ical Sciences. Springer Japan, 2016. 3. Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. Clustering with Bregman divergences. Journal of machine learning research, 6(Oct):1705–1749, 2005. 4. Lucio R Berrone and Julio Moro. Lagrangian means. Aequationes Mathematicae, 55(3):217–226, 1998. 5. Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock. Bregman Voronoi di- agrams. Discrete & Computational Geometry, 44(2):281–307, 2010. 6. Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967. 7. Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic. Means and their Inequalities, volume 31. Springer Science & Business Media, 2013. 8. Jacob Burbea and C Rao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982. 9. Bruno De Finetti. Sul concetto di media. 3:369396, 1931. Istituto italiano degli attuari. 10. Otto Ludwig Holder. Über einen Mittelwertssatz. Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889. 11. Andrey Nikolaevich Kolmogorov. Sur la notion de moyenne. 12:388391, 1930. Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez. 12. Janusz Matkowski. On weighted extensions of Cauchy’s means. Journal of math- ematical analysis and applications, 319(1):215–227, 2006. 13. Mitio Nagumo. Über eine klasse der mittelwerte. In Japanese journal of mathemat- ics: transactions and abstracts, volume 7, pages 71–79. The Mathematical Society of Japan, 1930. 14. Constantin P. Niculescu and Lars-Erik Persson. Convex functions and their appli- cations: A contemporary approach. Springer Science & Business Media, 2006. 15. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. 16. Frank Nielsen and Richard Nock. Total Jensen divergences: definition, properties and clustering. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2016–2020. IEEE, 2015. 17. Richard Nock, Frank Nielsen, and Shun-ichi Amari. On conformal divergences and their population minimizers. IEEE Trans. Information Theory, 62(1):527–538, 2016. 18. Denes Petz. Bregman divergence as relative operator entropy. Acta Mathematica Hungarica, 116(1-2):127–131, 2007. 19. Baba C Vemuri, Meizhu Liu, Shun-ichi Amari, and Frank Nielsen. Total Bregman divergence and its applications to DTI analysis. IEEE Transactions on medical imaging, 30(2):475–483, 2011. 20. Jun Zhang. Divergence function, duality, and convex analysis. Neural Computation, 16(1):159–195, 2004. 21. Jun Zhang. Nonparametric information geometry: From divergence func- tion to referential-representational biduality on statistical manifolds. Entropy, 15(12):5384–5418, 2013.