An Information Geometry Problem in Mathematical Finance

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14346

Résumé

Familiar approaches to risk and preferences involve minimizing the expectation EIP(X) of a payoff function X over a family Γ of plausible risk factor distributions IP. We consider Γ determined by a bound on a convex integral functional of the density of IP, thus Γ may be an I-divergence (relative entropy) ball or some other f-divergence ball or Bregman distance ball around a default distribution IPo. Using a Pythagorean identity we show that whether or not a worst case distribution exists (minimizing EIP(X) subject to IP∈Γ), the almost worst case distributions cluster around an explicitly specified, perhaps incomplete distribution. When Γ is an f-divergence ball, a worst case distribution either exists for any radius, or it does/does not exist for radius less/larger than a critical value. It remains open how far the latter result extends beyond f-divergence balls.

An Information Geometry Problem in Mathematical Finance

Collection

application/pdf An Information Geometry Problem in Mathematical Finance Imre Csiszár, Thomas Breuer, Michel Broniatowski

Métriques

150
10
897.83 Ko
 application/pdf
bitcache://9caeaccdca92c773a315d096516e8ef7196ca93a

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14346</identifier><creators><creator><creatorName>Michel Broniatowski</creatorName></creator><creator><creatorName>Imre Csiszár</creatorName></creator><creator><creatorName>Thomas Breuer</creatorName></creator></creators><titles>
            <title>An Information Geometry Problem in Mathematical Finance</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Convex integral functional</subject><subject>Bregman distance</subject><subject>f-divergence I-divergence</subject><subject>Payoff function</subject><subject>Pythagorean identity</subject><subject>Risk measure</subject><subject>Worst case density</subject><subject>Almost worst case densities</subject></subjects><dates>
	    <date dateType="Created">Sun 8 Nov 2015</date>
	    <date dateType="Updated">Mon 25 Jul 2016</date>
            <date dateType="Submitted">Mon 15 Oct 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">9caeaccdca92c773a315d096516e8ef7196ca93a</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>24740</version>
        <descriptions>
            <description descriptionType="Abstract">
Familiar approaches to risk and preferences involve minimizing the expectation EIP(X) of a payoff function X over a family Γ of plausible risk factor distributions IP. We consider Γ determined by a bound on a convex integral functional of the density of IP, thus Γ may be an I-divergence (relative entropy) ball or some other f-divergence ball or Bregman distance ball around a default distribution IPo. Using a Pythagorean identity we show that whether or not a worst case distribution exists (minimizing EIP(X) subject to IP∈Γ), the almost worst case distributions cluster around an explicitly specified, perhaps incomplete distribution. When Γ is an f-divergence ball, a worst case distribution either exists for any radius, or it does/does not exist for radius less/larger than a critical value. It remains open how far the latter result extends beyond f-divergence balls.

</description>
        </descriptions>
    </resource>
.

An Information Geometry Problem in Mathematical Finance Imre Csisz´ar1 Thomas Breuer2 1Alfr´ed R´enyi Institute of Mathematics, Hungarian Academy of Sciences, Hungary Supported by the Hungarian National Science Foundation, Grant 105840 2PPE Research Centre, FH Vorarlberg, Austria Supported by the Christian Doppler Gesellschaft GSI 2015 29 October 2015 Ecole Polytechnique, Paris-Saclay The problem Problem: Minimize the expectation EP(X) of a real valued random variable X for distributions P in a “plausible set” Γ. Motivation: The monetary payoff or utility of a financial act, such as a portfolio selection, is a function X(ω) of a collection of random risk factors whose distribution P is unknown but P ∈ Γ may be assumed. A measure of risk of this act, in this multiple priors model, is the negative of the worst case expected payoff inf P∈Γ EP(X) = inf P∈Γ X(ω)P(dω). (1) In the theory of preferences, the infimum (1) serves as a criterion by which decision makers may prefer one act to another. How to chose the plausible set Γ Intuition: the plausible distributions are those not deviating much from a default distribution P0. Deviation measures: I-divergence (relative entropy) appears most versatile, larger classes are f -divergences and Bregman distances. In this talk, we consider plausible sets consisting of distributions dominated by a given (σ-finite) measure µ, of form Γ = {P : dP = pdµ, H(p) ≤ k} (2) where H is an entropy functional (convex integaral functional). I-divergence balls, and general f -divergence or Bregman distance balls around a default distribution arise by specific choices of H. Goal: Determine infP∈Γ EP(X), the worst case distribution attaining the minimum (if attained), and the behavior of almost worst case distributions in general, our main contribution. History sketch Multiple prior models, risk measures, theory of preferences: F˝ollmer and Schied 2004, Hansen and Sargent 2008, Gilboa 2009. Plausible sets Γ: I-divergence balls Hansen and Sargent 2001, Ahmadi-Javid 2011; f -divergence balls Maccheroni, Marinacci, Rustichini 2006, Ben Tal and Teboulle 2007. Axiomatic approach leading to specific divergences: In inference context to I-divergence, with f -divergences and Bregman distances as alternatives: Csisz´ar 1991. In mathematical finance, distinguishing I-divergence: Strzalecki 2011. Moment problem: Geometric view goes back to Chentsov 1972, clustering of approximate solutions to Topsoe 1979, Csisz´ar 1984, convex duality approach to Borwein and Lewis 1991, 1993. General results relied upon in this talk: Csisz´ar and Mat´us 2012. Basic framework used in this talk: Breuer and Csisz´ar 2013. Formal definitions σ-finite measure space (Ω, F, µ), nonnegative, finite valued (measurable) functions on Ω are denoted by p or q. Equality p = q in µ-a.e. sense. Denote by B the class of functions β(ω, s) on Ω × R that are • for each s ∈ R, measurable in ω • for each ω ∈ Ω, strictly convex, differentiable in s on (0, +∞), equal to +∞ if s < 0, and β(ω, 0) = lims↓0 β(ω, s). For β ∈ B define the entropy functional H = Hβ by H(p) = Hβ(p) Ω β(ω, p(ω))µ(dω). (3) The functions β ∈ B are convex normal integrands, hence β(ω, p(ω)) and similar functions later on are measurable. (Shannon differential entropy is −H(p) with β(ω, s) = s log s.) Special cases • Let µ equal the default distribution P0, let β(ω, s) = f (s) be an autonomous convex integrand with f (1) = 0. Then H(p) in (3) with p = dP/dµ is the f -divergence Df (P || P0) = f ( dP dP0 )dP0. • Let µ and the default distribution P0 µ be arbitrary, f a strictly convex differentiable function on (0, +∞), and for s ≥ 0 let β(ω, s) = ∆f (s, p0(ω)) where ∆f (s, t) f (s) − f (t) − f (t)(s − t); (4) if f is steep (f (0) = −∞), assume that p0 > 0 µ-a.e. Then H(p) equals the Bregman distance Bf (p, p0). • In the special case f (s) = s log s, both examples above give for H(p) in (3) with p = dP/dµ the I-divergence (relative entropy) D(P||P0) = p log p p0 dµ, . Then the plausible set Γ in (2) is the I-divergence ball {P : D(P||P0) ≤ k}. Standing Assumptions • X is a real valued measurable function and P0 µ a default distribution on Ω, with density p0 • −∞ ≤ m < b0 < M ≤ +∞ where m µ-ess inf X, M µ-ess sup X b0 EP0 (X) = X(ω)p0(ω)µ(dω), • H(p) ≥ H(p0) = 0 whenever pdµ = 1. • 0 < k < kmax limb↓m F(b) where F(b) inf p: pdµ=1, Xpdµ=b H(p). (5) The version of Problem (1) we address is V (k) inf p: pdµ=1,H(p)≤k Xpdµ. (6) Main Lemma Lemma (Main Lemma) Under our standing assumptions, there exists a unique b with F(b) = k, m < b < b0 (7) and then V (k) = b. A density p attains the minimum in (6), the definition of V (k), if and only if it attains that in (5) for b in (7). ∈ and then V (k) = is not restrictive, for if k F(b) := inf (b) is a convex function with mini ) =: b0 −∞ (m) where the strict inequality is possible. then kmax < M Moment problem Given a moment mapping φ : Ω → Rd , minimize H(p) subject to the moment constraint φpdµ = a (a ∈ Rd ). This moment problem arises in inferring a nonnegative function p (often a probability density) when only the moment vector φpdµ = a is known: one may adopt, as best guess, the minimizer of H(p) subject to φpdµ = a. Extensively studied, particularly for entropy functional equal to I-divergence (thus for β(ω, s) = s log s). This has substantially contributed to the development of information geometry, including concepts like information projection and Pythagorean identities.. We will use results available on the moment problem, taken from Csisz´ar and Mat´u˘s 2012, with the choice d = 2, φ(ω) = (1, X(ω)). Differential geometry is often regarded a basic ingerdient of information geometry, but it will not be used here. On the other hand, as in the moment problem, convex duality will be a key tool. Invoking moment problem results For φ(ω) = (1, X(ω)), consider minimization of H(p) = Hβ(p) subject to φpdµ ( pdµ, Xpdµ) = (a, b) ∈ R2: J(a, b) inf p: pdµ=a, Xpdµ=b H(p). (8) Instances of CsM 2012, Theorem 1.1 and Lemma 6.6 : • Convex conjugate J∗(θ1, θ2) supa,b[θ1a + θ2b − J(a, b)] of the function (8) equals K(θ1, θ2) β∗ (ω, θ1 + θ2X(ω))µ(dω). (9) Convex conjugate and (later on) derivative of β are by its second variable. • The interior of domJ {(a, b) : J(a, b) < +∞} is int dom J = {(a, b) : am < b < aM} Implications of convex duality Since F(b) = J(1, b), standard convex duality results give for (1, b) not on the boundary of domJ, i.e., for b = m, b = M, that F(b) = J(1, b) = J∗∗ (1, b) = sup θ1,θ2 [θ1 + θ2b − K(θ1, θ2)] (10) = sup θ2 [θ2b − G(θ2)] = G∗ (b) (11) where G(θ2) inf θ1 [K(θ1, θ2) − θ1]. (12) Moreover, in (10),(11) the maximum is attained if b ∈ (m, M). A maximizer (θ1, θ2) in (10) is equivalently a subgradient of J at (1, b). Its component θ2, a maximizer in (11), is the slope of a supporting line of the graph of F at b. (11) implies that F∗ = G∗∗. Stronger result: F∗ = G. Evaluation of V (k) Fix k ∈ (0, kmax). By Main Lemma, k = F(b) where b = V (k) ∈ (m, b0). For this b the maximizer θ2 in (11) is negative, thus k = max θ2<0 [ θ2V (k) − G(θ2) ]. (13) Consequence (extension of Ahmadi-Javid 2011, Theorem 5.1): Theorem V (k) = max θ2<0 k + G(θ2) θ2 . (14) Of course, the maximizers in (13) and (14) are the same. Evaluation of V (k) at s # 0 and s " +1. For fixed r 2 ⌦ the fun for ⌧  0(r, 0), it is strictly convex in the inter and equals +1 if 0(r, +1) is finite and ⌧ > 0(r di↵erentiable in the interval ( 1, 0(r, +1)) with n ( ⇤)0(r, ⌧ ), positive if ⌧ > 0(r, 0), and approaching or ⌧ " 0(r, +1). Since J⇤ = K implies J⇤⇤ = K⇤, and J⇤⇤ (equal di↵er from J only on the boundary of dom J, F (b) = J(1, b) = K⇤ (1, b) = sup ✓1,✓2 [✓1 + ✓2 except possibly for b equal to m or M , see (15). T F (b) = sup ✓2 [✓2b G(✓2)] = G where G(✓2) := inf ✓1 [K(✓1, ✓2) ✓1 The function G will play a role as the logarithmic m tion does when in (3) is an I-divergence ball, see The following family of functions will play a k families do for I-divergence minimisation: p✓1,✓2 (r) := ( ⇤ )0 (r, ✓1 + ✓2X(r)), (✓ where ⇥ := (✓1, ✓2) 2 dom K : ✓1 + ✓2X(r) < 0 (r, As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2) 2 ⇥ for dom K and ⇥ have the same projection to the ✓2 G(✓2) := inf ✓1 [K(✓1, ✓2) ✓1]. (20 on G will play a role as the logarithmic moment generating fun when in (3) is an I-divergence ball, see Example 3. lowing family of functions will play a key role like exponenti for I-divergence minimisation: p✓1,✓2 (r) := ( ⇤ )0 (r, ✓1 + ✓2X(r)), (✓1, ✓2) 2 ⇥ (21 (✓1, ✓2) 2 dom K : ✓1 + ✓2X(r) < 0 (r, +1) µ-a.e. . (22 ✓2) 2 dom K implies (˜✓1, ✓2) 2 ⇥ for each ˜✓1 < ✓1, the se d ⇥ have the same projection to the ✓2-axis. This projectio 7 ma is sent forward that will be proved in the Appendix. . F ⇤ = G. hat while F ⇤ = G⇤⇤ = clG immediately follows from (19), it ntrivial that the function G is closed. 1. For k 2 (0, kmax) V (k) = max ✓2<0 max ✓12R k + K(✓1, ✓2) ✓1 ✓2 = max ✓2<0 k + G(✓2) ✓2 . (35) ✓2) attains the first maximum in (35) if and only if it attains um in (18), for b = V (k). Such (✓1, ✓2) belongs to ⇥ and satisfies  1. A maximiser for the second maximum in (35) is equivalently er of ✓2b G(✓2) where b = V (k). Lemma 5, G is a closed convex function with G(0) = 0. Let G := {(✓2, G(✓2)) : ✓2 2 ⇥2, ✓2 < 0} (36) graph of G restricted to negative arguments. As k + G(✓2 ✓2 (✓2 < 0) (37) slope of the straigh line through (0, k) and (✓2, G(✓2)) 2 G, its equals the slope of the supporting line to G through (0, k), see note this maximum by b, then k = G⇤ (b) = F (b), (38) uality by the definition of convex conjugate and the second one where ⇥ := (✓1, ✓2) 2 dom K : ✓1 + ✓2X(r) < 0 (r, +1 As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2) 2 ⇥ for ea dom K and ⇥ have the same projection to the ✓2-a 7 Our framework includes assumptions about the integrand not made there, which we need for other purposes and appear dispensible for the proof of (35). 11 Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F (b), (38) the first equality by the definition of convex conjugate and the second one by (19). 6 Our framework includes assumptions about the integrand not made there, which we need for other purposes and appear dispensible for the proof of (35). 11 Figure 2: The supporting line has maximum slope b = (k + G(✓2)) Remark 3. The following geometric interpretation of the proof of The deserves emphasis, see Fig. 2. Denote by G := {(✓2, G(✓2)) : ✓2 2 ⇥2, ✓2  0} the graph of G restricted to nonpositive arguments. Recall that, by Lem G is a closed convex function with G(0) = 0. Then (37) is the slope straigh line through (0, k) and (✓2, G(✓2)) 2 G, which is maximi the supporting line to G through (0, k). The proof of Theorem 1 that this supporting line (exists and) has slope b = V (k). The max b = V (k) of (37) is attained if and only if (✓2, G(✓2)) is on this supp line. 13 Generalized exponential family Key concept for the moment problem, see CsM 2012. In our case of φ(ω) = (1, X(ω)), it consists of the nonnegative functions pθ1,θ2 (ω) (β∗ ) (ω, θ1 + θ2X(ω)), (θ1, θ2) ∈ Θ, (15) Θ (θ1, θ2) ∈ domK : θ1 + θ2X(ω) < β (ω,+∞) µ-a.e. (16) Instance of CsM 2012, Lemma 4.10: For b ∈ (m, M), in the definition (5) of F(b) the minimum is attained if and only if some pθ1,θ2 in (15) is a density with Xpθ1,θ2 dµ = b. Then H(pθ1,θ2 ) = F(b) = θ1 + θ2b − K(θ1, θ2). Via Main Lemma it follows: For k ∈ (0, kmax), in the definition (6) of V (k) the minimum is attained if and only if some pθ1,θ2 in (15) with θ2 < 0 is a density with H(pθ1,θ2 ) = k. Then θ2 is a minimizer in (14), and the minimizer in (6) (worst case density) is p = pθ1,θ2 . Additional auxiliary results If K(θ1, θ2) < +∞ and θ1 < θ1 then (θ1, θ2) ∈ Θ, hence dom K and Θ have the same projection to the θ2 axis. Denote this projection by Θ2 and its infimum by θmin. • pθ1,θ2 (ω) = 0 if and only if θ1 + θ2X(ω) ≤ β (ω, 0). If β (ω, 0) = −∞ µ-a.e then each pθ1,θ2 is positive µ-a.e. • The generalized exponential family (15) contains the default density: p0 = pθ1,0 for some θ1. In particular, 0 ∈ Θ2. • The standing assumption kmax > 0 is equivalent to θmin < 0, and implies that F(b) > 0 for each b < b0. • To each θ2 ∈ Θ2 there exists a unique θ1 = θ1(θ2) with G(θ2) = K(θ1, θ2) − θ1. If pθ1,θ2 dµ = 1 for some θ1 then this θ1 is unique and equals θ1(θ2). Otherwise θ1(θ2) is the largest θ1 with (θ1, θ2) ∈ Θ, and for this pθ1,θ2 dµ < 1. Main Result Bregman distance corresponding to any β ∈ B: B(p, q) = Bβ(p, q) ∆β(ω,.)(p(ω), q(ω))µ(dω), (17) with ∆β(ω, .) defined as in (4), taking f (s) = β(ω, s) there. Theorem (Worst case localiser) For k ∈ (0, kmax)), let θ2 < 0 maximize [k + G(θ2)]/θ2, and let θ1 = θ1(θ2). Then each density p with H(p) < +∞ satisfies B(p, pθ1,θ2 ) ≤ H(p) − k − θ2 Xpdµ − V (k) . (18) Consequently, each sequence of densities pn with H(pn) → k, Xpndµ → V (k) (19) converges to pθ1,θ2 locally in measure. Illustration of Main result H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) H(p)  k p0 k p qk R Xpdµ = V (k) + H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 ✓min p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 ✓min H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 ✓min H(p)  k p0 k p qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min Worst case localiser • By Theorem 2, almost worst case densities, i.e., densities with H(p) close to k and Xpdµ close to V (k), cluster in Bregman distance around the function pθ1,θ2 . • This function pθ1,θ2 is uniquely determined by k ∈ (0, kmax). It will be called worst case localiser (WCL), denoted by qk. • The parameters θ1, θ2 of the WCL qk = pθ1,θ2 need not be unique, but they are if qk > 0 µ-a.e. • If a worst case density (attaining the minimum in (6)) exists, it equals the WCL. However, the clustering property also holds when no worst case density exists. • A sufficient condition for qk = pθ1,θ2 to be the worst case density is (θ1, θ2) ∈ int dom K. • qk fails to be a density if and only if θ2 in Theorem 2 is such that pθ1,θ2 dµ < 1 for each θ1 with (θ1, θ2) ∈ Θ. Proof of Theorem 2 Instance of CsM 2012, Lemma 4.15 (proven by simple algebra): H(p) = θ1 + θ2 Xpdµ − K(θ1, θ2) + B(p, pθ1,θ2 ) + |β (ω, 0) − θ1 − θ2X(ω)|+p(ω)µ(dω). (20) For θ1, θ2 in Theorem 2, θ2V (k) = k + K(θ1, θ2) − θ1, hence H(p) = k − θ2V (k) + θ2 Xpdµ + B(p, pθ1,θ2 ) + |β (ω, 0) − θ1 − θ2X(ω)|+p(ω)µ(dω). (21) This proves (18), whence (19) follows as B(pn, q) → 0 implies pn → q locally in measure (CsM 2012, Corollary 2.14). Pythagorean identities For any b ∈ (m.M), (20) gives for densities p with Xpdµ = b and θ1, θ2 attaining F(b) = θ1 + θ2b − K(θ1, θ2) that H(p) = F(b) + B(p, pθ1,θ2 ) + correction term. (22) The correction term is the last integral in (20). When in the definition F(b) = infp: pdµ=1, Xpdµ=b H(p) the minimum is attained (necessarily by pθ1,θ2 ), and the correction term vanishes (specifically, when pθ1,θ2 > 0 µ-a.e.), this reduces to H(p) = H(pθ1,θ2 ) + B(p, pθ1,θ2 ), a familiar Pythagorean identity. The general version of identity (22) (for arbitrary noment mapping φ) appears in CsM 2012 as generalized Pythagorean identity. The novel feature of the proof of Theorem 2 is that identity (20) is applied also to densities p with Xpdµ = b. Example Autonomous integrand β(ω, s) = f (s) = − log s, let µ = P0. Then H(p) for p = dP/dP0 is the reverse I-divergence D(P0||P). Specifically, let Ω = (0, 1), X(ω) = ω, let µ = P0 have density 2ω with respect to the Lebesgue measure. Then f ∗ (r)=−1−log(−r) (r<0), K(θ1, θ2)= 1 0 [−1−log(−θ1−θ2ω)]2ωdω Θ = dom K = {(θ1, θ2) : θ1 ≤ 0, θ1 + θ2 < 0}. To θ2 < 0 there exists θ1 such that pθ1,θ2 (ω) = 1/(−θ2 − θ2ω) is a µ-density ( pθ1,θ2 (ω)2ωdω = 1) if and only if θ2 ≥ −2. Otherwise G(θ2) = K(0, θ2) = 1 0 [−1−log(−θ2ω)]2ωdω = − log(−θ2)−1/2. Calculus gives that θ2 < −2 is a maximizer of [k + G(θ2)]/θ2 if k = log(−θ2) − 1/2 > log 2 − 1/2. Then V (k) = e−(k+1/2), and qk(ω) = (θ2ω)−1, not a density. If k ≤ log 2 − 1/2 then θ2 ≥ −2 and qk = pθ1(θ2),θ2 is the worst case density (no explicit formulas). Illustration of Example ✓min ✓min = 1 tan2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 of radius ✓2 exponential family ✓min ✓min = 1 tan2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 ✓min = 1 tan2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 2 1 ✓2 ✓1(✓2) ⇥ 1 min tan2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 F(b) = J(1,b) = K⇤ (1,b) = sup ✓1,✓2 [✓ except possibly for b equal to m or M, see ( F(b) = sup ✓2 [✓2b G(✓2 where G(✓2) := inf ✓1 [K(✓1,✓ The function G will play a role as the logari tion does when in (3) is an I-divergence b The following family of functions will p families do for I-divergence minimisation: p✓1,✓2(r) := ( ⇤ )0 (r,✓1 + ✓2X( where ⇥ := (✓1,✓2) 2 dom K : ✓1 + ✓2X(r) < As (✓1,✓2) 2 domK implies (˜✓1,✓2) 2 dom K and ⇥ have the same projection t 7 The following family of functions will play a families do for I-divergence minimisation: p✓1,✓2(r) := ( ⇤ )0 (r,✓1 + ✓2X(r)), where ⇥ := (✓1,✓2) 2 dom K : ✓1 + ✓2X(r) < 0 (r As (✓1,✓2) 2 domK implies (˜✓1,✓2) 2 ⇥ fo dom K and ⇥ have the same projection to the 7 Theorem 1. For k 2 (0,kmax) V (k) = max ✓2<0 max ✓12R k + K(✓1,✓2) ✓1 ✓2 = max ✓2<0 k + G(✓2) ✓2 . (35) A pair (✓1,✓2) attains the first maximum in (35) if and only if it attains the maximum in (18), for b = V (k). Such (✓1,✓2) belongs to ⇥ and satisfiesR p✓1,✓2dµ  1. A maximiser for the second maximum in (35) is equivalently a maximiser of ✓2b G(✓2) where b = V (k). Proof. By Lemma 5, G is a closed convex function with G(0) = 0. Let G := {(✓2,G(✓2)) : ✓2 2 ⇥2, ✓2 < 0} (36) denote the graph of G restricted to negative arguments. As k + G(✓2 ✓2 (✓2 < 0) (37) equals the slope of the straigh line through (0, k) and (✓2,G(✓2)) 2 G, its maximum equals the slope of the supporting line to G through (0, k), see Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F(b), (38) the first equality by the definition of convex conjugate and the second one by (19). ✓1 The function G will play a role as the logarithmic tion does when in (3) is an I-divergence ball, se The following family of functions will play a families do for I-divergence minimisation: p✓1,✓2(r) := ( ⇤ )0 (r,✓1 + ✓2X(r)), where ⇥ := (✓1,✓2) 2 dom K : ✓1 + ✓2X(r) < 0 (r As (✓1,✓2) 2 domK implies (˜✓1,✓2) 2 ⇥ fo dom K and ⇥ have the same projection to the 7 As (✓1,✓2) 2 domK implies (˜✓1,✓2) 2 ⇥ for each ˜✓1 < ✓1, the s dom K and ⇥ have the same projection to the ✓2-axis. This projecti 7 Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F(b), (38) the first equality by the definition of convex conjugate and the second one by (19). 6 Our framework includes assumptions about the integrand not made there, which we need for other purposes and appear dispensible for the proof of (35). 11 G := {(✓2,G(✓2)) : ✓2 2 ⇥2, ✓2 < 0} (36) denote the graph of G restricted to negative arguments. As k + G(✓2 ✓2 (✓2 < 0) (37) equals the slope of the straigh line through (0, k) and (✓2,G(✓2)) 2 G, its maximum equals the slope of the supporting line to G through (0, k), see Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F(b), (38) the first equality by the definition of convex conjugate and the second one by (19). 6 Our framework includes assumptions about the integrand not made there, which we need for other purposes and appear dispensible for the proof of (35). 11 Figure 2: The supporting line has maximum slope b = (k + G(✓ Remark 3. The following geometric interpretation of the proof of T deserves emphasis, see Fig. 2. Denote by G := {(✓2,G(✓2)) : ✓2 2 ⇥2, ✓2  0} the graph of G restricted to nonpositive arguments. Recall that, by G is a closed convex function with G(0) = 0. Then (37) is the slo straigh line through (0, k) and (✓2,G(✓2)) 2 G, which is maxi the supporting line to G through (0, k). The proof of Theorem that this supporting line (exists and) has slope b = V (k). The m exponential family ✓min ✓min = 1 tan2 1/2 kr 2 1 Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan2 1/2 kr 2 Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan2 1/2 kcr 2 R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan2 1/2 kcr 2 A pathological example With Ω, µ, X in the previous example, now let P0 be the uniform distribution on Ω = (0, 1), that has µ-density p0(ω) = 1 2ω . Let H(p) be the Bregman distance Bf ,µ(p, p0) with f (s) = − log s, formally the functional Hβ(p) with β(ω, s) ∆f (s, p0(ω)) = − log s − log(2ω) + 2ω(s − 1 2ω ). Then β∗ (ω, r)=log 2ω−log(−r+2ω), (β∗ ) (ω, r)=1/(−r+2ω) (r < 2ω) and Θβ = domKβ consists of those (θ1, θ2) for which (θ1, θ2 − 2) belongs to the set Θ of the previous example. For (θ1, θ2) ∈ Θβ the function pθ1,θ2 (ω) = 1/[−θ1 − (θ2 − 2)ω)] coincides with pθ1,θ2−2(ω) of the previous example, never a density if θ2 < 0. Hence, in this example, no worst case density exists for any k > 0. When WCL is a density Theorem (i) For k ∈ (0, kmax), if the WCL qk is a density, it fails to be the worst case density only if θmin ∈ Θ2, G (θmin) > −∞ and k > kcr −G(θmin) + θminG (θmin). (23) (ii) qk is always a density if K(θ1, 0) < +∞ for all θ1 ∈ R. Proof. (θ1, θ2) ∈ Θ with qk = pθ1,θ2 maximizes θ1 + θ2V (k) − K(θ1, θ2). Taking directional derivative gives for (˜θ1, ˜θ2) ∈ dom K (˜θ1 − θ1)(1 − pθ1,θ2 dµ) + (˜θ2 − θ2)(V (p) − Xpθ1,θ2 dµ) ≤ 0. If pθ1,θ2 is a density, it follows that Xpθ1,θ2 dµ) ≤ V (k), with equality if θ1 > θmin. Under the hypothesis of (ii) one can take ˜θ2 = 0 and ˜θ1 arbitrarily large, this rules out pθ1,θ2 dµ < 1. Illustration of Theorem of radius ✓2 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ⇥ 1 qk R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 min tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ✓1(✓min) 1 ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ✓1(✓min) 1 Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 1 ✓2 ✓1(✓2) ✓1(✓min) 1 Since J⇤ = K implies J⇤⇤ = K⇤, and di↵er from J only on the boundary of d F (b) = J(1, b) = K⇤ (1, b) = s ✓ except possibly for b equal to m or M, s F (b) = sup ✓2 [✓2b G where G(✓2) := inf ✓1 [K( The function G will play a role as the lo tion does when in (3) is an I-divergen The following family of functions w families do for I-divergence minimisatio p✓1,✓2 (r) := ( ⇤ )0 (r, ✓1 + ✓ where ⇥ := (✓1, ✓2) 2 dom K : ✓1 + ✓2X As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2 dom K and ⇥ have the same projecti 7 The function G will play a role as the logarith tion does when in (3) is an I-divergence bal The following family of functions will pla families do for I-divergence minimisation: p✓1,✓2 (r) := ( ⇤ )0 (r, ✓1 + ✓2X(r) where ⇥ := (✓1, ✓2) 2 dom K : ✓1 + ✓2X(r) < As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2) 2 ⇥ dom K and ⇥ have the same projection to 7 V (k) = max ✓2<0 max ✓12R k + K(✓1, ✓2) ✓1 ✓2 = max ✓2<0 k + G(✓2) ✓2 . (35) A pair (✓1, ✓2) attains the first maximum in (35) if and only if it attains the maximum in (18), for b = V (k). Such (✓1, ✓2) belongs to ⇥ and satisfiesR p✓1,✓2 dµ  1. A maximiser for the second maximum in (35) is equivalently a maximiser of ✓2b G(✓2) where b = V (k). Proof. By Lemma 5, G is a closed convex function with G(0) = 0. Let G := {(✓2, G(✓2)) : ✓2 2 ⇥2, ✓2 < 0} (36) denote the graph of G restricted to negative arguments. As k + G(✓2 ✓2 (✓2 < 0) (37) equals the slope of the straigh line through (0, k) and (✓2, G(✓2)) 2 G, its maximum equals the slope of the supporting line to G through (0, k), see Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F (b), (38) the first equality by the definition of convex conjugate and the second one by (19). 6 Our framework includes assumptions about the integrand not made there, which we The function G will play a role as the lo tion does when in (3) is an I-divergen The following family of functions w families do for I-divergence minimisatio p✓1,✓2 (r) := ( ⇤ )0 (r, ✓1 + ✓ where ⇥ := (✓1, ✓2) 2 dom K : ✓1 + ✓2X As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2 dom K and ⇥ have the same projectio 7 As (✓1, ✓2) 2 dom K implies (˜✓1, ✓2) 2 ⇥ for each ˜✓1 < ✓1 dom K and ⇥ have the same projection to the ✓2-axis. This 7 Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F (b), (3 the first equality by the definition of convex conjugate and the second o by (19). 6 Our framework includes assumptions about the integrand not made there, which need for other purposes and appear dispensible for the proof of (35). 11 denote the graph of G restricted to negative arguments. As k + G(✓2 ✓2 (✓2 < 0) (37) equals the slope of the straigh line through (0, k) and (✓2, G(✓2)) 2 G, its maximum equals the slope of the supporting line to G through (0, k), see Fig. 1. Denote this maximum by b, then k = G⇤ (b) = F (b), (38) the first equality by the definition of convex conjugate and the second one by (19). 6 Our framework includes assumptions about the integrand not made there, which we need for other purposes and appear dispensible for the proof of (35). 11 Figure 2: The supporting line has maximum slope b = (k Remark 3. The following geometric interpretation of the pro deserves emphasis, see Fig. 2. Denote by G := {(✓2, G(✓2)) : ✓2 2 ⇥2, ✓2  0} the graph of G restricted to nonpositive arguments. Recall th G is a closed convex function with G(0) = 0. Then (37) is straigh line through (0, k) and (✓2, G(✓2)) 2 G, which i the supporting line to G through (0, k). The proof of T that this supporting line (exists and) has slope b = V (k). b = V (k) of (37) is attained if and only if (✓2, G(✓2)) is on line. R Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 Xpdµ = V (k) + R Xpdµ = V (k) Bregman ball of radius ✓2 exponential family ✓min ✓min = 1 tan 2 1/2 kcr 2 Critical value The condition of Theorem 3 (ii) holds, e.g., if H(p) is an f -divergence defined by a cofinite f (i.e., f (+∞) = +∞). Then the worst case density either exists for each k ∈ (0, kmax) or it exists if and only if k does not excceed a critical value. For f -divergences with non-cofinite f a similar result can be proved (using that in that case the standing assumption kmax > 0 implies m > −∞, and one may assume m = 0). It remains open whether the functional Hβ has this property also for each β ∈ B. Recalling the pathological example, this talk is concluded by the Conjecture: The set of those θ2 < 0 for which there exists θ1 with (θ1, θ2) ∈ Θ, pθ1,θ2 dµ = 1 (and hence qk = pθ1,θ2 is the worst case density for some k ∈ (0, kmax)) is, if nonempty, an interval with right endpoint 0.