Empirical ϕ*-Divergence Minimizers for Hadamard Differentiable Functionals

Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22585
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit


We study some extensions of the empirical likelihood method, when the Kullback distance is replaced by some general convex divergence or j-discrepancy. We show that this generalized empirical likelihood method is asymptotically valid for general Hadamard differentiable functionals.

Empirical ϕ*-Divergence Minimizers for Hadamard Differentiable Functionals


application/pdf Empirical ϕ*-Divergence Minimizers for Hadamard Differentiable Functionals Patrice Bertail, Emmanuelle Gautherat, Hugo Harari-Kermadec
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Empirical ϕ*-Divergence Minimizers for Hadamard Differentiable Functionals


Voir la vidéo


165.53 Ko


Creative Commons Aucune (Tous droits réservés)


Sponsors Platine


Sponsors Bronze


Sponsors scientifique





<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22585</identifier><creators><creator><creatorName>Patrice Bertail</creatorName></creator><creator><creatorName>Emmanuelle Gautherat</creatorName></creator><creator><creatorName>Hugo Harari-Kermadec</creatorName></creator></creators><titles>
            <title>Empirical ϕ*-Divergence Minimizers for Hadamard Differentiable Functionals</title></titles>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Thu 8 Mar 2018</date>
	    <date dateType="Updated">Thu 8 Mar 2018</date>
            <date dateType="Submitted">Fri 20 Apr 2018</date>
	    <alternateIdentifier alternateIdentifierType="bitstream">a9d9c57fac091e3b9813550725978b40bd9bb458</alternateIdentifier>
            <description descriptionType="Abstract">We study some extensions of the empirical likelihood method, when the Kullback distance is replaced by some general convex divergence or j-discrepancy. We show that this generalized empirical likelihood method is asymptotically valid for general Hadamard differentiable functionals.

Empirical ϕ∗-Divergence Minimizers for Hadamard Differentiable Functionals Patrice Bertail, Emmanuelle Gautherat and Hugo Harari-Kermadec Patrice Bertail, MODAL’X, University Paris-Ouest, patrice.bertail@gmail.com Emmanuelle Gautherat, REGARDS, University of Reims Champagne Ardenne, emmanuelle.gautherat@univ-reims.fr Hugo Harari-Kermadec, ENS-Cachan, hugo.harari@ens-cachan.fr Abstract. We study some extensions of the empirical likelihood method, when the Kullback distance is replaced by some general convex divergence or ϕ-discrepancy. We show that this generalized empirical likelihood method is asymptotically valid for general Hadamard differentiable functionals. 1 Introduction Empirical likelihood ([22], [20, 21]) is now a classical method for testing or con- structing confidence regions for the value of some parameters in non-parametric or semi-parametric models. A possible interpretation of empirical likelihood is to see it as the minimization of the Kullback divergence, say K, between the em- pirical distribution of the data Pn and a measure (or a probability measure) Qn dominated by Pn, under linear or non-linear constraints imposed on Qn by the model (see [26, 19],[2]). Related results may be found in the probabilistic liter- ature about divergence or the method of entropy in mean (see [9, 16, 18, 12, 7, 8]). Some generalizations of the empirical likelihood method have also been ob- tained by using Cressie-Read discrepancies [1, 10] and led to some econometric extensions known as “generalized empirical likelihood” [19]. [4] have shown that Owen’s original method in the case of the mean can be extended to any regular convex statistical divergence or ϕ∗-discrepancy (where ϕ∗ is a regular convex function) under weak assumptions. The purpose of this paper is to show that this general method remains asymptotically valid for a large class of non linear pa- rameters, mainly Hadamard differentiable parameters in the same spirit as [2]. The layout of this paper is the following. In part 2, we first recall some basic facts about convex integral functionals and their dual representation. As a conse- quence, we briefly state the asymptotic validity of the corresponding “empirical ϕ∗-discrepancy” method in the case of M-estimators. In part 3, we then extend this method to general Hadamard differentiable functionals, T(P) in Rq. An in- teresting interpretation of these method is that the image by T of the ball centered at Pn with radius χ2 q (1 − α)/2n is, in regular cases, a confidence region asymp- totically with coverage probability 1−α, for any ”regular” ϕ∗-discrepancies. 2 Empirical ϕ∗-discrepancy minimizers 2.1 ϕ∗-discrepancy minimizers and duality We consider a measured space (X ,A ,M ) where M is a space of signed mea- sures. Let f be a measurable function defined from X to Rr, r ≥ 1. For any signed measure µ ∈ M , we write µ f = R fdµ and if µ is a density of probabil- ity, µ f = Eµ (f(X)). In the following, we consider ϕ, a convex function whose support d(ϕ), defined as {x ∈ R, ϕ(x) < ∞}, is assumed to be non-void (ϕ is said to be proper). We denote respectively infd(ϕ) and supd(ϕ), the extremes of this support. For every convex function ϕ, its convex dual or Fenchel-Legendre transform is given by ϕ∗ (y) = sup x∈R {xy−ϕ(x)}, ∀ y ∈ R. Recall that ϕ∗ is then a semi-continuous inferiorly (s.c.i.) convex function. We define by ϕ(i) the derivative of order i of ϕ when it exists. From now on, we will assume the following assumptions for the function ϕ. H1 ϕ is strictly convex and d(ϕ) contains a neighborhood of 0; H2 ϕ is twice differentiable on a neighborhood of 0; H3 (renormalization) ϕ(0) = 0 and ϕ(1)(0) = 0, ϕ(2)(0) = 1, which implies that ϕ has an unique minimum at zero; H4 ϕ is differentiable on d(ϕ), that is to say differentiable on int{d(ϕ)}, with right and left limits on the respective endpoints of the support of d(ϕ), where int{.} is the topological interior; H5 ϕ is twice differentiable on d(ϕ); H6 ϕ(1) is itself convex on his domain. Assumptions H4-H6 will be useful in studying the generalized empirical likeli- hood. Notice that H6 implies that the second order derivative of ϕ is bounded from below by some constant m > 0 on R+ ∩ d(ϕ), an assumption required in [4]. Let ϕ satisfies the hypotheses H1, H2, H3. Then, the Fenchel dual transform ϕ∗ of ϕ also satisfies these hypotheses. Under H1-H6, ϕ∗is convex, with a min- imum at 0, ϕ∗(0) = 0, non negative thus invertible on d(ϕ∗)∩R+ ϕ∗(2) is non increasing on d(ϕ∗)∩R−.The ϕ∗-discrepancy Iϕ∗ between Q and P, where Q is a signed measure and P a positive measure, is defined as follows: Iϕ∗ (Q,P) = ( R X ϕ∗  dQ dP −1  dP if Q  P +∞ else. (1) For more explanations on ϕ∗-discrepancies or divergences Csiszàr, see [9, 4, 7] and some historical comments, see [27, 28, 16, 17]. For us, the main interest of ϕ∗-discrepancies lies on the following duality representation, which follows from results of [3] on convex functional integrals (see also [18, 8, 4]). Theorem 1. Let P ∈M be a probability measure with a finite support and f be a measurable function on (X ,A ,M ). Let ϕ be a convex function satisfying assumptions H1-H3. If the following constraints qualification holds, Qual(P) : ∃µ ∈ M , ( µ f = θ0 infd(ϕ∗) < inf X dµ dP ≤ sup X dµ dP < supd(ϕ∗), P−a.s. , then, we have the dual equality: inf Q∈M ,(Q−P)f=θ0  Iϕ∗ (Q,P) = sup λ∈Rr  λ0 θ0 − Z X ϕ(λ0 f)dP  . (2) If ϕ satisfies H4, then the supremum on the right hand side of (2) is achieved at a point λ∗ and the infimum on the left hand side at Q∗ is given by Q∗ = (1+ϕ(1)(λ∗0 f))P. 2.2 Empirical optimization of ϕ∗-discrepancies Let X1,...Xn be i.i.d. r.v.’s defined on X with common probability measure P. Consider the empirical probability measure Pn = 1 n ∑n i=1 δXi , where δXi is the Dirac measure at Xi. We will first consider that the parameter of interest θ0 ∈ Rq is the solution of some M-estimation problem EP f(X,θ0) = 0, where f is a regu- lar differentiable function from X ×Rq → Rr. For simplicity, we now assume that f takes its value in Rq, that is r = q and that there is no over-identification problem. The over-identified case can be treated similarly by first reducing the problem to the strictly identified case (see [26, 5]). Denote Mn = {Qn ∈ M with Qn  Pn} =  Qn = ∑n i=1 qi δXi , (qi)1≤i≤n ∈ Rn . Considering this set of mea- sures, instead of a set of probabilities, can be partially explained by Theorem 1, to establish the existence of a solution for the dual problem. For a given ϕ, we define, by analogy to [21, 22], the quantity ∀θ ∈ Rq , βn(θ) = inf Qn∈Mn, Qn f(.,θ)=0 Iϕ∗ (Qn,Pn) We define the corresponding random confidence region Cn(r) =  θ ∈ Rq |∃Qn ∈ Mn with Qn f(.,θ) = 0 and nIϕ∗ (Qn,Pn) ≤ r , where r = r(α) is a quantity such that P(θ0 ∈ Cn(r)) = 1−α +o(1). The underlying idea of empirical likelihood and its extensions are actually a plug- in rule. Consider the functional defined by ∀θ ∈ Rq , β(P,θ) = inf Q∈M , QP, Qf(.,θ)=0 Iϕ∗ (Q,P) that is, the minimization of a contrast under the constraints imposed by the model. This can be seen as a projection of P on the model of interest for the given pseudo- metric Iϕ∗ . If the model is true at P, that is, if EP f(X,θ0) = 0 at the true under- lying probability P, then clearly β(P,θ0) = 0. A natural estimator of β(P,θ) for fixed θ is given by the plug-in estimator β(Pn,θ), which is βn(θ). This estimator can then be used to test β(P,θ) = 0 or, in a dual approach, to build confidence region for θ0 by inverting the test. For Qn in Mn, the constraints can be rewritten as (Qn −Pn)f(.,θ) = −Pn f(.,θ). Using Theorem 1, we get the dual representation βn(θ) := inf Qn∈Mn, (Qn−Pn)f(.,θ)=−Pn f(.,θ) Iϕ∗ (Qn,Pn) = sup λ∈Rq Pn  −λ0 f(.,θ)−ϕ(λ0 f(.,θ))  . (3) Notice that −x − ϕ(x) is a strictly concave function and that the function λ → λ0 f is also concave. The parameter λ can be simply interpreted as the Kuhn & Tucker coefficient associated to the original optimization problem. From this representation of βn(θ), we can now derive the usual properties of the empirical likelihood and its generalization. In the following, we will also use the notations fn = 1 n n ∑ i=1 f(Xi,θ), S2 n = 1 n n ∑ i=1 f(Xi,θ)f(Xi,θ)0 and S−2 n = (S2 n)−1 . The following theorem states that generalized empirical likelihood essentially behaves asymptotically like a self-normalized sum. Links to self-normalized sum for finite n have been investigated in [5, 6]. Theorem 2. Let X, X1,..., Xn be in Rp, i.i.d. with probability P and θ0 ∈ Rq such that EP f(X,θ0) = 0. Assume that S2 = EP f(X,θ0)f(X,θ0)0 is of rank q and that ϕ satisfies the hypotheses H1-H4. Assume that the qualification con- straints Qual(Pn) hold. For any α in ]0,1[, set r = χ2 q (1−α) 2 , where χ2 q (.) is the χ2 distribution quantile. Then Cn(r) is an asymptotic confidence region with lim n→∞ P(θ0 ∈ Cn(r)) = lim n→∞ P(nβn(θ0) ≤ r) = lim n→∞ P  nf 0 nS−2 n fn ≤ χ2 q (1−α)  = 1−α. The proof of this theorem starts from the convex dual-representation and follows the main arguments of [4] and [22] for the case of the mean. It is left to the reader. Remark 1. If ϕ is finite everywhere then the qualification constraints are not needed (this is for instance the case for the χ2 divergence). In this case, the empty set problem emphasized by [14] is solved. For empirical-likelihood, the Qual(P) constraint qualification simply says that there should be at least a solution which belongs to the support of the discrepancy. For the case of the mean, it boils down to assuming that θ belongs to the convex hull of the points. 3 Empirical ϕ∗-discrepancy minimizers for Hadamard differentiable functionals. We now extend the preceding results to general functional parameter θ0 = T(P) defined on the space of signed measures M taking their value in Rq . The empir- ical ϕ∗-discrepancy minimizers evaluated at θ is now defined by ∀θ ∈ Rq , βn(θ) = inf Qn∈Mn, T(Qn)=θ Iϕ∗ (Qn,Pn). (4) For any r > 0, define the empirical ball center at Pn with radius r by Mn(r) = {Qn ∈ Mn, nIϕ∗ (Qn,Pn) ≤ r} By using the arguments of [21] and [2],the confidence region given by Cn(r) = {θ ∈ Rq ,∃Qn ∈ Mn(r),θ = T(Qn)}, (5) with r = χ2 q (1−α) 2 has asymptotically coverage probability 1 − α.This was the main motivation of [2] for proving that empirical likelihood of general Hadamard differentiable functionals is still valid asymptotically. It means that the image by T of the ball with respect to Iϕ∗ , centered at Pn with radius χ2 q (1−α) 2n for any pseudo-metric Iϕ∗ is always an asymptotically 1−α confidence region for T(P). If T is a convex functional (in particular if it is linear) then the corresponding region is automatically convex (see also [15]). 3.1 Hadamard differentiability For this we consider the following abstract empirical process framework (see [29] for details). F is a subset of functions of L2(P) = {h, Ph2 < ∞} endowed with ||f||2,P = (P(f2))1/2. We assume that L∞(F) is equipped with the uniform norm ||Q̃−Q||F = dF (Q̃,Q) = sup h∈F |(Q̃−Q)h(.)|. We assume that expectations (resp. measures) are outer expectations (resp. outer measures) so that weak convergence is defined as Hoffman-Jörgensen conver- gence. This avoids measurability problems. For the same reason, we will also assume that F is image admissible Suslin (see [11, 29]). This ensures that the classes of the square functions and difference of square functions are P-measurable. Assume in addition that H7 F is a Suslin-Donsker Class of functions with envelop H (without loss of generality such that H ≥ 1) such that 0 < PH2(.) < ∞. Recall that expectation should be understood as outer expectation. Under H7, the empirical process n1/2(Pn − P) indexed by F converges (as an element of L∞(F)) to a limit GP, which is a tight Borel measurable element of L∞(F) such that the sample paths f → GP(f) are uniformly || . ||2,P continuous. Denote the covering number -the minimal number of ball of radius ε for any seminorm ||.|| needed to cover F- by N (ε,F,||.||). H8 The following usual uniform entropy condition holds Z ∞ 0 sup P̃∈D q log(N(ε||H||2,P̃,F,||.||2,P̃))dε < ∞, where D is the set of all discrete finitely probability measures P̃ with 0 < P̃H2(.) < ∞. Define now B(F,P), the subset of L∞(F) which are || ||2,P−uniformly contin- uous and bounded. We recall the following definition of Hadamard differentia- bility tangentially to B(F) adapted from [24]. Notice that differentiation taken tangentially to B(F,P) weakens the notion of differentiation and makes it eas- ier to check in statistical problems (see examples in [13, 24], Chap. 3.9 of [29] and Chap. 20 of [30]). Our result may be applied for instance the well known functional R FdG, see p. 298 of [30]. The empirical counterpart of this functional yields the Mann-Whitney statistic. It is known that this functional is Hadamard differentiable tangentially too some appropriate sets. The functional T from M ⊂ L∞(F) to Rq is said to be Hadamard differen- tiable at P ∈ M tangentially to B(F,P), say T is HDTF − P, iff there ex- ists a continuous linear mapping dTP : M →Rq, such that for every sequence µn → µ ∈ B(F,P), for every sequence tn → 0, as n → ∞ T(P+tnµn)−T(P) tn −dTP.µ → n→∞ 0 . For a Hadamard differentiable functional, T(1)(.,P) is the canonical gradient or influence function, that is any function from X to B1 such that dTPµ = µT(1)(.,P), with the normalization PT(1)(.,P) = 0. The following theorem establishes the validity of generalized empirical likeli- hood for Hadamard differentiable functionals. Theorem 3. Assume H1 to H8. If T defined on M is HDTF − P with gradient T(1)(.,P) and P(T(1)(.,P))2 < ∞ of rank q, for all α ∈ [@;1], for r = χ2 q (r−α) 2 , we have lim n→∞ P(θ0 ∈ Cn(r)) = 1−α. An interesting example of Hadamard differentiability is given in [24] in the frame- work of two-dimensional censored survival times, with applications to tests of independance between duration data (see [25]). The idea is to show that the two- dimensional cumulative hazard function is Hadamard differentiable tangentially to a well chosen class of functions (given in [24]). The same kind of results may be obtained directly for real Hadamard functionals of the cumulative hazard func- tion by the chain rule. Recall that Hadamard differentiability is fairly the weakest form of differentiability which ensures the validity of the chain rule (see [29]). Note that it is not needed to construct an empirical likelihood version adapted to the censored data as done for instance in [23] for univariate censored data. The censored structure is directly taken into account into the constraints. Comparisons between the two approaches would be of interest and will be the subject of fu- ture applied works. Other examples of interest may be found in [23] and may be treated by using Hadamard differentiability. In this framework, the choice of an adequate divergence is also a crucial issue which requires some extensive works. References 1. Baggerly, K.A. (1998). Empirical likelihood as a goodness-of-fit measure. Biometrika, 85, 535-547 2. Bertail, P. (2006). Empirical likelihood in some semi-parametric models, Bernoulli, 12, 299-331. 3. Borwein, J.M. and Lewis, A.S. (1991). Duality relationships for entropy like minimization problem, SIAM J. Optim., 29, 325-338. 4. Bertail, P., Harari-Kermadec, H. and Ravaille, D.(2007). ϕ-Divergence em- pirique et vraisemblance empirique généralisée, Annales d’Économie et de Statistique, 85, 131-158. 5. Bertail, P. and Gautherat, E. and Harari-Kermadec, H. (2005). Exponen- tial bounds for quasi-empirical likelihood, Working Paper n 34, CREST http://www.crest.fr/images/doctravail//2005-34.pdf Cited 12/12/2012. 6. Bertail, P., Gautherat, E. and Harari-Kermadec, H. (2008). Exponential bounds for multivariate self-normalized sums, Electronic communication in probability, 13, 628-640. 7. Broniatowski, M. and Keziou, A. (2006). Minimization of ϕ-divergences on sets of signed measures. Studia Sci. Math. Hungar., 43(4), 4032. 8. Broniatowski, M. and Keziou, A. (2009). Parametric estimation and tests through divergences and the duality technique. J. Multivariate Anal., 100(1), 16. 9. Csiszár, I. (1967). On topology properties of f-divergences. Studia Sci. Math. Hungar., 2, 3299. 10. Corcoran, S. (1998). Bartlett adjustment of empirical discrepancy statistics. Biometrika, 85, 967-972. 11. Dudley, R.M. (1984). A course on empirical processes. Ecole d’été de prob- abilité de Saint Flour. Lecture Notes in Mathematics, 1097, 2-241. Springer- Verlag, N.Y. 12. Gamboa, F. and Gassiat, E. (1996). Bayesian methods and maximum en- tropy for ill-posed inverse problems, Annals of Statistics, 25, 328-350. 13. Gill, R.D. (1989). Non- and semiparametric Maximum Likelihood Estima- tors and the von Mises Method, Scand. J. Statist.,16, 97-128. 14. M. Grendar and G. Judge(2009). Empty set problem of maximum empirical likelihood methods, Electron. J. Statist.,3, 1542-1555. 15. Hall, P. and La Scala, B. (1990). Methodology and Algorithms of Empirical Likelihood, Int. Statist. Rev., 58, 109-127. 16. Liese, F., Vajda, I. (1987). Convex Statistical distances. Teubner, Leipzig. 17. Léonard, C. (2001). Convex conjugates of integral functionals, Acta Mathe- matica Hungarica, 93, 253-280. 18. Leonard, C. (2001). Minimizers of Energy functionals. Acta Math. Hungar. , 93, 281-325. 19. Newey, W. K. and Smith, R. J. (2004), Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators, Econometrica, 72, 219-255. 20. Owen, A.B. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single Functional, Biometrika, 75, 2, 237-249. 21. Owen, A.B. (1990). Empirical Likelihood Ratio Confidence Regions. Ann. Statist., 18, 90-120. 22. Owen, A.B. (2001). Empirical Likelihood. Chapman and Hall/CRC. 23. Pan, X.R. and Zhou, M. (2002). Empirical likelihood in terms of cumulative hazard function for censored data. Journal of Multivariate Analysis, 80, 166- 188. 24. Pons, O., Turckheim E. (1991). Von Mises method, Bootstrap and Hadamard differentiability, Statistics, 22, 205-214. 25. Pons, O., Turckheim E. (1991). Tests of Independence for Bivariate Cen- sored Data Based on the Empirical Joint Hazard Function, Scandinavian Journal of Statistics, 18, 1, 21-37. 26. Qin, J. and Lawless, J. (1994). Empirical Likelihood and General Estimating Equations, Ann. Statist., 22, 300-325. 27. Rockafeller, R. (1968). Integrals which are Convex Functionals, Pacific J. Math., 24, 525-339. 28. Rockafellar, R. T. (1970). Convex Analysis, Princeton University Press, Princeton, NJ. 29. van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Em- pirical Process: With Applications to Statistics, Springer Verlag. 30. van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge series in Sta- tistical and Probabilistic Mathematics.