Coordinate-wise transformation and Stein-type densities

Auteurs : Tomonari Sei
Publication GSI2017
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit


A Stein-type density function is de ned as a stationary point of the free-energy functional over a ber that consists of probability densities obtained by coordinate-wise transformations of a given density.
It is shown that under some conditions there exists a unique Stein-type density in each ber. An application to rating is discussed

Coordinate-wise transformation and Stein-type densities


application/pdf Coordinate-wise transformation and Stein-type densities Tomonari Sei
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

A Stein-type density function is de ned as a stationary point of the free-energy functional over a ber that consists of probability densities obtained by coordinate-wise transformations of a given density.
It is shown that under some conditions there exists a unique Stein-type density in each ber. An application to rating is discussed
Coordinate-wise transformation and Stein-type densities
application/pdf Coordinate-wise transformation and Stein-type densities (slides)


Voir la vidéo


97.84 Ko


Creative Commons Aucune (Tous droits réservés)


Sponsors Platine


Sponsors Bronze


Sponsors scientifique





<resource  xmlns:xsi=""
        <identifier identifierType="DOI">10.23723/17410/22341</identifier><creators><creator><creatorName>Tomonari Sei</creatorName></creator></creators><titles>
            <title>Coordinate-wise transformation and Stein-type densities</title></titles>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Optimal transport</subject><subject>coordinate-wise transformation</subject><subject>copositivity</subject><subject>copula</subject><subject>freeenergy functional</subject><subject>positive dependence</subject><subject>Stein-type density</subject></subjects><dates>
	    <date dateType="Created">Sun 18 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Sun 24 Mar 2019</date>
	    <alternateIdentifier alternateIdentifierType="bitstream">367f080c65f2fa94df3f9045a7c8384bb4e5b48f</alternateIdentifier>
            <description descriptionType="Abstract">A Stein-type density function is de ned as a stationary point of the free-energy functional over a ber that consists of probability densities obtained by coordinate-wise transformations of a given density.<br />
It is shown that under some conditions there exists a unique Stein-type density in each ber. An application to rating is discussed

Coordinate-wise transformation and Stein-type densities Tomonari Sei The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan Abstract. A Stein-type density function is defined as a stationary point of the free-energy functional over a fiber that consists of probability densities obtained by coordinate-wise transformations of a given density. It is shown that under some conditions there exists a unique Stein-type density in each fiber. An application to rating is discussed. Keywords: coordinate-wise transformation, copositivity, copula, free- energy functional, optimal transport, positive dependence, Stein-type density. 1 Introduction Let V (x) be a continuous function of x ∈ Rd . Consider a minimization problem of the free-energy functional E(p) = EV (p) = ∫ p(x) log p(x)dx + ∫ p(x)V (x)dx (1) over a restricted set of probability density functions p(x) on Rd . The functional E(p) is, as is well known, the Lagrange function for maximizing entropy under given moment ∫ p(x)V (x)dx. Equation (1) is also discussed in the theory of optimal transport (e.g. [7] and [15]). We first recall the solution of the unconstrained problem. Lemma 1. Let Z = ∫ e−V (x) dx. If Z < ∞, then the functional E is minimized at q(x) = e−V (x) /Z. If Z = ∞, then E is not bounded from below. Proof. If Z < ∞, we have E(p) = KL(p, q) − log Z, where KL(p, q) denotes the Kullback-Leibler divergence. Therefore E(p) is minimized at p = q. If Z = ∞, then E(pn) → −∞ as n → ∞ for pn(x) ∝ e−V (x) I[−n,n]d (x). u t Now consider a set of probability densities obtained by coordinate-wise trans- formations of a given density p0(x). Here a coordinate-wise transformation means T(x) = (T1(x1), . . . , Td(xd)), T0 i (xi) > 0. (2) If p0 is pushed forward by T, then the resultant probability density is p(x) = (T]p0)(x) = p0(T−1 1 (x1), . . . , T−1 d (xd)) d ∏ i=1 (T−1 i )0 (xi). We will call the set {p = T]p0 | T satisfies (2)} a fiber. It is widely known that each fiber has a unique copula density, which plays an important role in dependence modeling (e.g. [9]). Our problem is to minimize E over the fiber, even if Z = ∫ e−V (x) dx = ∞. In Theorem 1, it is shown that the stationary condition of E over the fiber is ∫ f(xi)∂iV (x)p(x)dx = ∫ f0 (xi)p(x)dx, ∀f ∈ C1 (R). (3) This equation is applied to a rating problem in Section 5. The following definition is a generalization of that in [13]. Definition 1 (Stein-type density). A d-dimensional probability density func- tion p(x) is called a Stein-type density with respect to V if it satisfies (3). The Stein-type density is named after the Stein identity (see e.g. [2], [14]) ∫ f(x)xφ(x)dx = ∫ f0 (x)φ(x)dx, that characterizes the standard normal density function φ(x) = e−x2 /2 / √ 2π. The Stein identity corresponds to d = 1 and V (x) = x2 /2 in (3). The remainder of the present paper is organized as follows. In Section 2, we describe the unique existence theorem on the constrained minimization problem, where the proof of existence is more challenging. In Section 3, we provide suffi- cient conditions for existence. In Section 4, examples of Stein-type densities are shown. In Section 5, we briefly discuss an application to rating. Throughout the paper, we assume that the density functions are continuous and positive over Rd . Otherwise, more careful treatment is necessary. 2 Main result Suppose that V (x) satisfies the following condition: V (x) = ψ(x1 + · · · + xd), ψ : non-negative, convex, lim x→±∞ ψ(x) = ∞. (4) The last condition is called coercive. For example, V (x) = |x1 +· · ·+xd| satisfies the condition. Note that Z = ∫ e−V (x) dx = ∞ when d ≥ 2. As a referee pointed out, the restriction that V is a function of the sum can be relaxed. However we assumed it to make the description simpler. Under Equation (4), we can restrict the domain of E to P = { p | ∫ x1p1(x1)dx1 = · · · = ∫ xdpd(xd)dxd } (5) without loss of generality, where pi denotes the i-th marginal of p. Indeed, E(p) is invariant under the translation xi 7→ xi +ai for any constants ai with ∑ i ai = 0. Note that the translation is a coordinate-wise transformation. For each p ∈ P, let Tcw(p) be the set of coordinate-wise transformations such that T]p ∈ P. Then the p-fiber is defined by Fp = {T]p | T ∈ Tcw(p)}. If p is not specified explicitly, we call it a fiber. The space P is the disjoint union of fibers. In the context of optimal transport, each fiber is a totally-geodesic subspace of the L2 -Wasserstein space (e.g. [15]). Stein-type densities are characterized by the following theorem. It holds for any convex function V (x) without the restriction (4). Theorem 1 (Characterization). Let V be a convex function on Rd . Then the following two conditions on p ∈ P are equivalent to each other: (i) p is Stein-type, (ii) the functional E restricted to the p-fiber is minimized at p. Proof. The proof relies on McCann’s displacement convexity [7]. For each p, E(T]p) = ∫ (T]p)(x) log(T]p)(x)dx + ∫ (T]p)(x)V (x)dx = ∫ p(x) log 1 ∏ i T0 i (xi) dx + ∫ p(x)V (T(x))dx, that is a convex functional of T. Thus it suffices to check the stationary condition. Consider a coordinate-wise transformation Tt (x) = x + tf(xi)ei parameterized by t, where f is any function and ei is the i-th unit vector. Then we have d dt E((Tt )]p) t=0 = − ∫ pi(xi)f0 (xi)dxi + ∫ p(x)f(xi){∂iV (x)}dx. Then the stationary condition (d/dt)E((Tt )]p)|t=0 = 0 is equivalent to (3). u t To state the main result, we define additional symbols. For each p ∈ P, denote the product of the marginal densities of p by p⊥ (x) = ∏d i=1 pi(xi). Definition 2 (Copositivity). For each p ∈ P, define β(p) = βV (p) = inf T ∈Tcw(p) ∫ V (T(x))p(x)dx ∫ V (T(x))p⊥(x)dx . If β(p) > 0, p is called copositive with respect to V . It is shown that β(p) takes a common value in each fiber and that β(p) ∈ [0, 1]. Sufficient conditions for copositivity are discussed in Section 3. On the other hand, there is a positive density function that is not copositive [13]. The following theorem is our main result. The result for ψ(x) = x2 /2 is proved in [13]. In that case, the theorem is interpreted as a non-linear analogue of the diagonal scaling theorem on matrices established by [6]. Theorem 2 (Existence and uniqueness). Let V satisfy Equation (4). As- sume that p0 ∈ P is copositive. Then there exists a unique Stein-type density in the p0-fiber. Proof. The uniqueness is a consequence of the displacement convexity used in the proof of Theorem 1, where strict convexity holds under the restriction (5). Now prove the existence in line with [13]. By Theorem 1, it is enough to show that the functional E|Fp0 restricted to the p0-fiber has its minimum. For that purpose, we prove that E|Fp0 is bounded from below, and each sublevel set {p ∈ Fp0 | E(p) ≤ M} is tight (refer to [7] for details). Note that E itself is not bounded from below. Let p = T]p0 with T ∈ Tcw(p0). The assumption on copositivity of p0 implies ∫ V (x)p(x)dx ≥ β ∫ V (x)p⊥ (x)dx, where β = β(p) = β(p0) > 0. Hence we obtain E(p) = ∫ p(x) log p(x) p⊥(x) dx + ∫ p(x) log p⊥ (x)dx + ∫ V (x)p(x)dx ≥ ∫ p(x) log p⊥ (x)dx + β ∫ V (x)p⊥ (x)dx = d ∑ i=1 ∫ pi(xi) log pi(xi)dxi + β ∫ V (x)p⊥ (x)dx. Therefore the problem is essentially reduced to the independent case p = p⊥ . By using Jensen’s inequality for V (x) = ψ( ∑ i xi), we have ∫ V (x)p⊥ (x)dx ≥ ∫ ψ(xi + (d − 1)c)pi(xi)dxi for each i, where c = c(p) = ∫ xjpj(xj)dxj does not depend on the index j due to the definition of P. By combining these results, we obtain E(p) ≥ d ∑ i=1 {∫ pi(xi) log pi(xi)dxi + β d ∫ ψ(xi + (d − 1)c)pi(xi)dxi } . Define a probability density q(x) = e− β 2d ψ(x+(d−1)c) A , A = ∫ e− β 2d ψ(x) dx < ∞, to obtain ∫ pi(xi) log pi(xi)dxi + β d ∫ ψ(xi + (d − 1)c)pi(xi)dxi = ∫ pi(xi) log pi(xi) q(xi) dxi − log A + β 2d ∫ ψ(xi + (d − 1)c)pi(xi)dxi ≥ − log A + β 2d ∫ ψ(xi + (d − 1)c)pi(xi)dxi. Then, putting α = −d log A, we have E(p) ≥ α + β 2d d ∑ i=1 ∫ ψ(xi + (d − 1)c)pi(xi)dxi (6) ≥ α + β 2 ψ(dc) (Jensen’s inequality) (7) ≥ α. (8) By (8), E|Fp0 is bounded from below. Define the sublevel set PM = {p ∈ Fp0 | E(p) ≤ M} for a fixed M. Equation (7) implies that, if p ∈ PM , then ψ(dc) is bounded from above and therefore c is contained in a bounded interval [−C, C]. Now a function defined by ψ∗(x) = min c∈[−C,C] ψ(x + (d − 1)c) is coercive. In addition, the inequality (6) shows that ∫ ψ∗(xi)pi(xi)dxi is bounded from above. Therefore we deduce that PM is tight. This completes the proof. u t For an independent density p(x) = ∏d i=1 pi(xi), it is obvious that β(p) = 1 from the definition. Therefore we have the following corollary. This is interpreted as a variational Bayes method [1] except that ∫ e−V (x) dx = ∞. Corollary 1. Let V satisfy (4). Then there is a unique independent Stein-type density. 3 Sufficient conditions for copositivity In this section, we discuss sufficient conditions for copositivity (see Definition 2). The notion of positive dependence plays a central role. A function f : Rd → R is called super-modular if for any x, y ∈ Rd , f(x ∨ y) + f(x ∧ y) ≥ f(x) + f(y). Here x ∨ y and x ∧ y are coordinate-wise maximum and minimum, respectively. For smooth functions f, the super-modularity is equivalent to ∂2 f ∂xi∂xj ≥ 0, i 6= j. The following lemma is straightforward. Lemma 2. Let V (x) = ψ(x1 +· · ·+xd) with a convex function ψ. Then for any coordinate-wise transformation T(x), the composite function V (T(x)) is super- modular. Although there are a number of variants of positive dependence, we use only three of them. Refer to [11] for further information. Definition 3 (Positive dependence). Let p(x) be a probability density func- tion on Rd . Then 1. p(x) is called MTP2 (multivariate totally positive of order 2) if p(x ∨ y)p(x ∧ y) ≥ p(x)p(y) for all x and y in Rd . In other words, log p(x) is super-modular. 2. p(x) is said to be associated if ∫ φ(x)ψ(x)p(x)dx ≥ ∫ φ(x)p(x)dx ∫ ψ(x)p(x)dx for any increasing functions φ, ψ : Rd → R. 3. p(x) is called PSMD (positive super-modular dependent) if ∫ φ(x)p(x)dx ≥ ∫ φ(x)p⊥ (x)dx for any super-modular function φ. Recall that p⊥ (x) = ∏ i pi(xi). These variants of positive dependence have the following implications. The first implication is called the FKG inequality [5]. Lemma 3 ([5], [3]). MTP2 ⇒ associated ⇒ PSMD. MTP2 is relatively easy to confrim whereas association is interpretable. A Gaussian distribution is MTP2 (resp. PSMD) if and only if all the partial corre- lation coefficients (resp. all the correlation coefficients) are non-negative [10, 8]. Graphical models with the MTP2 property are discussed in [4]. PSMD meets our purpose as follows. Theorem 3. Let V (x) satisfy (4). If p is PSMD, then p is copositive. Proof. For any T ∈ Tcw(p), Lemma 2 implies that V (T(x)) is super-modular. Therefore if p is PSMD, then ∫ V (T(x))p(x)dx ≥ ∫ V (T(x))p⊥ (x)dx, which means β(p) = 1. u t We also provide two other sufficient conditions for copositivity. Each of them holds for any non-negative V (x), and its proof is straightforward. Lemma 4. If p0(x) is copositive and there exist constants M, δ > 0 such that δ ≤ p(x)/p0(x) ≤ M for all x, then p(x) is also copositive. Lemma 5. If there exists δ > 0 such that p(x)/p⊥ (x) ≥ δ for all x, then p(x) is copositive. This condition holds if and only if the copula density (e.g. [9]) corresponding to p(x) is greater than or equal to the constant δ. 4 Examples of Stein-type densities Let V (x) = ψ(x1 + · · · + xd). First, by integral-by-parts formula, Equation (3) is equivalent to ( ∑ i ∂ip(x) ) + ψ0 (x1 + · · · + xd)p(x) = r(x), ∫ Rd−1 r(x)dx−i = 0, (9) When r(x) is given, the partial differential equation (9) is solved by the charac- teristic curve method. In particular, if r(x) = 0, the general solution is p(x) = 1 Z e−ψ(x1+···+xd) q(Q> x), (10) where Z is the normalizing constant, Q is a matrix such that (1/ √ d, Q) is an orthogonal matrix, and q is an arbitrary (d − 1)-dimensional density function. However, for given p0 ∈ P, it is generally difficult to find a Stein-type density that belongs to the p0-fiber. The case for ψ(x) = x2 /2 is investigated in [13]. Here we give another example. Example 1. Let d = 2 and V (x1, x2) = |x1 + x2|. Then the independent Stein- type density has the marginals pi(xi) = 1 4 cosh2 (xi/2) = exi (exi + 1)2 , i = 1, 2. This is a logistic distribution. One can directly confirm (9). 5 Application to rating Suppose that a d-dimensional density p(x) denotes a distribution of students’ marks on d subjects. We make a rule to determine the general score of each student. An answer is given as follows. Fix a convex function ψ. Typically ψ(x) = x2 /2 or ψ(x) = |x|. As long as p is copositive, Theorem 2 implies that there exists a coordinate-wise transformation T such that T]p is Stein-type. In particular, we obtain ∫ p(x)f(xi)ψ0 (T1(x1) + · · · + Td(xd))dx > 0 (11) for any increasing function f. Then we can use T1(x1) + · · · + Td(xd) as the general score of x. Refer to [12] for relevant information. Example 2. Let ψ(x) = |x|. Consider a probability density function p(x1, x2) = 0.1 4 e−|x1+0.1x2| {e−|x1−0.1x2−1| + e−|x1−0.1x2+1| }. A map T(x1, x2) = (x1, 0.1x2) attains the Stein-type density due to (10). The following contingency tables show that, under the law p, the signs of x1 and x1+x2 have negative correlation whereas those of x1 and x1+0.1x2 have positive. x1 + x2 < 0 x1 + x2 > 0 x1 < 0 0.198 0.302 x1 > 0 0.302 0.198 x1 + 0.1x2 < 0 x1 + 0.1x2 > 0 x1 < 0 0.342 0.158 x1 > 0 0.158 0.342 Indeed, (11) implies that the sign of xi − a for any i and a ∈ R has positive correlation with the sign of the general score. As referees pointed out, the general score depends on the choice of the poten- tial ψ. A natural choice would be ψ(x) = x2 /2 because then the scores x1, . . . , xd are transformed into normal quantiles if they are independent. On the other hand, if one concerns the “passing point” of a particular grading, the potential ψ(x) = |x| seems more preferable as seen in Example 2. Acknowledgements The author is grateful to three anonymous referees for their constructive com- ments. This work was supported by JSPS KAKENHI Grant Numbers JP26108003 and JP17K00044. References 1. Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer. 2. Chen, L. H. Y., Goldstein, L., and Shao, Q. (2011). Normal Approximation by Stein’s Method, Springer. 3. Christofides, T. C. and Vaggelatou, E. (2004). A connection between supermodular ordering and positive/negative association, J. Multivariate Anal., 88, 138–151. 4. Fallat, S., Lauritzen, S., Sadeghi, K., Uhler, C., Wermuth, N., and Zwiernik, P. (2017). Total positivity in Markov structures, Ann. Statist., 45 (3), 1152–1184. 5. Fortuin, C. M., Kasteleyn, P. W., and Ginibre, J. (1971). Correlation inequalities on some partially ordered sets, Comm. Math. Phys., 22, 89–103. 6. Marshall, A. W., Olkin, I., (1968). Scaling of matrices to achieve specified row and column sums. Numer. Math., 12, 83–90. 7. McCann, R. J. (1997). A convexity principle for interacting gases, Adv. Math., 128, 153–179. 8. Müller, A. and Stoyan, D. (2002). Comparison Methods for Stochastic Models and Risks, Wiley. 9. Nelsen, R. B. (2006). An Introduction to Copulas, 2nd ed., Springer. 10. Rüschendorf, L. (1981). Characterization of dependence concepts in normal distri- butions, Ann. Inst. Statist. Math., 33, 347–359. 11. Rüschendorf, L. (2013). Mathematical Risk Analysis, Springer. 12. Sei, T. (2016). An objective general index for multivariate ordered data, J. Multi- variate Anal., 147, 247–264. 13. Sei, T. (2017). Coordinate-wise transformation of probability distributions to achieve a Stein-type identity, Technical Report METR2017-04, Department of Mathematical Engineering and Information Physics, The University of Tokyo. 14. Stein, C. (1972). A bound for the error in the normal approximation to the dis- tribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. on Math. Statist. and Prob., Vol. 2, 583–602. 15. Villani, C. (2003). Topics in Optimal Transportation, American Mathematical So- ciety.