Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22339
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

For Markov chains with transition probabilities pij , the Shannon entropy rate is well-known to be equal to the sum of the −Σj pij log pij weighted by the stationary distribution. This expression derives from the chain rule specific to Shannon entropy. For an ergodic Markov chain, the stationary distribution is the limit of the marginal distributions of the chain. 
Here a weighted expression for the R´enyi entropy rate is derived from a chain rule, that involves escort distributions of the chain, specific to Rényi entropy. Precisely, the rate is equal to the logarithm of a weighted sum of the −Σj ps ij , where s is the parameter of Rényi entropy. The weighting distribution is the normalized left eigenvector of the maximum eigenvalue of the matrix (psij). This distribution, that plays the role of the stationary distribution for Shannon entropy, is shown to be the limit of marginals of escort distributions of the chain.

Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains

Collection

application/pdf Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains Philippe Regnault, Valérie Girardin, Loïck Lhote
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains

Média

Voir la vidéo

Métriques

0
0
127.52 Ko
 application/pdf
bitcache://024eab0023bb4b868318ab9eaadcf16eb3ccb438

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
logo_gdr-mia.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22339</identifier><creators><creator><creatorName>Valérie Girardin</creatorName></creator><creator><creatorName>Philippe Regnault</creatorName></creator><creator><creatorName>Loïck Lhote</creatorName></creator></creators><titles>
            <title>Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sun 18 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Thu 14 Jun 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">024eab0023bb4b868318ab9eaadcf16eb3ccb438</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>36974</version>
        <descriptions>
            <description descriptionType="Abstract">For Markov chains with transition probabilities pij , the Shannon entropy rate is well-known to be equal to the sum of the −Σj pij log pij weighted by the stationary distribution. This expression derives from the chain rule specific to Shannon entropy. For an ergodic Markov chain, the stationary distribution is the limit of the marginal distributions of the chain. <br />
Here a weighted expression for the R´enyi entropy rate is derived from a chain rule, that involves escort distributions of the chain, specific to Rényi entropy. Precisely, the rate is equal to the logarithm of a weighted sum of the −Σj ps ij , where s is the parameter of Rényi entropy. The weighting distribution is the normalized left eigenvector of the maximum eigenvalue of the matrix (psij). This distribution, that plays the role of the stationary distribution for Shannon entropy, is shown to be the limit of marginals of escort distributions of the chain.
</description>
        </descriptions>
    </resource>
.

Weighted Closed Form Expressions Based on Escort Distributions for Rényi Entropy Rates of Markov Chains. Philippe Regnault1 , Valérie Girardin2 , and Loı̈ck Lhote3 1 Laboratoire de Mathématiques de Reims, EA 4535, Université de Reims Champagne-Ardenne, BP 1039, 51687 Reims cedex 2, France. philippe.regnault@univ-reims.fr 2 Laboratoire de mathématiques Nicolas Oresme UMR CNRS 6139, 3 ENSICAEN, GREYC, UMR CNRS 6072, Université de Caen Normandie, BP5186, 14032 Caen, France. valerie.girardin@unicaen.fr, loick.lhote@unicaen.fr Abstract. For Markov chains with transition probabilities pij, the Shannon en- tropy rate is well-known to be equal to the sum of the − P j pij log pij weighted by the stationary distribution. This expression derives from the chain rule specific to Shannon entropy. For an ergodic Markov chain, the stationary distribution is the limit of the marginal distributions of the chain. Here a weighted expression for the Rényi entropy rate is derived from a chain rule, that involves escort distributions of the chain, specific to Rényi entropy. Precisely, the rate is equal to the logarithm of a weighted sum of the − P j ps ij, where s is the parameter of Rényi entropy. The weighting distribution is the normalized left eigenvector of the maximum eigenvalue of the matrix (ps ij). This distribution, that plays the role of the stationary distribution for Shannon entropy, is shown to be the limit of marginals of escort distributions of the chain. 1 Introduction Originally, the concept of entropy S(m) of a probability measure m on a finite or denu- merable set has been introduced by Shannon, by setting S(m) = − X i∈E mi log mi. It constitutes a measure of information – or uncertainty – of the distribution m in the sense that it is minimal when m is a Dirac measure and, for a finite set E, maximal if m is uniform. Apart from information theory to which it gave birth, it is an essential tool in all scientific fields involving probability measures, such as signal processing, computer science, probability and statistics. A large toolbox has been developed; see e.g., [3]. Due to the diversity of applications and constraints specific to each domain, alter- native measures of information have been developed since. Rényi entropy functionals, depending on a parameter s > 0, are an important instance of such measures. They are defined by R1(m) = S(m) and Rs(m) = 1 1 − s log X i∈E ms i ! , s 6= 1. A related toolbox has been developed too; see [13]. Further, the Rényi entropy of a random variable is the entropy of its distribution. Thus, Rs(Xn 0 ) = Rs(mn 0 ) for any random vector Xn 0 = (X0, . . . , Xn−1) with distribution mn 0 . For a random sequence X = (Xn)n≥0, the usual notion of averaging per time unit leads – when the limit exists – to the Rényi entropy rate Hs(X) = lim n→∞ 1 n Rs(Xn 0 ), The entropy rate H = H1 associated to Shannon entropy was originally defined in [15] for an ergodic – homogeneous aperiodic irreducible – Markov chain with a finite state space E as the sum of the entropies of the transition distributions pi = (pij)j∈E weighted by the probability of occurrence of each state i ∈ E according to the stationary distribution π of the chain, namely H(X) = − X i∈E πi X j∈E pij log pij, (1) where pij = P(Xn = j|Xn−1 = i). This expression, which has a direct interpretation in terms of dynamics of the chain, has allowed for a complete toolbox to be developed around Shannon entropy rate, including probability and statistical results. The convergence of 1 n S(Xn 0 ) to (1) is proven in [15] for ergodic finite chains. The proof, based on the chain rule specific to Shannon entropy, has since been extended to the denumerable case, and then under hypotheses weakened in many directions; see [6]. For ergodic Markov chains, the Shannon entropy rate is also known to be given by H(X) = λ′ (1), (2) where λ(s) denotes the dominant eigenvalue of the perturbated matrix P(s) = (ps ij)(i,j)∈E2 , s > 0. (3) The Rényi entropy rate for s > 0 of finite state space ergodic Markov chains is shown in [12] to be well-defined and given by Hs(X) = (1 − s)−1 log λ(s), s 6= 1. (4) An attempt to extend this expression to denumerable Markov chains has been proposed in [10], based on positive operator theory. Unfortunately, some necessary conditions on the involved operator are not checked by the authors, making their proof inaccurate. We refer to [5] for a correct statement and proof of (4) for denumerable Markov chains. Still, no closed form weighted expression similar to (1) exists when s 6= 1. The aim of the present paper is to fill this gap. Precisely, we will show that, under technical assumptions to be specified, Hs(X) = 1 1 − s log   X i∈E ui(s) X j∈E ps ij   , (5) where u(s) denotes the normalized left eigenvector of P(s) associated to the dominant eigenvalue λ(s). The proof will be based on a functional identity satisfied by Rényi entropy. This identity, strongly related to the escort distributions of the marginals of the chain, is seemingly closed to the chain rule satisfied by Shannon entropy. Escort distributions have initially been introduced in [2]. The escort distributions of a probability distribution m = (mi)i∈E on a finite or denumerable set E are m∗s = (m∗s i )i∈E, for s ∈ R, with m∗s i = ms i P j∈E ms j , i ∈ E, whenever P j∈E ms j is finite. They provide a tool for zooming at different parts of m, or for adapting m to constraints through their ability to scan its structure. They also constitute the geodesics of information geometry with respect to an affine connection naturally induced by the Kullback-Leibler divergence; see [4], [1], [16], and also [8] for an application to hypothesis testing. The escort distributions of a random variable are the escort distributions of its distribution, in short escorts. The chain rule for Rényi entropy is based on escorts. Further, we will show that the eigenvector u(s) is closely linked to the asymptotic behavior of the escorts of the marginals of the Markov chain. The paper is organized as follows. Since they constitute the keystone of the present paper, the chain rules for Shannon and Rényi entropies are recalled in Section 2. In Sec- tion 3, the asymptotic behavior of the sequence (P(s)n )n together with the existence and uniqueness of u(s), are considered. In Section 4, links between u(s) and the asymptotic behavior of escorts of marginal distributions of the Markov chain are highlighted. Finally, the weighted expression for Rényi entropy (5) is derived in Section 5. 2 Chain rules for Shannon and Rényi entropies The convergence of the normalized marginal entropies S(Xn 0 )/n to (1) has been derived in [15, Theorem 5], from the so-called chain rule satisfied by Shannon entropy S(X, Y ) = S(X) + S(Y |X), (6) for any pair of random variables X, Y taking values in finite or denumerable sets E and F. Here S(Y |X) denotes the so-called conditional entropy of Y given X, i.e., the entropy of the distribution of Y conditional to X weighted by the distribution mX of X, S(Y |X) = − X i∈E mX(i) X j∈F P(Y = j|X = i) log P(Y = j|X = i). Weighting by escort distributions of X instead of the distribution of X itself yields a sort of chain rule for Rényi entropy; see the axiomatics in [11]. Precisely, supposing that P j∈E P(X = j)s is finite, Rs(X, Y ) = Rs(X) + Rs(Y |X), (7) with Rs(Y |X) = 1 1 − s log   X i∈E m∗s X (i) X j∈F P(Y = j|X = i)s   . Therefore, adapting the original proof of the weighted expression of Shannon entropy rate in order to obtain a similar expression for Rényi entropy rate requires to study the escorts of the marginals of the Markov chains. The main tool in this aim is the perturbation P(s) given by (3) of the transition matrix P = (pij)(i,j)∈E2 . 3 The eigenvectors of the perturbated transition matrix For finite ergodic Markov chains, the matrix P(s) is positive, irreducible and aperiodic so that the Perron-Frobenius theorem applies. A unique positive dominant – with maximal modulus – eigenvalue λ(s) exists, with a spectral gap, namely |λ(s)| > sup{|λ| : λ eigenvalue of P(s), λ 6= λ(s)}. Left and right positive eigenvectors associated to λ(s) for P(s) exist, say u(s) and v(s) such that t u(s).P(s) = λ(s)t u(s) and P(s).v(s) = λ(s)v(s). (8) Moreover they can be normalized so that P i∈E ui(s) = 1 and P i∈E ui(s)vi(s) = 1. For denumerable ergodic Markov chains, a positive eigenvalue with maximal modulus λ(s) again generally exists for P(s). Still, a sequence of eigenvalues converging to λ(s) may exist, hindering the existence of a spectral gap. Additional assumptions are thus required. Sufficient conditions for the operator u 7→ t u.P(s) to be compact on ℓ1 = {u = (ui)i∈E : P i∈E |ui| < ∞}, and hence for the existence of a spectral gap, are proposed in the following proposition whose proof can be found in [5]. Moreover, the asymptotic behavior of P(s)n is also derived. Proposition 1 Let X = (Xn)n∈N be an ergodic Markov chain with finite or denumerable state space E, transition matrix P = (pij)(i,j)∈E2 and initial distribution µ = (µi)i∈E satisfying the following assumptions: C1. sup(i,j)∈E2 pij < 1; C2. ∃σ0 < 1 such that ∀s > σ0, both supi∈E P j∈E ps ij and P i∈E µs i are finite; C3. ∀ε > 0 and ∀s > σ0, ∃A ⊂ E finite such that supi∈E P j∈E\A ps ij < ε. Then, for s > σ0, the perturbed matrix P(s) defined by (3) has a unique positive eigenvalue λ(s) with maximum modulus and a spectral gap. Positive left and right eigen- vectors u(s) and v(s) exist for λ(s), and can be chosen such that P i∈E ui(s) = 1 and P i∈E ui(s)vi(s) = 1. Moreover, for all s > σ0, a positive real number ρ(s) < 1 exists such that P(s)n = λ(s)n C(s) + Oℓ1 ((ρ(s)λ(s)) n ), n ∈ N∗ , (9) where C(s) = v(s).t u(s). Proposition 1 clearly induces the following property X in 0 ∈En mn 0 (i0)s = t µs .P(s)n−1 .1 = λ(s)n−1 c(s) + O((ρ(s)λ(s))n−1 ), n ∈ N∗ , (10) where µs = (µs i )i∈E and c(s) = t µs .C(s).1 = t µs .v(s). This key property is called the quasi-power property in [7]. It is naturally satisfied for any finite ergodic Markov chain, since its transition matrix and initial distribution obviously satisfy conditions C1 to C3 with σ0 = −∞. In the denumerable case, condition C1 ensures mixing between the states of the chain, avoiding some very probable paths to asymptotically capture the chain. Condition C2 insures that t u.P(s) ∈ ℓ1 for any u ∈ ℓ1. Condition C3 forbids the probability mass to escape to infinity. 4 Escort distributions and left eigenvectors Up to our knowledge, escort distributions of random vectors or sequences have never been studied in the literature. Clearly, we do not aim here at studying them in a comprehensive way. We will establish a few properties linked to ergodic Markov chains, interesting in their own and that will finally yield the weighted closed form expression of Rényi entropy rates. First, the escort distribution of a random vector with independent components is equal to the product of the escort distributions of its components. In mathematical words, for n independent random variables Xn 0 = (X0, . . . , Xn−1), m∗s Xn 0 = n−1 O k=0 m∗s Xk , (11) for all s > 0 such that the escort distributions exist. If, in addition, Xn 0 are independent and identically distributed (i.i.d.) with common distribution m, then m∗s Xn 0 = (m∗s )⊗n . Such a random vector can be viewed as the n first components of an i.i.d. sequence and hence, as a Markov chain X whose transition matrix P has all rows equal to m. Then, provided that P i∈E ms i is finite, its perturbation P(s) has only two eigenvalues, first 0, and second λ(s) = P i∈E ms i with multiplicity 1. Easy computation yield t m∗s .P(s) = λ(s)t m∗s , meaning that the escort distribution m∗s of m is equal to the normalized left eigenvector u(s) of P(s). For dependent random sequences, in particular Markov chains, (11) is not true any- more. Still, the escorts of the marginals of the chain converge to u(s). Proposition 2 Let the assumptions of Proposition 1 be satisfied. Let u(s) be defined by (8). For any n ∈ N∗ , let mXn 0 denote the distribution of Xn 0 and m∗s Xn 0 its escorts. Then, the n-th marginal of m∗s Xn 0 converges to u(s) as n goes to infinity. Proof. Thanks to C2, P i∈E m∗s Xn 0 = t µs .P(s)n−1 .1 is finite and the escort m∗s Xn 0 is well- defined for s > σ0. Let marginn(m∗s Xn 0 ) denote its n-th marginal distribution. We have t marginn(m∗s Xn 0 ) = t µs .P(s)n−1 tµs.P(s)n−1.1 . Thanks to (9), t µs .P(s)n−1 = λ(s)n−1 c(s)t u(s) + Oℓ1 ((ρ(s)λ(s))n−1 ), where c(s) = t µs .v(s). Together with (10), this implies that t marginn(m∗s Xn 0 ) = λ(s)n−1 c(s)t u(s) + Oℓ1 ((ρ(s)λ(s))n−1 ) λ(s)n−1c(s) + O((ρ(s)λ(s))n−1) = t u(s) + oℓ1 (1), and the result follows.  Note that no clear relationship exists between the limit u(s) of the n-th marginal of the escort distribution and the escort distribution of the asymptotic distribution of the chain. First of all, they are not equal, and hence, limit and escort are not commuting operators in general. 5 Weighted expression for Rényi entropy rates We can now state and prove the main result of the paper. Theorem 1 Under the assumptions of Proposition 1, the Rényi entropy rate of the er- godic Markov chain X is Hs(X) = 1 1 − s log   X i∈E ui(s) X j∈E ps ij   , s > σ0, s 6= 1, (12) H1(X) = − X i∈E πi X j∈E pij log pij. (13) Proof. Let s > σ0, with s 6= 1. Applying (7) recursively yields Rs(Xn 0 ) = Rs(Xn−1|Xn−1 0 ) + Rs(Xn−1 0 ) = Rs(X0) + n−1 X k=1 Rs(Xk|Xk 0). (14) Iterated use of the Markov property yields Rs(Xk|Xk 0) = 1 1 − s log   X ik 0 ∈Ek m∗s Xk 0 (ik 0) X ik∈E ps ik−1ik   = 1 1 − s log   X i∈E   X ik−1 0 ∈Ek−1 m∗s Xk 0 (ik−1 0 , i)   X j∈E ps ij   . Since the sum P ik−1 0 m∗s Xk 0 (ik−1 0 , i) is the k–th marginal distribution of m∗s Xk 0 , Proposition 2 applies to show that it converges to ui(s). Finally, Rs(Xk|Xk 0) − − − − → k→∞ 1 1 − s log   X i∈E ui(s) X j∈E ps ij   , so that (14) yields 1 n Rs(Xn 0 ) − − − − → n→∞ 1 1 − s log   X i∈E ui(s) X j∈E ps ij   . For s = 1, R1(Xn 0 ) = S(Xn 0 ) and hence (13) is classical.  An obvious analogy exists between (12) and (13). The left eigenvector u(s) plays in (12) the role of the stationary distribution π of the chain in (13); if s = 1, then u(1) = π is the stationary distribution of X. The stationary distribution π is the asymptotic limit of the marginal distribution of the chain, while Proposition 2 provides an analogous asymptotic interpretation of u(s). Further, thanks to normalization, P i∈E u′ i(s) = 0, so that (12) tends to (13) as s tends to 1, proving that Hs is a continuous function of s > 0. Note that if X is an i.i.d. sequence with common distribution m, then all n-th marginals of m∗s Xn 0 are m∗s = u(s); see (11). Thus, Rs(Xn|Xn−1 0 ) does not depend on n and equals Rs(Xn|Xn−1 0 ) = 1 1 − s log   X i∈E m∗s i X j∈E ms j   = Rs(m). This expresses the well-known additivity of Rényi entropy for independent variables. Therefore, the Rényi entropy rate of any i.i.d. sequence reduces to the Rényi entropy of the common distribution, namely Hs(X) = Rs(m). 6 General entropy functionals A huge class of entropy functionals is defined by setting Sh,ϕ(m) = h( X i∈E ϕ(mi)). In all cases of interest, ϕ : [0, 1] → R+ and h : R+ → R, with either ϕ concave and h increasing or ϕ convex and h decreasing. Moreover h and ϕ behave such that Sh,ϕ takes only non-negative values, and h(z) is finite for all z ∈ R+. See [14], and also [5]. Rényi entropy is the unique (h, ϕ)-entropy satisfying (7) as well as Shannon entropy is the unique one satisfying the chain rule (6); see [11]. Up to our knowledge, no tractable functional identity of this sort exists for the others, making this approach hard to extend. An alternative consists in studying the asymptotic behavior of the entropy Sh,ϕ(mn 0 ) of the marginals of the chain through analytic combinatoric techniques and then de- rive the related entropy rate. So doing, [5] proves that the entropy rates Sh,ϕ(X) = limn→∞ 1 n Sh,ϕ(Xn 0 ) of ergodic Markov chains X, are degenerated (equal 0 or ±∞), for all (h, ϕ)-entropy functionals except Shannon and Rényi. Indeed, the normalizing term n appears to be badly adapted to other (h, ϕ)-entropy functionals. Through rescaling by a suitable normalizing sequence, [7] defines non-degenerated entropy rates Sh,ϕ,r(X) = limn→∞ 1 rn Sh,ϕ(Xn 0 ). Closed form expressions are obtained for ergodic Markov chains as functions of λ(s) or λ′ (s), thus extending (2) and (4). In the continuation of [7], weighted closed form expressions similar to (12) and (13) are to be derived for all (h, ϕ)-entropy functionals in [9]. Obtaining weighted closed form expressions for divergences between Markov chains would be a relevant further step, as well as the statement of properties in geometry of information linked to generalized entropy and divergence rates. References 1. Amari S., Nagaoka H., Methods of information geometry. Oxford University Press (2000). 2. Beck C., Schlogl F. Thermodynamics of chaotic systems. Cambridge University Press, Cam- bridge, Great Britain (1993). 3. Cover L., Thomas J., Elements of Information Theory. Wiley series in telecommunications, New-York (1991). 4. Csiszár I., I-Divergence Geometry of Probability Distributions and Minimization Problems. Ann. Probab. 3, pp141–158 (1975). 5. Ciuperca G., Girardin V., Lhote L., Computation of generalized entropy rates. Application and estimation for countable Markov chains IEEE Trans. Inf. Th. 57, pp4026–4034 (2011). 6. Girardin, V., On the Different Extensions of the Ergodic Theorem of Information Theory, in Recent Advances in Applied Probability, R. Baeza-Yates, J. Glaz, H. Gzyl, J. Hüsler & J. L. Palacios (Eds), Springer-Verlag, San Francisco, pp163-179 (2005). 7. Girardin V., Lhote L., Rescaling Entropy and Divergence Rates. IEEE Trans. Inf. Th. 61, pp5868–5882 (2015). 8. Girardin V., Regnault P., Escort distributions minimizing the Kullback–Leibler divergence for a large deviations principle and tests of entropy level. Ann. Inst. Stat. Math. 68, pp439– 468 (2016). 9. Girardin V., Lhote L., Regnault P., Different closed-form expressions for generalized entropy rates of Markov chains. Work in Progress. 10. Golshani L., Pasha E., Yari G., Some properties of Rényi entropy and Rényi entropy rate. Information Sciences 179, pp2426–2433 (2009). 11. Jizba P., Arimitsu T., Generalized statistics: yet another generalization. Physica A, 340, pp110–116 (2004). 12. Rached Z., Alajaji F. and Campbell L. L., Rényi’s divergence and entropy rates for finite alphabet Markov sources, IEEE Trans. Inf. Th. 47, pp1553-1561 (2001). 13. Rényi A., Probability Theory. North Holland, Amsterdam (1970). 14. Salicrú M., Menéndez M. L., Morales D. and Pardo L. Asymptotic distribution of (h, ϕ)- entropies, Comm. Stat.Th. Meth. 22, pp2015-2031 (1993). 15. Shannon C., A mathematical theory of communication, Bell Syst. Techn. J. 27, pp379–423, pp623-656 (1948). 16. Sgarro A., An Informational Divergence Geometry for Stochastic Matrices. Calcolo 15, pp41– 49 (1978) .