## Process comparison combining signal power ratio and Jeffrey's divergence between unit-power signals

07/11/2017**Publication**GSI2017

**OAI :**oai:www.see.asso.fr:17410:22590

__connecter__ou vous

__enregistrer__pour accéder à ou acquérir ce document.

- Accès libre pour les ayants-droit

## Résumé

## Collection

*Eric Grivel, Leo Legrand*

__connecter__ou vous

__enregistrer__pour accéder à ou acquérir ce document.

- Accès libre pour les ayants-droit

## Auteurs

## Média

## Métriques

<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd"> <identifier identifierType="DOI">10.23723/17410/22590</identifier><creators><creator><creatorName>Eric Grivel</creatorName></creator><creator><creatorName>Leo Legrand</creatorName></creator></creators><titles> <title>Process comparison combining signal power ratio and Jeffrey's divergence between unit-power signals</title></titles> <publisher>SEE</publisher> <publicationYear>2018</publicationYear> <resourceType resourceTypeGeneral="Text">Text</resourceType><dates> <date dateType="Created">Thu 8 Mar 2018</date> <date dateType="Updated">Thu 8 Mar 2018</date> <date dateType="Submitted">Tue 13 Nov 2018</date> </dates> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="bitstream">240c9de62c43a4e6cc14f0cdcd48f6f4133dabc7</alternateIdentifier> </alternateIdentifiers> <formats> <format>application/pdf</format> </formats> <version>37312</version> <descriptions> <description descriptionType="Abstract">Jeffrey's divergence (JD), the symmetric Kullback-Leibler (KL) divergence, has been used in a wide range of applications. In recent works, it was shown that the JD between probability density functions of k successive samples of autoregressive (AR) and/or moving average (MA) processes can tend to a stationary regime when the number k of variates increases. The asymptotic JD increment, which is the difference between two JDs computed for k and k-1 successive variates tending to a nite constant value when k increases, can hence be useful to compare the random processes. However, interpreting the value of the asymptotic JD increment is not an easy task as it depends on too many parameters, i.e. the AR/MA parameters and the driving-process variances. In this paper, we propose to compute the asymptotic JD increment between the processes that have been normalized so that their powers are equal to 1. Analyzing the resulting JD on the one hand and the ratio between the original signal powers on the other hand makes the interpretation easier. Examples are provided to illustrate the relevance of this way to operate with the JD. </description> </descriptions> </resource>

Process comparison combining signal power ratio and Jeffrey’s divergence between unit-power signals Eric Grivel1 and Leo Legrand12 1 Bordeaux university - INP Bordeaux ENSEIRB-MATMECA - IMS - UMR CNRS 5218, Talence, FRANCE 2 Thales Systèmes Aéroportés, Site de Merignac, FRANCE Abstract. Jeffrey’s divergence (JD), the symmetric Kullback-Leibler (KL) divergence, has been used in a wide range of applications. In recent works, it was shown that the JD between probability density functions of k successive samples of autoregressive (AR) and/or moving average (MA) processes can tend to a stationary regime when the number k of variates increases. The asymptotic JD increment, which is the difference between two JDs computed for k and k−1 successive variates tending to a finite constant value when k increases, can hence be useful to compare the random processes. However, interpreting the value of the asymptotic JD increment is not an easy task as it depends on too many parameters, i.e. the AR/MA parameters and the driving-process variances. In this paper, we propose to compute the asymptotic JD increment between the processes that have been normalized so that their powers are equal to 1. Analyzing the resulting JD on the one hand and the ratio between the original signal powers on the other hand makes the interpretation easier. Examples are provided to illustrate the relevance of this way to operate with the JD. 1 Introduction Comparing stochastic processes such as autoregressive and/or moving-average (AR, MA or ARMA) processes can be useful in many applications, from speech processing to biomedical applications, from change detection to process classifi- cation. Several ways exist: cepstral distance can be useful, for instance for EEG classification [1]. Power spectrum comparison is also of interest. Several distances have been proposed such as the COSH distance, the log spectral distance or the Itakura-Saito distance. They have been widely used, especially in speech pro- cessing [2]. Alternative solutions consist in measuring the dissimilarity between probability density functions (pdf) of data. In this case, divergences or distances can be computed such as the Hellinger distance and the Bhattacharyya diver- gence. The reader may refer to [3] for a comparative study between them. Met- rics in the information geometry can be also seen as dissimilarity measures. The reader may refer to [4] where the information geometry of AR-model covariance matrices is studied. Alternatively, the practitioner often selects the Kullback- Leibler (KL) divergence [5]. This is probably due to the fact that a ’direct’ and ’simple’ expression of the KL can be deduced for Gaussian pdf. – When comparing autoregressive (AR) processes, a recursive way to deduce the Jeffrey’s divergence (JD), which is the symmetric Kullback-Leibler (KL) divergence, has been proposed in [6]. The approach has been also extended to classify more than two AR processes in various subsets [7]. – When dealing with the comparison between moving-average (MA) processes, we recently gave the exact analytical expressions of the JD between 1st -order MA processes, for any MA parameter and any number of samples. Moreover, the MA processes can be real or complex, noise-free or disturbed by additive white Gaussian noises. For this purpose, we used the analytical expression of each element of the tridiagonal-correlation-matrix inverse [8]. – Comparing AR and MA processes using the JD has been presented in [9] by taking advantage of the expression of the correlation-matrix inverse [10]. In the above cases, some links with the Rao distance have been drawn [11] when it was possible. It was for instance confirmed that the square of the Rao distance was approximately twice the value of the JD, except when a 1st -order MA process is considered whose zero is close to the unit-circle in the z-plane. We also concluded that the JD tends to a stationary regime because the JD increment, i.e. the difference between two JDs computed for k and k−1 successive variates, tends to a constant value when k increases. This phenomenon was observed for most cases, except for ARMA processes whose zeros are on the unit- circle in the z-plane. In addition, we showed that the asymptotic JD increment was sufficient to compare these random processes. This latter depends on the process parameters. In practice, the comparison can operate with the following steps: given the AR/MA parameters, the asymptotic JD increment is evaluated. The computation cost is smaller and the JD is no longer sensitive to the choice of k. Nevertheless, interpreting the value of the asymptotic JD increment is not necessarily easy especially because it is a function of all the parameters defining the processes under study. For this reason, we propose to operate differently in this paper. Instead of comparing two stochastic processes by using the asymp- totic JD increment, we rather suggest computing the asymptotic JD increment between the processes that have been normalized so that their powers are equal to 1 and looking simultaneously at the ratio between the powers of the two orig- inal processes. We will illustrate the benefit of this approach in the following cases: when a 1st -order AR process is compared with a white noise and then when two real 1st -order AR processes are compared. Note that due to the lack of space, other illustrations based on MA processes cannot be presented. This paper is organized as follows: in sections 2, we briefly recall the defini- tions and properties of the AR processes. In section 3, the expression of the JD is introduced and our contributions are presented. Illustrations are then proposed. In the following, Ik is the identity matrix of size k and Tr the trace of a matrix. The upper-script T denotes the transpose. xk1:k2 = (xk1 , ..., xk2 ) is the collection of samples from time k1 to k2. l = 1, 2 is the label of the process under study. 2 Brief presentation of the AR processes Let us consider the lth autoregressive (AR) process with order p. Its nth sample, denoted as xl n, is defined as follows: xl n = − p X i=1 al ixl n−i + ul n (1) where the driving process ul n is white, Gaussian, zero-mean with variance σ2 u,l. These wide-sense stationary processes are characterized by their correlation func- tions, denoted as rAR,l,τ , with τ the lag. In addition, the Toeplitz covariance ma- trices are denoted as QAR,l,k for l = 1, 2. Note that for 1st -order AR processes, the correlation function satisfies: rAR,l,τ = (−al 1)|τ| 1−(al 1)2 σ2 u,l. In addition, given (1), the AR processes can be seen as the outputs of filters whose inputs are zero-mean white sequences with unit-variances and with trans- fer functions Hl(z) = σu,l 1 Qp i=1 (1−pl iz−1) , where {pl i}i=1,...,p are the poles. The inverse filters are then defined by the transfer functions H−1 l (z). 3 Jeffrey divergence analysis The Kullback-Leibler (KL) divergence between the joint pdf of k successive val- ues of two random processes, denoted as p1(x1:k) and p2(x1:k), can be evaluated to study the dissimilarities between the processes [12]: KL (1,2) k = Z x1:k p1(x1:k)ln p1(x1:k) p2(x1:k) dx1:k (2) If the real processes are Gaussian with means µ1,k and µ2,k and covariance matrices Q1,k and Q2,k, it can be shown that the KL satisfies [13]: KL (1,2) k = 1 2 h Tr(Q−1 2,kQ1,k) − k − ln detQ1,k detQ2,k + (µ2,k − µ1,k)T Q2,k −1 (µ2,k − µ1,k) i (3) As the KL is not symmetric, the Jeffrey’s divergence (JD) can be preferred: JD (1,2) k = 1 2 (KL (1,2) k + KL (2,1) k ) (4) For zero-mean processes and given (4), the JD can be expressed as follows: JD (1,2) k = −k + 1 2 Tr(Q−1 2,kQ1,k) + Tr(Q−1 1,kQ2,k) (5) In the following, our purpose is to study the behavior of the JD when k increases. Therefore, let us introduce the asymptotic JD increment defined by: ∆JD(1,2) = −1 + 1 2 [∆T2,1 + ∆T1,2] (6) with the asymptotic increments Tl,l0 of the trace -(l, l0 ) = (1, 2) or (2, 1)- : ∆Tl,l0 = lim k→+∞ Tr(Q−1 l,k Ql0,k) − Tr(Q−1 l,k−1Ql0,k−1) (7) In the next section, we analyze the relevance of JD when comparing processes that are normalized so that their powers are equal to 1. 4 Applications 4.1 JD between a 1st -order AR process and a white noise By taking advantage of the inverse of the correlation matrix of a 1st -order AR process [10], the asymptotic JD increment between a 1st -order AR process and a white noise can be expressed as follows: ∆JDk (AR,W N) = −1 + 1 2 [∆TW N,AR + ∆TAR,W N ] (8) where ∆TW N,AR = σ2 u,1 σ2 u,2 1 (1 − (a1 1) 2 ) and ∆TAR,W N = σ2 u,2 σ2 u,1 (1 + (a1 1) 2 ) (9) Let us see if one can easily analyze the sensitivity of the JD with respect to the process parameters. For this reason, (8) is first rewritten as follows: ∆JD(AR,W N) = −1 + 1 2 1 Ru 1 (1 − (a1 1)2 ) + Ru(1 + (a1 1) 2 ) (10) with Ru = σ2 u,2 σ2 u,1 . Then, let us express σ2 u,2 as σ2 u,1 +δσ2 u and introduce the relative difference between the noise-variances ∆σ2 u = δσ2 u σ2 u,1 . This leads to: ∆JD(AR,W N) = −1 + 1 2 [ 1 1 + ∆σ2 u 1 (1 − (a1 1)2 ) + (1 + ∆σ2 u)(1 + (a1 1) 2 )] (11) Fig. 1: Asymptotic JD increment as a function of the AR parameter and ∆σ2 u In Fig. 1, the asymptotic JD increment between an AR process and a white noise is presented as a function of the AR parameter a1 1 and the relative difference between the noise-variances ∆σ2 u. This latter varies in the interval ]−1, 20] with a step equal to 0.03. Only positive values of the AR parameter a1 1 are considered because ∆JD(AR,W N) is an even function with respect to the AR parameter in this case. In addition, the AR parameter varies with a step equal to 0.01. Therefore, this illustration makes it possible to present a large set of situations that could happen. Let us now give some comments about Fig. 1: When the AR parameter is equal to zero, ∆JD(AR,W N) is only equal to 0 when ∆σ2 u is equal to 0. Indeed, this amounts to comparing two zero-mean white noises with the same variance. Then, assuming that ∆σ2 u is still equal to zero, ∆JD(AR,W N) is all the higher as the power spectral density (PSD) of the AR process is spikier, i.e. the modulus of the AR pole tends to 1. When ∆σ2 u increases or decreases, the phenomenon remains the same. From a theoretical point of view, this is confirmed by calculating the derivative of ∆JD(AR,W N) with respect to the AR parameter a1 1. However, the range of the values taken by ∆JD(AR,W N) is different. One can also calculate the derivative of ∆JD(AR,W N) with respect to ∆σ2 u. In this case, one can notice that the minimum value of ∆JD(AR,W N) is obtained when ∆σ2 u = q 1 1−(a1 1)4 −1 and is equal to √ 1−(a1 1)4 1−(a1 1)2 −1. As a conclusion, when looking at Fig. 1, ∆JD(AR,W N) takes into account the differences between all the process parameters, but it is not easy to know whether the value that is obtained is due to differences in terms of signal magnitudes and/or spectral shapes. One value of ∆JD(AR,W N) is related to several situations. Instead of using this criterion on the processes themselves, we suggest consid- ering the ratio between the random processes on the one hand and the asymptotic increment of the JD between the processes whose powers have been normalized respectively by the square of their process, i.e. σu,1 √ (1−(a1 1)2 ) and σu,2. It should be noted that in practical cases, the signal powers can be easily estimated from the data. In this case, ∆TW N,AR is divided by the power of the AR process σ2 u,1 (1−(a1 1)2 ) and multiplied by the power of the white noise σ2 u,2. Similarly, ∆TAR,W N is di- vided by the power of the white noise σ2 u,2 and multiplied by the AR-process power σ2 u,1 (1−(a1 1)2 ) . Therefore, given (8), it can be easily shown that the asymp- totic increment of the JD between the normalized AR process (nAR) and the normalized white noise (nWN) is equal to: ∆JDk (nAR,nW N) = −1 + 1 2 1 + 1 + (a1 1) 2 1 − (a1 1)2 ! (12) When a1 1 is equal to 0, both processes have a PSD that is flat and the asymptotic increment of the JD between the unit-power processes is equal to 0. Meanwhile, one can easily compare the powers of both non-normalized processes. When the AR-parameter modulus increases, the PSD of the AR process tends to be spikier and spikier whereas the PSD of the white noise is flat. In this case, the asymptotic JD increment becomes larger and larger. It is also illustrated in Fig. 2. Fig. 2: Asymptotic JD increment between a unit-power AR process and a unit-power white noise as a function of the AR parameter By removing the influence of the noise variances in the asymptotic JD incre- ment, it is easier to give an interpretation related to the spectral shapes of the processes. Meanwhile, one can compare the powers of both processes. 4.2 JD between two 1st -order AR processes In [6], we suggested using a recursive way to deduce the JD two 1st -order AR processes: ∆JD(AR1,AR2) = A + B (13) with ( A = −1 + 1 2 (Ru + 1 Ru ) B = (a2 1−a1 1)2 2 h 1 1−(a1 1)2 1 Ru + 1 1−(a2 1)2 Ru i (14) By reorganizing the terms in (14), one has: ∆JD(AR1,AR2) = −1 + 1 2 [∆TAR1,AR2 + ∆TAR2,AR1 ] (15) = −1 + 1 2 (Ru 1 − 2a1 1a2 1 + (a1 1)2 1 − (a2 1)2 + 1 Ru 1 − 2a1 1a2 1 + (a2 1)2 1 − (a1 1)2 ) where ∆TAR1,AR2 = Ru 1 − 2a1 1a2 1 + (a1 1)2 1 − (a2 1)2 and ∆TAR2,AR1 = 1 Ru 1 − 2a1 1a2 1 + (a2 1)2 1 − (a1 1)2 ) (16) It should be noted that when Ru = 1, ∆JD(AR1,AR2) is a symmetric function of the AR parameters: ∆JD(AR1,AR2) = (a2 1 − a1 1)2 2 ( 1 1 − (a2 1)2 + 1 1 − (a1 1)2 ) (17) In Fig. 3, ∆JD(AR1,AR2) is presented as a function of the AR parameters a1 1 and a2 1 and for two different cases: Ru = 1, Ru = 3 2 . Note that the AR parameters vary in the interval ] − 1, 1[ with a small step equal to 0.03. When Ru = 1, ∆JD(AR1,AR2) is a symmetric function with respect to the AR parameters of both processes. Nevertheless, when Ru 6= 1, this is no longer true. To help for interpretation, let us now normalize both processes respectively by σu,1 √ (1−(a1 1)2 ) and σu,2 √ (1−(a2 1)2 ) so that the process powers become equal to 1. Using (15), it can be easily shown that the asymptotic increment of the JD between the normalized AR processes becomes equal to: ∆JD(nAR1,nAR2) = −1 + 1 2 ( 1 − 2a1 1a2 1 + (a1 1)2 1 − (a1 1)2 + 1 − 2a1 1a2 1 + (a2 1)2 1 − (a2 1)2 ) (18) = (a1 1 − a2 1) a1 1 1 − (a1 1)2 − a2 1 1 − (a2 1)2 As depicted in Fig. 4 and according to (18), ∆JD(nAR1,nAR2) is equal to 0 when the AR parameters are the same. In addition, it is a symmetric function with respect to the AR parameters of both processes. Looking at the power ratio at the same time is then a way to clearly see that the processes have the same spectral shapes but their powers are not the same. This could not be pointed out by only looking at the JD between non-normalized AR processes. 5 Conclusions Interpreting the value of the asymptotic JD increment is not necessarily straight- forward because the influences of the process parameters are mixed. To make the interpretation easier, two criteria should rather be taken into account: the pro- cess power ratio and the asymptotic increment of the JD between the processes that are preliminary normalized so that their powers are equal to 1. References 1. K. Assaleh, H. Al-Nashash, and N. Thakor, “Spectral subtraction and cepstral distance for enhancing EEG entropy,” IEEE Proc. of the Engineering in Medicine and Biology, vol. 3, pp. 2751–2754, 2005. 2. W. Bobillet, R. Diversi, E. Grivel, R. Guidorzi, M. Najim, and U. Soverini, “Speech enhancement combining optimal smoothing and errors-in-variables identification of noisy AR processes,” IEEE Trans. on Signal Processing, vol. 55, pp. 5564–5578, December 2007. 3. K. Abou-Moustafa and F. P. Ferrie, “A note on metric properties for some diver- gence measures: the Gaussian case,” JMLR Workshop and Conference Proceedings, vol. 25, pp. 1–15, 2012. 4. P. Formont, J. Ovarlez, F. Pascal, and G. Vasile, “On the extension of the product model in POLSAR processing for unsupervised classification using information ge- ometry of covariance matrices,” IEEE International Geoscience and Remote Sens- ing Symposium, pp. 1361–1364, 2011. 5. R. Murthy, I. Pavlidis, and P. Tsiamyrtzis, “Touchless monitoring of breathing function,” IEEE EMBS, pp. 1196–1199, 2004. 6. C. Magnant, A. Giremus, and E. Grivel, “On computing Jeffrey’s divergence between time-varying autoregressive models,” IEEE Signal Processing Letters, vol. 22, issue 7, pp. 915–919, 2014. 7. C. Magnant, A. Giremus, and E. Grivel, “Jeffreys divergence between state models: Application to target tracking using multiple models,” EUSIPCO, pp. 1–5, 2013. 8. W.-C. Yueh, “Explicit inverses of several tridiagonal matrices,” Applied Mathemat- ics E-Notes, pp. 74–83, 2006. Fig. 3: Asymptotic JD increment between two AR processes as a function of the AR parameters, where Ru = 1 and Ru = 3/2 Fig. 4: Asymptotic JD increment between two unit-power AR processes as a function of the AR parameters for any Ru 9. L. Legrand and E. Grivel, “Jeffrey’s divergence between moving-average and au- toregressive models,” IEEE ICASSP, 2017. 10. B. Cernuschi-Frias, “A derivation of the gohberg-semencul relation [signal analy- sis],” IEEE Trans. on Signal Processing, vol. 39, Issue: 1, pp. 190–192, 1991. 11. C. Rao, Information and the accuracy attainable in the estimation of statistical parameters, vol. 37, pp. 81–89. Bull. Calcutta Math. Soc., 1945. 12. S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951. 13. C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. MIT Press, 2006.