On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22340
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

In signal processing, the detection error probability of a random quantity is a fundamental and often dicult problem. In this work, we assume that we observe under the alternative hypothesis a noisy
tensor admitting a Tucker decomposition parametrized by a set of orthonormal factor matrices and a random core tensor of interest with fixed multilinear ranks. The detection of the random entries of the core tensor is hard to study since an analytic expression of the error probability is not tractable. To cope with this diculty, the Cherno Upper Bound (CUB) on the error probability is studied for this tensor-based detection problem. The tightest CUB is obtained for the minimal error exponent value, denoted by s*, that requires a costly numerical optimization algorithm. An alternative strategy to upper bound the error probability is to consider the Bhattacharyya Upper Bound (BUB) by prescribing s* = 1/2. In this case, the costly numerical optimization step is avoided but no guarantee exists on the tightness optimality of the BUB. In this work, a simple analytical expression of s? is provided with respect to the Signal to Noise Ratio (SNR). Associated to a compact expression of the CUB, an easily tractable expression of the tightest CUB is provided and studied. A main conclusion of this work is that the BUB is the tightest bound at low SNRs but this property is no longer true at higher SNRs.

On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices

Collection

application/pdf On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices Rémy Boyer, Frank Nielsen
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

In signal processing, the detection error probability of a random quantity is a fundamental and often dicult problem. In this work, we assume that we observe under the alternative hypothesis a noisy
tensor admitting a Tucker decomposition parametrized by a set of orthonormal factor matrices and a random core tensor of interest with fixed multilinear ranks. The detection of the random entries of the core tensor is hard to study since an analytic expression of the error probability is not tractable. To cope with this diculty, the Cherno Upper Bound (CUB) on the error probability is studied for this tensor-based detection problem. The tightest CUB is obtained for the minimal error exponent value, denoted by s*, that requires a costly numerical optimization algorithm. An alternative strategy to upper bound the error probability is to consider the Bhattacharyya Upper Bound (BUB) by prescribing s* = 1/2. In this case, the costly numerical optimization step is avoided but no guarantee exists on the tightness optimality of the BUB. In this work, a simple analytical expression of s? is provided with respect to the Signal to Noise Ratio (SNR). Associated to a compact expression of the CUB, an easily tractable expression of the tightest CUB is provided and studied. A main conclusion of this work is that the BUB is the tightest bound at low SNRs but this property is no longer true at higher SNRs.
On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices

Auteurs

Information geometry: Dualistic manifold structures and their uses
An elementary introduction to information geometry
k-Means Clustering with Hölder divergences
On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices
Bregman divergences from comparative convexity
session Computational Information Geometry (chaired by Frank Nielsen, Olivier Schwander)
Opening and closing sessions (chaired by Frédéric Barbaresco, Frank Nielsen, Silvère Bonnabel)
GSI'17-Closing session
GSI'17 Opening session
Bag-of-components an online algorithm for batch learning of mixture models
TS-GNPR Clustering Random Walk Time Series
Online k-MLE for mixture modeling with exponential families
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry
Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)
Keynote speach Marc Arnaudon (chaired by Frank Nielsen)
Oral session 6 Foundations and Geometry (John Skilling, Frank Nielsen, Ariel Caticha)
Oral session 5 Bayesian inference (Frank Nielsen, John Skilling, Romke Brontekoe)
Oral session 3 Information geometry (Ariel Caticha, Steeve Zozor, Frank Nielsen)
Tutorial session 2 (Frank Nielsen, Ariel Caticha, Ken H. Knuth)
A new implementation of k-MLE for mixture modelling of Wishart distributions
Hypothesis testing, information divergence and computational geometry
ORAL SESSION 6 Computational Information Geometry (Frank Nielsen)
lncs_8085_cover.pdf
Geometric Science of Information - GSI 2013 Proceedings

Média

Voir la vidéo

Métriques

0
0
289.18 Ko
 application/pdf
bitcache://3bfb968627edc4474931265e566557ebce544ff9

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
gdrmia_logo.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22340</identifier><creators><creator><creatorName>Frank Nielsen</creatorName></creator><creator><creatorName>Rémy Boyer</creatorName></creator></creators><titles>
            <title>On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sun 18 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Mon 10 Dec 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">3bfb968627edc4474931265e566557ebce544ff9</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>36976</version>
        <descriptions>
            <description descriptionType="Abstract">In signal processing, the detection error probability of a random quantity is a fundamental and often dicult problem. In this work, we assume that we observe under the alternative hypothesis a noisy<br />
tensor admitting a Tucker decomposition parametrized by a set of orthonormal factor matrices and a random core tensor of interest with fixed multilinear ranks. The detection of the random entries of the core tensor is hard to study since an analytic expression of the error probability is not tractable. To cope with this diculty, the Cherno Upper Bound (CUB) on the error probability is studied for this tensor-based detection problem. The tightest CUB is obtained for the minimal error exponent value, denoted by s*, that requires a costly numerical optimization algorithm. An alternative strategy to upper bound the error probability is to consider the Bhattacharyya Upper Bound (BUB) by prescribing s* = 1/2. In this case, the costly numerical optimization step is avoided but no guarantee exists on the tightness optimality of the BUB. In this work, a simple analytical expression of s? is provided with respect to the Signal to Noise Ratio (SNR). Associated to a compact expression of the CUB, an easily tractable expression of the tightest CUB is provided and studied. A main conclusion of this work is that the BUB is the tightest bound at low SNRs but this property is no longer true at higher SNRs.
</description>
        </descriptions>
    </resource>
.

On the Error Exponent of a Random Tensor with Orthonormal Factor Matrices Rémy Boyer1 and Frank Nielsen2 ? 1 University of Paris-Sud, L2S, Department of Signals and Statistics, France remy.boyer@l2s.centralesupelec.fr 2 École Polytechnique, LIX, France Frank.Nielsen@acm.org Abstract. In signal processing, the detection error probability of a ran- dom quantity is a fundamental and often difficult problem. In this work, we assume that we observe under the alternative hypothesis a noisy tensor admitting a Tucker decomposition parametrized by a set of or- thonormal factor matrices and a random core tensor of interest with fixed multilinear ranks. The detection of the random entries of the core tensor is hard to study since an analytic expression of the error probability is not tractable. To cope with this difficulty, the Chernoff Upper Bound (CUB) on the error probability is studied for this tensor-based detection problem. The tightest CUB is obtained for the minimal error exponent value, denoted by s? , that requires a costly numerical optimization al- gorithm. An alternative strategy to upper bound the error probability is to consider the Bhattacharyya Upper Bound (BUB) by prescribing s? = 1/2. In this case, the costly numerical optimization step is avoided but no guarantee exists on the tightness optimality of the BUB. In this work, a simple analytical expression of s? is provided with respect to the Signal to Noise Ratio (SNR). Associated to a compact expression of the CUB, an easily tractable expression of the tightest CUB is provided and studied. A main conclusion of this work is that the BUB is the tightest bound at low SNRs but this property is no longer true at higher SNRs. 1 Introduction The theory of tensor decomposition is an important research topic (see for in- stance [1, 2]). They are useful to extract relevant information confined into a small dimensional subspaces from a massive volume of measurements while re- ducing the computational cost. In the context where the measurements are nat- urally modeled according to more than two axes of variations, i.e., in the case of tensors, the problem of obtaining a low rank approximation faces a number of practical and fundamental difficulties. Indeed, even if some aspects of the tensor algebra can be considered as mature, several “obvious” algebraic concepts in the ? This research was partially supported by Labex DigiCosme (project ANR-11- LABEX-0045-DIGICOSME) operated by ANR as part of the program “Investisse- ment d’Avenir” Idex Paris-Saclay (ANR-11-IDEX-0003-02). matrix case such as decomposition uniqueness, rank, or the notions of singular and eigen-values remain active and challenging research areas [3]. The Tucker decomposition [4] and the HOSVD (High-Order SVD) [5] are two popular de- compositions being an alternative to the Canonical Polyadic decomposition [6]. In this case, the notion of tensorial rank is no longer relevant and an alterna- tive rank definition is used. Specifically, it is standard to use the multilinear ranks defined as the set of strictly positive integers {R1, R2, R3} where Rp is the usual rank (in the matrix sense) of the p-th mode or unfolding matrix. Its practical construction is non-iterative and optimal in the sense of the Eckart- Young theorem at each mode level. This approach is interesting because it can be computed in real time [7] or adaptively [8]. Unfortunately, it is shown that the fixed (multilinear) rank tensor based on this procedure is generally suboptimal in the Fröbenius norm sense [5]. In other words, there does not exist a gener- alization of the Eckart-Young theorem for tensor of order strictly greater than two. Despite of this theoretical singularity, we focus our effort in the detection performance of a given multilinear rank tensor following the Tucker model with orthonormal factor matrices, leading to the HOSVD. It is important to note that the detection theory for tensors is an under-studied research topic. To the best of our knowledge, only the publication [9] tackles this problem in the con- text of RADAR multidimensional data detection. A major difference with this publication is that their analysis is dedicated to the performance of a low rank detection after matched filtering. More specifically, the goal is to decompose a 3-order tensor X of size N1 × N2 × N3 into a core tensor denoted by S of size R1 × R2 × R3 and into three rank-Rp orthonormal factor matrices {Φ1, Φ2, Φ3} each of size Np × Rp with Rp < Np, ∀p. For zero-mean independent Gaussian core and noise tensors, a key discriminative parameter is the Signal to Noise Ra- tio defined by SNR = σ2 s /σ2 where σ2 s and σ2 are the variances of the vectorized core and noise tensors, respectively. The binary hypothesis test can be described under the null hypothesis H0 : SNR = 0 (i.e., only the noise is present) and the alternative hypothesis H1 : SNR 6= 0 (i.e., there exists a signal of interest). First note that the exact derivation of the analytical expression of the error probability is not tractable in an analytical way even in the matrix case [10]. To tackle this problem, we adopt an information-geometric characterization of the detection performance [11, 12]. 2 Chernoff information framework 2.1 The Bayes’ detection theory Let Pr(Hi) be the a priori hypothesis probability with Pr(H0) + Pr(H1) = 1. Let pi(y) = p(y|Hi) and Pr(Hi|y) be the i-th conditional and the posterior probabilities, respectively. The Bayes’ detection rule chooses the hypothesis Hi associated with the largest posterior probability Pr(Hi|y). Introduce the indica- tor hypothesis function according to φ(y) ∼ Bernou(α), where Bernou(α) stands for the Bernoulli distribution of success probability α = Pr(φ(y) = 1) = Pr(H1). Function φ(y) is defined on X → {0, 1} where X is the data-set enjoying the following decomposition X = X0 ∪ X1 where X0 = {y : φ(y) = 0} = X \ X1 and X1 = {y : φ(y) = 1} =  y : Ω(y) = log Pr(H1|y) Pr(H0|y) > 0  =  y : Λ(y) = log p1(y) p0(y) > log τ  in which τ = 1−α α , Ω(y) is the log posterior-odds ratio and Λ(y) is the log- likelihood ratio. The average error probability is defined as P(N) e = E{Pr(Error|y)}, (1) with Pr(Error|y) =  Pr(H0|y) if y ∈ X1, Pr(H1|y) if y ∈ X0. 2.2 Chernoff Upper Bound (CUB) Using the fact that min {a, b} ≤ as b1−s with s ∈ (0, 1) and a, b > 0 in eq. (1), the minimal error probability is upper bounded as follows P(N) e ≤ α τs · Z X p0(y)1−s p1(y)s dy = α τs · exp[−µs], (2) where the Chernoff s-divergence is defined according to µs = − log MΛ(y|H0)(s), with MΛ(y|H0)(s) the M oment Generating Function (mgf) of the log-likelihood ratio under the null hypothesis. The Chernoff Upper Bound (CUB) of the error probability is given by P(N) e ≤ α τs · exp[−µs] = α · exp[−rs] (3) where rs = µs + s log τ is the exponential decay rate of the CUB. Assume that an optimal value of s denoted by s? ∈ (0, 1) exists then the tightest CUB verifies P(N) e ≤ α · exp[−rs? ] < α · exp[−rs]. The above condition is equivalent to maximize the exponential decay rate, i.e., s? = arg max s∈(0,1) rs. (4) Finally using eq. (3) and eq. (4), we obtain the Chernoff Upper Bound. The Bhattacharyya Upper Bound (BUB) is obtained by eq. (3) by fixing s = 1/2 instead of solving eq. (4). The two bounds verify the following inequality relation: P(N) e ≤ α · exp[−rs? ] ≤ α · exp h −r1 2 i . Note that in many encountered problems, the two hypothesis are assumed to be equi-probable, i.e., α = 1/2 and τ = 1. Then the exponential decay rate is in this scenario given by the Chernoff s-divergence since rs = µs and the tightest CUB is P(N) e ≤ 1 2 exp[−µs? ]. 3 Tensor detection with orthonormal factors 3.1 Binary hypothesis test formulation for random tensors Tucker model with orthonormal factors. Assume that the multidimen- sional measurements follow a noisy 3-order tensor of size N1 × N2 × N3 given by Y = X + N where N is the N1 × N2 × N3 is the noise tensor where each entry is centered i.i.d. Gaussian, i.e. [N ]n1,n2,n3 ∼ N(0, σ2 ) and X = S ×1 Φ1 ×2 Φ2 ×3 Φ3 (5) is the N1 × N2 × N3 “data” tensor following a Tucker model of (R1, R2, R3)- multilinear rank. Matrices {Φ1, Φ2, Φ3} are the three orthonormal factors each of size Np × Rp with Np > Rp. These factors are for instance involved in the Higher-Order SVD (HOSVD) [5] with ΦT p Φp = IRp and Πp = ΦpΦT p a Np × Np orthogonal projector on the range space of Φp. The R1 × R2 × R3 core tensor is given by S = X ×1 ΦT 1 ×2 ΦT 2 ×3 ΦT 3 . Formulating the detection test. We assume that each entry of the core tensor is centered i.i.d. Gaussian, i.e. [S]r1,r2,r3 ∼ N(0, σ2 s ). Let Yn be the n-th frontal N1 × N2 slab of the 3-order tensor Y, the vectorized tensor expression is defined according to y =  (vecY1)T . . . (vecYN3 )T T = x + n ∈ RN×1 where N = N1 · N2 · N3, n is the vectorization of the noise tensor N and x = Φs with s the vectorization of the core tensor S and Φ = Φ3 ⊗ Φ2 ⊗ Φ1 is a N × R structured matrix with R = R1 · R2 · R3. In this framework, the associated equi-probable binary hypothesis test for the detection of the random signal, s, is ( H0 : y Φ, σ2 ∼ N 0, Σ0 = σ2 IN  , H1 : y Φ, σ2 ∼ N  0, Σ1 = σ2 (SNR · Π + IN )  where SNR = σ2 s /σ2 is the signal to noise ratio and Π = Π3 ⊗ Π2 ⊗ Π1 is an orthogonal projector. The performance of the detector of interest is quite often difficult to determine analytically [10]. As a consequence, we adopt the methodology of the CUB to upper bound it. Geometry of the expected log-likelihood ratio. Consider p(y Ĥ) = N (0, Σ) associated to the estimated hypothesis Ĥ. The expected log-likelihood ratio over y Ĥ is given by E y Ĥ Λ(y) = Z X p(y Ĥ) log p1(y) p0(y) dy = Z X p(y Ĥ) log   p(y Ĥ) p0(y) · p(y Ĥ) p1(y) !−1   dy = KL(Ĥ, H0) − KL(Ĥ, H1) where the Kullback-Leibler pseudo-distances are KL(Ĥ, H0) = Z X p(y Ĥ) log p(y Ĥ) p0(y) dy, KL(Ĥ, H1) = Z X p(y Ĥ) log p(y Ĥ) p1(y) dy. The corresponding data-space for hypothesis H1 is X1 = {y : Λ(y) > τ0 } with Λ(y) = yT (Σ−1 0 − Σ−1 1 )y = 1 σ2 yT Φ  ΦT Φ + SNR · I −1 ΦT y τ0 = log det(Σ0) det(Σ1 ) = − log det (SNR · Π + IN ) where det(·) stands for the determinant. Thus, the alternative hypothesis is selected, i.e., Ĥ = H1 if E y Ĥ Λ(y) = 1 σ2 Tr  ΦT Φ + SNR · I −1 ΦT ΣΦ  > τ0 or equivalently KL(Ĥ, H0) > τ0 + KL(Ĥ, H1). 4 Tightest CUB 4.1 Derivation of the bound Theorem 1. Let c = R/N < 1. The Chernoff s-divergence for the above test is given by µs = c 2 ((1 − s) · log(SNR + 1) − log (SNR · (1 − s) + 1)) . Proof. According to [13], the Chernoff s-divergence for the above test is given by µs = 1 − s 2N log det (SNR · Π + I) − 1 2N log det (SNR · (1 − s)Π + I) . Using λ{Π} = {1, . . . , 1 | {z } R , 0, . . . , 0 | {z } N−R } in the above expression yields Theorem 1. Theorem 2. 1. The Chernoff s-divergence is a strictly convex function and admits an unique minimizer given by s? = 1 SNR  1 + SNR − 1 ψ(SNR)  (6) where ψ(SNR) = log(SNR+1) SNR . 2. The tightest CUB for the (R1, R2, R3)-multilinear rank orthonormal Tucker decomposition of eq. (5) is given by µs? = c 2  1 − ψ(SNR) + log ψ(SNR)  . Proof. The proof is straightforward and thus omitted due to the lack of space. -40 -30 -20 -10 0 10 20 30 40 SNR [dB] 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 s ⋆ Numerical optim. Analytical solution s → 0.5 (BUB) s → 1 Fig. 1. Optimal s-value vs SNR in dB for a (3, 3, 3)-multilinear rank tensor X of size 4 × 4 × 4: The exact analytical formula is in full agreement with the numerical approximation scheme. 4.2 Analysis in typical limit regimes We can identify the two following limit scenarii: – At low SNR, the tightest divergence, denoted by µs? , coincides with the divergence µ1/2 associated with the BUB. Indeed, the optimal value in (6) admits a second-order approximation for SNR  1 according to s? ≈1 + 1 SNR  1 −  1 + SNR 2  = 1 2 . – At contrary for SNR  1, we have s? ≈ 1. So, the BUB is a loose bound in this regime. To illustrate our analysis, on Fig. 1, the optimal s value obtained thanks to a numerical optimization of eq. (4) using the divergence given in Theorem 1 and the analytical solution reported in eq. (6) are plotted. We can check that the predicted analytical optimal s-value is in agreement with the approximated numerical one. We also verify the s-value in the the low and high SNR regimes. In particular, for high SNRs, the optimal value is far from 1/2. 5 Conclusion Performance detection in terms of the minimal Bayes’ error probability for multi- dimensional measurements is a fundamental problem at the heart of many chal- lenging applications. Interestingly, this tensor detection problem has received little attention so far. In this work, we derived analytically a tightest upper bound on the minimal Bayes’ error probability for the detection of a random core tensor denoted by S given a N1 × N2 × N3 noisy observation tensor X fol- lowing an orthonormal Tucker model with a (R1, R2, R3)-multilinear rank with Rp < Np, ∀p. In particular, we showed that the tightest upper bound in the high SNR regime is not the Bhattacharyya upper bound. References 1. Comon, P.: Tensors: A brief introduction. IEEE Signal Processing Magazine 31(3) (2014) 44–53 2. Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Processing Magazine 32(2) (2015) 145–163 3. Chang, K., Qi, L., Zhang, T.: A survey on the spectral theory of nonnegative tensors. Numerical Linear Algebra with Applications 20(6) (2013) 891–912 4. Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychome- trika 31(3) (1966) 279–311 5. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications 21(4) (2000) 1253–1278 6. Bro, R.: Parafac: Tutorial and applications. Chemometrics and intelligent labora- tory systems 38(2) (1997) 149–171 7. Badeau, R., Boyer, R.: Fast multilinear singular value decomposition for structured tensors. SIAM Journal on Matrix Analysis and Applications 30(3) (2008) 1008– 1021 8. Boyer, R., Badeau, R.: Adaptive multilinear SVD for structured tensors. In: IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. Volume 3., IEEE (2006) III–III 9. Boizard, M., Ginolhac, G., Pascal, F., Forster, P.: Low-rank filter and detector for multidimensional data based on an alternative unfolding HOSVD: application to polarimetric STAP. EURASIP Journal on Advances in Signal Processing 2014(1) (2014) 1–14 10. Kay, S.M.: Fundamentals of statistical signal processing: Estimation theory. PTR Prentice-Hall, Englewood Cliffs, NJ (1993) 11. Nielsen, F., Bhatia, R.: Matrix information geometry. Springer (2013) 12. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons (2012) 13. Boyer, R., Nielsen, F.: Information geometry metric for random signal detection in large random sensing systems. ICASSP (2017)