Hypothesis testing, information divergence and computational geometry

28/08/2013
Auteurs : Frank Nielsen
OAI : oai:www.see.asso.fr:2552:5066
DOI : You do not have permission to access embedded form.

Résumé

Hypothesis testing, information divergence and computational geometry

Métriques

96
4
195.3 Ko
 application/pdf
bitcache://0b54cd8b1fb24e30a95215d7c6742f8d571abed5

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors financier

logo_gdr-mia.png
logo_inria.png
image010.png
logothales.jpg

Sponsors logistique

logo-minesparistech.jpg
logo-universite-paris-sud.jpg
logo_supelec.png
Séminaire Léon Brillouin Logo
logo_cnrs_2.jpg
logo_ircam.png
logo_imb.png
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/2552/5066</identifier><creators><creator><creatorName>Frank Nielsen</creatorName></creator></creators><titles>
            <title>Hypothesis testing, information divergence and computational geometry</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2013</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sat 5 Oct 2013</date>
	    <date dateType="Updated">Mon 25 Jul 2016</date>
            <date dateType="Submitted">Sat 17 Feb 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">0b54cd8b1fb24e30a95215d7c6742f8d571abed5</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>25055</version>
        <descriptions>
            <description descriptionType="Abstract"></description>
        </descriptions>
    </resource>
.

Hypothesis testing, information divergence and computational geometry Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. August 2013, GSI, Paris, FR c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/20 The Multiple Hypothesis Testing (MHT) problem Given a rv. X with n hypothesis H1 : X ∼ P1, ..., Hn : X ∼ Pn, decide for a IID sample x1, ..., xm ∼ X which hypothesis holds true? Pm correct = 1 − Pm error Asymptotic regime: α = − 1 m log Pm e , m → ∞ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/20 Bayesian hypothesis testing (preliminaries) prior probabilities: wi = Pr(X ∼ Pi ) > 0 (with n i=1 wi = 1) conditional probabilities: Pr(X = x|X ∼ Pi ). Pr(X = x) = n i=1 Pr(X ∼ Pi )Pr(X = x|X ∼ Pi ) = n i=1 wi Pr(X|Pi ) Let ci,j = cost of deciding Hi when in fact Hj is true. Matrix [cij ]= cost design matrix Let pi,j(u) = probability of making this decision using rule u. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/20 Bayesian detector Minimize the expected cost: EX [c(r(x))], c(r(x)) = i  wi j=i ci,jpi,j(r(x))   Special case: Probability of error Pe obtained for ci,i = 0 and ci,j = 1 for i = j: Pe = EX   i  wi j=i pi,j(r(x))     The maximum a posteriori probability (MAP) rule considers classifying x: MAP(x) = argmaxi∈{1,...,n} wi pi (x) where pi (x) = Pr(X = x|X ∼ Pi ) are the conditional probabilities. → MAP Bayesian detector minimizes Pe over all rules [8] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/20 Probability of error and divergences Without loss of generality, consider equal priors ( w1 = w2 = 1 2): Pe = x∈X p(x) min(Pr(H1|x), Pr(H2|x))dν(x) (Pe > 0 as soon as suppp1 ∩ suppp2 = ∅) From Bayes’ rule Pr(Hi |X = x) = Pr(Hi )Pr(X=x|Hi ) Pr(X=x) = wi pi (x)/p(x) Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) Rewrite or bound Pe using tricks of the trade: Trick 1. ∀a, b ∈ R, min(a, b) = a+b 2 − |a−b| 2 , Trick 2. ∀a, b > 0, min(a, b) ≤ minα∈(0,1) aαb1−α, c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/20 Probability of error and total variation Pe = 1 2 x∈X p1(x) + p2(x) 2 − |p1(x) − p2(x)| 2 dν(x), = 1 2 1 − 1 2 x∈X |p1(x) − p2(x)|dν(x) Pe = 1 2 (1 − TV(P1, P2)) total variation metric distance: TV(P, Q) = 1 2 x∈X |p(x) − q(x)|dν(x) → Difficult to compute when handling multivariate distributions. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/20 Bounding the Probability of error Pe min(a, b) ≤ minα∈(0,1) aαb1−α for a, b > 0, upper bound Pe: Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) ≤ 1 2 min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x). C(P1, P2) = − log min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x) ≥ 0, Best error exponent α∗ [7]: Pe ≤ wα∗ 1 w1−α∗ 2 e−C(P1,P2) ≤ e−C(P1,P2) Bounding technique can be extended using any quasi-arithmetic α-means [13, 9]... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/20 Computational information geometry Exponential family manifold [4]: M = {pθ | pθ(x) = exp(t(x)⊤ θ − F(θ))} Dually flat manifolds [1] enjoy dual affine connections [1]: (M, ∇2F(θ), ∇(e), ∇(m)). η = ∇F(θ), θ = ∇F∗ (η) Canonical divergence from Young inequality: A(θ1, η2) = F(θ1) + F∗ (η2) − θ⊤ 1 η2 ≥ 0 F(θ) + F∗ (η) = θ⊤ η c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/20 MAP decision rule and additive Bregman Voronoi diagrams KL(pθ1 : pθ2 ) = B(θ2 : θ1) = A(θ2 : η1) = A∗ (η1 : θ2) = B∗ (η1 : η2) Canonical divergence (mixed primal/dual coordinates): A(θ2 : η1) = F(θ2) + F∗ (η1) − θ⊤ 2 η1 ≥ 0 Bregman divergence (uni-coordinates, primal or dual): B(θ2 : θ1) = F(θ2) − F(θ1) − (θ2 − θ1)⊤ ∇F(θ1) log pi (x) = −B∗ (t(x) : ηi ) + F∗ (t(x)) + k(x), ηi = ∇F(θi ) = η(Pθi ) Optimal MAP decision rule: MAP(x) = argmaxi∈{1,...,n}wi pi (x) = argmaxi∈{1,...,n} − B∗ (t(x) : ηi ) + log wi , = argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi → nearest neighbor classifier [2, 10, 15, 16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/20 MAP & nearest neighbor classifier Bregman Voronoi diagrams (with additive weights) are affine diagrams [2]. argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi ◮ point location in arrangement [3] (small dims), ◮ Divergence-based search trees [16], ◮ GPU brute force [6]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/20 Geometry of the best error exponent: binary hypothesis On the exponential family manifold, Chernoff α-coefficient [5]: cα(Pθ1 : Pθ2 ) = pα θ1 (x)p1−α θ2 (x)dµ(x) = exp(−J (α) F (θ1 : θ2)), Skew Jensen divergence [14] on the natural parameters: J (α) F (θ1 : θ2) = αF(θ1) + (1 − α)F(θ2) − F(θ (α) 12 ), Chernoff information = Bregman divergence for exponential families: C(Pθ1 : Pθ2 ) = B(θ1 : θ (α∗) 12 ) = B(θ2 : θ (α∗) 12 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/20 Geometry of the best error exponent: binary hypothesis Chernoff distribution P∗ [12]: P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) e-geodesic: Ge(P1, P2) = {E (λ) 12 | θ(E (λ) 12 ) = (1 − λ)θ1 + λθ2, λ ∈ [0, 1]}, m-bisector: Bim(P1, P2) : {P | F(θ1) − F(θ2) + η(P)⊤ ∆θ = 0}, Optimal natural parameter of P∗: θ∗ = θ (α∗) 12 = argminθ∈ΘB(θ1 : θ) = argminθ∈ΘB(θ2 : θ). → closed-form for order-1 family, or efficient bisection search. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/20 Geometry of the best error exponent: binary hypothesis P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) pθ1 pθ2 pθ∗ 12 m-bisector e-geodesic Ge(Pθ1 , Pθ2 ) η-coordinate system Pθ∗ 12 C(θ1 : θ2) = B(θ1 : θ∗ 12) Bim(Pθ1 , Pθ2 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/20 Geometry of the best error exponent: multiple hypothesis n-ary MHT [8] from minimum pairwise Chernoff distance: C(P1, ..., Pn) = min i,j=i C(Pi , Pj ) Pm e ≤ e−mC(Pi∗ ,Pj∗ ) , (i∗ , j∗ ) = argmini,j=iC(Pi , Pj ) Compute for each pair of natural neighbors [3] Pθi and Pθj , the Chernoff distance C(Pθi , Pθj ), and choose the pair with minimal distance. (Proof by contradiction using Bregman Pythagoras theorem.) → Closest Bregman pair problem (Chernoff distance fails triangle inequality). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/20 Hypothesis testing: Illustration η-coordinate system Chernoff distribution between natural neighbours c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/20 Summary Bayesian multiple hypothesis testing... ... from the viewpoint of computational geometry. ◮ probability of error & best MAP Bayesian rule ◮ total variation & Pe, upper-bounded by the Chernoff distance. ◮ Exponential family manifolds: ◮ MAP rule = NN classifier (additive Bregman Voronoi diagram) ◮ best error exponent from intersection geodesic/bisector for binary hypothesis, ◮ best error exponent from closest Bregman pair for multiple hypothesis. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/20 Thank you 28th-30th August, Paris. @incollection{HTIGCG-GSI-2013, year={2013}, booktitle={Geometric Science of Information}, volume={8085}, series={Lecture Notes in Computer Science}, editor={Frank Nielsen and Fr\’ed\’eric Barbaresco}, title={Hypothesis testing, information divergence and computational geometry}, publisher={Springer Berlin Heidelberg}, author={Nielsen, Frank}, pages={241-248} } c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/20 Bibliographic references I Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Oxford University Press, 2000. Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock. Bregman Voronoi diagrams. Discrete & Computational Geometry, 44(2):281–307, 2010. Jean-Daniel Boissonnat and Mariette Yvinec. Algorithmic Geometry. Cambridge University Press, New York, NY, USA, 1998. Lawrence D. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Institute of Mathematical Statistics, Hayworth, CA, USA, 1986. Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. Vincent Garcia, Eric Debreuve, Frank Nielsen, and Michel Barlaud. k-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In IEEE International Conference on Image Processing (ICIP), pages 3757–3760, 2010. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/20 Bibliographic references II Martin E. Hellman and Josef Raviv. Probability of error, equivocation and the Chernoff bound. IEEE Transactions on Information Theory, 16:368–372, 1970. C. C. Leang and D. H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, January 1997. Frank Nielsen. Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. submitted, 2012. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2012. preliminary, technical report on arXiv. Frank Nielsen. Hypothesis testing, information divergence and computational geometry. In Frank Nielsen and Fr´ed´eric Barbaresco, editors, Geometric Science of Information, volume 8085 of Lecture Notes in Computer Science, pages 241–248. Springer Berlin Heidelberg, 2013. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/20 Bibliographic references III Frank Nielsen. An information-geometric characterization of Chernoff information. IEEE Signal Processing Letters (SPL), 20(3):269–272, March 2013. Frank Nielsen. Pattern learning and recognition on statistical manifolds: An information-geometric review. In Edwin Hancock and Marcello Pelillo, editors, Similarity-Based Pattern Recognition, volume 7953 of Lecture Notes in Computer Science, pages 1–25. Springer Berlin Heidelberg, 2013. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Paolo Piro, and Michel Barlaud. Bregman vantage point trees for efficient nearest neighbor queries. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (ICME), pages 878–881, 2009. Paolo Piro, Frank Nielsen, and Michel Barlaud. Tailored Bregman ball trees for effective nearest neighbors. In European Workshop on Computational Geometry (EuroCG), LORIA, Nancy, France, March 2009. IEEE. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/20