Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14262

Résumé

We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Collection

application/pdf Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott, Radka Sabolova, Germain Van Bever, Frank Critchley

Média

Voir la vidéo

Métriques

69
12
1.53 Mo
 application/pdf
bitcache://3075c0fc40370b1da2fe55cd93663ed5115d2715

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14262</identifier><creators><creator><creatorName>Paul Marriott</creatorName></creator><creator><creatorName>Frank Critchley</creatorName></creator><creator><creatorName>Radka Sabolova</creatorName></creator><creator><creatorName>Germain Van Bever</creatorName></creator></creators><titles>
            <title>Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sat 7 Nov 2015</date>
	    <date dateType="Updated">Sat 15 Oct 2016</date>
            <date dateType="Submitted">Fri 20 Apr 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">3075c0fc40370b1da2fe55cd93663ed5115d2715</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>29580</version>
        <descriptions>
            <description descriptionType="Abstract">
We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.

</description>
        </descriptions>
    </resource>
.

Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 Cressie-Read power divergence λ-family - equivalent to Amari’s α-family asymptotic properties of two test statistics: Pearson’s χ2-test and deviance simulation study for other statistics within power divergence family 3 k-asymptotics instead of N-asymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, High-Dimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. continuous data - High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis - example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in Log-Linear Models Despite the widespread usage of these [log-linear] models, the applicability and statistical properties of log-linear models under sparse settings are still very poorly understood. As a result, even though high-dimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difficult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodness-of-fit test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - theory Theorem For k > 1 and N ≥ 6, the first three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for fixed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Define the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) define ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions - Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of k-asymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does k-asymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp   k i=0    ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h      Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 Cressie-Read recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 Cressie-Read, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= -1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= -3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary - key points 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 k-asymptotics instead of N-asymptotics 3 Cressie-Read power divergence λ-family asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. Anaya-Izquierdo, F. Critchley, and P. Marriott (2014): When are first order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. Anaya-Izquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in log-linear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efficiency for certain goodnes-of-fit tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling