Computational Information Geometry: mixture modelling

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14268
DOI :

Résumé

Computational Information Geometry: mixture modelling

Collection

application/pdf Computational Information Geometry: mixture modelling Germain Van Bever, Radka Sabolova, Frank Critchley, Paul Marriott

Média

Voir la vidéo

Métriques

196
9
11.03 Mo
 application/pdf
bitcache://b272a9a9ec0cd0c1441fb8aba15fe23bce72c831

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14268</identifier><creators><creator><creatorName>Paul Marriott</creatorName></creator><creator><creatorName>Frank Critchley</creatorName></creator><creator><creatorName>Radka Sabolova</creatorName></creator><creator><creatorName>Germain Van Bever</creatorName></creator></creators><titles>
            <title>Computational Information Geometry: mixture modelling</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sat 7 Nov 2015</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Fri 20 Jul 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">b272a9a9ec0cd0c1441fb8aba15fe23bce72c831</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>24661</version>
        <descriptions>
            <description descriptionType="Abstract"></description>
        </descriptions>
    </resource>
.

Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 28-30 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of affine connections { −1 , +1 } (mixture and exponential connections). These allow to define notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two affine structures (maps on M) are used simultaneously: -1: Mixture affine geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential affine geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifold-based systems to simplex-based geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efficient algorithms in a computational framework. It is inherently finite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in -1-space π1 π2 -6 -4 -2 0 2 4 6 -6-4-20246 mixed geodesics in +1-space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the log-likelihood function. The log-likelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1-representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an infinite-dimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {X|X ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2-simplex. Right: Embedded binomial (k = 3) in the 3-simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefficient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates well-known difficulties: Identifiability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore defined on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1-geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)-dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efficient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows low-dimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a n-dim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with low-dimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1-convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1-convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efficient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple one-parameter binomial [...] models generally provides poor fits to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Define the norm ||π||π0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Define the distance function as d(π(θ), π0) := inf π∈∪Si ||π(θ) − π||π0 . Theorem (Anaya-Izquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N||(ˆπG − ˆπNP MLE )||ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary High-dimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, S-I (1985), Differential-geometrical methods in statistics, Springer-Verlag. Anaya-Izquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 1197-1224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 2454-2471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582-600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 69-76. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 77-93. Germain Van Bever CIG for mixtures 19/19