Bag-of-components an online algorithm for batch learning of mixture models

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14304

Résumé

Practical estimation of mixture models may be problematic when a large number of observations are involved: for such cases, online versions of Expectation-Maximization may be preferred, avoiding the need to store all the observations before running the algorithms. We introduce a new online method well-suited when both the number of observations is large and lots of mixture models need to be learned from different sets of points. Inspired by dictionary methods, our algorithm begins with a training step which is used to build a dictionary of components. The next step, which can be done online, amounts to populating the weights of the components given each arriving observation. The usage of the dictionary of components shows all its interest when lots of mixtures need to be learned using the same dictionary in order to maximize the return on investment of the training step. We evaluate the proposed method on an artificial dataset built from random Gaussian mixture models.

Bag-of-components an online algorithm for batch learning of mixture models

Collection

application/pdf Bag-of-components an online algorithm for batch learning of mixture models Olivier Schwander, Frank Nielsen

Auteurs

Média

Voir la vidéo

Métriques

212
5
526.53 Ko
 application/pdf
bitcache://a0ade68cb4c857af072c18df40cc0d97af58e1ea

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14304</identifier><creators><creator><creatorName>Frank Nielsen</creatorName></creator><creator><creatorName>Olivier Schwander</creatorName></creator></creators><titles>
            <title>Bag-of-components an online algorithm for batch learning of mixture models</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sun 8 Nov 2015</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Fri 13 Jul 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">a0ade68cb4c857af072c18df40cc0d97af58e1ea</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>24695</version>
        <descriptions>
            <description descriptionType="Abstract">
Practical estimation of mixture models may be problematic when a large number of observations are involved: for such cases, online versions of Expectation-Maximization may be preferred, avoiding the need to store all the observations before running the algorithms. We introduce a new online method well-suited when both the number of observations is large and lots of mixture models need to be learned from different sets of points. Inspired by dictionary methods, our algorithm begins with a training step which is used to build a dictionary of components. The next step, which can be done online, amounts to populating the weights of the components given each arriving observation. The usage of the dictionary of components shows all its interest when lots of mixtures need to be learned using the same dictionary in order to maximize the return on investment of the training step. We evaluate the proposed method on an artificial dataset built from random Gaussian mixture models.

</description>
        </descriptions>
    </resource>
.

Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Université Pierre et Marie Curie, Paris, France École polytechnique, Palaiseau, France October 29, 2015 1 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Exponential families Definition p(x; λ) = pF (x; θ) = exp ( t(x)|θ − F(θ) + k(x)) λ source parameter t(x) sufficient statistic θ natural parameter F(θ) log-normalizer k(x) carrier measure F is a stricly convex and differentiable function ·|· is a scalar product 2 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Multiple parameterizations: dual parameter spaces Legendre Transform (F, Θ) ↔ (F , H) θ ∈ Θ Natural Parameters η ∈ H Expectation Parameters θ = F (η) η = F(θ) Source Parameters (not unique) λ1 ∈ Λ1, λ2 ∈ Λ2, . . . , λn ∈ Λn Multiple source parameterizations Two canonical parameterizations 3 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman divergences Definition and properties BF (x y) = F(x) − F(y) − x − y, F(y) F is a stricly convex and differentiable function No symmetry! Contains a lot of common divergences Squared Euclidean, Mahalanobis, Kullback-Leibler, Itakura-Saito. . . 4 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman centroids Left-sided centroid min c i ωi BF (c xi ) Right-sided centroid min c i ωi BF (xi c) Closed-form cL = F∗ i ωi F(xi ) cR = i ωi xi 5 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Link with exponential families [Banerjee 2005] Bijection with exponential families log pF (x|θ) = −BF∗ (t(x) η) + F∗ (t(x)) + k(x) Kullback-Leibler between exponential families between members of the same exponential family KL(pF (x, θ1), pF (x, θ2)) = BF (θ2 θ1) = BF (η1 η2) Kullback-Leibler centroids In closed-form through the Bregman divergence 6 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Maximum likelihood estimator A Bregman centroid ˆη = arg max η i log pF (xi , η) = arg min η i BF∗ (t(xi ) η) −F∗ (t(xi )) − k(xi ) does not depend on η = arg min η i BF∗ (t(xi ) η) = i t(xi ) And ˆθ = F (ˆη) 7 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Mixtures of exponential families m(x; ω, θ) = 1≤i≤k ωi pF (x; θi ) Fixed Family of the components PF Number of components k (model selection techniques to choose) Parameters Weights i ωi = 1 Component parameters θi Learning a mixture Input: observations x1, . . . , xN Output: ωi and θi 8 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman Soft Clustering: EM for exponential families [Banerjee 2005] E-step p(i, j) = ωjpF (xi , θj) m(xi ) M-step ηj = arg max η i p(i, j) log pF (xi , θj) = arg min η i p(i, j)   BF∗ (t(xi ) η) −F∗ (t(xi )) − k(xi ) does not depend on η    = i p(i, j) u p(u, j) t(xu) 9 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Joint estimation of mixture models Exploit shared information between multiple pointsets to improve quality to improve speed Inspiration Dictionary methods Transfer learning Efficient algorithms Building Comparing 10 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Co-Mixtures Sharing components of all the mixtures m1(x|ω(1) , η) = k i=1 ω (1) i pF (x| ηj) . . . mS(x|ω(S) , η) = k i=1 ω (S) i pF (x| ηj) Same η1 . . . ηk everywhere Different weights ω(l) 11 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications co-Expectation-Maximization Maximize the mean of the likelihoods on each mixtures E-step A posterior matrix for each dataset p(l) (i, j) = ω (l) j pF (xi , θj) m(x (l) i |ω(l), η) M-step Maximization on each dataset η (l) j = i p(i, j) u p(l)(u, j) t(x(l) u ) Aggregation ηj = 1 S S l=1 η (l) j 12 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Variational approximation of Kullback-Leibler [Hershey Olsen 2007] KLVariationnal(m1, m2) = K i=1 ω (1) i log j ω (1) j e−KL(pF (·; θi ) pF (·; θj )) j ω (2) j e−KL(pF (·; θi ) pF (·; θj )) With shared parameters Precompute Dij = e−KL(pF (·| ηi ),pF (·| ηj )) Fast version KLvar(m1 m2) = i ω (1) i log j ω (1) j e−Dij j ω (2) j e−Dij 13 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications co-Segmentation Segmentation from 5D RGBxy mixtures Original EM Co-EM 14 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Transfer learning Increase the quality of one particular mixture of interest First image: only 1% of the points Two other images: full set of points Not enough points for EM 15 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Bag of Components Training step Comix on some training set Keep the parameters Costly but offline D = {θ1, . . . , θK } Online learning of mixtures For a new pointset For each observation arriving: arg max θ∈D pF (xj, θ) or arg min θ∈D BF (t(xj), θ) 16 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Nearest neighbor search Naive version Linear search O(number of samples × number of components) Same order of magnitude as one step of EM Improvement Computational Bregman Geometry to speed-up the search Bregman Ball Trees Hierarchical clustering Approximate nearest neighbor 17 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Image segmentation Segmentation on a random subset of the pixels 100% 10% 1% EM BoC 18 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Computation times Training 100% 10% 1% 0 20 40 60 80 100 120 Training EM BoC 19 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Summary Comix Mixtures with shared components Compact description of a lot of mixtures Fast KL approximations Dictionary-like methods Bag of Components Online method Predictable time (no iteration) Works with only a few points Fast 20 / 20