A kernel view on manifold sub-sampling based on Karcher variance optimization

28/08/2013
OAI : oai:www.see.asso.fr:2552:4909
DOI : You do not have permission to access embedded form.

Résumé

A kernel view on manifold sub-sampling based on Karcher variance optimization

Média

Voir la vidéo

Métriques

592
158
8.58 Mo
 application/pdf
bitcache://aaa97c2c15cb96b835ab214efc3a314702e88531

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors financier

logo_gdr-mia.png
logo_inria.png
image010.png
logothales.jpg

Sponsors logistique

logo-minesparistech.jpg
logo-universite-paris-sud.jpg
logo_supelec.png
Séminaire Léon Brillouin Logo
logo_cnrs_2.jpg
logo_ircam.png
logo_imb.png
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/2552/4909</identifier><creators><creator><creatorName>Nicolas Courty</creatorName></creator><creator><creatorName>Thomas Burger</creatorName></creator></creators><titles>
            <title>A kernel view on manifold sub-sampling based on Karcher variance optimization</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2013</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Tue 17 Sep 2013</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Sat 17 Feb 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">aaa97c2c15cb96b835ab214efc3a314702e88531</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>15064</version>
        <descriptions>
            <description descriptionType="Abstract"></description>
        </descriptions>
    </resource>
.

Geometric Science of Information 2013, Paris, France A kernel view on manifold sub-sampling based on Karcher variance optimization Nicolas Courty1 and Thomas Burger2 1 IRISA/University of Bretagne Sud, France 2 CEA/CNRS, Grenoble, France Motivations of the sampling problem Sampling in the feature space Results The sampling problem Some big-data problem In several real-world applications, the number of available data is rising dramatically, mainly because of several enhancements in: captor precision and frequencies data storage and transportation Those data contains a lot of knowledge, but its extraction raise a computational problem, because the traditional algorithms often fail to scale up Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem This computational burden can be alleviated by working on a subset of the input data. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem This computational burden can be alleviated by working on a subset of the input data. This is the so-called sampling problem. SUBSAMPLING Of course, this subset should be chosen so that to respect at best the original data. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem: selection criteria What are the good criteria to consider that the original data are well represented ? Manifold assumption Data live on a subspace of lower dimension, possibly a Riemannian manifold Coarse graining approaches within the graph community minimize Laplacian eigenvectors distortions [Lafon and Lee et al., TPAMI 06] label propagation [Farajtabar et al. ECML 12] Manifold pr´ecis [Schroff et al. NIPS 10] select data by maximizing both diversity and representativity Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem: diversity and representativity This last idea has motivated our work. How to measure the diversity and representativity of a subset of samples ? Diversity The subset Sk X should minimize the representational error: xi ∈X min zi ∈Sk X d(zi , xi )2 Representatitivity The subset Sk X should minimize the variance: 1 k xi ∈Sk X d(µ(Sk X ), xi ) 2 µ(.) computes the mean of the subset Because of the manifold assumption, we should instead use an intrinsic metric (geodesical distances) the use of µ(.) implies now to use instead the Karcher mean the variance is now extended to the concept of Karcher variance Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem: diversity and representativity Problem Deriving the geodesic distance directly from data is difficult and hardly tractable for large amount of data Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results The sampling problem: diversity and representativity Problem Deriving the geodesic distance directly from data is difficult and hardly tractable for large amount of data Proposition We propose to use kernel theory to conveniently embed the data in a space with a convenient topology links between the Gaussian, Heat kernels and the Laplace-Beltrami operator recent Riemannian kernels derived from the Gaussian kernel [Jayasumana et al. CVPR 13] Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Gaussian RKHS Let’s consider a Gaussian kernel k(xi , xj ) = exp − d(xi ,xj )2 2σ2 . d(, ) can be any significant distance (Euclidean, shortest path on a graph,etc.) It is well known that there exists a space H where the dot product reproduces the behavior of this kernel function Such a space can be accessed via a mapping function φ(.) from Rn onto H, the corresponding Reproducing Kernel Hilbert Space (RKHS) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere The hypersphere of the Gaussian RKHS We have k(xi , xi ) =< φ(xi ), φ(xi ) >= 1, meaning every φ(xi ) belong to the unit hypersphere of the Gaussian Reproducing Kernel Hilbert Space (RKHS) Question Instead of conducting a linear analysis in the corresponing RKHS, why not also take into account this particular geometry ? Proposition Use tools from geometry on Riemannian manifolds Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere The hypersphere of the Gaussian RKHS We propose to consider geodesic distance between elements of φ(X) rather than the Euclidean one. φ(xi) Intrinsic mean Extrinsic mean Linear pathGeodesic path Sp−1 Figure : The Gaussian RKHS Hypersphere Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Distance between points on the hypersphere The Riemannian distance (or the geodesic distance) between φ(xi ) and φ(xj ) on Sp−1 corresponds to the length of the portion of the great circle embedding φ(xi ) and φ(xj ). It is simply given by: d(φ(xi ), φ(xj )) = arccos(< φ(xi ), φ(xj ) >H), (1) = arccos k(xi , xj ). (2) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Outline 1 Motivations of the sampling problem 2 Sampling in the feature space Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere 3 Results Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space We need to define the mean of the data on the hypersphere (a.k.a. Fr´echet mean). It is defined by the following minimization problem: µ = arg min x∈M p i=1 dgeod (xi , x)2 . (3) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space Usually, Newton-like method can be used to find a (unique) solution on Sn . Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space Usually, Newton-like method can be used to find a (unique) solution on Sn . Problem However, this minimization cannot be conducted in the feature space since we do not have access to the coordinates of points in the feature space. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space Usually, Newton-like method can be used to find a (unique) solution on Sn . Problem However, this minimization cannot be conducted in the feature space since we do not have access to the coordinates of points in the feature space. Solution Perform a minimization in the input space and find the pre-image ˜x ∈ Rn of the karcher mean µ ∈ H (such that µ = φ(˜x)). it is the solution of the following (non-linear) minimization problem: ˜x = arg min x∈Rn p i=1 arccos(k(xi , x))2 . (4) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space To operate this minimization, let us consider f : Rn → R x → p i=1 arccos(k(xi , x))2 and compute its gradient: f (x) = 2 σ2 p i=1 arccos(k(xi , x))k(xi , x) 1 − k(xi , x)2 (xi − x). Setting this derivative to zero leads to a fixed point algorithm which amounts to refining in several iterations a solution ˜xt such that: ˜xt+1 = i αt (i)xi i αt (i) , with αt (i) = arccos(k(xi , x))k(xi , ˜xt ) 1 − k(xi , ˜xt )2 (5) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Karcher mean of the data in feature space Figure : Illustration of Karcher mean on two datasets: The dataset is represented by red points. The blue point is the data mean in input space, The green point is the pre-image of the Karcher mean after mapping onto the RKHS (the grayscale represents the function f values as described on previous slide. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Outline 1 Motivations of the sampling problem 2 Sampling in the feature space Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere 3 Results Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Minimizing the representational error on the hypersphere We want to minimize the following criterium xi ∈X min zi ∈Sk X dg (φ(zi ), φ(xi ))2 Stonrg similarities with a kernel k-medoid algorithm, but we use the closest point in the sense of the geodesic distance to the Karcher mean as the medoid. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere A new kernel clustering algorithm Algorithm 1 k-GC algorithm Input: dataset X, size of the sub-sampling k, Gaussian kernel variance σ2 Output: k samples p = |X|; Randomly initialize the {m1, . . . , mk } repeat for j = 1 to p do for i = 1 to k do Compute dgeod (φ(xj ), φ(mi )) = arccos(e||xj −mi ||2 /(2σ2 ) ) end for cj = arg mini d2 geod (φ(xj ), φ(mi )); dj = mini dgeod (φ(xj ), φ(mi )) end for for i = 1 to k do ψi = {x ∈ X/ c = i}; mi = arg minx ∈ψi x ∈ψi d2 end for until no more changes in {m1, . . . , mk } Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Outline 1 Motivations of the sampling problem 2 Sampling in the feature space Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere 3 Results Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Maximizing the diversity The Karcher variance over Sk X reads V (Sk X ) = 1 k xi ∈Sk X dg (µ(Sk X ), φ(xi )) 2 , it is possible to find Sk X which maximizes V (Sk X ) through a rank-revealing QR decomposition of the matrix Pµ = Logµ(φ(X)) but Logµ(φ(X)) is not accessible in the feature space we propose to estimate Pµ by a Cholesky decomposition of Kµ = PT µ Pµ 1 1 see paper for details Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere Rank-revealing Cholesky decomposition The Cholesky decomposition C of Kµ allows to find one upper-triangular matrix U ∈ Rn×n and one diagonal matrix D ∈ Rn×n such that C(Kµ ) = UT DU. U = Ak Bk 0 In−k and D = Ik 0 0 Cn−k I being the identity matrix of rank . In the case of a rank-revealing Cholesky decomposition, it is also possible to find a permutation matrix Π such that: C(ΠKµ ΠT ) = ˆUT ˆD ˆU det( ˆAk ) is maximized, i.e. the first k lines and columns of ΠKµ ΠT contain the dot product of the most independent vectors of {φ(xi )}i=1,...,n. Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Karcher mean of the data in feature space Minimizing the representational error on the hypersphere Maximizing the diversity on the hypersphere A new diversity maximization algorithm Algorithm 2 Kernel rank-revealing Cholesky decomposition for Karcher variance maximization Input: size of the sub-sampling k, Kµ Output: k samples Π = In, (U, D) = C(Kµ ) repeat βij = max (A−T k BT k )2 ij + (Ck )jj (wi (Ak ))2 Π = ΠΠi↔k+j (U, D) = C(ΠKµ ΠT ) until βij <= 1 the k first columns of Π indicate the k chosen samples Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Outline 1 Motivations of the sampling problem 2 Sampling in the feature space 3 Results Toy datasets 3D meshes subsampling Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Toy datasets A combination of two unbalanced Gaussian distributions where one minimizes Karcher variance residuals: (a) original distribution; (b) Euclidean distances; (c) geodesic distances. a b c Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Toy datasets A square-shaped distribution on which one maximizes the Karcher variance of the selection: (d) original distribution; (e) linear kernel; (f) Log-map kernel (we added a 4-NN graph to enhance the geometry of the samples) d e f Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Toy datasets S shape data cloud: representational errors g h i Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Toy datasets S shape data cloud: diversity maximization. Both images k and l are obtained with the same Log-map kernel but with different values of σ = {0.6, 0.1} j k l Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Outline 1 Motivations of the sampling problem 2 Sampling in the feature space 3 Results Toy datasets 3D meshes subsampling Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling 3D Mesh resampling Random resampling a Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling 3D Mesh resampling Representational error b Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling 3D Mesh resampling DIveristy maximization c Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Conclusion Summary of our contributions: A new geometry-preserving sampling scheme based on diverse expressions of the Karcher variance Operational algorithms Some work remains to be done the RR-Cholesky is computationally demanding... linear time equivalent procedures are being developed (work in progress) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Thank you Any questions ? Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling Projection in the tangent space φ(xi) µ = φ(˜x) θ Logµ(φ(xi)) Logarithmic map The logarithmic map at location µ which projects any point φ(xi ) ∈ R ⊂ Sp−1 onto Tµ has the following form : Logµ : R \ µ → Tµ (6) y → θ sin(θ) (y − cos(θ) · µ) where θ is the angle between µ and y i.e. θ = arccos(< µ, y >H). a . a if θ = 0, it is natural to consider that y = µ Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling A new kernel In order to compute PCA in the tangent space, we need to be able to compute the dot product in it (just like in kernel PCA): K˜x ij = < Logφ(˜x)(φ(xi )), Logφ(˜x)(φ(xj )) >H, (7) Nicolas.Courty@univ-ubs.fr GSI Motivations of the sampling problem Sampling in the feature space Results Toy datasets 3D meshes subsampling A new kernel In order to compute PCA in the tangent space, we need to be able to compute the dot product in it (just like in kernel PCA): K˜x ij = < Logφ(˜x)(φ(xi )), Logφ(˜x)(φ(xj )) >H, (7) it is possible to interpret K˜x directly as the Gram matrix derived from a new kernel data-dependent k˜x such that: k ˜x (xi , xj ) = arccos(k(xi , ˜x)) arccos(k(xj , ˜x)) 1 − k(xi , ˜x)2 1 − k(xj , ˜x)2 · (k(xi , xj ) − k(xi , ˜x)k(xj , ˜x)). with the assumption that if xi = ˜x (resp. xj ), then k˜x (xi , xj ) = arccos k(xi , xj )a a (see paper for details) Nicolas.Courty@univ-ubs.fr GSI