# GSI2013 - Geometric Science of Information

## A propos

The objective of this SEE Conference hosted by MINES ParisTech is to bring together pure/applied mathematicians and engineers, with common interest for Geometric tools and their applications for Information analysis.

It emphasises an active participation of young researchers for deliberating emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.

Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for the industry.

The Conference will be therefore being held in areas of priority/focused themes and topics of mutual interest with a mandate to:

• Provide an overview on the most recent state-of-the-art
• Exchange of mathematical information/knowledge/expertise in the area
• Identification of research areas/applications for future collaboration
• Identification of academic & industry labs expertise for further collaboration

This conference will be an interdisciplinary event and will federate skills from Geometry, Probability and Information Theory to address the following topics among others. The conference proceedings, are published in Springer's Lecture Notes in Computer Science (LNCS) series.

- Computational Information Geometry
- Hessian/Symplectic Information Geometry
- Optimization on Matrix Manifolds
- Probability on Manifolds
- Optimal Transport Geometry
- Divergence Geometry & Ancillarity
- Machine/Manifold/Topology Learning
- Tensor-Valued Mathematical Morphology
- Differential Geometry in Signal Processing
- Geometry of Audio Processing
- Geometry for Inverse Problems
- Shape Spaces: Geometry and Statistic
- Geometry of Shape Variability
- Relational Metric
- Discrete Metric Spaces

## Documents

### Synthèse (Frédéric Barbaresco)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### OPENING SESSION ()

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### POSTER SESSION (Frédéric Barbaresco)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

(3) is satisfied (3) is not satisfied Fig.10. The two cases of proposition 1. Fast Polynomial Spline Approximation for Large Scattered Data Sets via L1 Minimization Laurent Gajny, Éric Nyiri, Olivier Gibaru Laurent.GAJNY@ensam.eu, Eric.NYIRI@ensam.eu, Olivier.GIBARU@ensam.eu References [Her13] F.Hernoux, R.Béarée, L.Gajny, E.Nyiri, J.Bancalin, O.Gibaru, Leap Motion pour la capture de mouvement 3D par spline L1. Application à la robotique. GTMG 2013, Marseille, 27-28 mars 2013 [Gib1899] J. Willard Gibbs, lettre à l’éditeur, Nature 59 (April 27, 1899) 606. [Lav00] J.E. Lavery, Univariate Cubic Lp splines and shape-preserving multiscale interpolation by univariate cubic L1 splines. CAGD 17 (2000) 319 – 336. [Lav00bis] J.E. Lavery, Shape-preserving, multiscale tting of univariate data by cubic L1 smoothing splines, Comput. Aided Geom. Design, 17 (2000), 715-727. [NGA11] E. Nyiri, O. Gibaru, P. Auquiert, Fast L1 kCk polynomial spline interpolation algorithm with shape preserving properties. CAGD 28(1), 2011, 65 – 74. Behavior on noisy data set Context of the study The proposed method A brief state of the art We propose to develop a fast method of approximation by polynomial spline with no Gibbs phenomenon near to abrupt changes in the shape of the data [Gib1899]. Comparison between L1 and L2 regression line Fig2. Stability of L1 regression line against outliers . Cubic spline interpolation The L1 norm is robust against outliers. Sliding window for L1 interpolation The Lp smoothing splines Aims : Using theoretical results, develop a spline approximation method : • With shape-preserving properties. L1 norm. • With prescribed error. The control. • Fast for real-time use. Sliding window process. The problem and existence theory Fig3. A cubic spline is defined by its knots and associated first derivative values. Let (xi,yi), i=1,2,…,n, be n points in the plane, the Lp regression line problem is to find a line y=a*x+b* solution of : Let qi, ui i=1,2,…,n, be respectively n points in Rd and the associated parameters, The cubic spline interpolation problem is to find a curve  such that: • (ui) = qi for all i=1,2,…,n. •  is a polynomial function of degree at most 3 on each interval [ui,ui+1], The solution set is infinite. The C2 cubic spline interpolation problem leads to a unique solution which satisfies the following minimization property : Fig4. Gibbs phenomenon with cubic spline interpolation. It is a least square method or a L2 method. The L1 cubic spline interpolation Fig5. No Gibbs phenomenon with the L1 cubic interpolation. The L1 cubic spline interpolation introduced in [Lav00] consist in mininizing the following functional : over the set of C1 cubic splines which interpolates the points qi. + No Gibbs phenomenon. - Non-linear problem with non-unique solution. A sliding window process proposed in [Nyi2011] admits a linear complexity with the number of data points. Fig6. the sliding window process. We solve a sequence of local L1 problems and we keep only the derivative value at the middle point of each window. This method enables to obtain an algebraic solution and to manage the non-unicity of solutions. Fig7. Global method (left) and local method (right). The Lp smoothing splines are obtained by minimization of : Fig8. Overshoot with L2 smoothing spline. In this method, the parameter is not easy to choose. We have no control to the initial data points. Algebraic resolution on three points Iteration 1 Iteration 3 Iteration 6 Iteration 1 Iteration 3 Iteration 6 The three-point algorithm The -controlled L1 regression line for non-unicity cases Behavior of the method on a Heaviside data set When (3) is satisfied, the solution set may be infinite. To compute a relevant solution, we extend the window with supplementary more neighbours and solve : We consider the case n = 5 and we give more importance to the three middle point. Then we choose w2 = w3 = w4 > w1 = w5 = 1. Fig.11. The -controlled L1 regression line step in the three-point algorithm. Fig.12. Approximation of a Heaviside data set (left) and zoom on the top part of the jump (right). After applying the algorithm on the initial noisy data set, the spline solution is not smooth. However we can iterate the method. At step k+1, data points are the approximation points computed at step k. Using this process, we emphasize a smoothing phenomenon while keeping jumps. We are currently studying the proposed approach for the problem of approximation of functions. Fig.13. Approximation over noisy data sets. We apply these results to the treatment of sensor data from a low-cost computer vision system, the Leap Motion (See [Her13]). This system is able to capture very accurately fingers motion. When the user’s moves are too fast, we may observe holes in the data. Fig1. Motion capture by the leap motion and industrial robot steering. Fig.9. We look for spline solutions that do not deviate to initial data more than a given tolerance.

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Document not available

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### Guest speech (Shun-Ichi Amari)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Applications of Information Geometry Statistical Inference Machine Learning and AI Convex Programming Signal Processing (ICA) Information Theory, Systems Theory Quantum Information Geometry High Order Asymptotics 1 1 , (u) : , , ˆu u , , n n p x x x x x ˆ ˆ T e E u u u u 1 22 1 1 ( )e G G n n 1 1G G :Cramér Rao 2 2 2 2 e m m M AG H H Linear regression: Semiparametrics 1 1 2 2 , , ,n n x y x y x y ' i i i i i i x y ' 2 , 0,i i N y x x y Least squares? 2 2 mle, TLS ˆmin : 1 , 0: 0 Neyman-Scott i i i i i ii i i i i i i i i i i x y L y x x yy n x x y x y x y x y x c Semiparametric Statistical Model 1 , ; , : ; ' , , i i i i i i n p x y x y x z ( , ) , ; , , ; ,i ix y p x y z p x y z d : parameter of interest : nuisance parameter :functional degrees of freedomz orthogonalized score , , , , : mixture modelN t t t V v x z v x u u c v dt Projected score Parallel Transports of Scores 0T r x E r x 1 2 1 2: ,r r E r x r xinner product , z zz r x r x E r x e , , , , z z p x z r x r x x z m 1 2 1 2, , z z z z z r r r r e m z , ; ,p x y T Example of estimating functions 2 2 , ; : : arbitrary function 1 1 exp 2 2 { } f x y k x y y x k c x y y x k x y Z dxdyd 0, 0, 0i i i ik x y y x 1 , 1 , ' , 1 ' , i i i y x n n y x y x nn E y xn mixture and unmixture of independent signals 2x 1s ns 2smx 1x 1 n i ij j j x A s x As ICA: Independent Component Analysis Information Geometry of ICA natural gradient estimating function stability, efficiency S ={p(y)} 1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y { ( )}p Wx r q ( ) [ ( ; ) : ( )] ( ) l KL p q r W y W y y compressed sensing 1 0L Lsolution is the same as solution? 2 min 0 : -sparseX ky 0 0 1 : min : min i i L L log 2 log 2 m N k m k N m Applications to Machine Learning Stochastic reasoning: Belief propagation Boosting Support vector machine Neural networks Clustering Optimization Stochastic Reasoning ( , , , , )p x y z r s ( , , | , ) , , ,... 1, 1 p x y z r s x y z 1 1 1 1 2 exp , 1, 1 e p , x s ij L i j i i i r q r r r i i s i i q k x c c c x x r q w x x h x i i x r i i x x x x Boltzmann machine, spin glass, neural networks Turbo Codes, LDPC Codes Information Geometry 0 0 0, exp , expr r r r r r M p M p c x x x x x 1, ,r L q x rM ' r M 0M ( ) exp{ ( )rq x c x Machine Learning Boosting : combination of weak learners 1 1 2 2, , , , , ,N ND y y yx x x 1iy , : , sgn ,f y h fx u x u x u Boosting generalization 1 expt t t t tQ Q y x Q y x yh x f , | E[ const]t tF P y x yh x : min :t tD P Q 1, ,t tD P Q D P Q Neural Networks Higher order correlations Synchronous firing Multilayer Perceptron Neural Firing 1x 2x 3x nx higher order correlations orthogonal decomposition 1 2( ) ( , ,..., ): joint probabilitynp p x x xx [ ]i ir E x [ , ]ij i jv Cov x x ----firing rate ----covariance: correlation 1,0ix 1 2{ ( , ,..., )}nS p x x x Correlations of Neural Firing 1 2 00 10 01 11 1 1 10 11 2 1 01 11 , , , , p x x p p p p r p p p r p p p 11 00 12 12 2 10 01 : : logr p p r r r r p p 1x 2x 2r 1r 1 2{( , ), }r r orthogonal coordinates firing rates correlations Multilayer Perceptrons i iy v nw x 21 ; exp , 2 , i i p y c y f f v x x x w x x 1 2( , ,..., )nx x x x 1 1( ,..., ; ,..., )m mw w v v Multilayer Perceptron 1 1, , , ; , i i m m y f v v v x w x w w neuromanifold ( )x space of functions singularities singularities Center of a cluster argmin , i i Dx x x K means clustering Total Bregman Divergence 2 : : 1 D TD x y x y •rotational invariance •conformal geometry Clustering : t center 1, , mE x x arg min , i i TDx x x E y T center of E x t center is robust 1, , ; 1 ; , nE n x x y x x z x y influence fun ;ction z x y robustas :cz y Linear Programming max log inner method ij j i i i ij j i i A x b c x A x bx Convex Cone Programming P : positive semi definite matrix convex potential function dual geodesic approach , minAx b c x あ

### ORAL SESSION 1 Geometric Statistics on manifolds and Lie groups (Xavier Pennec)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

10/15/13 1 A Subspace Learning of Dynamics on a Shape Manifold: A Generative Modeling Approach Sheng Yi* and H. Krim VISSTA, ECE Dept., NCSU Raleigh NC 27695 *GE Research Center, NY Thanks to AFOSR Outline •  Motivation •  Statement of the problem •  Highlight key issues and brief review •  Proposed model and solution •  Experiments 10/15/13 2 Problem Statement X(t) Z(t) Looking for a subspace that preserve geometrical properties of data in the original space Related Work •  Point-wise subspace learning –  PCA, MDS, LLE, ISOMAP, Hessian LLE, Laplacian Mapping, Diffusion Map, LTSA [T. Wittman, "Manifold Learning Techniques: So Which is the Best?“, UCLA ] •  Curve-wise subspace learning –  Whitney embedding [D. Aouada, and H. K., IEEE Trans. IP, 2010] •  Shape manifold –  Kendall’s shape space •  Based on landmarks –  Klassen et al. shape space •  Functional representation •  Concise description of tangent space & Fast Implementation –  Michor&Mumford’s shape space •  Focus on parameterization •  Complex description of tangent space& Heavy computation –  Trouve and Younes Diff. Hom Approach 10/15/13 3 Contribution Summary •  Proposed subspace learning is Invertible Original seq. Reconstructed seq. Subspace seq.
Contribution Summary •  The parallel transport of representative frames defined by a metric on the shape manifold preserves curvatures in the subspace •  Ability to apply an ambient space calculus instead of relying essentially on manifold calculus 10/15/13 4 Shape Representation •  From curve to shape [Klassen et al.] α(s)= x(s),y(s)( )∈R2 ⇒ ∂ ∂s ∂α ∂s = cosθ(s),sinθ(s)( ) (simpleandclosedθ(s))\Sim(n) Closed: cosθ(s)ds 0 2π ∫ = 0 sinθ(s)ds 0 2π ∫ = 0 Rotation: 1 2π θ(s)ds 0 2π ∫ = π Dynamic Modeling on a Manifold •  The Core idea27 ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt = Vi (Xt )dZi (t) i=1 dim( M ) ∑ ∈TXt M dim( M ) dZii (t) Zii (t) 10/15/13 5 Parallel Transport span Tangent along curve Tangent along curve Parallel Transport X0 X1 M [ Yi et al. IEEE IP, 2012] The core idea •  Adaptively select frame to represent in a lower dimensional space ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt PCA on vectors parallel transported to a tangent space dZi (t) Rdim( M ) 10/15/13 6 Formulation of Curves on Shape Manifold A shape at the shape manifold Vectors span the tangent space of the shape manifold[ Yi et al. IEEE IP, 2012] A Euclidean driving process Vectors span a subspace A driving process in a subspace In original space: In a subspace: Core Idea •  Restrict the selection of V to be parallel frames on the manifold •  Advantage of using parallel moving frame: •  Angles between tangent vectors are preserved. With the same sampling scheme, curvatures are preserved as well. •  Given the initial frame and initial location on manifold, the original curve could be reconstructed 10/15/13 7 Core Idea •  Find an L2 optimal V Euclidean distance is used here because it is within a tangent space of the manifold Core Idea Given a parallel transport on shape manifold, with some mild assumption we can obtain a solution as a PCA 10/15/13 8 Parallel Transport Flow Chart By definition of parallel transport Discrete approx. of derivation Tangent space of shape manifold is normal to b1,b2,b3 Tangent space of shape manifold is normal to b1,b2,b3 A linear system Experiments •  Data Ø Moshe Blank et al., ICCV 2005 http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html Ø Kimia Shape Database Sharvit, D. et al.,. Content-Based Access of Image and Video Libraries,1998 •  Walk •  Run •  Jump •  Gallop sideways •  Bend •  One-hand wave •  Two-hands wave •  Jump in place •  Jumping Jack •  Skip 10/15/13 9 Reconstruction Experiment PCA in Euclidean space The proposed method More Reconstructions and Embeddings 10/15/13 10 Other Embedding Result Experiment on curvature preservation 10/15/13 11 Experiment on curvature preservation 10/15/13 12 Generative Reconstruction 10/15/13 13 Conclusions •  A low dimensional embedding of a parallelly transported shape flow proposed •  A learning-based inference framework achieved •  A generative model for various shape- based activities is obtained

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Xavier Pennec Asclepios team, INRIA Sophia- Antipolis – Mediterranée, France Bi-invariant Means on Lie groups with Cartan-Schouten connections GSI, August 2013 X. Pennec - GSI, Aug. 30, 2013 2 Design Mathematical Methods and Algorithms to Model and Analyze the Anatomy  Statistics of organ shapes across subjects in species, populations, diseases…  Mean shape  Shape variability (Covariance)  Model organ development across time (heart-beat, growth, ageing, ages…)  Predictive (vs descriptive) models of evolution  Correlation with clinical variables Computational Anatomy Statistical Analysis of Geometric Features Noisy Geometric Measures  Tensors, covariance matrices  Curves, tracts  Surfaces  Transformations  Rigid, affine, locally affine, diffeomorphisms Goal:  Deal with noise consistently on these non-Euclidean manifolds  A consistent statistical (and computing) framework X. Pennec - GSI, Aug. 30, 2013 3 X. Pennec - GSI, Aug. 30, 2013 4 Statistical Analysis of the Scoliotic Spine Data  307 Scoliotic patients from the Montreal’s St-Justine Hosp  3D Geometry from multi-planar X-rays  Articulated model:17 relative pose of successive vertebras Statistics  Main translation variability is axial (growth?)  Main rot. var. around anterior-posterior axis  4 first variation modes related to King’s classes [ J. Boisvert et al. ISBI’06, AMDO’06 and IEEE TMI 27(4), 2008 ] Morphometry through Deformations 5X. Pennec - GSI, Aug. 30, 2013 Measure of deformation [D’Arcy Thompson 1917, Grenander & Miller]  Observation = “random” deformation of a reference template  Deterministic template = anatomical invariants [Atlas ~ mean]  Random deformations = geometrical variability [Covariance matrix] Patient 3 Atlas Patient 1 Patient 2 Patient 4 Patient 5 1 2 3 4 5 Hierarchical Deformation model Varying deformation atoms for each subject M3 M4 M5 M6 M1 M2 M0 K M3 M4 M5 M6 M1 M2 M0 1 … Subject level: 6 Spatial structure of the anatomy common to all subjects w0 w1 w2 w3 w4 w5 w6 Population level: Aff(3) valued trees X. Pennec - GSI, Aug. 30, 2013[Seiler, Pennec, Reyes, Medical Image Analysis 16(7):1371-1384, 2012] X. Pennec - GSI, Aug. 30, 2013 7 Outline Riemannian frameworks on Lie groups Lie groups as affine connection spaces A glimpse of applications in infinite dimensions Conclusion and challenges X. Pennec - GSI, Aug. 30, 2013 8 Riemannian geometry is a powerful structure to build consistent statistical computing algorithms Shape spaces & directional statistics  [Kendall StatSci 89, Small 96, Dryden & Mardia 98] Numerical integration, dynamical systems & optimization  [Helmke & Moore 1994, Hairer et al 2002]  Matrix Lie groups [Owren BIT 2000, Mahony JGO 2002]  Optimization on Matrix Manifolds [Absil, Mahony, Sepulchre, 2008] Information geometry (statistical manifolds)  [Amari 1990 & 2000, Kass & Vos 1997]  [Oller & Corcuera Ann. Stat. 1995, Battacharya & Patrangenaru, Ann. Stat. 2003 & 2005] Statistics for image analysis  Rigid body transformations [Pennec PhD96]  General Riemannian manifolds [Pennec JMIV98, NSIP99, JMIV06]  PGA for M-Reps [Fletcher IPMI03, TMI04]  Planar curves [Klassen & Srivastava PAMI 2003] Geometric computing  Subdivision scheme [Rahman,…Donoho, Schroder SIAM MMS 2005] X. Pennec - GSI, Aug. 30, 2013 9 The geometric framework: Riemannian Manifolds Riemannian metric :  Dot product on tangent space  Speed, length of a curve  Geodesics are length minimizing curves  Riemannian Distance Operator Euclidean space Riemannian manifold Subtraction Addition Distance Gradient descent )( ttt xCxx   )(yLogxy x xyxy  xyyx ),(dist x xyyx ),(dist )(xyExpy x ))(( txt xCExpx t   xyxy  Unfolding (Logx), folding (Expx)  Vector -> Bipoint (no more equivalent class) Exponential map (Normal coord. syst.) :  Geodesic shooting: Expx(v) = g(x,v)(1)  Log: find vector to shoot right (geodesic completeness!) 10 Statistical tools: Moments Frechet / Karcher mean minimize the variance Existence and uniqueness : Karcher / Kendall / Le / Afsari Gauss-Newton Geodesic marching Covariance (PCA) [higher moments]  xyEwith)(expx x1  vvtt        M M )().(.x.xx.xE TT zdzpzz xxx xx         0)(0)().(.xxE),dist(Eargmin 2   CPzdzpy y MM MxxxxxΕ X. Pennec - GSI, Aug. 30, 2013 [Oller & Corcuera 95, Battacharya & Patrangenaru 2002, Pennec, JMIV06, NSIP’99 ] 11 Distributions for parametric tests Generalization of the Gaussian density:  Stochastic heat kernel p(x,y,t) [complex time dependency]  Wrapped Gaussian [Infinite series difficult to compute]  Maximal entropy knowing the mean and the covariance Mahalanobis D2 distance / test:  Any distribution:  Gaussian:           2/x..xexp.)( T xΓxkyN       rOk n /1.)det(.2 32/12/    Σ    rO /Ric3 1)1(    ΣΓ yx..yx)y( )1(2   xxx t    n)(E 2 xx  rOn /)()( 322  xx [ Pennec, NSIP’99, JMIV 2006 ] X. Pennec - GSI, Aug. 30, 2013 Natural Riemannian Metrics on Lie Groups Lie groups: Smooth manifold G compatible with group structure  Composition g o h and inversion g-1 are smooth  Left and Right translation Lg(f) = g o f Rg (f) = f o g Natural Riemannian metric choices  Chose a metric at Id: Id  Propagate at each point g using left (or right) translation g = < DLg (-1) .x , DLg (-1) .y >Id  Practical computations using left (or right) translations X. Pennec - GSI, Aug. 30, 2013 12   g)(f.LogDL(g)Logfgx).DL(ExpfxExp 1)( Idffff 1)(    Id 13 Example on 3D rotations Space of rotations SO(3):  Manifold: RT.R=Id and det(R)=+1  Lie group ( R1 o R2 = R1.R2 & Inversion: R(-1) = RT ) Metrics on SO(3): compact space, there exists a bi-invariant metric  Left / right invariant / induced by ambient space = Tr(XT Y) Group exponential  One parameter subgroups = bi-invariant Geodesic starting at Id  Matrix exponential and Rodrigue’s formula: R=exp(X) and X = log(R)  Geodesic everywhere by left (or right) translation LogR(U) = R log(RT U) / ExpR(X) = R exp(RT X) / Bi-invariant Riemannian distance  d(R,U) = ||log(RT U)|| = q( RT U ) X. Pennec - GSI, Aug. 30, 2013 14 General Non-Compact and Non-Commutative case No Bi-invariant Mean for 2D Rigid Body Transformations  Metric at Identity:

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### Keynote speech 1 (Yann Ollivier)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Objective Improvement in Information-Geometric Optimization Youhei Akimoto Project TAO – INRIA Saclay LRI, Bât. 490, Univ. Paris-Sud 91405 Orsay, France Youhei.Akimoto@lri.fr Yann Ollivier CNRS & Univ. Paris-Sud LRI, Bât. 490 91405 Orsay, France yann.ollivier@lri.fr ABSTRACT Information-Geometric Optimization (IGO) is a uniﬁed frame- work of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization prob- lem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO re- covers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-

### ORAL SESSION 2 Deformations in Shape Spaces (Alain Trouvé)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Geodesic image regression with a sparse parameterization of diffeomorphisms James Fishbaugh1 Marcel Prastawa1 Guido Gerig1 Stanley Durrleman2 1 Scientific Computing and Imaging Institute, University of Utah 2 INRIA/ICM, Pitié Salpêtrière Hospital, Paris, France Image Regression 6 months 12 months 18 months8 months 14 months 16 months10 months Why image regression? • Extrapolation for change prediction • Align images and cognitive scores acquired at different times • Align subjects with scans acquired at different times • Improved understanding of normal and pathological brain changes 1 of 18 Previous Work Kernel regression Geodesic regression Geodesic regression Davis et al. ICCV 2007 Niethammer et al. MICCAI 2011 Singh et al. ISBI 2013 Require to store many model parameters ~ number of voxels Image evolution described by considerably fewer parameters Concentrated in areas undergoing most dynamic changes 2 of 18 Motivation for Sparsity Fewer parameters Location of parameters • Potential for greater statistical power – less noise in description • Concentrated in areas undergoing the most dynamic changes • Number of parameters should reflect complexity of anatomical changes, not the sampling of the images • Localize potential biomarkers 3 of 18 Compact and generative statistical model of growth Geodesic Image Regression Geodesic path on a sub-group of diffeomorphisms (Dupuis 98, Trouvè 95,98) 4 of 18 Geodesic Image Regression S0 = {c0, α0} I0 O1 O3 O2 5 of 18 Geodesic shooting to evolve control points S0 = {c0, α0} Methods: Shooting 6 of 18 Trajectory of control points defines flow of diffeomorphisms Physical pixel coordinates y follow the trajectory which evolves in time as Methods: Flow (5, 5, 60.25)Deformed images constructed by interpolation 7 of 18 Summary Of Method 1) Shoot control points 2) Trajectory defines flow 3) Flow pixel locations 4) Interpolate in baseline image 8 of 18 Subject to Shoot Flow Regression Criterion 9 of 18 Method Overview Gradient with respect to control points and initial momenta Gradient with respect to initial image Gradient of Regression Criterion 1) Flow voxel Yk(t) to time t and compute residual 2) Grey value in residual is distributed to neighboring voxels with weights from trilinear interoplation 3) Grey values accumulated for every observed image 10 of 18 Method Overview Sparsity on Initial Momenta Fast Iterative Shrinkage-Thresholding Algorithm (Beck 09)  Use previous gradient of criterion without L1 penalty  Threshold momentum vectors with small magnitude Select a small subset of momenta which best describe the dynamics of image evolution 11 of 18  Used in context of atlas building (Durrleman 12,13) Synthetic Evolution (2D) Generated by shooting baseline with 79,804 predefined momenta Time 1 Time 2 Time 3 Time 4 Time 5 Impact of sparsity parameter on model estimation 12 of 18 Synthetic Evolution (2D) From 79,804 to 67 momenta 13 of 18 Pediatric Brain Development (2D) Models estimated backwards in time with varying sparsity T1W image of same child over time 14 of 18 Method Overview Pediatric Brain Development (2D) From 45,435 to 47 momenta 15 of 18 Brain Atrophy in Alzheimer's Disease (3D) T1W image of same patient over time 70.75 years 71.38 years 71.78 years 72.79 years Six years predicted brain atrophy with 35,937 momenta 98% decrease in number of parameters 16 of 18 Conclusions Geodesic image regression framework:  Decouples deformation parameters from image representation  L1 penalty which selects optimal subset of initial momenta Number of parameters reduced with only minimal cost in terms of matching target data Future work:  Kernels at multiple scales (Sommer 11)  Other image matching metrics, LCC (Avants 07, Lorenzi 13)  Combine with a framework for longitudinal analysis 17 of 18 This work was supported by: NIH (NINDS) 1 U01 NS082086-01 (4D shape HD) NIH (NICHD) RO1 HD055741 (ACE, project IBIS) NIH (NIBIB) 2U54 EB005149 (NA-MIC) Acknowledgments Thank you 18 of 18

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### Keynote speech 2 (Hirohiko Shima)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

. . . . . . . . . .. . . Geometry of Hessian Structures Hirohiko Shima h-shima@c-able.ne.jp Yamaguchi University 2013/8/29 Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 1 / 27 . . . . . . . ..1 Hessian Structures . ..2 Hessian Structures and K¨ahlerian Structures . ..3 Dual Hessian Structures . ..4 Hessian Curvature Tensor . ..5 Regular Convex Cones . ..6 Hessian Structures and Aﬃne Diﬀerential Geometry . ..7 Hessian Structures and Information Geometry . ..8 Invariant Hessian Structures Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 2 / 27 . . . . . . Preface In 1964 Prof. Koszul was sent to Japan by the French Government and gave lectures at Osaka University. I was a student in those days and attended the lectures together with the late Professor Matsushima and Murakami. The topics of the lectures were a theory of ﬂat manifolds with ﬂat connection D and closed 1-form α such that Dα is positive deﬁnite. α being a closed 1-form it is locally expressed as α = dφ, and so Dα = Ddφ is just a Hessian metric in our view point. This is the ultimate origin of the notion of Hessian structures and the starting point of my research. Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 3 / 27 . . . . . . 1. Hessian structures . Deﬁnition (Hessian metric) .. . . .. . . M : manifold with ﬂat connection D A Riemannian metric g on M is said to be a Hessian metric if g can be locally expressed by g = Ddφ, gij = ∂2 φ ∂xi ∂xj , where {x1 , · · · , xn } is an aﬃne coordinate system w.r.t. D. (D, g) : Hessian structure on M (M, D, g) : Hessian manifold The function φ is called a potential of (D, g). Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 4 / 27 . . . . . . . Deﬁnition (diﬀerence tensor γ) .. . . .. . . Let γ be the diﬀerence tensor between the Levi-Civita connection ∇ for g and the ﬂat connection D; γ = ∇ − D, γX Y = ∇X Y − DX Y . γi jk(component of γ)=Γi jk(Christoﬀel symbol for g) . Proposition (characterizations of Hessian metric) .. . . . Let (M, D) be a ﬂat manifold and g a Riemannian metric on M. The following conditions are equivalent. (1) g is a Hessian metric. (2) (DX g)(Y , Z) = (DY g)(X, Z) Codazzi equation i.e. the covariant tensor Dg is symmetric. (3) g(γX Y , Z) = g(Y , γX Z) Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 5 / 27 . . . . . . 2. Hessian Structures and K¨ahlerian Structures . Deﬁnition (K¨ahlerian metric) .. . . .. . . A complex manifold with Hermitian metric g = ∑ i,j gij dzi d¯zj is called a K¨ahlerian metric if g is expressed by complex Hessian gij = ∂2 ψ ∂zi ∂¯zj , where {z1 , · · · , zn } is a holomorphic coordinate system. Chen and Yau called Hessian metrics aﬃne K¨ahlerian metrics. . Proposition .. . . .. . . Let TM be the tangent bundle over Hessian manifold (M, D, g). Then TM is a complex manifold with K¨ahlerian metric gT = ∑n i,j=1 gij dzi d¯zj where zi = xi + √ −1dxi . Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 6 / 27 . . . . . . . Example (tangent bundle of paraboloid) .. . . .. . . Ω = { x ∈ Rn | xn − 1 2 n−1∑ i=1 (xi )2 > 0 } paraboloid φ = log { xn − 1 2 n−1∑ i=1 (xi )2 }−1 g = Ddφ : Hessian metric on Ω TΩ ∼= Ω + √ −1Rn ⊂ Cn : tube domain over Ω TΩ ∼

### ORAL SESSION 3 Differential Geometry in Signal Processing (Michel Berthier)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives K-Centroids-Based Supervised Classiﬁcation of Texture Images using the SIRV modeling Aurélien Schutz Lionel Bombrun Yannick Berthoumieu IMS Laboratory - CNRS UMR5218, Groupe Signal 28-30 august 2013 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 1 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Database classiﬁcation Database classiﬁcation Musical genres Images databases Textured images databases Video databases Propositions Information geometry Centroid ¯θi [Choy2007] [Fisher1925], [Burbea1982], [Pennec1999], [Banerjee2005], [Amari2007], [Nielsen2009] Bayesian framework of classiﬁcation Intrinsic prior p(θ | Hi ) [Bayes1763], [Whittaker1915], [Robert1996], [Bernardo2003] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 2 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Prior capability of handling the intra-class diversity Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 3 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 4 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 5 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian decision Data space X where x ∼ P Parameter space Θ PΘ a prior parametric model on Θ Riemannian manifold G the Fisher information matrix Nc classes, D = {Hi } Nc i=1 decision space Prerequisites : likelihood p(x | θ, Hi ), prior p(θ | Hi ), 0-1 loss L Decision rule on X : high computational complexity Xi = x | ˆHi = arg min Hj ∈D − log Θj p(x | θ, Hj )p(θ | Hj ).dθ Decision rule on Θ : minimizing conditional risk Duda, Bayesian decision theory Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 6 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) ¯θi centroid of the class i = 1, . . . , Nc Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Deﬁnition of intrinsic prior based on Jeﬀrey divergence Prior that follow a Gaussian distribution on manifold Θ p(θ | Hi ) = Zi exp − 1 2 (γ¯θi ,θ(1))T Ci γ¯θi ,θ(1) Pennec, Xavier, "Probabilities and statistics on Riemannian manifolds : basic tools for geometric measurements," NSIP, 1999 Proposition Intrinsic prior as Gaussian distribution on manifold Θ, with λi = (¯θi , σ2 i ) p(θ | λi , Hi ) |G(¯θi )|1/2 (σi √ 2π)d exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi )) Jeﬀrey divergence J(p(· | θ), p(· | ¯θi )) = X (p(x | θ) − p(x | ¯θi )) log p(x | θ) p(x | ¯θi ) .dx Fisher, R.A., "Theory of statistical estimation," Proc. Cambridge Phil. Soc., 22, pp. 700-–725, 1925 Burbea, Jacob et Rao, C.Radhakrishna, "Entropy diﬀerential metric, distance and divergence measures in probability spaces : A uniﬁed approach ," Journal of Multivariate Analysis, 4, pp. 575—596, 1982 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 8 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Optimal decision on Θ Decision on X based on Empirical Bayes, Xi = x Hi = arg min Hj ∈D − log Θi p(Xi | θ, Hi )p(θ | Hi ).dθ Kass, R. E. and Steﬀey, D., "Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)," 1989 Miyata, Y., "Fully Exponential Laplace Approximations Using Asymptotic Modes," Journal of the American Statistical Association, 2004 Proposition Decision on X, Laplace approximation Xi x ˆλi = arg min λj ∈D d 2 log{2σ2 j + 1} + 1 2σ2 j J(p(· | ˆθ(x)), p(· | ¯θj )) ˆθ could be maximum likelihood estimator for p(x | θ, Hi ) [Miyata2004] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 9 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 10 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Space/scale decomposition Reel/complex wave- lets Gabor Steerable ﬁlters Bandelets Grouplets Dual-Tree Mallat, S. A, "Theory for multiresolution signal decomposition : The wavelet representation," IEEE PAMI, 1989 Do, M. and Vetterli, M., "Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance," IEEE IP, 2002 Choy, S.-K. and Tong, C.-S., "Supervised Texture Classiﬁcation Using Characteristic Generalized Gaussian Density," Journal of Mathematical Imaging and Vision, 2007 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 11 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Stochastic models for likelihood p(x | θ, Hi) Spherically Invariant Random Vector (SIRV) x = g √ τ : g multivariate gaussian distribution Σ τ Weibull distribution a Joint distribution y = (τ, g), θ = (Σ, a) p(y | θ) = pG (g | Σ)pw (τ | a) Separability of Jeﬀrey divergence J(p(· | θ), p(· | θ )) = J(pG (· | Σ), pG (· | Σ ))+J(pw (· | a), pw (· | a )) Bombrun, L., Lasmar, N.-E., Berthoumieu, Y. and Verdoolaege, G., Multivariate texture retrieval using the SIRV representation and the geodesic distance, 2011 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 12 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Centroid ¯θi computation State of the art : exponential families ; centred multivariate Gaussian ¯ΣR,i = 1 Ni Ni n=1 Σ−1 n −1 and ¯ΣL,i = 1 Ni Ni n=1 Σn Banerjee, A., Merugu, S., Dhillon, I. and Ghosh, J., Clustering with Bregman divergences, 2005 Nielsen, F. and Nock, R. Sided and Symmetrized Bregman Centroids, 2009 Steepest descent algorithm for Weibull centroid Dekker, T. J., Finding a zero by means of successive linear interpolation, 1969 Brent, R. P., An algorithm with guaranteed convergence for ﬁnding a zero of a function, 1971 Proposition Separated estimation of each centroid. ¯θi = (1 − i )¯ΣR,i + i ¯ΣL,i , arg min a∈R+ 1 Ni Ni n=1 J(pw (· | an), pw (· | a)) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 13 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 14 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Unique centroid versus multiple centroids (K-CB) Varma, M. and Zisserman, A., A Statistical Approach to Texture Classiﬁcation from Single Images, 2005 Several centroids per class (¯θi,k )K k=1, likelihood K-CB with binary weights wk pm(θ | (Hi,k )K k=1) = K k=1 wk Zi,k exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi,k )) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 15 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Algorithms for K-CB K-means ( Hard C-Means ) Proposition 1. Assignment of parametric vector θ Θi,k = θ | ˆθi,k = arg min θi,l ∈Hi 1 2σ2 i J(p(· | θ), p(· | ¯θi,l )) 2. Update ¯θi,k ¯θi,k = arg min ¯θ∈Θ Θi,k 1 2σ2 i J(p(· | θ), p(· | ¯θ)).dθ Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 16 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Textured image database Vision Texture database (VisTex) Brodatz database Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 17 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Vistex, SIRV Weibull, Jeﬀrey divergence 2/16 5/16 8/16 11/16 14/16 85 90 95 100 No. of training sample Averagekappaindex(%) 1−CB [1] 3−CB 1−NN Spatial Database K 1-NN 1-CB [1] K-CB neigh. NTr = K NTr = NSa/2 NTr = NSa/2 3 × 3 VisTex 3 83.7 % ±2.0 90.4 % ±1.3 96.8 % ±1.2 Brodatz 10 50.6 % ±2.6 79.9 % ±1.5 96.2 % ±1.2 1 × 1 VisTex 3 78.7 % ±2.3 72.7 % ±2.0 88.9 % ±1.7 Brodatz 10 65.8 % ±2.7 70 % ±1 97 % ±2 [1] Choy, S.K., Tong, C.S. : Supervised texture classiﬁcation using characteristic generalized Gaussian density. Journal of Mathematical Imaging and Vision 29 (Aug. 2007) 35–47 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 18 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 19 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Conclusions et Perspectives Conclusion 1. Bayesian classiﬁcation theory and Information geometry 2. Concentred Gaussian distribution as prior p(θ | Hi ) intrinsic when p(θ) or L depends on Fisher information matrix G(θ) Decision rule done on Θ 3. K-Centroids based (K-CB) classiﬁcation Diversity intra-class too high : a class, K centroids K-means on each class Numerical application : K-CB performances close to 1-NN performances K-CB give a low computing complexity Perspectives 1. K-CB with Possibilistic Fuzzy C-Means (PFCM) algorithm 2. Adapting the number of centroid needed by class Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 20 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Brodatz, P., Textures : A Photographic Album for Artists and Designers, 1966. Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 21 / 21

### Keynote speech 3 (Giovanni Pistone)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

GSI2013 - Geometric Science of Information, Paris, 28-30 August 2013 Dimensionality reduction for classiﬁcation of stochastic ﬁbre radiographs C.T.J. Dodson1 and W.W. Sampson2 School of Mathematics1 and School of Materials2 University of Manchester UK ctdodson@manchester.ac.uk Abstract Dimensionality reduction helps to identify small numbers of essential features of stochastic ﬁbre networks for classiﬁcation of image pixel density datasets from experimental radiographic measurements of commercial samples and simulations. Typical commercial macro-ﬁbre networks use ﬁnite length ﬁbres suspended in a ﬂuid from which they are continuously deposited onto a moving bed to make a continuous web; the ﬁbres can cluster to differing degrees, primarily depending on the ﬂuid turbulence, ﬁbre dimensions and ﬂexibility. Here we use information geometry of trivariate Gaussian spatial distributions of pixel density among ﬁrst and second neighbours to reveal features related to sizes and density of ﬁbre clusters. Introduction Much analytic work has been done on modelling of the statistical geometry of stochastic ﬁbre networks and their behaviour in regard to strength, ﬂuid ingress or transfer [1, 5, 7]. Using complete sampling by square cells, their areal density distribution is typically well represented by a log-gamma or a (truncated) Gaussian distribution of variance that decreases monotonically with increasing cell size; the rate of decay is dependent on ﬁbre and ﬁbre cluster dimensions. Clustering of ﬁbres is well-approximated by Poisson processes of Poisson clusters of differing density and size. A Poisson ﬁbre network is a standard reference structure for any given size distribution of ﬁbres; its statistical geometry is well-understood for ﬁnite and inﬁnite ﬁbres. Figure : 1. Electron micrographs of four stochastic ﬁbrous materials. Top left: Nonwoven carbon ﬁbre mat; Top right: glass ﬁbre ﬁlter; Bottom left: electrospun nylon nanoﬁbrous network (Courtesy S.J. Eichhorn and D.J. Scurr); Bottom right: paper using wood cellulose ﬁbres—typically ﬂat ribbonlike, of length 1 to 2mm and width 0.02 to 0.03mm. Figure : 2. Areal density radiographs of three paper networks made from natural wood cellulose ﬁbres, of order 1mm in length, with constant mean density but different distributions of ﬁbres. Each image represents a square region of side length 5 cm; darker regions correspond to higher coverage. The left image is similar to that expected for a Poisson process of the same ﬁbres, so typical real samples exhibit clustering of ﬁbres. Spatial statistics We use information geometry of trivariate Gaussian spatial distributions of pixel density with covariances among ﬁrst and second neighbours to reveal features related to sizes and density of ﬁbre clusters, which could arise in one, two or three dimensions—the graphic shows a grey level barcode for the ordered sequence of the 20 amino acids in a yeast genome, a 1-dimensional stochastic texture. Saccharomyces CerevisiaeAmino Acids SC1 For isotropic spatial processes, which we consider here, the variables are means over shells of ﬁrst and second neighbours, respectively, which share the population mean with the central pixel. For anisotropic networks the neighbour groups would be split into more, orthogonal, new variables to pick up the spatial anisotropy in the available spatial directions. Typical sample data Figure : 3. Trivariate distribution of areal density values for a typical newsprint sample. Left: source radiograph; centre: histogram of pixel densities ˜βi , average of ﬁrst neighbours ˜β1,i and second neighbours ˜β2,i ; right: 3D scatter plot of ˜βi , ˜β1,i and ˜β2,i . Information geodesic distances between multivariate Gaussians What we know analytically is the geodesic distance between two multivariate Gaussians, fA, fB, of the same number n of variables in two particular cases [2]: Dµ(fA, fB) when they have a common mean µ but different covariances ΣA, ΣB and DΣ(fA, fB) when they have a common covariance Σ but different means µA, µB. The general case is not known analytically but for the purposes of studying the stochastic textures arising from areal density arrays of samples of stochastic ﬁbre networks, a satisfactorily discriminating approximation is D(fA , fB ) ≈ Dµ(fA , fB ) + DΣ(fA , fB ). Information geodesic distance between multivariate Gaussians [2] (1). µA = µB, ΣA = ΣB = Σ : fA = (n, µA, Σ), fB = (n, µB, Σ) Dµ(fA , fB ) = µA − µB T · Σ−1 · µA − µB . (1) (2). µA = µB = µ, ΣA = ΣB : fA = (n, µ, ΣA), fB = (n, µ, ΣB) DΣ(fA , fB ) = 1 2 n j=1 log2 (λj), (2) with {λj} = Eig(ΣA−1/2 · ΣB · ΣA−1/2 ). From the form of DΣ(fA, fB) in (2) it may be seen that an approximate monotonic relationship arises with a more easily computed symmetrized log-trace function given by ∆Σ(fA, fB) = log 1 2n Tr(ΣA−1/2 · ΣB · ΣA−1/2 ) + Tr(ΣB−1/2 · ΣA · ΣB−1/2 ) . (3) This is illustrated by the plot of DΣ(fA, fB) from equation (2) on ∆Σ(fA, fB) from equation (3) in Figure 4 for 185 trivariate Gaussian covariance matrices. For comparing relative proximity, this is a better measure near zero than the symmetrized Kullback-Leibler distance [6] in those multivariate Gaussian cases so far tested and may be quicker for handling large batch processes. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 DΣ(fA, fB) ∆Σ(fA, fB) Figure : 4. Plot of DΣ(fA , fB ) from (2) on ∆Σ(fA , fB ) from (3) for 185 trivariate Gaussian covariance matrices. Dimensionality reduction for data sets 1. Obtain mutual ‘information distances’ D(i, j) among the members of the data set of textures X1, X2, .., XN each with 250×250 pixel density values. 2. The array of N × N differences D(i, j) is a symmetric positive deﬁnite matrix with zero diagonal. This is centralized by subtracting row and column means and then adding back the grand mean to give CD(i, j). 3. The centralized matrix CD(i, j) is again symmetric positive deﬁnite with diagonal zero. We compute its N eigenvalues ECD(i), which are necessarily real, and ﬁnd the N corresponding N-dimensional eigenvectors VCD(i). 4. Make a 3 × 3 diagonal matrix A of the ﬁrst three eigenvalues of largest absolute magnitude and a 3 × N matrix B of the corresponding eigenvectors. The matrix product A · B yields a 3 × N matrix and its transpose is an N × 3 matrix T, which gives us N coordinate values (xi, yi, zi) to embed the N samples in 3-space. 2 4 6 8 10 12 14 2 4 6 8 10 12 14 -0.5 0.0 0.5 -0.2 0.0 0.2 -0.2 -0.1 0.0 0.1 Figure : 5. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), trivariate Gaussian information distances among 16 datasets of 1mm pixel density differences between a Poisson network and simulated networks from 1mm ﬁbres, same mean density different clustering. Embedding: subgroups show numbers of ﬁbres in clusters and cluster densities. 2 4 6 8 10 12 2 4 6 8 10 12 -1 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 6. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), for trivariate Gaussian information distances among 16 datasets of 1mm pixel density arrays for simulated networks made from 1mm ﬁbres, each network with the same mean density but with different clustering. Embedding: subgroups show numbers of ﬁbres in clusters and cluster densities; the solitary point is an unclustered Poisson network. 2 4 6 8 10 12 2 4 6 8 10 12 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 7. DΣ(fA , fB ) as a cubic-smoothed surface (left), and as a contour plot (right), for trivariate Gaussian information distances among 16 simulated Poisson networks made from 1mm ﬁbres, with different mean density, using pixels at 1mm scale. Second row: Embedding of the same Poisson network data, showing the effect of mean network density. -5 0 5 0 2 4 -1.0 -0.5 0.0 0.5 Figure : 8. Embedding using 182 trivariate Gaussian distributions for samples from a data set of radiographs of commercial papers. The embedding separates different forming methods into subgroups. References [1] K. Arwini and C.T.J. Dodson. Information Geometry Near Randomness and Near Independence. Lecture Notes in Mathematics. Springer-Verlag, New York, Berlin, 2008, Chapter 9 with W.W. Sampson, Stochasic Fibre Networks pp 161-194. [2] C. Atkinson and A.F.S. Mitchell. Rao’s distance measure. Sankhya: Indian Journal of Statistics 48, A, 3 (1981) 345-365. [3] K.M. Carter, R. Raich and A.O. Hero. Learning on statistical manifolds for clustering and visualization. In 45th Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, 2007. https://wiki.eecs.umich.edu/global/data/hero/images/c/c6/Kmcarter- learnstatman.pdf [4] K.M. Carter Dimensionality reduction on statistical manifolds. PhD thesis, University of Michigan, 2009. http://tbayes.eecs.umich.edu/kmcarter/thesis [5] M. Deng and C.T.J. Dodson. Paper: An Engineered Stochastic Structure. Tappi Press, Atlanta, 1994. [6] F. Nielsen, V. Garcia and R. Nock. Simplifying Gaussian mixture models via entropic quantization. In Proc. 17th European Signal Processing Conference, Glasgow, Scotland 24-28 August 2009, pp 2012-2016. [7] W.W. Sampson. Modelling Stochastic Fibre Materials with Mathematica. Springer-Verlag, New York, Berlin, 2009.

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Nonparametric Information Geometry http://www.giannidiorestino.it/GSI2013-talk.pdf Giovanni Pistone de Castro Statistics Initiative Moncalieri, Italy August 30, 2013 Abstract The diﬀerential-geometric structure of the set of positive densities on a given measure space has raised the interest of many mathematicians after the discovery by C.R. Rao of the geometric meaning of the Fisher information. Most of the research is focused on parametric statistical models. In series of papers by author and coworkers a particular version of the nonparametric case has been discussed. It consists of a minimalistic structure modeled according the theory of exponential families: given a reference density other densities are represented by the centered log likelihood which is an element of an Orlicz space. This mappings give a system of charts of a Banach manifold. It has been observed that, while the construction is natural, the practical applicability is limited by the technical diﬃculty to deal with such a class of Banach spaces. It has been suggested recently to replace the exponential function with other functions with similar behavior but polynomial growth at inﬁnity in order to obtain more tractable Banach spaces, e.g. Hilbert spaces. We give ﬁrst a review of our theory with special emphasis on the speciﬁc issues of the inﬁnite dimensional setting. In a second part we discuss two speciﬁc topics, diﬀerential equations and the metric connection. The position of this line of research with respect to other approaches is brieﬂy discussed. References in • GP, GSI2013 Proceedings. A few typos corrected in arXiv:1306.0480; • GP, arXiv:1308.5312 • If µ1, µ2 are equivalent measures on the same sample space, a statistical model has two representations L1(x; θ)µ1(dx) = L2(x; θ)µ2(dx). • Fisher’s score is a valid option s(x; θ) = d dθ ln Li (x; θ), i = 1, 2, and Eθ [sθ] = 0. • Each density q equivalent to p is of the form q(x) = ev(x) p(x) Ep [ev ] = exp (v(x) − ln (Ep [ev ])) p(x), where v is a random variable such that Ep [ev ] < +∞. • To avoid borderline cases, we actually require Ep eθv < +∞, θ ∈ I open ⊃ [0, 1]. • Finally, we require Ep [v] = 0. Plan Part I Exponential manifold Part II Vector bundles Part III Deformed exponential Part I Exponential manifold Sets of densities Deﬁnition P1 is the set of real random variables f such that f dµ = 1, P≥ the convex set of probability densities, P> the convex set of strictly positive probability densities: P> ⊂ P≥ ⊂ P1 • We deﬁne the (diﬀerential) geometry of these spaces in a way which is meant to be a non-parametric generalization of Information Geometry • We try to avoid the use of explicit parameterization of the statistical models and therefore we use a parameter free presentation of diﬀerential geometry. • We construct a manifold modeled on an Orlicz space. • We look for applications to applications intrisically non parametric, i.e. Statistical Physics, Information Theory, Optimization, Filtering. Banach manifold Deﬁnition 1. Let P be a set, E ⊂ P a subset, B a Banach space. A 1-to-1 mapping s : E → B is a chart if the image s(E) = S ⊂ B is open. 2. Two charts s1 : E1 → B1, s2 : E2 → B2, are both deﬁned on E1 ∩ E2 and are compatible if s1(E1 ∩ E2) is an open subset of B1 and the change of chart mapping s2 ◦ s−1 1 : s1(E1 ∩ E2) s−1 1 // E1 ∩ E2 s2 // s2(E1 ∩ E2) is smooth. 3. An atlas is a set of compatible charts. • Condition 2 implies that the model spaces B1 and B2 are isomorphic. • In our case: P = P>, the atlas has a chart sp for each p ∈ P> such that sp(p) = 0 and two domains Ep1 and Ep2 are either equal or disjoint. Charts on P> Model space Orlicz Φ-space If φ(y) = cosh y − 1, the Orlicz Φ-space LΦ (p) is the vector space of all random variables such that Ep [Φ(αu)] is ﬁnite for some α > 0. Properties of the Φ-space 1. u ∈ LΦ (p) if, and only if, the moment generating function α → Ep [eαu ] is ﬁnite in a neighborhood of 0. 2. The set S≤1 = u ∈ LΦ (p) Ep [Φ(u)] ≤ 1 is the closed unit ball of a Banach space with norm u p = inf ρ > 0 Ep Φ u ρ ≤ 1 . 3. u p = 1 if either Ep [Φ(u)] = 1 or Ep [Φ(u)] < 1 and Ep Φ u ρ = ∞ for ρ < 1. If u p > 1 then u p ≤ Ep [Φ(u)]. In particular, lim u p→∞ Ep [Φ (u)] = ∞. Example: boolean state space • In the case of a ﬁnite state space, the moment generating function is ﬁnite everywhere, but its computation can be challenging. • Boolean case: Ω = {+1, −1} n , uniform density p(x) = 2−n , x ∈ Ω. A generic real function on Ω has the form u(x) = α∈L ˆu(α)xα , with L = {0, 1} n , xα = n i=1 xαi i , ˆu(α) = 2−n x∈Ω u(x)xα . • The moment generating function of u under the uniform density p is Ep etu = B∈B(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)), where B(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2. • Ep [Φ(tu)] = B∈B0(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)) − 1, where B0(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2 and α∈Supp ˆu α = 0. Example : the sphere is not smooth in general • p(x) ∝ (a + x)−3 2 e−x , x, a > 0. • For the random variable u(x) = x, the function Ep [Φ(αu)] = 1 ea Γ −1 2, a ∞ 0 (a+x)−3 2 e−(1−α)x + e−(1+α)x 2 dx−1 is convex lower semi-continuous on α ∈ R, ﬁnite for α ∈ [−1, 1], inﬁnite otherwise, hence not smooth. −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 \alpha E_p(\Phi(\alphau) q q Isomorphism of LΦ spaces Theorem LΦ (p) = LΦ (q) as Banach spaces if p1−θ qθ dµ is ﬁnite on an open neighborhood I of [0, 1]. It is an equivalence relation p q and we denote by E(p) the class containing p. The two spaces have equivalent norms Proof. Assume u ∈ LΦ (p) and consider the convex function C : (s, θ) → esu p1−θ qθ dµ. The restriction s → C(s, 0) = esu p dµ is ﬁnite on an open neighborhood Jp of 0; the restriction θ → C(0, θ) = p1−θ qθ dµ is ﬁnite on the open set I ⊃ [0, 1]. hence, there exists an open interval Jq 0 where s → C(s, 1) = esu q dµ is ﬁnite. q q J_p J_q I e-charts Deﬁnition (e-chart) For each p ∈ P>, consider the chart sp : E(p) → LΦ 0 (p) by q → sp(q) = log q p + D(p q) = log q p − Ep log q p For u ∈ LΦ 0 (p) let Kp(u) = ln Ep [eu ] the cumulant generating function of u and let Sp the interior of the proper domain. Deﬁne ep : Sp u → eu−Kp(u) · p ep ◦ sp is the identity on E(p) and sp ◦ ep is the identity on Sp. Theorem (Exponential manifold) {sp : E (p)|p ∈ P>} is an aﬃne atlas on P>. Cumulant functional • The divergence q → D(p q) is represented in the chart centered at p by Kp(u) = log Ep [eu ], where q = eu−Kp(u) · p, u ∈ Bp = LΦ 0 (p). • Kp : Bp → R≥ ∪ {+∞} is convex and its proper domain Dom (Kp) contains the open unit ball of Tp. • Kp is inﬁnitely Gˆateaux-diﬀerentiable on the interior Sp of its proper domain and analytic on the unit ball of Bp. • For all v, v1, v2, v3 ∈ Bp the ﬁrst derivatives are: d Kpuv = Eq [v] d2 Kpu(v1, v2) = Covq (v1, v2) d3 Kpu(v1, v2, v3) = Covq(v1, v2, v3) Change of coordinate The following statements are equivalent: 1. q ∈ E (p); 2. p q; 3. E (p) = E (q); 4. ln q p ∈ LΦ (p) ∩ LΦ (q). 1. If p, q ∈ E(p) = E(q), the change of coordinate sq ◦ ep(u) = u − Eq [u] + ln p q − Eq ln p q is the restriction of an aﬃne continuous mapping. 2. u → u − Eq [u] is an aﬃne transport from Bp = LΦ 0 (p) unto Bq = LΦ 0 (q). Summary p q =⇒ E (p) sp // Sp sq◦s−1 p  I // Bp d(sq◦s−1 p )  I // LΦ (p) E (q) sq // Sq I // Bq I // LΦ (q) • If p q, then E (p) = E (q) and LΦ (p) = LΦ (q). • Bp = LΦ 0 (p), Bq = LΦ 0 (q) • Sp = Sq and sq ◦ s−1 p : Sp → Sq is aﬃne sq ◦ s−1 p (u) = u − Eq [u] + ln p q − Eq ln p q • The tangent application is d(sq ◦ s−1 p )(v) = v − Eq [v] (does not depend on p) Duality Young pair (N–function) • φ−1 = φ∗, • Φ(x) = |x| 0 φ(u) du • Φ∗(y) = |y| 0 φ∗(v) dv • |xy| ≤ Φ(x) + Φ∗(y) 0 1 2 3 4 5 050100150 v phi φ∗(u) φ(v) Φ∗(x) Φ(y) ln (1 + u) ev − 1 (1 + |x|) ln (1 + |x|) − |x| e|y| − 1 − |y| sinh−1 u sinh v |x| sinh−1 |x| − √ 1 + x2 + 1 cosh y − 1 • LΦ∗ (p) × LΦ (p) (v, u) → u, v p = Ep [uv] • u, v p ≤ 2 u Φ∗,p v Φ,p • (LΦ∗ (p)) = LΦ (p) because Φ∗(ax) ≤ a2 Φ∗(x) if a > 1 (∆2). m-charts For each p ∈ P>, consider a second type of chart on f ∈ P1 : ηp : f → ηp(f ) = f p − 1 Deﬁnition (Mixture manifold) The chart is deﬁned for all f ∈ P1 such that f /p − 1 belongs to ∗ Bp = LΦ+ 0 (p). The atlas (ηp : ∗ E (p)), p ∈ P> deﬁnes a manifold on P1 . If the sample space is not ﬁnite, such a map does not deﬁne charts on P>, nor on P≥. Example: N(µ, Σ), det Σ = 0 I G = (2π)− n 2 (det Σ)− 1 2 exp − 1 2 (x − µ)T Σ−1 (x − µ) µ ∈ Rn , Σ ∈ Symn + . ln f (x) f0(x) = − 1 2 ln (det Σ) − 1 2 (x − µ)T Σ−1 (x − µ) + 1 2 xT x = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) Ef0 ln f f0 = 1 2 (n − Tr Σ−1 ) − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) u(x) = ln f (x) f0(x) − Ef0 ln f f0 = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 (n − Tr Σ−1 ) Kf0 (u) = − 1 2 (n − Tr Σ−1 ) + 1 2 µT Σ−1 µ + 1 2 ln (det Σ) Example: N(µ, Σ), det Σ = 0 II G as a sub-manifold of P> G = x → eu(x)−K(u) f0(x) u ∈ H1,2 ∩ Sf0 • H1,2 is the Hemite space of total degree 1 and 2, that is the vector space generated by the Hermite polynomials X1, . . . , Xn, (X2 1 − 1), . . . , (X2 n − 1), X1X2, . . . , Xn−1Xn • If the matrix S, Sii = βii − 1 2 , Sij

### ORAL SESSION 4 Relational Metric (Jean-François Marcotorchino)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

A general framework for comparing heterogeneous binary relations Julien Ah-Pine (julien.ah-pine@eric.univ-lyon2.fr) University of Lyon - ERIC Lab GSI 2013 Paris 28/08/2013 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 1 Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 2 Introduction Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 3 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reﬂexive, symmetric and transitive BR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reﬂexive, symmetric and transitive BR. Order Relations (OR) are of diﬀerent types : preorders, partial orders and total (or linear or complete) orders. If ties and missing values : preorders (reﬂexive, transitive BR) If no tie but missing values : partial orders (reﬂexive, antisymmetric, transitive BR) If no tie and no missing value : total orders (reﬂexive, antisymmetric, transitive and complete BR) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green X is the ER “has the same color of eyes than” and can be represented by a graph and its adjacency matrix (AM) denoted X such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X =       a b c d e a 1 1 . . . b 1 1 . . . c . . 1 1 . d . . 1 1 . e . . . . 1       J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Order Relations and quantitative variables OR are related to quantitative variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 X is the OR “has a lower rank than” and its AM X is again such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X =       a b c d e a 1 1 1 1 1 b . 1 1 1 1 c . . 1 . 1 d . . 1 1 1 e . . . . 1       J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have diﬀerent number of categories and diﬀerent distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have diﬀerent number of categories and diﬀerent distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) When OR are of diﬀerent types ? For example : x = (1, 2, 4, 3, 5) ; y = (1, 1, 1, 4, 5) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Kendall’s general coeﬃcient Γ Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 8 Kendall’s general coeﬃcient Γ Kendall’s Γ coeﬃcient In statistics, Kendall in [Kendall(1948)] proposed a general correlation coeﬃcient in order to deﬁne a broad family of association measures between x and y : Γ(x, y) = i,j Xij Yij i,j X2 ij i,j Y2 ij (1) where X and Y are two square matrices derived from x and y. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 9 Kendall’s general coeﬃcient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Kendall’s general coeﬃcient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. Association measure Xij Kendall’s τa 1 if xi < xj −1 if xi > xj Spearman’s ρa Xij = xi − xj Table: Particular cases of Γ as for OR For Spearman’s ρa, xi is the rank of item i. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Another view of Kendall’s Γ Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 11 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more speciﬁcally called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more speciﬁcally called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. For instance, the relational properties of X can be expressed as linear equations of X : reﬂexivity, ∀i (Xii = 1) ; symmetry, ∀i, j (Xij − Xji = 0) ; antisymmetry, ∀i, j (Xij + Xji ≤ 1) ; complete (or total), ∀i = j (Xij + Xji ≥ 1) ; transitivity, ∀i, j, k (Xij + Xjk − Xik ≤ 1). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. 4 We explain the weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 14 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. If x is a categorical variable then X = X (the complement relation) : Xij

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

On Prime-valent Symmetric Bicirculants and Cayley Snarks Klavdija Kutnar University of Primorska Paris, 2013 Joint work with Ademir Hujdurovi´c and Dragan Maruˇsiˇc. Snarks A snark is a connected, bridgeless cubic graph with chromatic index equal to 4. non-snark = bridgeless cubic 3-edge colorable graph The Petersen graph is a snark Blanuˇsa Snarks Not vertex-transitive. Vertex-transitive graph An automorphism of a graph X = (V , E) is an isomorphism of X with itself. Thus each automorphism α of X is a permutation of the vertex set V which preserves adjacency. A graph is vertex-transitive if its automorphism group acts transitively on vertices. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cay(G, S) is connected if and only if G = S . Example The Cayley graph Cay(Z7, {±1, ±2}) on the left-hand side and the Petersen graph on the right-hand side. Snarks Any other snarks amongst vertex-transitive graphs, in particular Cayley graphs? Snarks Nedela, ˇSkoviera, Combin., 2001 If there exists a Cayley snark, then there is a Cayley snark Cay(G, {a, x, x−1}) where x has odd order, a2 = 1, and G = a, x is either a non-abelian simple group, or G has a unique non-trivial proper normal subgroup H which is either simple non-abelian or the direct product of two isomorphic non-abelian simple groups, and |G : H| = 2. Potoˇcnik, JCTB, 2004 The Petersen graph is the only vertex-transitive snark containing a solvable transitive subgroup of automorphisms. Snarks The hunting for vertex-transitive/Cayley snarks is essentially a special case of the Lovasz question regarding hamiltonian paths/cycles. Existence of a hamiltonian cycle implies that the graph is 3-edge colorable, and thus a non-snark. Hamiltonicity problem is hard, the snark problem is hard too, but should be easier to deal with. The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. See next slides. Partial results for Type 3 graphs A (2, s, t)-generated group is a group G = a, x | a2

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### ORAL SESSION 5 Discrete Metric Spaces (Michel Deza)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Introduction The Main Result Counting the number of solutions of K DMDGP instances Leo Liberti1,2, Carlile Lavor3, Jorge Alencar3, Germano Abud3 1IBM “T.J. Watson” Research Center, Yorktown Heights, 10598 NY, USA 2LIX, Ecole Polytechnique, 91128 Palaiseau, France 3Dept. of Applied Math. , University of Campinas, Campinas – SP, Brazil Introduction The Main Result Contents 1 Introduction Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections 2 The Main Result Counting Incongruent Realizations Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Distance Geometry Problem (DGP) Given an integer K > 0 and an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, is there a function x : V → RK such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? In other words: ﬁnd an embedding (or a realization) of G in RK , such that the Euclidean distances in RK match the given edge weights. NP-complete, for K = 1. Strongly NP-hard, for K > 1. Important case: K = 3. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Discretizable Molecular DGP (DMDGP) Given an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, and such that there is there is an order v1, v2, . . . , vn of V satisfying (a) ∀i ∈ {4, . . . , n}, ∀j, k ∈ {i − 3, . . . , i} : {vj , vk} ∈ E (b) ∀i ∈ {2, . . . , n − 1}, d(vi−1, vi+1) < d(vi−1, vi ) + d(vi , vi+1), is there an embedding x : V → R3 such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections For each vertex vi (with i ≥ 4), if we know the realizations of its 3 immediate predecessors, as well as their distances to vi , then there are, with probability 1, two possible positions for vi . S2 (i − 3, di−3,i ) ∩ S2 (i − 2, di−2,i ) i − 3 i − 2 i i − 1 di−3,i−2 di−2,i−1 θi−3,i−1 i di−3,i di−3,i θi−2,i Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Let ED = {{v, v − j} : j ∈ {1, . . . , K}}, EP = E \ ED and m = |E|. A discrete search is possible (Branch-and-Prune). Let X the set of all realizations. Our goal is to determine the cardinality of X. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Incongruence Two sets in RK are congruent if there is a sequence of translations, rotations and reﬂections that turns one into the other. X is partially correct in this respect: there is a “four level symmetry” . Half of the realizations in X are reﬂections of the other, along the plane through the ﬁrst K vertices. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Probability 1 The theory supporting BP algorithm is based on d satisfying strict simplex inequalities. Intersection of K spheres in RK might have uncountable cardinality, or be a singleton set. We have manifolds of Lebesgue measure zero in RK . The probability of uniformly sampling d such that it yields a YES K DMDGP instance satisfying the strict simplex inequalities is 1. We state most of our results “with probability 1”. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Partial Reﬂections For x ∈ X and K < v ∈ V let Rv x be the reﬂection along the hyperplane through xv−K , . . . , xv−1. The partial reﬂection operator with respect to x is: gv (x) = (x1, . . . , xv−1, Rv x (xv ), Rv x (xv+1), . . . , Rv x (xn)). Figure: The action of the reﬂection Rv x in RK . Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Partial Reﬂections For v > u > K and x ∈ X, we deﬁne a product between partial reﬂections: gugv (x) = gu(gv (x)) = gu(x1, . . . , xv?1, Rv x (xv ), . . . , Rv x (xn)) = = (x1, . . . , xu−1, Ru x (xu), . . . , Ru x (xv−1), Ru gv (x)(xv ), . . . , Ru gv (x)(xn)). Let ΓD = {gv : v > K} and GD = ΓD the invariant group of the set of realizations XD (found by the BP) of GD = (V , ED). Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Assuming EP = ∅ Let {u, w} ∈ EP and deﬁne Suw = {u + K + 1, . . . , w} (u < w). GP is the subgroup of GD generated by ΓP = {gv : v > K ∧ ∀{u, w} ∈ EP(v /∈ Suw )}. 5 1 2 3 4 3 1 45 2 Figure: On the left: the set XD. On the right: the eﬀect of the pruning edge {1, 4} on XD. Introduction The Main Result Counting Incongruent Realizations Counting Incongruent Realizations There is a integer such that |X| = 2 with probability 1. We can easily reﬁne this: Proposition With probability 1, |X| = 2ΓP . Proof. GD ∼= Cn−K 2 , so that |GD = 2n−K |. Since GP GD, |GP| divides GD. But |GP| = 2|ΓP |. The action of GP on X has only one orbit: GPx = X, ∀x ∈ X. Every partial reﬂection operator is idempotent. Thus, gx = g x implies g g = 1 whence g = g and |GPx| = |GP|. For any x ∈ X, |X| = |GPx| = |GP| = 2|ΓP |. Introduction The Main Result Counting Incongruent Realizations Thank you !

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization Orders for Distance Geometry Discretizable DG group Carlile Lavor, Leo Liberti, Nelson Maculan, Antonio Mucherino GSI13 Paris, France August 28th 2013 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Distance Geometry Problem (for molecular conformations) Let G = (V, E, d) be a simple weighted undirected graph, where V the set of vertices of G − it is the set of atoms; E the set of edges of G − it is the set of known distances; E′ ⊂ E the subset of E where distances are exact; d the weights associated to the edges of G the numerical value of each weight corresponds to the known distance; it can be an interval. Deﬁnition The DGP. Determine whether there exists a function x : V −→ ℜK for which, for all edges (u, v) ∈ E, ||xu − xv || = d(u, v). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Sphere intersections In the 3-dimensional space, the intersection of 2 spheres gives one circle 3 spheres gives two points 2 spheres and 1 spherical shell gives two disjont curves Spheres and spherical shells can be centered in known vertex positions, while their radii are related to the distance information. All this is true with probability 1: the reference vertices cannot be aligned, the strict triangular inequality needs to be satisﬁed. Generalization to any dimension K: the volume of the (K − 1)-simplex deﬁned by the reference vertices needs to be strictly positive (simplex inequalities). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Branch & Prune algorithm The Branch & Prune (BP) algorithm is based on the idea of branching over all possible positions for each vertex, and of pruning by using additional information not used in the discretization process. In this tree, it is supposed that all available distances are exact. D sample (exact) distances can be taken from interval distances. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Importance of orders The deﬁnition of an order on the vertices in V allows us to ensure that vertex coordinates are available when needed. Given 1 a simple weighted undirected graph G = (V, E, d) 2 a vertex v ∈ V how to identify K vertices wi , with i = 1, 2, . . . , K, for which the coordinates of every wi are available every edge (wi , v) ∈ E ???? We refer to wi as a reference vertex for v (wi , v) as a reference distance for v Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Deﬁnition of order Deﬁnition An order for V is a sequence r : N → V ∪ {0} with length |r| ∈ N (for which ri = 0 for all i > |r|) such that, for each v ∈ V, there is an index i ∈ N for which ri = v. Some facts about orders: they allow for vertex repetitions (|r| ≥ |V|); however, each vertex can be used as a reference only once; simplex inequalities (generally satisﬁed with probability 1) would not be satisﬁed if the same vertex were used twice as a reference. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Counting the reference vertices Let r be an order for V. Let us consider the following counters. α(ri ): counter of adjacent predecessors of ri; β(ri ): counter of adjacent successors of ri; αex (ri ): counter of adjacent predecessors of ri related to an exact distance. Necessary condition for V to admit a discretization order is that, for any order r on V, ∀i ∈ {1, 2, . . . , |r|}, α(ri ) + β(ri ) ≥ K. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders We refer to any order r that allows for the discretization of the DGP as Discretization order Two big families: Discretization orders with consecutive reference vertices for each ri , reference vertices always immediately precede ri in the order Discretization orders without consecutive reference vertices any vertex with rank < i can be a reference for ri Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders For K = 3. the reference vertices for ri are searched only in the “window” [ri−K , . . . , ri−1] the simplex inequality must be satisﬁed on the window. the reference vertices for ri are in the “big window” [r1, . . . , ri−1] the simplex inequality must be satisﬁed for the K reference vertices inside the big window. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The ordering problem Deﬁnition Given a simple weighted undirected graph G = (V, E, d) and a positive integer K, establish whether there exists an order r such that: (a) GC = (VC , EC) ≡ G[{r1, r2, . . . , rK }] is a clique and EC ⊂ E′ ; (b) ∀i ∈ {K + 1, . . . , |r|}, α(ri ) ≥ K and αex (ri ) ≥ K − 1. Remarks: this problem is NP-complete when K is not ﬁxed no consecutivity assumption: solvable in polynomial time when K is known when dealing with proteins, K = 3 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . A greedy algorithm 0: reorder(G) while (a valid order r is not found yet) do let i = 0; ﬁnd a K-clique C in G with exact distances; // position C at the beginning of new order for (all vertices v in C) do let i = i + 1; let ri = v; end for // greedy search while (V is not covered) do v = arg max{α(u) | ∃j ≤ i : rj = u and αex (u) ≥ K − 1}; if (α(v) < K) then break the inner loop: there are no possible orderings for C; end if // adding the vertex to the order let i = i + 1; let ri = v; end while end while return r; Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . An order for the protein backbone This order was automatically obtained by the greedy algorithm. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments NMR-like instances are considered in these experiments, including protein backbones and side chains. PDB name naa n |E| D LDE 1brv 4 51 368 3 2.10e-4 1brv 8 98 853 9 5.88e-4 1ccq 6 114 1181 3 1.16e-4 1ccq 10 183 2169 8 1.63e-4 1acz 6 94 929 3 1.63e-4 1acz 13 199 2144 3 1.95e-4 1acz 21 308 3358 10 4.93e-4 1k1v 6 110 1236 3 3.04e-4 1k1v 18 317 4169 3 3.66e-4 1k1v 30 519 7068 3 5.63e-4 All instances were automatically reordered by the greedy algorithm, and the BP algorithm was invoked for ﬁnding one solution. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments Two solutions obtained during the experiments. On the left, a 4-amino acid fragment of 1brv; on the right, a 18-amino acid fragment of 1k1v. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Work in progress . . . Can we ﬁnd orders that help BP in ﬁnding solutions? minimize in length the subsequences in the order having no pruning distances (for proteins) maximize the interval distances that are related to pairs of hydrogen atoms . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Other work in progress . . . identify clusters of solutions in BP solution sets ﬁnd a way for avoiding discretizing the intervals improve and tailor the parallel versions of BP to interval data (for proteins) use real NMR data and compare our results to what is currently available on the PDB . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Thanks!

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Tessellabilities, Reversibilities, and Decomposabilities of Polytopes ― A Survey ― École nationale supérieure des mines de Paris Paris, August 28, 2013 Jin Akiyama： Tokyo University of Science Ikuro Sato: Miyagi Cancer Center Hyunwoo Seong: The University of Tokyo Tokyo, Japan 1. P1-TILES AND P2-TILES 2 3 A P1-tile is a polygon which tiles the plane with translations only. Two families of convex P1-tiles : (1) parallelograms and (2) hexagons with three pairs of opposite sides parallel and of the same lengths (P1-hexagons). Parallelogram P1-hexagon Parallelepiped(PP) Rhombic Dodecahedron(RD) Hexagonal Prism(HP) Elongated Rhombic Dodecahedron(ERD) Truncated Octahedron(TO) 4 F1 F2 F3 F4 F5 A 3-dimensional P1-tile is a polyhedron which tiles the space with translations only. Five families of convex 3-dimensional P1-tiles (Fedorov) : 10 Triangle Quadrilateral P2-pentagon (BC∥ED) P2-hexagon (QPH) (AB∥ED and |AB|=|ED|) Theorem A Every convex P2-tile belongs to one of the following four families: F1 F2 F3 F4 A P2-tile is a polygon which tiles the plane by translations and 180° rotations only. 11 Determine all convex 3-dimensional P2-tiles, i.e., convex polyhedra each of which tiles the space in P2-manner. (cf) triangular prism, … A net of a convex polyhedron P is defined to be a connected planar object obtained by cutting the surface of P. An ART (almost regular tetrahedron) is a tetrahedron with four congruent faces. CG Theorem B (J.A(2007)) Every net (convex or concave) of an ART tiles the plane in P2-manner. Artworks Artworks Artworks 2. REVERSIBILITY 17 18 Volvox, a kind of green alga known as one of the most simple colonial (≒ multicellular) organisms, reproduces itself by reversing its interior offspring and its surface. Theorem C (J.A. (2007)) If a pair of polygons A and B is reversible, then each of them tiles the plane by translations and 180°rotations only (P2- tiling). 19 A ： red quadrilateral, B： blue triangle CG CG 20 Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem D ( J.A., I. Sato, H. Seong (2013)) For an arbitrary convex P2-tile P and an arbitrary family Fi (i= 1, 2, 3, and 4) of convex P2-tiles, there exists a polygon Q ∈ Fi such that the pair P and Q is reversible. A king in a cage 22 Spider ⇔ Geisha 23 A 3-dimensional P1-tile is said to be canonical if it is convex and symmetric with respect to each orthogonal axis. F5F4F3F2F1 24 UFO ⇔ Alien CG Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem E ( J.A., I. Sato, H. Seong (2011)) For an arbitrary canonical 3-dimensional P1-tile P and an arbitrary family Fi (i= 1, 2, 3, 4, and 5) of canonical 3- dimensional P1-tiles, there exists a polyhedron Q ∈ Fi such that the pair P and Q is reversible. Cube -> Hexagonal Prism 25 CG Hexagonal Prism -> Truncated Octahedron Rhombic Dodecahedron -> Elongated Rhombic Dodecahedron CG CG 3. TILINGS AND ATOMS 26 2 2 2 6 6 2 6 2 2 2 4 4 3 2 23 A symmetric pair of pentadra Pentadron is a convex pentahedron whose net is as follows: 28 Tetrapak is a special kind of ART(tetrahedron with four congruent faces) made by pentadra as follows: 29 Theorem F (J.A.) A tetrapak tiles the space and its net tiles the plane. Problem Determine all convex polyhedra, each of which tiles the space and one of its nets tiles the plane. Theorem G (J.A, G.Nakamura, I.Sato (2012)) Every convex 3-dimensional P1-tile (or its affine- stretching transform) can be constructed by copies of a pentadron. 31 Cube Hexagonal prism 32 33 Truncated octahedron Rhombic dodecahedron 35 Elongated rhombic dodecahedron

### ORAL SESSION 6 Computational Information Geometry (Frank Nielsen)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

A new implementation of k-MLE for mixture modelling of Wishart distributions Christophe Saint-Jean Frank Nielsen Geometric Science of Information 2013 August 28, 2013 - Mines Paris Tech Application Context (1) 2/31 We are interested in clustering varying-length sets of multivariate observations of same dim. p. X1 =   3.6 0.05 −4. 3.6 0.05 −4. 3.6 0.05 −4.   , . . . , XN =       5.3 −0.5 2.5 3.6 0.5 3.5 1.6 −0.5 4.6 −1.6 0.5 5.1 −2.9 −0.5 6.1       Sample mean is a good but not discriminative enough feature. Second order cross-product matrices tXi Xi may capture some relations between (column) variables. Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Outline of this talk 4/31 1 MLE and Wishart Distribution Exponential Family and Maximum Likehood Estimate Wishart Distribution Two sub-families of the Wishart Distribution 2 Mixture modeling with k-MLE Original k-MLE k-MLE for Wishart distributions Heuristics for the initialization 3 Application to motion retrieval Reminder : Exponential Family (EF) 5/31 An exponential family is a set of parametric probability distributions EF = {p(x; λ) = pF (x; θ) = exp { t(x), θ + k(x) − F(θ)|θ ∈ Θ} Terminology: λ source parameters. θ natural parameters. t(x) suﬃcient statistic. k(x) auxiliary carrier measure. F(θ) the log-normalizer: diﬀerentiable, strictly convex Θ = {θ ∈ RD|F(θ) < ∞} is an open convex set Almost all commonly used distributions are EF members but uniform, Cauchy distributions. Reminder : Maximum Likehood Estimate (MLE) 6/31 Maximum Likehood Estimate principle is a very common approach for ﬁtting parameters of a distribution ˆθ = argmax θ L(θ; χ) = argmax θ N i=1 p(xi ; θ) = argmin θ − 1 N N i=1 log p(xi ; θ) assuming a sample χ = {x1, x2, ..., xN} of i.i.d observations. Log density have a convenient expression for EF members log pF (x; θ) = t(x), θ + k(x) − F(θ) It follows ˆθ = argmax θ N i=1 log pF (xi ; θ) = argmax θ N i=1 t(xi ), θ − NF(θ) MLE with EF 7/31 Since F is a strictly convex, diﬀerentiable function, MLE exists and is unique : F(ˆθ) = 1 N N i=1 t(xi ) Ideally, we have a closed form : ˆθ = F−1 1 N N i=1 t(xi ) Numerical methods including Newton-Raphson can be successfully applied. Wishart Distribution 8/31 Deﬁnition (Central Wishart distribution) Wishart distribution characterizes empirical covariance matrices for zero-mean gaussian samples: Wd (X; n, S) = |X| n−d−1 2 exp − 1 2tr(S−1X) 2 nd 2 |S| n 2 Γd n 2 where for x > 0, Γd (x) = π d(d−1) 4 d j=1 Γ x − j−1 2 is the multivariate gamma function. Remarks : n > d − 1, E[X] = nS The multivariate generalization of the chi-square distribution. Wishart Distribution as an EF 9/31 It’s an exponential family: log Wd (X; θn, θS ) = < θn, log |X| >R + < θS , − 1 2 X >HS + k(X) − F(θn, θS ) with k(X) = 0 and (θn, θS ) = ( n − d − 1 2 , S−1 ), t(X) = (log |X|, − 1 2 X), F(θn, θS ) = θn + (d + 1) 2 (d log(2) − log |θS |)+log Γd θn + (d + 1) 2 MLE for Wishart Distribution 10/31 In the case of the Wishart distribution, a closed form would be obtained by solving the following system ˆθ = F−1 1 N N i=1 t(xi ) ≡    d log(2) − log |θS | + Ψd θn + (d+1) 2 = ηn − θn + (d+1) 2 θ−1 S = ηS (1) with ηn and ηS the expectation parameters and Ψd the derivative of the log Γd . Unfortunately, no closed-form solution is known. Two sub-families of the Wishart Distribution (1) 11/31 Case n ﬁxed (n = 2θn + d + 1) Fn(θS ) = nd 2 log(2) − n 2 log |θS | + log Γd n 2 kn(X) = n − d − 1 2 log |X| Case S ﬁxed (S = θ−1 S ) FS (θn) = θn + d + 1 2 log |2S| + log Γd θn + d + 1 2 kS (X) = − 1 2 tr(S−1 X) Two sub-families of the Wishart Distribution (2) 12/31 Both are exponential families and MLE equations are solvable ! Case n ﬁxed: − n 2 ˆθ−1 S = 1 N N i=1 − 1 2 Xi =⇒ ˆθS = Nn N i=1 Xi −1 (2) Case S ﬁxed : ˆθn = Ψ−1 d 1 N N i=1 log |Xi | − log |2S| − d + 1 2 , ˆθn > 0 (3) with Ψ−1 d the functional reciprocal of Ψd . An iterative estimator for the Wishart Distribution 13/31 Algorithm 1: An estimator for parameters of the Wishart Input: A sample X1, X2, . . . , XN of Sd ++ Output: Final values of ˆθn and ˆθS Initialize ˆθn with some value > 0; repeat Update ˆθS using Eq. 2 with n = 2ˆθn + d + 1; Update ˆθn using Eq. 3 with S the inverse matrix of ˆθS ; until convergence of the likelihood; Questions and open problems 14/31 From a sample of Wishart matrices, distr. parameters are recovered in few iterations. Major question : do you have a MLE ? probably ... Minor question : sample size N = 1 ? Under-determined system Regularization by sampling around X1 Mixture Models (MM) 15/31 A additive (ﬁnite) mixture is a ﬂexible tool to model a more complex distribution m: m(x) = k j=1 wj pj (x), 0 ≤ wj ≤ 1, k j=1 wj = 1 where pj are the component distributions of the mixture, wj the mixing proportions. In our case, we consider pj as member of some parametric family (EF) m(x; Ψ) = k j=1 wj pFj (x; θj ) with Ψ = (w1, w2, ..., wk−1, θ1, θ2, ..., θk) Expectation-Maximization is not fast enough [5] ... Original k-MLE (primal form.) in one slide 16/31 Algorithm 2: k-MLE Input: A sample χ = {x1, x2, ..., xN}, F1, F2, ..., Fk Bregman generator Output: Estimate ˆΨ of mixture parameters A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argmaxj log ˆwj pFj (xi ; ˆθj ); foreach Cj := {xi ∈ χ|zi = j} do ˆθj = MLEFj (Cj ); until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE’s properties 17/31 Another formulation comes with the connection between EF and Bregman divergences [3]: log pF (x; θ) = −BF∗ (t(x) : η) + F∗ (t(x)) + k(x) Bregman divergence BF (. : .) associated to a strictly convex and diﬀerentiable function F : Original k-MLE (dual form.) in one slide 18/31 Algorithm 3: k-MLE Input: A sample χ = {y1 = t(x1), y2 = x2, ..., yn = t(xN)}, F∗ 1 , F∗ 2 , ..., F∗ k Bregman generator Output: ˆΨ = (ˆw1, ˆw2, ..., ˆwk−1, ˆθ1 = F∗(ˆη1), ..., ˆθk = F∗(ˆηk)) A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argminj BF∗ j (yi : ˆηj ) − log ˆwj ; foreach Cj := {xi ∈ χ|zi = j} do ˆηj = xi ∈Cj yi /|Cj | until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE for Wishart distributions 19/31 Practical considerations impose modiﬁcations of the algorithm: During the assignment empty clusters may appear (High dimensional data get this worse). A possible solution is to consider Hartigan and Wang’s strategy [6] instead of Lloyd’s strategy: Optimally transfer one observation at a time Update the parameters of involved clusters. Stop when no transfer is possible. This should guarantees non-empty clusters [7] but does not work when considering weighted clusters... Get back to an“old school”criterion : |Czi | > 1 Experimentally shown to perform better in high dimension than the Lloyd’s strategy. k-MLE - Hartigan and Wang 20/31 Criterion for potential transfer (Max): log ˆwzi pFzi (xi ; ˆθzi ) log ˆwz∗ i pFz∗ i (xi ; ˆθzi ∗ ) < 1 with z∗ i = argmaxj log ˆwj pFj (xi ; ˆθj ) Update rules : ˆθzi = MLEFj (Czi \{xi }) ˆθz∗ i = MLEFj (Cz∗ i ∪ {xi }) OR Criterion for potential transfer (Min): BF∗ (yi : ηz∗ i ) − log wz∗ i BF∗ (yi : ηzi ) − log wzi < 1 with z∗ i = argminj (BF∗ (yi : ηj ) − log wj ) Update rules : ηzi = |Czi |ηzi − yi |Czi | − 1 ηz∗ i = |Cz∗ i |ηz∗ i + yi |Cz∗ i | + 1 Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Fast and greedy approximation : Θ(kN) Probabilistic guarantee of good initialization: OPTF ≤ k-meansF ≤ O(log k)OPTF Dual Bregman divergence BF∗ may replace the square distance Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” It may overestimate the number of clusters... Initialization with DP-k-MLE++ 23/31 Algorithm 4: DP-k-MLE++ Input: A sample y1 = t(X1), . . . , yN = t(XN), F , λ > 0 Output: C a subset of y1, . . . , yN, k the number of clusters Choose ﬁrst seed C = {yj }, for j uniformly random in {1, 2, . . . , N}; repeat foreach yi do compute pi = BF∗ (yi : C)/ N i =1 BF∗ (yi : C) where BF∗ (yi : C) = minc∈CBF∗ (yi : c) ; if ∃pi > λ then Choose next seed s among y1, y2, . . . , yN with prob. pi ; Add selected seed to C : C = C ∪ {s} ; until all pi ≤ λ; k = |C|; Motion capture 24/31 Real dataset: Motion capture of contemporary dancers (15 sensors in 3d). Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Remark: Size of each sub-motion is known (so its θn) Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Mixture parameters can be viewed as a sparse representation of local dynamics in Xi . Application to motion retrieval(2) 26/31 Comparing two movements amounts to compute a dissimilarity measure between ˆΨi and ˆΨj . Remark 1 : with DP-k-MLE++, the two mixtures would not probably have the same number of components. Remark 2 : when both mixtures have one component, a natural choice is KL(Wd (.; ˆθ)||Wd (.; ˆθ )) = BF∗ (ˆη : ˆη ) = BF (ˆθ : ˆθ) A closed form is always available ! No closed form exists for KL divergence between general mixtures. Application to motion retrieval(3) 27/31 A possible solution is to use the CS divergence [10]: CS(m : m ) = − log m(x)m (x)dx m(x)2dx m (x)2dx It has a analytic formula for m(x)m (x)dx = k j=1 k j =1 wj wj exp F(θj +θj )−(F(θj )+F(θj )) Note that this expression is well deﬁned since natural parameter space Θ = R+ ∗ × Sp ++ is a convex cone. Implementation 28/31 Early speciﬁc code in MatlabTM. Today implementation in Python (based on pyMEF [2]) Ongoing proof of concept (with Herranz F., Beuriv´e A.) Conclusions - Future works 29/31 Still some mathematical work to be done: Solve MLE equations to get F∗ = ( F)−1 then F∗ Characterize our estimator for full Wishart distribution. Complete and validate the prototype of system for motion retrieval. Speeding-up algorithm: computational/numerical/algorithmic tricks. library for bregman divergences learning ? Possible extensions: Reintroduce mean vector in the model : Gaussian-Wishart Online k-means -> online k-MLE ... References I 30/31 Nielsen, F.: k-MLE: A fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing. (2012) pp. 869–872 Schwander, O. and Nielsen, F. pyMEF - A framework for Exponential Families in Python in Proceedings of the 2011 IEEE Workshop on Statistical Signal Processing Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J. Clustering with bregman divergences. Journal of Machine Learning Research (6) (2005) 1705–1749 Nielsen, F., Garcia, V.: Statistical exponential families: A digest with ﬂash cards. http://arxiv.org/abs/0911.4863 (11 2009) Hidot, S., Saint Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering. Pattern Recognition Letters 31(14) (2010) 2318–2324 References II 31/31 Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1) (1979) 100–108 Telgarsky, M., Vattani, A.: Hartigan’s method: k-means clustering without Voronoi. In: Proc. of International Conference on Artiﬁcial Intelligence and Statistics (AISTATS). (2010) pp. 820–827 Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (2007) pp. 1027–1035 Kulis, B., Jordan, M.I.: Revisiting k-means: New algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML). (2012) Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR). (2012) pp. 1723–1726

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Hypothesis testing, information divergence and computational geometry Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. August 2013, GSI, Paris, FR c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/20 The Multiple Hypothesis Testing (MHT) problem Given a rv. X with n hypothesis H1 : X ∼ P1, ..., Hn : X ∼ Pn, decide for a IID sample x1, ..., xm ∼ X which hypothesis holds true? Pm correct = 1 − Pm error Asymptotic regime: α = − 1 m log Pm e , m → ∞ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/20 Bayesian hypothesis testing (preliminaries) prior probabilities: wi = Pr(X ∼ Pi ) > 0 (with n i=1 wi = 1) conditional probabilities: Pr(X = x|X ∼ Pi ). Pr(X = x) = n i=1 Pr(X ∼ Pi )Pr(X = x|X ∼ Pi ) = n i=1 wi Pr(X|Pi ) Let ci,j = cost of deciding Hi when in fact Hj is true. Matrix [cij ]= cost design matrix Let pi,j(u) = probability of making this decision using rule u. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/20 Bayesian detector Minimize the expected cost: EX [c(r(x))], c(r(x)) = i  wi j=i ci,jpi,j(r(x))   Special case: Probability of error Pe obtained for ci,i = 0 and ci,j = 1 for i = j: Pe = EX   i  wi j=i pi,j(r(x))     The maximum a posteriori probability (MAP) rule considers classifying x: MAP(x) = argmaxi∈{1,...,n} wi pi (x) where pi (x) = Pr(X = x|X ∼ Pi ) are the conditional probabilities. → MAP Bayesian detector minimizes Pe over all rules [8] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/20 Probability of error and divergences Without loss of generality, consider equal priors ( w1 = w2 = 1 2): Pe = x∈X p(x) min(Pr(H1|x), Pr(H2|x))dν(x) (Pe > 0 as soon as suppp1 ∩ suppp2 = ∅) From Bayes’ rule Pr(Hi |X = x) = Pr(Hi )Pr(X=x|Hi ) Pr(X=x) = wi pi (x)/p(x) Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) Rewrite or bound Pe using tricks of the trade: Trick 1. ∀a, b ∈ R, min(a, b) = a+b 2 − |a−b| 2 , Trick 2. ∀a, b > 0, min(a, b) ≤ minα∈(0,1) aαb1−α, c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/20 Probability of error and total variation Pe = 1 2 x∈X p1(x) + p2(x) 2 − |p1(x) − p2(x)| 2 dν(x), = 1 2 1 − 1 2 x∈X |p1(x) − p2(x)|dν(x) Pe = 1 2 (1 − TV(P1, P2)) total variation metric distance: TV(P, Q) = 1 2 x∈X |p(x) − q(x)|dν(x) → Diﬃcult to compute when handling multivariate distributions. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/20 Bounding the Probability of error Pe min(a, b) ≤ minα∈(0,1) aαb1−α for a, b > 0, upper bound Pe: Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) ≤ 1 2 min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x). C(P1, P2) = − log min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x) ≥ 0, Best error exponent α∗ [7]: Pe ≤ wα∗ 1 w1−α∗ 2 e−C(P1,P2) ≤ e−C(P1,P2) Bounding technique can be extended using any quasi-arithmetic α-means [13, 9]... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/20 Computational information geometry Exponential family manifold [4]: M = {pθ | pθ(x) = exp(t(x)⊤ θ − F(θ))} Dually ﬂat manifolds [1] enjoy dual aﬃne connections [1]: (M, ∇2F(θ), ∇(e), ∇(m)). η = ∇F(θ), θ = ∇F∗ (η) Canonical divergence from Young inequality: A(θ1, η2) = F(θ1) + F∗ (η2) − θ⊤ 1 η2 ≥ 0 F(θ) + F∗ (η) = θ⊤ η c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/20 MAP decision rule and additive Bregman Voronoi diagrams KL(pθ1 : pθ2 ) = B(θ2 : θ1) = A(θ2 : η1) = A∗ (η1 : θ2) = B∗ (η1 : η2) Canonical divergence (mixed primal/dual coordinates): A(θ2 : η1) = F(θ2) + F∗ (η1) − θ⊤ 2 η1 ≥ 0 Bregman divergence (uni-coordinates, primal or dual): B(θ2 : θ1) = F(θ2) − F(θ1) − (θ2 − θ1)⊤ ∇F(θ1) log pi (x) = −B∗ (t(x) : ηi ) + F∗ (t(x)) + k(x), ηi = ∇F(θi ) = η(Pθi ) Optimal MAP decision rule: MAP(x) = argmaxi∈{1,...,n}wi pi (x) = argmaxi∈{1,...,n} − B∗ (t(x) : ηi ) + log wi , = argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi → nearest neighbor classiﬁer [2, 10, 15, 16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/20 MAP & nearest neighbor classiﬁer Bregman Voronoi diagrams (with additive weights) are aﬃne diagrams [2]. argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi ◮ point location in arrangement [3] (small dims), ◮ Divergence-based search trees [16], ◮ GPU brute force [6]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/20 Geometry of the best error exponent: binary hypothesis On the exponential family manifold, Chernoﬀ α-coeﬃcient [5]: cα(Pθ1 : Pθ2 ) = pα θ1 (x)p1−α θ2 (x)dµ(x) = exp(−J (α) F (θ1 : θ2)), Skew Jensen divergence [14] on the natural parameters: J (α) F (θ1 : θ2) = αF(θ1) + (1 − α)F(θ2) − F(θ (α) 12 ), Chernoﬀ information = Bregman divergence for exponential families: C(Pθ1 : Pθ2 ) = B(θ1 : θ (α∗) 12 ) = B(θ2 : θ (α∗) 12 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/20 Geometry of the best error exponent: binary hypothesis Chernoﬀ distribution P∗ [12]: P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) e-geodesic: Ge(P1, P2) = {E (λ) 12 | θ(E (λ) 12 ) = (1 − λ)θ1 + λθ2, λ ∈ [0, 1]}, m-bisector: Bim(P1, P2) : {P | F(θ1) − F(θ2) + η(P)⊤ ∆θ = 0}, Optimal natural parameter of P∗: θ∗ = θ (α∗) 12 = argminθ∈ΘB(θ1 : θ) = argminθ∈ΘB(θ2 : θ). → closed-form for order-1 family, or eﬃcient bisection search. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/20 Geometry of the best error exponent: binary hypothesis P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) pθ1 pθ2 pθ∗ 12 m-bisector e-geodesic Ge(Pθ1 , Pθ2 ) η-coordinate system Pθ∗ 12 C(θ1 : θ2) = B(θ1 : θ∗ 12) Bim(Pθ1 , Pθ2 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/20 Geometry of the best error exponent: multiple hypothesis n-ary MHT [8] from minimum pairwise Chernoﬀ distance: C(P1, ..., Pn) = min i,j=i C(Pi , Pj ) Pm e ≤ e−mC(Pi∗ ,Pj∗ ) , (i∗ , j∗ ) = argmini,j=iC(Pi , Pj ) Compute for each pair of natural neighbors [3] Pθi and Pθj , the Chernoﬀ distance C(Pθi , Pθj ), and choose the pair with minimal distance. (Proof by contradiction using Bregman Pythagoras theorem.) → Closest Bregman pair problem (Chernoﬀ distance fails triangle inequality). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/20 Hypothesis testing: Illustration η-coordinate system Chernoﬀ distribution between natural neighbours c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/20 Summary Bayesian multiple hypothesis testing... ... from the viewpoint of computational geometry. ◮ probability of error & best MAP Bayesian rule ◮ total variation & Pe, upper-bounded by the Chernoﬀ distance. ◮ Exponential family manifolds: ◮ MAP rule = NN classiﬁer (additive Bregman Voronoi diagram) ◮ best error exponent from intersection geodesic/bisector for binary hypothesis, ◮ best error exponent from closest Bregman pair for multiple hypothesis. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/20 Thank you 28th-30th August, Paris. @incollection{HTIGCG-GSI-2013, year={2013}, booktitle={Geometric Science of Information}, volume={8085}, series={Lecture Notes in Computer Science}, editor={Frank Nielsen and Fr\’ed\’eric Barbaresco}, title={Hypothesis testing, information divergence and computational geometry}, publisher={Springer Berlin Heidelberg}, author={Nielsen, Frank}, pages={241-248} } c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/20 Bibliographic references I Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Oxford University Press, 2000. Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock. Bregman Voronoi diagrams. Discrete & Computational Geometry, 44(2):281–307, 2010. Jean-Daniel Boissonnat and Mariette Yvinec. Algorithmic Geometry. Cambridge University Press, New York, NY, USA, 1998. Lawrence D. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Institute of Mathematical Statistics, Hayworth, CA, USA, 1986. Herman Chernoﬀ. A measure of asymptotic eﬃciency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. Vincent Garcia, Eric Debreuve, Frank Nielsen, and Michel Barlaud. k-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In IEEE International Conference on Image Processing (ICIP), pages 3757–3760, 2010. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/20 Bibliographic references II Martin E. Hellman and Josef Raviv. Probability of error, equivocation and the Chernoﬀ bound. IEEE Transactions on Information Theory, 16:368–372, 1970. C. C. Leang and D. H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, January 1997. Frank Nielsen. Generalized Bhattacharyya and Chernoﬀ upper bounds on Bayes error using quasi-arithmetic means. submitted, 2012. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2012. preliminary, technical report on arXiv. Frank Nielsen. Hypothesis testing, information divergence and computational geometry. In Frank Nielsen and Fr´ed´eric Barbaresco, editors, Geometric Science of Information, volume 8085 of Lecture Notes in Computer Science, pages 241–248. Springer Berlin Heidelberg, 2013. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/20 Bibliographic references III Frank Nielsen. An information-geometric characterization of Chernoﬀ information. IEEE Signal Processing Letters (SPL), 20(3):269–272, March 2013. Frank Nielsen. Pattern learning and recognition on statistical manifolds: An information-geometric review. In Edwin Hancock and Marcello Pelillo, editors, Similarity-Based Pattern Recognition, volume 7953 of Lecture Notes in Computer Science, pages 1–25. Springer Berlin Heidelberg, 2013. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Paolo Piro, and Michel Barlaud. Bregman vantage point trees for eﬃcient nearest neighbor queries. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (ICME), pages 878–881, 2009. Paolo Piro, Frank Nielsen, and Michel Barlaud. Tailored Bregman ball trees for eﬀective nearest neighbors. In European Workshop on Computational Geometry (EuroCG), LORIA, Nancy, France, March 2009. IEEE. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/20

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

The exponential family in abstract information theory Jan Naudts and Ben Anthonis Universiteit Antwerpen Paris, August 2013 1 Outline Fisher information Example Abstract Information Theory Assumptions Examples of generalized divergences Deformed exponential families Conclusions 2 Fisher information The standard expression for the Fisher information matrix Ik,l (θ) = Eθ ∂ ∂θk ln pθ ∂ ∂θl ln pθ is a relevant quantity when pθ belongs to the exponential family. A different quantity is needed in the general case. It involves the Kullback-Leibler divergence D(p||pθ) = Ep ln p pθ . Remember that Ik,l (θ) = ∂2 ∂θk ∂θl D(p||pθ) p=pθ . 3 The divergence D(p||pθ) is a measure for the distance between an arbitrary state p and a point pθ of the statistical manifold. Let D(p||M) denote the minimal ‘distance’. Let Fθ denote the ‘ﬁber’ of all points p for which D(p||M) = D(p||pθ). minimum contrast leaf, Eguchi 1992 Deﬁnition The extended Fisher information of a pdf p (not necessarily in M) is Ik,l (p) = ∂2 ∂θk ∂θl D(p||pθ) p∈Fθ . 4 Note that on the manifold the two deﬁnitions coincide Ik,l(θ) = Ik,l (pθ). Proposition Ik,l (p) is covariant. Proof Let η be a function of θ. One calculates ∂2 ∂θk ∂θl D(x||θ) = ∂2 ∂ηm∂ηn D(x||θ) ∂ηm ∂θk ∂ηn ∂θl + ∂ ∂ηm D(x||θ) ∂2 ηm ∂θk ∂θl . The latter term vanishes because p ∈ Fθ. The former term is manifestly covariant. 5 Proposition If pθ belongs to the exponential family then Ik,l(p) is constant on the ﬁbre Fθ. Proof p, pθ satisﬁes the Pythagorean relation D(p||pη) = D(p||pθ) + D(pθ||pη). Hence taking derivatives w.r.t. η only involves D(pθ||pη). Only afterwards put η = θ. One concludes that Ik,l (p) = Ik,l (pθ). Coordinate-independent method to verify that M is not an exponential family! 6 Example Suggested by H. Matsuzoe. Consider the manifold of normal distributions pµ,σ(x) with mean µ and standard deviation σ pµ,σ(x) = 1 √ 2πσ2 e−(x−µ)2 /2σ2 . Consider the submanifold M of normal distributions for which µ = σ pθ(x) = 1 √ 2πθ2 e−(x−θ)2 /2θ2 . Question Is M an exponential family? Answer It is known to be curved (Efron, 1975). Let us show that I(pµ,σ) is not constant along ﬁbers Fθ. 7 The Kullback-Leibler divergence D(pµ,σ||pθ) is minimal when θ is the positive root of the equation θ2 + µθ = µ2 + σ2 . The Fisher information I(pµ,σ) equals I(pµ,σ) = θ2 + µ2 + σ2 θ4 . It is not constant on Fθ — it cannot be written as a function of θ. This implies that M is not an exponential family. 8 Abstract Information Theory Our aims ◮ Formulate the notion of an exponential family in the context of abstract information theory ◮ If M is not an exponential family w.r.t. the Kullback-Leibler divergence, can it be exponential w.r.t. some other divergence? Abstract information theory does not rely on probability theory. We try to bring classical and quantum information theory together in a single formalism. 9 A generalized divergence is a map D : X × M → [0, +∞] between two different spaces. ◮ A divergence is generically asymmetric in its two arguments. This is an indication that the two arguments play a different role. X is the space of data sets, M is a manifold of models. ◮ D(x||m) has the meaning of a loss of information when the data set x is replaced by the model point m. ◮ In the classical setting X is the space of empirical measures, M is a statistical manifold. One has in this case M ⊂ X. 10 Assumptions Let Q denote a linear space of continuous real functions of X. Instead of q(x) we write x|q to stress that Q is not an algebra. In the classical setting Q is the space of random variables. In the quantum setting Q is a space of operators on a Hilbert space. We consider a class of generalized divergences which can be written into the form D(x||m) = ξ(m) − ζ(x) − x|Lm , where ξ and ζ are real functions and L : M → Q is a map from the manifold M into the linear space Q. We assume in addition a compatibility and a consistency condition — see a later slide. 11 For instance, the quantities ln p, ln pθ appearing in the Kullback-Leibler divergence D(p||pθ) = Ep ln p − Ep ln pθ = p| ln p − p| ln pθ are used as random variables and belong to Q. One can deﬁne a map L : M → Q by Lpθ = ln pθ and write the divergence as D(p||pθ) = ξ(pθ) − ζ(p) − p|Lpθ with ξ(pθ) = 0, ζ(p) = −Ep ln p and p|q = Epq. The quantity ξ(pθ) has been called the corrector by Flemming Topsøe. ζ(p) is the entropy. We call L the logarithmic map. 12 Compatibility condition For each x ∈ X there exists a unique point m ∈ M which minimizes the divergence D(x||m). This means that each point of X belongs to some ﬁber Fm. Consistency condition Each point m of M can be approached by points x of Fm in the sense that D(x||m) can be made arbitrary small. 13 Example: Bregman divergence A divergence of the Bregman type is deﬁned by D(x||m) = a F(x(a)) − F(m(a)) − (x(a) − m(a))f(m(a)) = a x(a) m(a) du [f(u) − f(m(a))] , where F is any strictly convex function deﬁned on the interval (0, 1] and f = F′ is its derivative. L.M. Bregman, The relaxation method to ﬁnd the common point of convex sets and its applications to the solution of problems in convex programming, USSR Comp. Math. Math. Phys. 7 (1967) 200–217. 14 In the notations of our abstract information theory one has ◮ x|q = Ex q; ◮ Lm(a) = f(m(a)); ◮ ζ(x) = − a F(x(a)); ◮ ξ(m) = a m(a)f(m(a)) − a F(m(a)). 15 Note that the Bregman divergence can be written as D(x||m) = a f(m(a)) f(x(a)) du [g(u) − x(a)]. g is the inverse function of f. N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi, Information Geometry of U-Boost and Bregman Divergence, Neural Computation 16, 1437–1481 (2004). In the language of non-extensive statistical physics is f the deformed logarithm, g the deformed exponential function. The Kullback-Leibler divergence is recovered by taking F(u) = u ln u − 1. This implies g(u) = eu and f(u) = ln u. 16 Deformed exponential families A parametrized exponential family is of the form mθ(a) = c(a) exp(−α(θ) − θk Hk (a)). physicists′ notation This implies a logarithmic map of the form Lmθ(a) = ln mθ(a) c(a) = −α(θ) − θk Hk (a). It is obvious to generalize this deﬁnition by replacing the exponential function by a deformed exponential function. J. Naudts, J. Ineq. Pure Appl. Math. 5 102 (2004). S. Eguchi, Sugaku Expositions (Amer. Math. Soc.) 19, 197–216 (2006). P. D. Grünwald and A. Ph. Dawid, Ann. Statist. 32,1367–1433 (2004). 17 Can we give a deﬁnition of a deformed exponential family - which relies only on the divergence? - which does not involve canonical coordinates? - which has a geometric interpretation? Lafferty 1999: additive models mθ minimizes d(m||m0) + θk EmHk . This is a constraint maximum entropy principle. Our proposal: The Fisher information I(x) is constant along the ﬁbers of minimal divergence. This property is a minimum requirement for a distribution to be a (deformed) exponential family. It is satisﬁed for the deformed exponential families based on Bregman type divergences. 18 Csiszár type of divergences Csiszár type of divergence D(x||m) = a m(a)F x(a) m(a) . The choice F(u) = u ln u reproduces Kullback-Leibler. Example In the context of non-extensive statistical mechanics both Csiszár and Bregman type divergences are being used. Fix the deformation parameter q = 1, 0 < q < 2. Csiszár Dq(x||m) = 1 q − 1 a x(a) x(a) m(a) q−1 − 1 , Bregman Dq(x||m) = 1 q − 1 a x(a) m(a)1−q − x(a)1−q + a [m(a) − x(a)] m(a)1−q . 19 Introduce the q-deformed exponential function expq(u) = [1 + (1 − q)u] 1/(1−q) + . The distribution of the form mθ(a) = expq(−α(θ) − θk Hk (a)) is a deformed exponential family relative to the Bregman type divergence, but not relative to the Csiszár type divergence. In the latter case the extended Fisher info is given by Ik,l (x) = z(x) ∂2 α ∂θk ∂θl with ∂α ∂θk = − 1 z(θ) a x(a)q Hk (a) and z(x) = a x(a)q . If q = 1 then z(x) = 1 and the extended Fisher info is constant along Fθ. If q = 1 it is generically not constant along Fθ. 20 Conclusions ◮ We consider Fisher information not only on the statistical manifold of model states but also for empirical measures. ◮ If the model is an exponential family then the Fisher information is constant along ﬁbers of minimal divergence. ◮ We extend the notion of an exponential family to an abstract setting of information theory ◮ In the abstract setting the deﬁnition of a generalized exponential family only depends on the choice of the divergence. 21

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Variational Problem in Euclidean Space With Density Lakehal BELARBI1 and Mohamed BELKHELFA2 1Departement de Math´ematiques, Universit´e de Mostaganem B.P.227,27000,Mostaganem, Alg´erie. 2Laboratoire de Physique Quantique de la Matiere et Mod´elisations Math´ematiques (LPQ3M), Universit´e de Mascara B.P.305 , 29000,Route de Mamounia Mascara, Alg´erie. GEOMETRIC SCIENCE OF INFORMATION Paris-Ecole des Mines 28,29 and 30 August 2013 1 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 4 The Divergence operator in manifolds with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References What is a manifold with density A manifold with density is a Riemannian manifold Mn with positive density function eϕ used to weight volume and hyperarea (and sometimes lower-dimensional area and length).In terms of underlying Riemannian volume dV0 and area dA0 , the new weighted volume and area are given by dV = eϕ .dV0, dA = eϕ .dA0. 3 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Examples of a manifold with density One of the ﬁrst examples of a manifold with density appeared in the realm of probability and statistics, Euclidean space with the Gaussian density e−π|x| (see ([13]) for a detailed exposition in the context of isoperimetric problems). 4 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For reasons coming from the study of diﬀusion processes,Bakry and ´Emery ([1]) deﬁned a generalization of the Ricci tensor of Riemannian manifold Mn with density eϕ (or the ∞−Bakry-´Emery-Ricci tensor) by Ric∞ ϕ = Ric − Hessϕ, (1) where Ric denotes the Ricci curvature of Mn and Hessϕ the Hessian of ϕ. 5 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References By Perelman in ([11],1.3,p.6),in a Riemannian manifold Mn with density eϕ in order for the Lichnerovicz formula to hold, the corresponding ϕ−scalar curvature is given by S∞ ϕ = S − 2∆ϕ− | ϕ |2 , (2) where S denotes the scalar curvature of Mn.Note that this is diﬀerent than taking the trace of Ric∞ ϕ which is S − ∆ϕ. 6 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Following Gromov ([6],p.213), the natural generalization of the mean curvature of hypersurfaces on a manifold with density eϕ is given by Hϕ = H − 1 n − 1 d ϕ dN , (3) where H is the Riemannian mean curvature and N is the unit normal vector ﬁeld of hypersurface . 7 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For a 2-dimensional smooth manifold with density eϕ , Corwin et al.([5],p.6) deﬁne a generalized Gauss curvature Gϕ = G − ∆ϕ. (4) and obtain a generalization of the Gauss-Bonnet formula for a smooth disc D: D Gϕ + ∂D κϕ = 2π, (5) where κϕ is the inward one-dimensional generalized mean curvature as (1.3) and the integrals are with respect to unweighted Riemannian area and arclength ([9],p.181). 8 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Bayle ([2]) has derived the ﬁrst and second variation formulae for the weighted volume functional (see also [9],[10],[13]).From the ﬁrst variation formula, it can be shown that an immersed submanifold Nn−1 in Mn is minimal if and only if the generalized mean curvature Hϕ vanishes (Hϕ = 0). Doan The Hieu and Nguyen Minh Hoang ([8]) classiﬁed ruled minimal surfaces in R3 with density Ψ = ez. 9 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In ([4]) , we have previously written the equation of minimal surfaces in R3 with linear density Ψ = eϕ (in the case ϕ(x, y, z) = x, ϕ(x, y, z) = y and ϕ(x, y, z) = z), and we gave some solutions of the equation of minimal graphs in R3 with linear density Ψ = eϕ. In ([3]),we gave a description of ruled minimal surfaces by geodesics straight lines in Heisenberg space H3 with linear density Ψ = eϕ = eαx+βy+γz,where (α, β, γ) ∈ R3 − {(0, 0, 0)} (in particular ϕ(x, y, z) = αx and ϕ(x, y, z) = βy), and we gave the ∞−Bakry-´Emery Ricci curvature tensor and the ϕ−scalar curvature of Heisenberg space H3 with radial density e−aρ2+c,where ρ = x2 + y2 + z2. 10 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In this section, we introduce notations, deﬁnitions, and preliminary facts which are used throughout this paper. We deal with two-dimensional surfaces in Euclidean 3-space.We assume that the surface is given parametrically by X : U ⊆ R2 → R3. We denote the parameters by u and v. We denote the partial derivatives with respect to u and v by the corresponding subscripts. The normal vector N to the surface at a given point is deﬁned by N = Xu ∧ Xv Xu ∧ Xv . 11 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References The ﬁrst fundamental form of the surface is the metric that is induced on the tangent space at each point of the surface.The (u, v) coordinates deﬁne a basis for the tangent space.This basis consists of the vectors Xu and Xv .In this basis the matrix of the ﬁrst fundamental form is E F F G , where E = Xu.Xu, F = Xu.Xv , and G = Xv .Xv . In this basis, the second fundamental form of the surface is given by the matrix : L M M N , where L = −Xu.Nu, M = −Xu.Nv , and N = −Xv .Nv . 12 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Deﬁnition ([12]) The area AX(R) of the part X(R) of a surface patch X : U ⊆ R2 → R3 corresponding to a region R ⊆ U is AX(R) = R Xu ∧ Xv dudv. and Xu ∧ Xv = (EG − F2 ) 1 2 . 13 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. The surface variation of the family is the function η : U → R3 given by η = ∂Xt ∂t /t=0, Let γ be a simple closed curve that is contained,along with its interior int(γ),in U. Then γ corresponds to a closed curve γt = Xt ◦ γ in the surface St, and we deﬁne the ϕt−area function Aϕt (t) in R3 with density eϕt to be the area of the surface St inside γt in R3 with density eϕt : Aϕt (t) = int(γ) eϕt dAXt . 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Theorem With the above notation, assume that the surface variation ηt vanishes along the boundary curve γ.Then, ∂Aϕt (t) ∂t /t=0 = Aϕ(0) = −2 int(γ) Hϕ.η.N.eϕ .(EG − F2 ) 1 2 dudv, (6) where Hϕ = H − 1 2 ϕ.N is the ϕ−mean curvature of S in R3 with density eϕ , E, F and G are the coeﬃcients of its ﬁrst fundamental form,and N is the standard unit normal of S. 15 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Motavation If S in R3 with density eϕ has the smallest ϕ−area among all surfaces in R3 with density eϕ with the given boundary curve γ, then Aϕ must have an absolute minimum at t = 0, so Aϕ(0) = 0 for all smooth families of surfaces as above. This means that the integral in Eq.(6) must vanish for all smooth functions ζ = η.N : U → R. This can happen only if the term that multiplies ζ in the integrand vanishes,in other words only ifHϕ = 0. This suggests the following deﬁnition. 16 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Deﬁnition A minimal surface in R3 with density eϕ is a surface whose ϕ−mean curvature is zero everywhere. 17 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proposition The minimal equation of surface S : z = f (x, y) in R3 with linear density ex given by the parametrization: X : (x, y) → (x, y, f (x, y)) , where (x, y) ∈ R2 is 1 + ∂f ∂x 2 ∂2f ∂y2 + ∂f ∂x + 1 + ∂f ∂y 2 ∂2f ∂x2 + ∂f ∂x −2 ∂f ∂x . ∂f ∂y . ∂2f ∂x∂y − ∂f ∂x = 0. 18 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Example The surface S in R3 with linear density ex deﬁned by the parametrization : X : (x, y) → x, y, − a2 √ 1 + a2 arcsin(βe − 1+a2 a2 x ) + ay + b + γ , where (x, y) ∈ R2 , a, b, β ∈ R∗ is minimal. 19 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let (Mn, g) be a Riemannian manifold equipped with the Riemannian metric g.For any smooth function f on M, the gradient f is a vector ﬁeld on M, which is locally coordinates x1, x2....., xn has the form ( f )i = gij ∂f ∂xj , where summation is assumed over repeated indices. For any smooth vector ﬁeld F on M, the divergence divF is a scalar function on R, which is given in local coordinates by divF = 1 detgij ∂ ∂xi ( detgij Fi ) Let ν be the Riemannian volume on M, that is ν = detgij dx1.....dxn. By the divergence theorem, for any smooth function f and a smooth vector ﬁeld F, such that either f or F has compact support, M fdivFdν = − M < f , F > dν. (7) 20 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References where < ., . >= g(., .). In particular, if F = ψ for a function ψ then we obtain M fdiv ψdν = − M < f , ψ > dν. (8) provided one of the functions f , ψ has compact support. The operator ∆ := div ◦ is called the Laplace (or Laplace-Beltrami ) operator of the Riemannian manifold M. From (8) we obtain the Green formulas M f ∆ψdν = − M < f , ψ > dν = M ψ∆fdν. (9) 21 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let now µ be another measure on M deﬁned by dµ = eϕ dν where ϕ is a smooth function on M. A triple (Mn, g, µ) is called a weighted manifold or manifold with density.The associative divergence divµ is deﬁned by divµF = 1 eϕ detgij ∂ ∂xi (eϕ detgij Fi ), and the Laplace-Beltrami operator ∆µ of (Mn, g, µ) is deﬁned by ∆µ. := divµ ◦ . = 1 eϕ div(eϕ .) = ∆. + ϕ .. (10) It is easy to see that the Green formulas hold with respect to the measure µ, that is, M f ∆µψdµ = − M < f , ψ > dµ = M ψ∆µfdµ. (11) provided f or ψ belongs to C∞ 0 (M). 22 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Let S a surface in M3 with density Ψ = eϕ, we have divϕN = −2Hϕ. (12) where Hϕ is the ϕ−mean curvature of a surface S and N is the unit normal vector ﬁeld of a surface S. 23 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proof. by deﬁnition we have divϕN = 1 eϕ div(eϕN).+ < ϕ, N > = divN + N. ϕ = ∑i=2 i=1 < ei N, ei > +N. ϕ = ∑i=2 i=1( ei < ei , N > − < ei ei , N >) + N. ϕ = −2 < HN, N > +N. ϕ = −2(H − 1 2 ϕ.N) = −2Hϕ. where we have used that < ei , N >= 0 and the deﬁnition of the mean curvature vector. 24 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References D.Bakry,M.´Emery.Diﬀusions hypercontractives. S´eminaire de Probabilit´es,XIX, 1123 (1983/1984,1985), 177-206. V.Bayle,Propri´et´es de concavit´e du proﬁl isop´erim´etrique et applications.graduate thesis,Institut Fourier,Univ.Joseph-Fourier,Grenoble I ,(2004). L.Belarbi,M.Belkhelfa.Heisenberg space with density,( submetted) . L.Belarbi,M.Belkhelfa.Surfaces in R3 with density,i-manager’s Journal on Mathematics,Vol. 1 .No. 1.(2012),34-48. I.Corwin,N.Hoﬀman,S.Hurder,V.Sesum,and Y.Xu,Diﬀerential geometry of manifolds with density,Rose-Hulman Und.Math.J.,7(1) (2006). 25 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References M.Gromov, Isoperimetry of waists and concentration of maps,Geom.Func.Anal 13(2003),285-215. J.Lott,C.Villani,Ricci curvature metric-measure space via optimal transport.Ann Math,169(3)(2009),903-991. N.Minh,D.T.Hieu,Ruled minimal surfaces in R3 with density ez,Paciﬁc J. Math. 243no. 2 (2009), 277–285. F.Morgan,Geometric measure theory,A Beginer’s Guide,fourth edition,Academic Press.(2009). 26 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References F.Morgan,Manifolds with density,Notices Amer.Math.Soc.,52 (2005),853-858. G.Ya.Perelman,The entropy formula for the Ricci ﬂow and its geometric applications,preprint,http://www.arxiv.org/abs/math.DG/0211159. (2002). A.Pressley.Elementary Diﬀerential Geometry,Second Edition,Springer. (2010). C.Rosales,A.Ca˜nete,V.Bayle,F.Morgan.On the isoperimetric problem in Euclidean space with density.Cal.Var.Partial Diﬀerential Equations.31(1) (2008) 27-46. 27 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References THANK YOU FOR YOUR ATTENTION 28 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density

### ORAL SESSION 7 Hessian Information Geometry I (Michel Nguiﬀo Boyom)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Complexiﬁcation of Information Geometry in view of quantum estimation theory Introduction • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually ﬂat structure) Introduction • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually ﬂat structure) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an eﬃcient estimator). • P itself is an exponential family. ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an eﬃcient estimator). • P itself is an exponential family. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iﬀ ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iﬀ ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Quasi-Classical Exponential Family (QCEF) M = {ρθ | θ ∈ Rn } ⊂ ¯S is called a quasi-classical exponential family when it is represented as ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # where {F1, . . . , Fn} ⊂ Lh(H), [Fi, Fj] := FiFj − FjFi = 0 (commutative), {ρ0, F1ρ0, . . . , Fnρ0} are linearly independent, ψ(θ) = log Tr h ρ0 exp £X j θi Fj §i . Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x￿}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |x￿￿x|. Then we have: pθ(x):= ￿x|ρθ|x￿ = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x￿}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |x￿￿x|. Then we have: pθ(x):= ￿x|ρθ|x￿ = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Similar to the case of Shima's observation on M and TM Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X￿ X X￿ e −→ X￿ X e −→ X￿ m −→ X￿ X m −→ X￿ Y ￿ Y Y ￿ 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ → X￿ X e −→ X￿ X e −→ X￿ → X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ → Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ m → Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X m −→ X￿ X m −→ X￿ X m −→ Y Y ￿ Y Y ￿ X e −→ X￿ X X m −→ X￿ X Y Y e −→ Y ￿ Y Y m −→ Y ￿ Y Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X￿ X X￿ e −→ X￿ X e −→ X￿ m −→ X￿ X m −→ X￿ Y ￿ Y Y ￿ 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ → X￿ X e −→ X￿ X e −→ X￿ → X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ → Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ m → Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X m −→ X￿ X m −→ X￿ X m −→ Y Y ￿ Y Y ￿ X e −→ X￿ X X m −→ X￿ X Y Y e −→ Y ￿ Y Y m −→ Y ￿ Y Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ￿ ￿ ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). X e −→ X￿ ↓ J ↓ J J(X) m −→ J(X￿ ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ￿ ￿ ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). X e −→ X￿ ↓ J ↓ J J(X) m −→ J(X￿ ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ m ￿ m ￿ m ￿ ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ m ￿ m ￿ m ￿ ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iﬀ J(X) m −→ J(X￿ ), and X m −→ X￿ iﬀ J(X) e −→ J(X￿ ). ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

' & $% Foliations on Aﬃnely Flat Manifolds Information Geometry Robert Wolak Jagiellonian University, Krakow (Poland) joint work with Michel Nguiﬀo Boyom UMR CNRS 5149. 3M D´epartement de Math´ematiques et de Mod´elisation Universit´e Montpellier2, Montpellier GSI2013 - Geometric Science of Information MINES ParisTech Paris 28-08-2013 - 30-08-2013 GSI2013 Foliations on Aﬃnely Flat Manifolds. 1/17 ' &$ % Contents 1. Algebraic preliminaries Koszul-Vinberg algebra (KV-algebra) Algebroid of Koszul-Vinberg Twisted KV-cochain complex Chevalley-Eilenberg complex of the twisted module 2. Foliations on locally ﬂat manifolds 3. Fisher information metric 4. Dual pairs of connections 5. Foliations on locally ﬂat manifolds cont. Ehresmann connections Topological properties GSI2013 Foliations on Aﬃnely Flat Manifolds. 2/17 ' & $% An algebra A is an R-vector space endowed with a bilinear map µ : A × A → A. This map µ is the multiplication map of A. For a, b ∈ A, ab will stand for µ(a, b). Given an algebra A, the Koszul-Vinberg anomaly (KV-anomaly) of A is the three-linear map KV : A3 → A deﬁned by KV (a, b, c) = (ab)c − a(bc) − (ba)c + b(ac) Deﬁnition An algebra A is called a Koszul-Vinberg algebra (KV-algebra) if its KV anomaly vanishes identically. GSI2013 Foliations on Aﬃnely Flat Manifolds. 3/17 ' &$ % Deﬁnition An algebroid of Koszul-Vinberg is a couple (V, a) where V is a vector bundle over the base manifold M and whose module of sections Γ(V ) has the structure of a Koszul-Vinberg algebra over R and a is a homomorphism of the vector bundle V into TM satisfying the following properties: (i) (fs).s′ = f(ss′ ) ∀s, s′ ∈ Γ(V ), ∀f ∈ C∞ (M, R) ; (ii) s.(fs′ ) = (a(s)f)s′ + f(ss′ ). Remark (i) If we equip Γ(V ) with the bracket [s, s′ ] = ss′ − s′ s then the Koszul-Vinberg algebroid (V, a) becomes a Lie algebroid. (ii) The condition [s, fs′ ] = (a(s)f)s′ + f[s, s′ ] ensures that a is a homomorphism of Lie algebras, i.e. a([s, s′ ]) = [a(s), a(s′ )] The vector space spanned by the commutators [a, b] = ab − ba of a KV-algebra A is a Lie algebra denoted by AL. GSI2013 Foliations on Aﬃnely Flat Manifolds. 4/17 ' & $% Let A be a Koszul-Vinberg algebra; a A -module is a vector space W equipped with a right action and a left action of A related by the following equalities: for any a, b ∈ A, and any w ∈ W we have a(bw) − (ab)w = b(aw) − (ba)w and a(wb) − (aw)b = w(ab) − (wa)b. Let A = (X(M), ·) be an algebra with the multiplication given by X · Y = DXY , then A is a Koszul-Vinberg algebra and the space T(M) of tensors on M is a two sided A-module. T(M) is a bigraded by subspaces Tp,q (M) of tensors of type (p, q), GSI2013 Foliations on Aﬃnely Flat Manifolds. 5/17 ' &$ % Twisted KV-cochain complex Let A be a KV-algebra and let W be a two-sided KV-module over A. We equip the vector space W with the left module structure A × W −→ W deﬁned by a ∗ w = aw − wa, ∀a ∈ A, w ∈ W. (1) One has KV (a, b, w) = (a, b, w) − (b, a, w) = 0, where (a, b, w) = (ab) ∗ w − a ∗ (b ∗ w). Deﬁnition The left KV-module structure deﬁned by (1) is called the twisted KV-module structure derived from the two-sided KV-module W. The vector space W endowed with the twisted module structure is denoted by Wτ . GSI2013 Foliations on Aﬃnely Flat Manifolds. 6/17 ' & $% The map (a, w) −→ a ∗ w deﬁnes on Wτ a left module structure over the Lie algebra AL. The complex CCE (AL, Wτ ) is called the Chevalley-Eilenberg complex of the twisted module. Let A be a KV-algebra and let W be a two-sided KV-module over A. We consider the graded vector space CKV (A, Wτ ) = q∈Z Cq KV (A, Wτ ) where Cq KV (A, Wτ ) = {0} if q < 0, C0 KV (A, Wτ ) = Wτ for q ≥ 1, Cq KV (A, Wτ ) = HomR(⊗q A, Wτ ). If no risk of confusion, C(A, Wτ ) will stand for CKV (A, Wτ ). GSI2013 Foliations on Aﬃnely Flat Manifolds. 7/17 ' &$ % Let us deﬁne the linear mapping d : Cq (A, Wτ ) −→ Cq+1 (A, Wτ ): ∀w ∈ Wτ , f ∈ Cq (A, Wτ ), a ∈ A and ζ = a1 ⊗ ... ⊗ aq+1 ∈ ⊗q+1 A, (dw)(a) = −aw + wa, (df)(ζ) = q+1 i=1 (−1)i {ai ∗ (f(∂iζ)) − f(ai.∂iζ)} (2) the action ai.∂iζ is deﬁned by the standard tensor product extension. Theorem (i) The pair (C(A, Wτ ), d) is a cochain complex whose qth cohomology space is denoted by Hq KV (A, Wτ ). (ii) The graded space CN (A, Wτ ) = W ⊕ q>0 Hom(∧q A, Wτ ) is a subcomplex of (C(A, Wτ ), d) whose cohomology coincides with the cohomology of the Chevalley-Eilenberg complex CCE (AL, Wτ ). GSI2013 Foliations on Aﬃnely Flat Manifolds. 8/17 ' & $% (M, ∇) - a locally ﬂat manifold. A∇ = (X(M), ∇) - the KV-algebra associated to (M, ∇) Wτ = C∞ (M) - the left KV-module over A∇ under the covariant derivative C0(A∇, Wτ ) the vector subspace of C∞ (A∇, Wτ ) formed by cochains of order 0, thus C0(A∇, Wτ ) consists of C∞ (M)-multilinear mappings. Theorem The second cohomology space H2 0 (A∇, Wτ ) can be decomposed as it follows: H2 0 (A∇, Wτ ) = H2 dR(M) ⊕ H0 (A∇, Hom(S2 A∇, Wτ )) (3) where H2 dR(M) is the 2nd de Rham cohomology space of M. GSI2013 Foliations on Aﬃnely Flat Manifolds. 9/17 ' &$ % H(A∇, Wτ ) = q≥0 Hq (A∇, Wτ ) - a geometric invariant of (M, ∇), bq(∇) = dim Hq 0 (A∇, C∞ (M)) - qth Betti number of (M, ∇) bq(M) = dim Hq dR(M, R). - the classical qth Betti number of M. bq(M) ≤ bq(∇). M. Nguiﬀo Boyom, F. Ngakeu, P. M. Byande, R. Wolak, KV-cohomology and diﬀerential geometry of aﬃnely ﬂat manifolds. information geometry, African Diaspora Journal of Mathematics, Special Volume in Honor of Prof. Augustin Banyaga Vol. 14, 2, pp. 197–226 (2012) GSI2013 Foliations on Aﬃnely Flat Manifolds. 10/17 ' & $% Deﬁnition Let (M, ∇) be a locally ﬂat manifold. (i) A totally geodesic foliation F of (M, ∇) is called aﬃne foliation. (ii) A totally geodesic foliation F of M is tranversally euclidean if its normal bundle TM/TF is endowed with a ∇-parallel (pseudo) euclidean scalar product. Q(M) = HomC∞(M) (S2 A, Wτ ), the vector space of tensorial quadratic forms on (sections of) TM. For σ ∈ H0 KV (A, Q(M)), let ¯σ be the quadratic form on TM/ ker σ deduced from σ and let sign(σ) be the Morse index of ¯σ. We deﬁne the following numerical invariants: Deﬁnition We set: ρ∇(M) = min{ρ∇(σ) = dim ker σ, σ ∈ H0 (A, Q(M))} and S∇(M) = min{S∇(σ) = dim ker σ + sign(σ), σ ∈ H0 (A, Q(M))}. GSI2013 Foliations on Aﬃnely Flat Manifolds. 11/17 ' &$ % (Ξ, Ω) a measurable set. Θ ⊂ Rn be a connected subset. Deﬁnition A connected open subset Θ ⊂ Rn is an n-dimensional statistical model for a measurable set (Ξ, Ω) if there exists a real valued positive function p : Θ × Ξ → R subject to the following requirements. (i) For every ﬁxed ξ ∈ Ξ the function θ → p(θ, ξ) is smooth. (ii) For every ﬁxed θ ∈ Θ the function ξ → p(θ, ξ) is a probability density in (Ξ, Ω) viz Ξ p(θ, ξ)dξ = 1. (iii) For every ﬁxed ξ ∈ Ξ there exists a couple (θ, θ′ ) such that p(θ, ξ) = p(θ′ , ξ) ∇ a torsion free linear connection in the manifold Θ and let one set ln(θ, ξ) = log(p(θ, ξ)). GSI2013 Foliations on Aﬃnely Flat Manifolds. 12/17 ' & $% At each point θ ∈ Θ we deﬁne the the family {q(θ,ξ))} of bilinear forms. Let (X, Y ) be a couple of smooth vector ﬁelds in Θ. We put q(θ,ξ)(X, Y ) = −(∇dln)(X, Y )(θ, ξ). Since ∇ is torsion free q(θ,ξ)(X, Y ) is symmetric w.r.t. the couple (X, Y ). Deﬁnition The Fisher information g of the local model (Θ, p) is the mathematical expectation of the bilinear form q(θ,ξ), g(X, Y )(θ) = Ξ p(θ, ξ)q(θ,ξ)(X, Y )dξ. The Fisher information g does not depend on the choice of the symmetric connection ∇. The Fisher information g is a semi deﬁnite positive. When g is deﬁnite it is called Fisher metric of the model (Θ, p). GSI2013 Foliations on Aﬃnely Flat Manifolds. 13/17 ' &$ % The dualistic relation between linear connections. Deﬁnition In a Riemannian manifold (M, g) a couple (∇, ∇∗ ) of linear connections are dual of each other if the identity Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇∗ X Z) holds for all vector ﬁelds X, Y, Z on the manifold M. A dual pair (∇, ∇∗ ) in a Riemannian manifold (M, g). Assume that both (M, ∇) and (M, ∇∗ ) are locally ﬂat structures. They deﬁne the pair ([ρ∇], [ρ∇∗ ]) of conjugation class of canonical representations. Therefore we have the following two properties. Theorem The pair ([ρ∇], [ρ∇∗ ]) does not depend on the choice of the riemannian structure g. GSI2013 Foliations on Aﬃnely Flat Manifolds. 14/17 ' & $% Theorem Every locally ﬂat manifold (M, ∇) whose 2-dimensional twisted cohomology H2 0 (A∇, Wτ ) diﬀers from the de Rham cohomology space H2 dR(M) is either a ﬂat (pseudo)-Riemannian manifold or is foliated by a pair (F, F∗) of g-orthogonal foliations for every Riemannian metric g. Moreover, these foliations are totally geodesic w.r.t. the g-dual pair (D, D∗ ) (respectively). GSI2013 Foliations on Aﬃnely Flat Manifolds. 15/17 ' &$ % TM = TF ⊕ TF∗ Deﬁne a torsion free linear connection by setting ˜D(X1,X2)(Y1, Y2) = (DX1 Y1 + [X2, Y1], D∗ X2 Y2 + [X1, Y2]) for all (X1, X2), (Y1, Y2) ∈ Γ(TF) × Γ(TF∗). ˜D is the unique torsion free linear connection which preserves (F, F∗). GSI2013 Foliations on Aﬃnely Flat Manifolds. 16/17 ' & $% Assume that one of the connections (D, D∗ ) is geodesically complete. The foliations are Ehresmann connections for the other. – the universal coverings of leaves of the foliation F, respectively, F∗, are D-aﬃnely isomorphic, respectively, D∗-aﬃnely isomorphic. – the universal covering ˜M of the manifold M is the product K × L where K is the universal covering of leaves of the foliation F and L is the universal covering of leaves of the foliation F∗. Assume that the connection D is complete. Then the restriction of D to leaves of F is complete and each leaf of F is a geodesically complete locally ﬂat manifold, so its universal covering is diﬀeomorphic Rp where p is the dimension of leaves of F. The same is true if the connection D∗ is complete. GSI2013 Foliations on Aﬃnely Flat Manifolds. 17/17 ' &$ % Merci Thank you

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Hessian structures on deformed exponential families MATSUZOE Hiroshi Nagoya Institute of Technology joint works with HENMI Masayuki (The Institute of Statistical Mathematics) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 1 PRELIMINARIES 1 Preliminaries 1.1 Geometry of statistical models   Deﬁnition 1.1 S is a statistical model or a parametric model on Ω def ⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ s.t. S = { p(x; ξ) ∫ Ω p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn } .   We regard S as a manifold with a local coordinate system {Ξ; ξ1 , . . . , ξn }   gF = (gF ij) is the Fisher metric (Fisher information matrix) of S def ⇐⇒ gF ij(ξ) := ∫ Ω ∂ ∂ξi log p(x; ξ) ∂ ∂ξj log p(x; ξ)p(x; ξ)dx = ∫ Ω ∂ipξ ( ∂ ∂ξj log pξ ) dx = Eξ[∂ilξ∂jlξ]   ∂ipξ def ⇐⇒ mixture representation, ∂ilξ = ( ∂ipξ pξ ) def ⇐⇒ exponential representation. (the score function) 2/29 1 PRELIMINARIES A statistical model Se is an exponential family def ⇐⇒ Se = { p(x; θ) p(x; θ) = exp[C(x) + n∑ i=1 θi Fi(x) − ψ(θ)] } , C, F1, · · · , Fn : random variables on Ω ψ : a function on the parameter space Θ The coordinate system [θi ] is called the natural parameters.   Proposition 1.2 For an exponential family Se, (1) ∇(1) is ﬂat (2) [θi ] is an aﬃne coordinate, i.e., Γ (1) k ij ≡ 0   For simplicity, assume that C = 0. gF ij(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))] = E[−∂i∂j log p(x; θ)] = E[∂i∂jψ(θ)] = ∂i∂jψ(θ) :the Fisher metric CF ijk(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))(∂k log p(x; θ))] = ∂i∂j∂kψ(θ) :the cubic form The triplets (Se, ∇(e) , gF ) and (Se, ∇(m) , gF ) are Hessian manifolds. Remark: (S, ∇(α) , gF ) is an invariant statistical manifold. 3/29 1 PRELIMINARIES Normal distributions Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane). S = { p(x; µ, σ) p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ]} The Fisher metric is (gij) = 1 σ2 ( 1 0 0 2 ) ( S is a space of constant negative curvature − 1 2 ) .   ∇(1) and ∇(−1) are ﬂat aﬃne connections. In addition, θ1 = µ σ2 , θ2 = − 1 2σ2 ψ(θ) = − (θ1 )2 4θ2 + 1 2 log ( − π θ2 ) =⇒ p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ] = exp [ xθ1 + x2 θ2 − ψ(θ) ] . {θ1 , θ2 }: natural parameters. (∇(1) -geodesic coordinate system) η1 = E[x] = µ, η2 = E [ x2 ] = σ2 + µ2 . {η1, η2}: moment parameters. (∇(−1) -geodesic coordinate system)   4/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex) The Fisher metric: (gij) = 1 η0      1 + η0 η1 1 · · · 1 1 1 + η0 η2 ... ... ... ... 1 · · · · · · 1 + η0 ηn      , where η0 = 1 − n∑ j=1 ηj. ( Sn is a space of constant positive curvature 1 4 ) . 5/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex)   {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log  1 + n∑ j=1 eθj   {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem)   6/29 1 PRELIMINARIES   Proposition 1.3 For Se, the following hold: (1) (Se, gF , ∇(e) , ∇(m) ) is a dually ﬂat space. (2) {θi } is a ∇(e) -aﬃne coordinate system on Se. (3) ψ(θ) is the potential of gF w.r.t. {θi }: gF ij(θ) = ∂i∂jψ(θ). (4) Set the expectations of Fi(x) by ηi =Eθ[Fi(x)] =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set ϕ(η) = Eθ[log pθ]. =⇒ ϕ(η) is the potential of gF w.r.t. {ηi}.   Since (Se, gF , ∇(e) , ∇(m) ) is a dually ﬂat space, the Legendre transfor- mation holds. ∂ψ ∂θi = ηi, ∂ϕ ∂ηi = θi , ψ(p) + ϕ(p) − m∑ i=1 θi (p)ηi(p) = 0 gF ij = ∂2 ψ ∂θi∂θj , CF ijk = ∂3 ψ ∂θi∂θj∂θk 7/29 1 PRELIMINARIES Kullback-Leibler divergence (or relative entropy on S def ⇐⇒ DKL(p, r) = ∫ Ω p(x) log p(x) r(x) dx = Ep[log p(x) − log r(x)] ( = ψ(r) + ϕ(p) − n∑ i=1 θi (r)ηi(p) = D(r, p) ) For Se, DKL coincides with the canonical divergence D on a dually ﬂat space (Se, ∇(m) , gF ). Construction of a divergence from an estimating function  s(x; ξ) =   ∂/∂ξ1 log p(x; ξ) ... ∂/∂ξn log p(x; ξ)  : the score function of p(x; ξ) (estimating function) by Integrating of the score function and by taking an expectation, dKL(p, r) := ∫ Ω p(x; ξ) log r(x; ξ′ )dx the cross entropy on S The KL-divergence is given by the diﬀerence of cross entropies. DKL(p, r) = dKL(p, p) − dKL(p, r)   8/29 1 PRELIMINARIES 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 9/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 2 Deformed exponential family (χ-exp. family) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm  Deﬁnition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t)   Usually, the χ-exponential is called ϕ-exponential in statistical physics. In this talk, ϕ is used as the dual potential on a dually ﬂat space.   Example 2.2 In the case χ(t) = tq , we have ∫ x 1 1 χ(t) dt = ∫ x 1 1 tq dt = x1−q − 1 1 − q = logq x q-logarithm λ(t) = (1 + (1 − q)t) q 1−q 1 + ∫ x 0 λ(t) dt = (1 + (1 − q)x) 1 1−q q-exponential   10/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm  Deﬁnition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t)   F1(x), . . . , Fn(x) : functions on Ω θ = {θ1 , . . . , θn } : parameters S = { p(x, θ) p(x; θ) > 0, ∫ Ω p(x; θ)dx = 1 } : statistical model   Deﬁnition 2.3 Sχ = {p(x; θ)} : χ-exponential family, deformed exponential family def ⇐⇒ Sχ := { p(x, θ)p(x; θ) = expχ [ n∑ i=1 θi Fi(x) − ψ(θ) ] , p(x, θ) ∈ S }   11/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY)   Proposition 2.4 (discrete distributions) The set of discrete distributions is a χ-exponential family for any χ   (Proof) Ω = {x0, x1, . . . , xn} Sn = { p(x; η) ηi > 0, n∑ i=0 ηi = 1, p(x; η) = n∑ i=0 ηiδi(x) } , η0 = 1 − n∑ i=1 ηi Set θi = logχ p(xi) − logχ p(x0) = logχ ηi − logχ η0 Then logχ p(x) = logχ ( n∑ i=0 ηiδi(x) ) = n∑ i=1 ( logχ ηi − logχ η0 ) δi(x) + logχ(η0) ψ(θ) = − logχ η0 12/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex)   {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log  1 + n∑ j=1 eθj   {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem)   13/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Example 2.5 (Student t-distribution (q-normal distribution)) Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane), q > 1. p(x; µ, σ) = 1 zq [ 1 − 1 − q 3 − q (x − µ)2 σ2 ] 1 1−q Set θ1 = 2 3 − q zq−1 q · µ σ2 , θ2 = − 1 3 − q zq−1 q · 1 σ2 . Then logq pq(x) = 1 1 − q (p1−q − 1) = 1 1 − q { 1 z1−q q ( 1 − 1 − q 3 − q (x − µ)2 σ2 ) − 1 } = 2µzq−1 q (3 − q)σ2 x − zq−1 q (3 − q)σ2 x2 − zq−1 q 3 − q · µ2 σ2 + zq−1 q − 1 1 − q = θ1 x + θ2 x2 − ψ(θ) ψ(θ) = − (θ1 )2 4θ2 − zq−1 q − 1 1 − q   The set of Student t-distributions is a q-exponential family.   14/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 15/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 3 Geometry of deformed exponential family (1) Sχ : a deformed exponential family ψ(θ) : strictly convex (normalization for Sχ)   sχ (x; θ) = ( (sχ )1 (x; θ), . . . , (sχ )n (x; θ) )T is the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). (1)   Statistical structure for Sχ  Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logχ p(x; θ) dx Dual aﬃne connections ∇M(e) , ∇M(m) : Γ M(e) ij,k (θ) = ∫ Ω ∂kp(x; θ)∂i∂j logχ p(x; θ)dx Γ M(m) ij,k (θ) = ∫ Ω ∂i∂jp(x; θ)∂k logχ p(x; θ)dx   (Sχ, ∇M(e) , gM ) and (Sχ, ∇M(m) , gM ) are Hessian manifolds. 16/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1)   Proposition 3.1 For Sχ, the following hold: (1) (Sχ, gM , ∇M(e) , ∇M(m) ) is a dually ﬂat space. (2) {θi } is a ∇M(e) -aﬃne coordinate system on Sχ. (3) Ψ(θ) is the potential of gM with respect to {θi }, that is, gM ij (θ) = ∂i∂jΨ(θ). (4) Set the expectations of Fi(x) by ηi = Eθ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set Φ(η) = −Iχ(pθ). =⇒ Φ(η) is the potential of gM with respect to {ηi}.   Iχ(pθ) = − ∫ Ω {Uχ(p(x; θ)) + (p(x; θ) − 1)Uχ(0)} dx, where Uχ(t) = ∫ t 1 logχ(s) ds, Uχ(0) = lim t→+0 Uχ(t) < ∞. the generalized entropy functional Ψ(θ) = ∫ Ω p(x; θ) logχ p(x; θ)dx + Iχ(pθ) + ψ(θ), the generalized Massieu potential 17/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) Construction of β-divergence (β = 1 − q)   uq(x; θ): a weighted score function def ⇐⇒ uq(x; θ) = (u1 q(x; θ), . . . , un q (x; θ))T ui q(x; θ) = p(x; θ)1−q si (x; θ) − Eθ[p(x; θ)1−q si (x; θ)].   From the deﬁnition of q-logarithm function, uq(x; θ) is written by ui q(x; θ) = ∂ ∂θi { 1 1 − q p(x; θ)1−q − 1 2 − q ∫ Ω p(x; θ)2−q dx } = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Hence, this estimating function is the bias-corrected q-score function. 18/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) By integrating uq(x; θ), and taking the expectation, we deﬁne a cross entropy by d1−q(p, r) = − 1 1 − q ∫ Ω p(x; θ)r(x; θ)1−q + 1 2 − q ∫ Ω r(x; θ)2−q dx Then the β-divergence (β = 1 − q) is given by D1−q(p, r) = −d1−q(p, p) + d1−q(p, r) = 1 (1 − q)(2 − q) ∫ Ω p(x)2−q dx − 1 1 − q ∫ Ω p(x)r(x)1−q dx + 1 2 − q ∫ Ω r(x)2−q dx Remark 3.2 A β-divergence D1−q induces Hessian manifolds (Sq, ∇M(m) , gM ) and (Sq, ∇M(e) , gM ). 19/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 20/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) 4 Geometry of deformed exponential family (2)   Deﬁnition 4.1 Pχ(x) : the escort distribution of p(x; θ), def ⇐⇒ Pχ(x; θ) = 1 Zχ(θ) χ(p(x; θ)), Zχ(θ) = ∫ Ω χ(p(x; θ))dx Eχ,θ[f(x)] : the χ-expectation of p(x) def ⇐⇒ the expectation of f(x) with respect to the escort distribution: Eχ,θ[f(x)] = ∫ f(x)Pχ(x; θ)dx = 1 Zχ(θ) ∫ f(x)χ(p(x; θ))dx     Deﬁnition 4.2 Sχ = {p(x; θ)}: a deformed exponential family gχ ij(θ) = ∂i∂jψ(θ) : the χ-Fisher information metric Cχ ijk(θ) = ∂i∂j∂kψ(θ) : the χ-cubic form   Set Γ χ(e) ij,k := Γ χ(0) ij,k − 1 2 Cχ ijk, Γ χ(m) ij,k := Γ χ(0) ij,k + 1 2 Cχ ijk, 21/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2)   Proposition 4.3 For Sχ, the following hold: (1) (Sχ, gχ , ∇χ(e) , ∇χ(m) ) is a dually ﬂat space. (2) {θi } is a ∇χ(e) -aﬃne coordinate system on Sχ. (3) ψ is the potential of gχ with respect to {θi }, that is, gχ ij(θ) = ∂i∂jψ(θ). (4) Set the χ-expectation of Fi(x) by ηi = Eχ,θ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gχ . (5) Set ϕ(η) = Eχ,θ[logχ p(x; θ)] =⇒ ϕ(η) is the potential of gχ with respect to {ηi}.   Proof: Statements 1, 2 and 3 are obtained from the deﬁnition of χ-Fisher metric and χ-cubic form. Statements 4 and 5 follow the fact that Eχ,θ[logχ p(x; θ)] = Eχ,θ [ n∑ i=1 θi Fi(x) − ψ(θ) ] = n∑ i=1 θi ηi − ψ(θ) 22/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. The generalized relative entropy Dχ of Sχ coincides with the canonical divergence D(r, p) on (Sχ, ∇χ(e) , gχ ). In fact, Dχ (pθ, rθ′) = Eχ,p [( n∑ i=1 θi Fi(x) − ψ(θ) ) − ( n∑ i=1 (θ′ )i Fi(x) − ψ(θ′ ) )] = ψ(θ′ ) + ( n∑ i=1 θi ηi − ψ(θ) ) − n∑ i=1 (θ′ )i ηi = D(rθ′, pθ). Tsallis relative entropy (q-exponential case)  Dq (p, r) = Eq,p [ logq p(x) − logq r(x) ] = 1 − ∫ p(x)q r(x)1−q dx (1 − q)Zq(p) = q Zq(p) D(1−2q) (p, r). The Tsallis relative entropy is conformal to α-divergence (α = 1−2q).   23/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. Construction of χ-relative entropy  sχ (x; θ): the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). p The χ-score is unbiased w.r.t. χ-expectation, Eχ,θ[(sχ )i (x; θ)] = 0. =⇒ We regard that sχ (x; θ) is a generalization of estimating function. By integrating the χ-score function, we deﬁne the χ-cross entropy by dχ (p, r) = − ∫ Ω P (x) logχ r(x)dx. Then we obtain the generalized relative entropy by Dχ (p, r) = −dχ (p, p) + dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)].   24/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5 Maximum q-likelihood estimators 5.1 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)] (p1(x) > 0, p2(y) > 0)   x > 0, y > 0 and x1−q + y1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y def ⇐⇒ x ⊗q y := [ x1−q + y1−q − 1 ] 1 1−q = expq [ logq x + logq y ]   expq x ⊗q expq y = expq(x + y), logq(x ⊗q y) = logq x + logq y. X and Y : q-independent with m-normalization (mixture normalization) def ⇐⇒ pq(x, y) = p1(x) ⊗ p2(y) Zp1,p2 where Zp1,p2 = ∫ ∫ XY p1(x) ⊗q p2(y)dxdy 25/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq.   Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) )   In the case q → 1, Lq is the standard likelihood function on Ξ.   expq(x1 + x2 + · · · + xN) = expq x1 ⊗q expq x2 ⊗q · · · ⊗q expq xN = expq x1 · expq ( x2 1 + (1 − q)x1 ) · · · expq ( xN 1 + (1 − q) ∑N−1 i=1 xi )   Each measurement inﬂuences the others.    26/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq.   Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) )   In the case q → 1, Lq is the standard likelihood function on Ξ.   ˆξ : the maximum q-likelihood estimator def ⇐⇒ ˆξ = arg max ξ∈Ξ Lq(ξ) ( = arg max ξ∈Ξ logq Lq(ξ) ) .     the q-likelihood is maximum ⇐⇒ the canonical divergence (Tsallis relative entropy) is minimum.   27/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) )  estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)]   Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) )  estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)]   28/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) )  estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)]   Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) )  estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)]   The notion of expectations, independence are determined from a geometric structure of the statistical model. 29/29

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### ORAL SESSION 8 Computational Aspects of Information Geometry in Statistics (chaired by Frank Critchley)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

A General Metric for Riemannian Hamiltonian Monte Carlo Michael Betancourt University College London August 30th, 2013 I’m going to talk about probability and geometry, but not information geometry! Instead our interest is Bayesian inference ⇡(✓|D) / ⇡(D|✓) ⇡(✓) Markov Chain Monte Carlo admits the practical analysis and manipulation of posteriors even in high dimensions Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Markov transitions can be though of as an “average” isomorphism that preserves a target distribution ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i MCMC performance is limited by complex posteriors, which are common in large dimensions Random walk Metropolis sampling explores only slowly Random walk Metropolis sampling explores only slowly Gibbs sampling doesn’t fare much better Gibbs sampling doesn’t fare much better RWM and Gibbs explore incoherently in large dimensions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i How do we generate coherent transitions? ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 How do we generate coherent transitions? ⇡T = ⇡ ( , B( ), ) t : M ! M, 8t 2 (M, B(M) , ⇡) T : M ! T⇤ M ! T⇤ M ! M Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map Marginalization We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) T We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) V Quadratic kinetic energies with constant metrics emulate dynamics on a Euclidean manifold ⇡(p|q) = N(0, M) T = 1 2 pipj M 1 ij The coherent flow the Markov chain along the target distribution, avoiding random walk behavior The coherent flow the Markov chain along the target distribution, avoiding random walk behavior Unfortunately, EHMC is sensitive to large variations in curvature As well as variations in the target density V = T = d 2 These weaknesses are particularly evident in hierarchical models x1 x2 xn 1 xn. . . v ⇡(x, v) = nY i=1 ⇡(xi|v) ⇡(v) These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models DV ⇡ 250 These weaknesses are particularly evident in hierarchical models DV ⇡ 250 Quadratic kinetic energies with dynamic metrics emulate dynamics on a Riemannian manifold ⇡(p|q) = N(0, ⌃(q)) T = 1 2 pipj ⌃ 1 (q) ij + 1 2 log |⌃(q)| Optimal numerical integration suggests using the Hessian, but the Hessian isn’t positive-definite ⌃(q)ij = @i@jV (q) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) @i@jV (q|D) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) ED [@i@jV (q|D)] We can regularize without appealing to expectations [exp(↵Hlj) exp( ↵Hlj)] 1 ·Hkl· ⌃ij(q) = [exp(↵Hik) + exp( ↵Hik)] The “SoftAbs” metric serves as a differentiable absolute value of the Hessian λ’ λ 1 / α The SoftAbs metric locally standardizes the target distribution The SoftAbs metric locally standardizes the target distribution And the log determinant admits full exploration of the funnel And the log determinant admits full exploration of the funnel The SoftAbs metric admits a general- purpose, practical implementation of RHMC -15 -10 -5 0 5 10 15 x1 -10 -5 0 5 10v

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

### ORAL SESSION 11 Probability on Manifolds (Marc Arnaudon)

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

 Télécharger Le téléchargement implique l’acceptation de nos conditions d’utilisation

Nonlinear
Modeling
and
Processing
Using
Empirical
Intrinsic
Geometry
with
Application
to
Biomedical
Imaging
Ronen
Talmon1,
Yoel
Shkolnisky2,
and
Ronald
Coifman1
1Mathematics
Department,
Yale
University
2Applied
Mathematics
Department,
Tel
Aviv
University

Geometric
Science
of
Information
(GSI
2013)
August
28-­‐30,
2013,
Paris
Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
2
•  Example
for
Intrinsic
Modeling
I
"   Molecular
Dynamics
"   Consider
a
molecule
oscilla)ng
stochas)cally
in
water

–  For
example,
Alanine
Dipep)de
"   Due
to
the
coherent
structure
of
molecular
mo)on,
we
assume
that
the
conﬁgura)on
at
any
given
)me
is
essen)ally
described
by
a
small
number
of
structural
variables
–
In
the
Alanine
case,
we
will
discover
two
factors,

corresponding
to
the
dihedral
angles
1 8 7 4 3 9 5 6 2 10 Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
3
•  Example
for
Intrinsic
Modeling
I
"   We
observe
three
atoms
of
the
molecule
for
a
certain
period,
three
other
atoms
for
a
second
period,
and
the
rest
in
the
last
period
"   The
is
to
describe
the
posi)ons
of
all
atoms
at
all
)mes
–  More
precisely,
derive
intrinsic
variables
that
correspond
to
the
dihedral
angles
and
describe
their
rela)on
to
the
posi)ons
of
all
atoms
–  We
always
derive
the
same
intrinsic
variables
(angles)
from
par)al
observa)ons
(independently
of
the
speciﬁc
atoms
we
observe)

–  If
we
learn
the
model,
we
can
describe
the

posi)ons
of
all
atoms
1 8 7 4 3 9 5 6 2 10 Introduc)on
Talmon,
Shkolnisky,
and
Coifman
4
•  Example
for
Intrinsic
Modeling
II
"   PredicBng
EpilepBc
Seizures
" Goal:
to
warn
the
pa)ent
prior
to
the
seizure
(when
medica)on
or
surgery
are
not
viable)
" Data:
intracranial
EEG
recordings
8/28/13
0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
Talmon,
Shkolnisky,
and
Coifman
5
•  Example
for
Intrinsic
Modeling
II
" Our
assump)on:
the
measurements
are
controlled
by
underlying
processes
that
represent
the
brain
ac)vity
" Main
Idea:
predict
seizures
based
on
the
“brain
ac)vity
processes”
" Challenges:
Noisy
data,
unknown
model,
and
no
available
examples
8/28/13
0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
6
•  Manifold
Learning
"   Represent
the
data
as
points
in
a
high
dimensional
space
"   The
points
lie
on
a
low
dimensional
structure
(manifold)
that
is
governed
by
latent
factors
"   For
example,
atom
trajectories
and
the
dihedral
angles
Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
7
•  Manifold
Learning
manifold
learning
techniques:
–  Laplacian
eigenmaps
[Belkin
&
Niyogi,
03’]
–  Diﬀusion
maps
[Coifman
&
Lafon,
05’;
Singer
&
Coifman,
08’]
Manifold
Learning
Parameteriza)on
of
the
manifold
Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
8
•  Formula)on
–
“State”
Space
" Dynamical
model:
let

be
a

-­‐dimensional
underlying
process
(the
state)
in
)me
index

that
evolves
according
to

where

are
unknown
dri
coeﬃcients
and

are
independent
white
noises

" Measurement
modality:
let

be
an

-­‐dimensional
measured
signal,
given
by
–

is
the
clean
observa)on
component
drawn
from
the
)me-­‐varying
pdf
–

is
a
corrup)ng
noise
(independent
of

)
–

is
an
arbitrary
measurement
func)on
" The
goal:
recover
and
track

given
zt = g(yt, vt) f(y; ✓) d✓i t = ai (✓t)dt + dwi t, i = 1, . . . , d Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
9
•  Manifold
Learning
for
Time
Series
" The
general
outline:
–  Construct
an
aﬃnity
matrix
(kernel)
between
the
measurements

,
e.g.,
–  Normalize
the
kernel
to
obtain
a
Laplace
operator
[Chung,
97’]
–  The
spectral
decomposi)on
(eigenvectors)
represents
the
underlying
factors
Manifold
Learning
k(zt, zs) = exp ⇢ kzt zsk2 " 'i 2 RN \$ ✓i t 2 RN Measurement
Modality
N N N N N Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
10
•  Intrinsic
Modeling
"   The
mapping
between
the
observable
data
and
the
underlying
processes
is
o`en
stochas)c
and
contains
measurement
noise
–  Repeated
observa)ons
of
the
same
phenomenon
usually
yield
diﬀerent
measurement
realiza)ons
–  The
measurements
may
be
performed
using
diﬀerent
instruments/sensors

"   Each
set
of
related
measurements
of
the
same
phenomenon
will
have
a
diﬀerent
geometric
structure
–  Depending
on
the
instrument
and
the
speciﬁc
realiza)on
–  Poses
a
problem
for
standard
manifold
learning
methods
Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
11
•  Intrinsic
Modeling
Intrinsic(Embedding( Observable(Domain(II(Observable(Domain(I( Par6al(Observa6on(I7A( Par6al(Observa6on(I7B( Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
12
•  How
to
Obtain
an
Intrinsic
Model?
"   Q:
Does
the
Euclidean
distance
between
the
measurements
convey
the
informa)on?
Realiza)ons
of
a
random
process
and
measurement
noise
"   A:
We
propose
a
new
-­‐
Empirical
Intrinsic
Geometry
(EIG)

[Talmon
&
Coifman,
PNAS,
13’]
–  Find
a
proper
high
dimensional
representa)on
–  Find
an
intrinsic
distance
measure:
robust
to
measurement
noise
and
modality

k(zt, zs) = exp ⇢ kzt zsk2 " Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
13
•  Geometric
Interpreta)on
"   Exploit
perturba)ons
to
explore
and
learn
the
tangent
plane
"   Compare
the
points
based
on
the
principal
direc)ons
of
the
tangent
planes
(“local
PCA”)
Underlying*Process* Measurement*1* Measurement*2* Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
14
•  The
Mahalanobis
Distance
"   We
view
the
local
histograms
as
feature
vectors
for
each
measurement

"   For
each
feature
vector,
we
compute
the
local
covariance
matrix
in
a
temporal
neighborhood
of
length

where

is
the
local
mean
"   Deﬁne
a
symmetric

-­‐dependent
distance
between
feature
vectors
Deﬁni)on
–
Mahalanobis
Distance
zt ! ht L C d2 C(zt, zs) = 1 2 (ht hs)T (C 1 t + C 1 s )(ht hs) Ct = 1 L tX s=t L+1 (hs µt)(hs µt)T Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
15
•  Results

"   Each
histogram
bin
can
be
expressed
as

where

are
the
histogram
bins

"   By
relying
on
the
independence
of
the
processes:
Assump)on
"   The
histograms
are
linear
transforma)ons
of
the
pdf

p(z; ✓) = Z g(y,v)=z f(y; ✓)q(v)dydv Lemma
"   In
the
histograms
domain,
any
sta)onary
noise
is
a
linear
transforma)on
p(z; ✓) Hj hj t = Z z2Hj p(z; ✓)dz Empirical
Intrinsic
Geometry
Assump)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
16
•  Results
The
Mahalanobis
distance:
"   Is
invariant
under
linear
transforma)ons,
thus
by
lemma,
noise
resilient
"   Approximates
the
Euclidean
distance
between
samples
of
the
underlying
process,
i.e.,

"   The
process

can
be
described
as
a
(possibly
nonlinear)
bi-­‐Lipschitz
func)on
of
the
underlying
process
"   We
rely
on
a
ﬁrst
order
approxima)on
of
the
measurement
func)on:

where

is
the
Jacobian,
deﬁned
as
k✓t ✓sk2 = d2 C(zt, zs) + O(kht hsk4 ) Theorem
[Talmon
&
Coifman,
PNAS,
13’]
ht Jt ht = JT t ✓t + ✏t Jji t = @hj @✓i Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
17
•  Rela)onship
to
Informa)on
Geometry
Q:
Does
the
structure
of
the
measurements
convey
the
informa)on?
A:
The
local
densi)es
of
the
measurements
do
and
not
par)cular
realiza)ons

" Informa)on
Geometry
[Amari
&
Nagaoka,
00’]:
–  Use
the
Kullback-­‐Liebler
divergence
approximated
by
the
Fisher
metric

where

is
the
Fisher
InformaBon
matrix
" EIG:
a
similar
data-­‐driven
metric:
consider
the
following
features
It D(p(zt; ✓)||p(zt0 ; ✓)) = ✓T t It ✓t lj t = ↵j log ⇣ hj t ⌘ Theorem
"

(underlying
manifold
dimensionality)
"

(feature
vectors
dimensionality)

It = JT t Jt Ct = JtJT t Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
18
•  Anisotropic
Kernel
"   Let

be
a
set
of
measurements
–  For
each
measurement,
we
compute
the
local
histogram
and
covariance
"   Construct
an

symmetric
aﬃnity
matrix
–  Approximates
the
Euclidean
distances
between
the
underlying
process
–  Invariant
to
the
measurement
modality
and
resilient
to
noise
"   The
corresponding
Laplace
operator

can
recover
the
underlying
process
"   Compute