GSI2013 - Geometric Science of Information
A propos
It emphasises an active participation of young researchers for deliberating emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.
Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for the industry.
The Conference will be therefore being held in areas of priority/focused themes and topics of mutual interest with a mandate to:
- Provide an overview on the most recent state-of-the-art
- Exchange of mathematical information/knowledge/expertise in the area
- Identification of research areas/applications for future collaboration
- Identification of academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will federate skills from Geometry, Probability and Information Theory to address the following topics among others. The conference proceedings, are published in Springer's Lecture Notes in Computer Science (LNCS) series.
Comités
Comité d'organisation
- Valérie Alidor - SEE, France https://www.see.asso.fr
- Catherine Moysan - Mines-Paristech
- Jean Vieille - SyntropicFactory http://www.syntropicfactory.com
Program chairs
- Jesus Angulo - Mines ParisTech, France http://cmm.ensmp.fr/~angulo/
- Frédéric Barbaresco - Thales, France http://www.thalesgroup.com
- Silvère Bonnabel - Mines-Paristech http://www.silvere-bonnabel.com/
- Arshia Cont - Ircam, France http://repmus.ircam.fr/arshia-cont
- Frank Nielsen - Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
Scientific committee
- Jesus Angulo - Mines ParisTech, France http://cmm.ensmp.fr/~angulo/
- Marc Arnaudon - Université de Bordeaux, France http://www.math.u-bordeaux1.fr/~marnaudo/
- Michael Aupetit - Qatar Computing Research Institute, Quatar http://michael.aupetit.free.fr/
- Frédéric Barbaresco - Thales, France http://www.thalesgroup.com
- Michèle Basseville - IRISA, France http://people.irisa.fr/Michele.Basseville/
- Silvère Bonnabel - Mines-Paristech http://www.silvere-bonnabel.com/
- Michel Boyom - Université de Montpellier, France http://www.i3m.univ-montp2.fr/
- Michel Broniatowski - University of Pierre and Marie Curie, France http://www.lsta.upmc.fr/Broniatowski/
- Paul Byande - Université de Montpellier
- Frédéric Chazal - INRIA, France http://geometrica.saclay.inria.fr/team/Fred.Chazal/
- Arshia Cont - Ircam, France http://repmus.ircam.fr/arshia-cont
- Arnaud Dessein - University of York http://imtr.ircam.fr/imtr/Arnaud_Dessein
- Michel Deza - Ecole Normale Supérieure Paris, CNRS, France http://www.liga.ens.fr/~deza/
- Stanley Durrleman - INRIA, France https://who.rocq.inria.fr/Stanley.Durrleman/index.html
- Edwin Hancock - University of York http://www-users.cs.york.ac.uk/erh/
- Nicolas Le Bihan - Université de Grenoble, CNRS, France - University of Melbourne, Australia http://www.gipsa-lab.grenoble-inp.fr/~nicolas.le-bihan/
- Jonathan Manton - The University of Melbourne http://people.eng.unimelb.edu.au/jmanton/
- Jean-François Marcotorchino - Thales, France https://www.thalesgroup.com/
- Bertrand Maury - Université Paris Sud, France http://www.math.u-psud.fr/~maury/
- Ali Mohammad-Djafari - Supelec, CNRS, France http://djafari.free.fr/
- Frank Nielsen - Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
- Richard Nock - Université des Antilles et de la Guyane, France - NICTA, Australia http://www.univ-ag.fr/rnock/index.html
- Xavier Pennec - INRIA, France http://www-sop.inria.fr/members/Xavier.Pennec/
- Michel Petitjean - Université Paris Diderot, CNRS, France http://petitjeanmichel.free.fr/itoweb.petitjean.html
- Gabriel Peyre - Université Paris Dauphine, CNRS, France http://gpeyre.github.io/
- Olivier Schwander - Ecole Polytechnique, France http://www.lix.polytechnique.fr/~schwander/en/
- Rodolphe Sepulchre - Cambridge University, Department of Engineering, UK http://www-control.eng.cam.ac.uk/Main/RodolpheSepulchre
- Hichem Snoussi - Université de Technologie de Troyes, France http://h.snoussi.free.fr/
- Alain Trouvé - ENS Cachan, France http://atrouve.perso.math.cnrs.fr/
Sponsors et organisateurs
Documents
Synthèse (Frédéric Barbaresco)
OPENING SESSION ()
Geometric Science of Information GSI’13 Frédéric BARBARESCO GSI’13 General Chair President of SEE SI2D Club (Signal, Image, Information & Decision) Société de l'électricité, de l'électronique et des technologies de l'information et de la communication SEE at a glance • Meeting place for science, industry and society • An officialy recognised non-profit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 6 Technical Commissions and 12 Regional Groups • Organizes conferences and seminars • Initiates and attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, Blanc-Lapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 2 periodical publications (REE, 3E.I) • Publishes 3 monographs each year • Present the Web: http://www.see.asso.fr and LinkedIn SEE group • Past SEE Presidents: Louis de Broglie, Paul Langevin, … 1883-2013: From SIE & SFE to SEE: 130 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 http://www.see.asso.fr/ GSI’13 Geometric Science of Information GSI’13 Sponsors SMF/SEE GSI’13 • >150 international attendees • 100 scientific presentations on 3 days • 3 keynote speakers • Yann OLLIVIER (Paris-Sud Univ.): “Information geometric optimization: The interest of information theory for discrete and continuous optimization” • Hirohiko SHIMA (Yamaguchi Univ.): “Geometry of Hessian Structures” dedicated to Prof. J.L. KOSZUL • Giovanni PISTONE (Collegio Carlo Alberto): “Nonparametric Information Geometry” • 1 Guest speaker • Shun-ichi Amari (RIKEN Brain Science Institute): “Information Geometry and Its Applications: Survey” • 4 social events • Welcome coktail at Ecole des Mines • Visit and Concert at IRCAM • Diner at Eiffel Tower • Visit of Minearology Museum at Ecole des Mines GSI’13: Dedicated to Jean-Louis KOSZUL WORK • Hessian Geometry and J.L. Koszul Works – Hirohiko Shima Book, « Geometry of Hessian Structures », world Scientific Publishing 2007, dedicated to Jean-Louis Koszul – Hirohiko Shima Keynote Talk at GSI’13 – Plenary Session chaired by Prof. M. Boyom on Hessian Information Geometry Jean-Louis Koszul J.L. Koszul, « Sur la forme hermitienne canonique des espaces homogènes complexes », Canad. J. Math. 7, pp. 562-576., 1955 J.L. Koszul, « Domaines bornées homogènes et orbites de groupes de transformations affines », Bull. Soc. Math. France 89, pp. 515-533., 1961 J.L. Koszul, « Ouverts convexes homogènes des espaces affines », Math. Z. 79, pp. 254-259., 1962 J.L. Koszul, « Variétés localement plates et convexité », Osaka J. Maht. 2, pp. 285-290., 1965 J.L. Koszul, « Déformations des variétés localement plates », .Ann Inst Fourier, 18 , 103-114., 1968 GSI’13 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 8085 (879 pages), ISBN 978-3-642-40019-3 • http://www.springer.com/computer/image+processing/boo k/978-3-642-40019-3 GSI’13 Topics • GSI’13 federates skills from Geometry, Probability and Information Theory: • shape spaces (geometric statistics on manifolds and Lie groups, deformations in shape space,…), • probability/optimization & algorithms on manifolds (structured matrix manifold, structured data/Information, …), • relational and discrete metric spaces (graph metrics, distance geometry, relational analysis,…), • computational and hessian information geometry, • algebraic/infinite dimensionnal/Banach information manifolds, • divergence geometry, • tensor-valued morphology, • optimal transport theory, • manifold & topology learning, … and applications (audio-processing, inverse problems and signal processing) GSI’13 Program Keynote/Guest Speakers Talks & Plenary Session in L108 Poincaré Amphi Cocktail and Guest Speaker Talk : 18h15 – 19h15 (Closure at 19h30) 08h30‐09h00 09h00‐10h00 10h00‐10h30 10h30‐12h35 12h35‐13h30 SCILAB “GSI” TOOLBOX Initiative (Amphi V107) Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B 13h30‐15h35 Relational Metric (chairman: Jean‐ François Marcotorchino) Algebraic/Infinite dimensionnal/Banach Information Manifolds (Chairman: Giovanni Pistone) Computational Information Geometry (chairman: Frank Nielsen) Hessian Information Geometry II (Chairman: Frédéric Barbaresco) Tensor‐Valued Mathematical Morphology (Chairman: Jesus Angulo) Geometry of Inverse Problems (Chairman: Ali Mohammad‐Djafari) Geometric Statistics on manifolds and Lie groups (Chairman: Xavier Pennec) Machine/Manifold/ Topology Learning (Chairmen: Michael Aupetit & Frédéric Chazal) Differential Geometry in Signal Processing (Chairman: Michel Berthier) 15h35‐16h05 Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B 16h05‐18h10 Discrete Metric Spaces (chairmen: Michel Deza & Michel Petitjean) Optimal Transport Theory (Chairmen: Gabiel Peyré & Bertrand Maury) Geometry of Audio Processing (Chairmen: Arshia Cont & Arnaud Dessein) Optimization on Matrix Manifolds (Chairman: Silvere Bonnabel) Divergence Geometry & Ancillarity (Chairman: Michel Broniatowski) Information Geometry Manifolds (Chairman: Hichem Snoussi) Entropic Geometry (Chairman: Roger Balian) Algorithms on Manifolds (Chairman: Olivier Schwander) Computational Aspects of Inform. Geometry in Statistics (Chairman: Frank Critchley) 18h15‐19h15 20h30‐22h30 Mineralogy Museum Visit Mineralogy Museum Visit Friday 30th of AugustWednesday 28th of August Thursday 29th of August Coffee Break / Poster session Opening Session + Keynote Speaker 1: Yann OLLIVIER Information‐Geometric Optimization: the Interest of Information Theory for Discrete and Continuous Optimization Amphi L108 Poincaré Welcome /Registration Welcome Welcome Keynote Speaker 2: Hirohiko SHIMA Geometry of Hessian Structures (dedicated to Prof. J.L. KOSZUL) Keynote Speaker 3: Giovanni PISTONE Nonparametric Information Geometry Amphi L108 Poincaré Amphi L108 Poincaré Concert at IRCAM Ecole des Mines, Terrasse of Hôtel de Vendôme Amphi L108 Poincaré IRCAM Eiffel Tower (1rst Floor) Coffee Break Plenary session: Hessian Information Geometry I (Chairman: Michel Boyom) Lunch Break at Ecole des Mines + Poster session (Chairman: Frédéric Barbaresco) Amphi L108 Poincaré End of GSI'13 Coffee Break GSI 2013 GALA DINNER RESTAURANT 58 TOUR EIFFEL, 1st FLOOR Lunch Break at Ecole des Mines Cocktail at Ecole des Mines Guest Speaker: Shun‐Ichi AMARI Information Geometry and Its Applications: Survey Closing Session Coffee Break Plenary session: Deformations in Shape Space (Chairman: Alain Trouvé) Coffee BreakCoffee Break / Poster session Lunch Break at Ecole des Mines Plenary session: Probability on Manifolds (Chairman: Marc Arnaudon) SCILAB GSI TOOLBOX • Contributing to “Geometric Science of Information” development, project of SCILAB “GSI” TOOLBOX is initiated, inviting contributors to write external modules that extend Scilab capabilities in specific fields of GSI (Information Geometry, Geometry of Structured Matrices, Statistics/optimization on Manifolds, …). These modules provide new features and documentation to Scilab users. A new website called "ATOMS Portal" has been released that host all external modules developed by external developers. These modules can be made available to Scilab users directly from Scilab console via a new feature named ATOMS (AuTomatic mOdules Management for Scilab), if the module author wishes it. • http://wiki.scilab.org/ATOMS • In parallel, external modules sources can now be managed through the new Scilab Forge. • http://forge.scilab.org/index.php/projects/ GSI’13 Coktail • AT ECOLE DES MINES • ON TERRASSE OF HÔTEL DE VENDÔME (exit of Lunch Break area) • Wednesday 28th of August, 18h15 – 19h15 • School is closed at 19h30 ! IRCAM Visit & Concert • AT IRCAM, 1 Place Igor-Stravinsky, 75004 Paris • METRO/RER: Châtelet-Les-Halles • Wednesday 28th of August • 19h30 – 20h00: 1rst Group (20 pers.) of IRCAM labs visit • 20h00 – 20h30: 2rst Group (20 pers.) of IRCAM labs visit • 20h30 – 21h30: Presentation & Demo/Concert of (Automatic Improvisation System with saxophonist player based on Information Geometry) (70 persons max) IRCAM Visit & Concert IRCAM LAB TOUR • We are glad to invite GSI participants for a lab tour of IRCAM. Ircam is a unique research center in the heart of Paris bringing artists and scientists together to foster research and creativity. Besides being a joint research venture between French Ministry of Culture, the CNRS, INRIA and Parisian Universities, it is also home to artists dealing with technological innovation in their works. • IRCAM has accompanied our efforts in Geometric Science of Information for some time now. IRCAM DEMO/CONCERT • The visit will be followed by a live demonstration of automatic improvisation, featuring Jazz saxophone and computer! The Lab Tour will be done in two separate sessions and for limited number of people at each session. • Participants are kindly asked to REGISTER at the registration desk for the visit and the demonstration. GSI’13 Gala Diner • AT Eiffel Tower, Champ de Mars, 5 Avenue Anatole France • First Floor of Eiffel Tower: 58 Tour Eiffel Restaurant • Thursday 29th of August , 20h30 – 22h30 • METRO Ticket & First Floor Lift Ticket will be given at GSI’13 Registration Desk Thursday 29th of August afternoon Visit of The Mineralogy Museum • AT Ecole des Mines • Visit by groups • Opening hours: 10h00-12h00 / 13h30-17h00 • More Information at GSI’13 Registration Desk Ecole des Mines de Paris • Special thanks to « Mathématiques et Systèmes » Department of Mines ParisTech: • Pierre Rouchon • Silvere Bonnabel • Jesus Angulo http://www.mines-paristech.eu/ Ecole des Mines de Paris • Since 1783 (230 years of sciences) GSI Topics & Ecole des Mines Students (Corps des Mines) Paul LEVY (general use of characteristic function in Probability) Henri POINCARE (Introduction of characteristic function in Probability) logor eψ 1869 François MASSIEU (introduction of characteristic function in Thermodynamic: Gibbs-Duhem Potentials) TT S /1 . 1 « je montre, dans ce mémoire, que toutes les propriétés d’un corps peuvent se déduire d’une fonction unique, que j’appelle la fonction caractéristique de ce corps» Roger BALIAN (metric for quantum states by hessian metric from Von- Neumann Entropy) EntropyS Sdds : 22 Roger Balian, 1986 DISSIPATION IN MANY-BODY SYSTEMS: A GEOMETRIC APPROACH BASED ON INFORMATION THEORY Massieu Work by Poincaré 1908 Characteristic Function by Poincaré 1912 Enjoy « Geometry » 1713-2013: 300 years birthday of Denis Diderot Diderot & d’Alembert Encyclopedy (Geometry chapter) Henri Poincaré / © Gallica Enjoy all « Geometries » 08h30‐09h00 Welcome /Registration Amphi V107 09h00‐10h00 Keynote Speaker 1: Yann OLLIVIER Information‐Geometric Optimization: the Interest of Information Theory for Discrete and Continuous Optimization 10h00‐10h30 Coffee Break L108 Poincaré Amphi Yann Ollivier, Paris‐Sud University, France Information‐geometric optimization: The interest of information theory for discrete and continuous optimization Biography Yann's research generally focuses on the introduction of probabilistic models on structured objects, and more particularly addresses the interplay between probability and differential geometry. He is currently Research scientist at the CNRS, currently in the Computer Science department at Paris‐Sud Orsay University, previously in the Mathematics department at the École Normale Supérieure in Lyon (2004–2010). He graduated to his PhD in Mathematics, under the supervision of M. Gromov and P. Pansu in 2003 and is accredited to supervise research since 2009 http://www.yann‐ollivier.org/rech/index
POSTER SESSION (Frédéric Barbaresco)
(3) is satisfied (3) is not satisfied Fig.10. The two cases of proposition 1. Fast Polynomial Spline Approximation for Large Scattered Data Sets via L1 Minimization Laurent Gajny, Éric Nyiri, Olivier Gibaru Laurent.GAJNY@ensam.eu, Eric.NYIRI@ensam.eu, Olivier.GIBARU@ensam.eu References [Her13] F.Hernoux, R.Béarée, L.Gajny, E.Nyiri, J.Bancalin, O.Gibaru, Leap Motion pour la capture de mouvement 3D par spline L1. Application à la robotique. GTMG 2013, Marseille, 27-28 mars 2013 [Gib1899] J. Willard Gibbs, lettre à l’éditeur, Nature 59 (April 27, 1899) 606. [Lav00] J.E. Lavery, Univariate Cubic Lp splines and shape-preserving multiscale interpolation by univariate cubic L1 splines. CAGD 17 (2000) 319 – 336. [Lav00bis] J.E. Lavery, Shape-preserving, multiscale tting of univariate data by cubic L1 smoothing splines, Comput. Aided Geom. Design, 17 (2000), 715-727. [NGA11] E. Nyiri, O. Gibaru, P. Auquiert, Fast L1 kCk polynomial spline interpolation algorithm with shape preserving properties. CAGD 28(1), 2011, 65 – 74. Behavior on noisy data set Context of the study The proposed method A brief state of the art We propose to develop a fast method of approximation by polynomial spline with no Gibbs phenomenon near to abrupt changes in the shape of the data [Gib1899]. Comparison between L1 and L2 regression line Fig2. Stability of L1 regression line against outliers . Cubic spline interpolation The L1 norm is robust against outliers. Sliding window for L1 interpolation The Lp smoothing splines Aims : Using theoretical results, develop a spline approximation method : • With shape-preserving properties. L1 norm. • With prescribed error. The control. • Fast for real-time use. Sliding window process. The problem and existence theory Fig3. A cubic spline is defined by its knots and associated first derivative values. Let (xi,yi), i=1,2,…,n, be n points in the plane, the Lp regression line problem is to find a line y=a*x+b* solution of : Let qi, ui i=1,2,…,n, be respectively n points in Rd and the associated parameters, The cubic spline interpolation problem is to find a curve such that: • (ui) = qi for all i=1,2,…,n. • is a polynomial function of degree at most 3 on each interval [ui,ui+1], The solution set is infinite. The C2 cubic spline interpolation problem leads to a unique solution which satisfies the following minimization property : Fig4. Gibbs phenomenon with cubic spline interpolation. It is a least square method or a L2 method. The L1 cubic spline interpolation Fig5. No Gibbs phenomenon with the L1 cubic interpolation. The L1 cubic spline interpolation introduced in [Lav00] consist in mininizing the following functional : over the set of C1 cubic splines which interpolates the points qi. + No Gibbs phenomenon. - Non-linear problem with non-unique solution. A sliding window process proposed in [Nyi2011] admits a linear complexity with the number of data points. Fig6. the sliding window process. We solve a sequence of local L1 problems and we keep only the derivative value at the middle point of each window. This method enables to obtain an algebraic solution and to manage the non-unicity of solutions. Fig7. Global method (left) and local method (right). The Lp smoothing splines are obtained by minimization of : Fig8. Overshoot with L2 smoothing spline. In this method, the parameter is not easy to choose. We have no control to the initial data points. Algebraic resolution on three points Iteration 1 Iteration 3 Iteration 6 Iteration 1 Iteration 3 Iteration 6 The three-point algorithm The -controlled L1 regression line for non-unicity cases Behavior of the method on a Heaviside data set When (3) is satisfied, the solution set may be infinite. To compute a relevant solution, we extend the window with supplementary more neighbours and solve : We consider the case n = 5 and we give more importance to the three middle point. Then we choose w2 = w3 = w4 > w1 = w5 = 1. Fig.11. The -controlled L1 regression line step in the three-point algorithm. Fig.12. Approximation of a Heaviside data set (left) and zoom on the top part of the jump (right). After applying the algorithm on the initial noisy data set, the spline solution is not smooth. However we can iterate the method. At step k+1, data points are the approximation points computed at step k. Using this process, we emphasize a smoothing phenomenon while keeping jumps. We are currently studying the proposed approach for the problem of approximation of functions. Fig.13. Approximation over noisy data sets. We apply these results to the treatment of sensor data from a low-cost computer vision system, the Leap Motion (See [Her13]). This system is able to capture very accurately fingers motion. When the user’s moves are too fast, we may observe holes in the data. Fig1. Motion capture by the leap motion and industrial robot steering. Fig.9. We look for spline solutions that do not deviate to initial data more than a given tolerance.
Target detection in non-stationary clutter background and Riemannian geometry Haiyan Fan 2013.05.21 Contents 1 Background 2 Methodology & Technology Road 3 Experiment Program & Results ContentsContents 4 Conclusions Background BackgroundBackground The emerging of Riemannian geometry approach brings out a new era of statistical signal processing The emerging of Riemannian geometry approach brings out a new era of statistical signal processing Non-stationary signal detection is gradually gaining importance Non-stationary signal detection is gradually gaining importance Many kinds of signal we meet today are non-stationary, for example Non-Gaussian sea clutter has essential non-stationarity Ultrasound Doppler Signals, obtained from the Physiological flows, are also non-stationary Riemannian manifold has a more natural description of the signal structure Barbaresco et al. has done much work in applying Riemannian metric to target detection of Radar signal Background Background Existing methods Review The understanding of the sentence “Riemannian manifold has a more natural description of the signal structure” Measured signals often belong to manifolds that are not vector spaces. In that case, processing the signal in flat Euclidean space is imprecise. Riemannian manifold satisfies the invariance requirements to build some statistical tools on transformation groups and homogeneous manifolds that avoids paradoxes. Background Existing methods Review The RG approach proposed by Barbaresco Autoregressive coefficient parameterization Riemannian metric & Riemannian distance Riemannian geodesico Riemannian median Targets detection Step 1 Step 2 Step 3 Step 4Step 5 In Step 1, a regularized Burg algorithm was used for parameterization of the signal by Barbaresco. Then the signal is mapped into a complex Riemannian manifold identified by the autoregressive coefficients. Riemannian distance and Riemannian median is derived for the manifold. The Principle of targets detection is : if a location has a good Riemannian distance from its Riemannian median, targets are supposed to appear in this location. Background Existing methods Review Methodology & Technology Road RG+SLAR Smooth Prior long AR model (SLAR) Riemannian geometry method (RG) Based on Barbaresco’s work, we extend the Riemannian geometry method for targets detection of non-stationary signal Methodology & Technology Road Better accommodate the non-stationarity of signal Inherit the RG technical road of Barbaresco Methodology smooth prior long AR model (SLAR) Pseudo-stationary spectral analysis for non-stationary signal in short analysis window Large model order fitted to relatively short analysis window Smoothness constraint to overcome the ill-posedness and spurious peaks brought by high order SLAR model Avoid underestim ation of m odel order as order selection criterion does Methodology Riemannian geometry approach(RG) Observation flow Complex Riemannian manifold Autoregressive coefficient parameterization by SLAR model SLAR part ... Guard cell Cell under test Guard cell …1R iR 1iR NR Computing Riemannian median Riemannian distance Threshold Detec tion RG approach Threshold is set by empirical value Technology roadmap Experiment Program & Results Experiment Program Experiment Program One typical instance of target detection in non-stationary clutter background is the problem of radar detection in non-Gaussian sea clutter the experiment part will use target detection in the presence of non-Gaussian sea clutter to demonstrate the performance of our proposed method Numeric experiments: simulated examples are given to validate performance of RG+SLAR method proposed in the paper, by comparing with Doppler filtering with DFT method and the RG approach with Regularized Burg algorithm(RG+ReBurg) Real targets detection: RG+SLAR method will applied to real target detection within sea clutter with McMaster IPIX radar data. Experiment Results Numeric experiment results The simulated Radar & targets parameters: Table 1 Radar Parameters Carrier Frequency Bandwidth Pulse repetition frequency Unambiguous Range interval Unambiguous velocity interval 10Ghz 10Mhz 10Khz 15Km 150m/s Table 2 Target Parameters Range SNR Rel_RCS velocity 2km -3dB -26.7dB 60m/s 3.8km 5dB -7.55dB 30m/s 4.4km 10dB 0dB -30m/s 4.4km 7dB -3dB -60m/s [1] SNR is the abbreviation of “Signal to Noise Ratio”. [2] Rel_RCS means relative RCS,. The relative RCS is RCS/ max (RCS) in dB. Max (RCS) is the maximum RCS of the 4 targets. Experiment Results Numeric experiment results RG+SLAR Figure 1 the range-velocity map of clutter cancelled data obtained from SLAR modeled spectral estimation. Here, the velocity axis is linearly mapped from the frequency. , is the speed of light, is the carrier frequency. Figure 2 range-velocity map obtained from the Riemannian median of the clutter canceled data based on the reflection coefficients parameterization using SLAR model Experiment Results Numeric experiment results RG+SLAR Table 3 Detected Target Parameters (RG+SLAR) Range Rel_RCS velocity 2km -30.8dB 61.81m/s 3.8km -12.5dB 31.39m/s 4.4km 0dB -29.89m/s 4.4km -3.38dB -62.05m/s Figure 3 Range with targets using RG+SLAR method Experiment Results Numeric experiment results Doppler filtering method & RG+ReBurg Figure 5 the Range-velocity map of clutter cancelled data. (a) The Range-velocity map of clutter cancelled data through spectral estimation of each range bin using regularized Burg algorithm. (b) The Range-velocity contour using Doppler filtering in the slow time. Experiment Results Numeric experiment results Doppler filtering method & RG+ReBurg Figure 6 the ambient estimation of clutter cancelled data. (a) The Range-velocity map of the Riemannian median of clutter-cancelled data parameterized by RG+ReBurg method (b) The estimation using Doppler filtering method. Experiment Results Numeric experiment results Doppler filtering method & RG+ReBurg Figure 7 detected Range peaks (a) The Range peaks detected by RG+ReBurg method. (b) The Range peaks detected by Doppler filtering method. Experiment Results Real targets detection The measured data we use is the file 19931118_023604_stare C0000.cdf collected by McMaster IPIX radar. Table 4 IPIX Radar Parameters Environment Value Geometry Value Radar Value Wind condition 0~60km/h Antenna azm. 170.2606º Unambig. Vel. 7.9872m/s Wind gust 90km/h Antenna elv. 359.5605º Range res. 15m Wave condition 0.8~3.8m Beam width 0.9º Carrier freq. 9,39Ghz Wave peak 5.5m Antenna gain 45.7dB PRF 1Khz [1] Unambig. Vel. is the abbreviation of “Unambiguous velocity”. [2] Range res. is the abbreviation of “Range resolution”. The average target to clutter ratio varies in the range 0-6 dB, and only one weak static target with small fluctuation is available in the range bin 8 (Primary target bin), with neighboring range bins 7-10, where the target may also be visible (Secondary target bin). Experiment Results Real targets detection Real targets detection results Figure 8 (a) is the Range-velocity contour of pre-processed data (b) The ambient estimation of pre-processed data based on the reflection coefficients parameterization using SLAR Experiment Results Real targets detection Real targets detection results Figure 9 Range bins with target. Primary target bin appears in range bin 8; the secondary target region spreads in 7-9 range bins. Figure 10 the velocity detection of the primary range bin 8 Conclusions Conclusions Conclusions A. Numeric and Real target detection experiments show that the proposed RG+SLAR method can attenuate the contamination brought by non-stationary clutter disturbance. B. The statistic depicting based on Riemannian geometry has higher accuracy of target detection than Doppler filtering based on DFT dose. C. The innovative idea of combing SLAR model and Riemannian geometry can achieve precise measurement of target location and velocity for non-stationary signal. Acknowledgement ! Reflection coefficients parameterization Riemannian metric Geodesic Riemannian median p 1 Step 2 Step 4 Riemannian distance ... CUT … Observation flow Complex Riemannian manifold Reflection coefficients parameterization by SLAR model 1θ iθ 1iθ Nθ Computing Riemannian median RG approach Riema dist ...... ...... guard cells guard cells threshold
Visual Point Set Processing with Lattice Structures : Application to Parsimonious Representations of Digital Histopathology Images Nicolas Lom´enie Universit´e Paris Descartes, LIPADE, SIP Group nicolas.lomenie@parisdescartes.fr Digital tissue images are too big to be processed with traditional image processing pipelines. We resort to the nuclear architecture within the tissue to explore such big images with geometrical and topological representations based on Delaunay triangulations of seed points. Then, we relate this representation to the parsimonious paradigm. Finally, we develop speciﬁc mathematical morphology operators to analyze any point set and contribute to the exploration of these huge medical images. Preliminary results proved good performance for both focusing on areas of interest and discrimination between slightly but signiﬁcantly varying nuclear geometric conﬁgurations. Keywords : Digital Histopathology ; Point Set Processing ; Mathematical Morphology Sparsity and Digital Histopathology The rationale : 1. Shape as a geometric visual point set vs. an assembly of radiometric pixels ; 2. Image Analysis/Pattern Recognition Issues over Geometric and hence Sparse repre- sentations ; 3. Versatile nature of digital high-content histopathological images : staining procedure, biopsy techniques → structural analysis. The statement : Promoting new representations for the exploration of Whole Slide Images (WSIs) by using the recently acknowledged sparsity paradigm based on geometric representations. In [Chen et al. 2001], Chen et al. relates Huo’s ﬁndings about general image analysis : ”In one experiment, Huo analyzed a digitized image and found that the humanly interpretable information was really carried by the edgelet component of the de- composition. This surprising ﬁnding shows that, in a certain sense, images are not made of wavelets, but instead, the perceptually important components of the image are carried by edgelets. This contradicts the frequent claim that wavelets are the optimal basis for image representation, which may stimulate discussion.” We propose a sparse representation of a WSI based on a codebook of representative cells that are translated over the seed points detected by a low level processing operator as illustrated below. We use a semantic sparse representation relying on the most robustly detected signiﬁcant tissue elements : the nuclei. WSInuclear(x, y) = (i,j)∈S δi,j(x, y) ∗ Cell Atom where S is a geometric point set corresponding to the nucleus seeds and Cell Atom is an atomic cell element image in the speciﬁc case of a 1-cell dictionary. S can be considered as a sparse representation of a WSI according to the given deﬁnition of a s-sparse vector x ∈ ℜd as given in [Needell & Ward 2012] : ||x||0 = |supp(x)| ≤ d << s (a) (b) (c) (d) (e) (f) (g) (h) (i) Sparse representation of a WSI illustrated with a tubule/gland structure ; (a) based on the (b) 1- atomic cell dictionary and the sparse represen- tation in (c) as a point set binary matrix S1 ; (d) Reconstruction of the tubule by convolution with a point set S1 obtained with a speciﬁc seed extractor ; (e) Superimposed with the gland ; (f) Reconstruction of the tubule by convolution with a point set S2 obtained with another speciﬁc seed extractor ; (g) superimposed with the gland struc- ture ; (h)(i) Sparse representations over a 1024 × 1024 sub-image of a more complex view out of a WSI (about 50 000 × 70 000 pixels size). In the ﬁeld of computational pathology, graph-based representations and geometric science of information are gaining momentum [Doyle et al. 2008]. R´ef´erences [Chen et al. 2001] Chen SS, Donoho DL, Saunders MA. (2001) Atomic Decomposition Basis Pursuit, SIAM Review, 3(1), 129-159. [Doyle et al. 2008] Doyle, S., Agner, S., Madabhushi, A., Feldman, M. and Tomaszewski, J. (2008). Automated Grading of Breast Cancer Histopathology Using Spectral Clustering with Textural and Architectural Image Features, 5th IEEE Interna- tional Symposium on Biomedical Imaging, 29 :496-499. [Needell & Ward 2012] Needell D, Ward, R. (2012) Stable image reconstruction using total variation minimization http://arxiv.org/abs/1202 Point Set Processing Point set processing in the manner of image processing is gaining momentum in the computer graphics community [Rusu & Cousins 2011] with the example of the Point Cloud Library (PCL : http://www.pointclouds.org) inspired by the GNU Image Manipulation Program (GIMP : http://www.gimp.org). At the same time, in the ﬁeld of applied mathematics, a new trend consists in adapting mature image analysis algorithms working on regular grids to parsimonious representa- tions like graphs of interest points or superpixels [Ta et al. 2009]. Applying mathematical morphology to graphs was ﬁrst suggested in [Heijmans et al. 1992] but never really came up with tractable applications. Nevertheless, the idea is emerging again with recent works by the mathematical morphology pioneers [Cousty et al. 2009] and was also related to the concept of α-objects in [Lom´enie & Stamon 2008] based on seminal ideas in [Lom´enie et al. 2000] and then applied to the modeling of spatial relations and histopathology in [Lom´enie & Racoceanu 2012]. Lattice Structures for Point Set Processing We refer the reader to [Lom´enie & Stamon 2011] for a detailed presentation of the ma- thematical morphology framework operating on point sets. But formally it is enough to deﬁne a lattice structure operating on unorganized point sets, or more precisely, on a tessellation of the space that embeds any point set S in a neighborhood system. For any point set S ⊂ ℜ2, it exists a Delaunay triangulation Del(S) deﬁning the aforementioned topology of the workspace. This mesh acts as the regular grid for a radiometric image. Then we deﬁne the complete lattice algebraic structure called L = (M(Del), ≤), where M(Del) is the set of meshes deﬁned on Del, that means the set of mappings from a triangle T in Del(S) to a φT value in ℜ that is M ∈ M(Del) = {(T, φ)}T∈Del, and where the partial ordering relation ≤ is deﬁned as follows : ∀M1 et M2 ∈ M(Del), M1 ≤ M2 ⇐⇒ ∀T ∈ Del, φ1 T ≤ φ2 T where φT is a positive measure of the k-simplex T in Del(S) related to the size, shape, area or visibility of the triangle [Lom´enie & Racoceanu 2012].The inﬁmum operators are deﬁned as follows : ∀M1 et M2 ∈ M(Del), inf(M1, M2) = {T ∈ Del, min(φ1 T , φ2 T )} sup(M1, M2) = {T ∈ Del, max(φ1 T , φ2 T )} Then, given the basic deﬁnition of an erosion and an involution c operators, we inherit the inﬁnite spectrum of theoretical well-sounded range of operators from mathematical mor- phology : ∀M ∈ M(Del), e(M) = {T ∈ Del, eT } and Mc = {T ∈ Del, 1 − φT } Left : The pyramid of structural operators we can obtain ranging from the fundamen- tal low-level erosion operator to the se- mantic high-level Ductal Carcinoma In Situ (DCIS) characterization and the represen- tation of spatial relationships like ’between’. Structural Analysis for Digital Histopathology Focusing Operators : (Top) Focusing on a tumorous area at magniﬁcation ×1 of the WSI ; (Down) Focusing on a small part of the WSI at ×20 Pattern Recognition Operators : (Above) Characterizing a DCIS structure with a structural bio-code’110’ based on our operators with a precise (Method 1) and a coarse seed nu- clei extractor (Method 2) at magniﬁcation ×40. (Below) New results on a small database. Type Nb samples Correct Biocodes Method 1 Method 2 DCIS(S) =′ 110′ 10 9 8 DCISpost(S) =′ 110′ 10 9 9 Tubule(S) =′ 101′ 10 10 10 Digital Histopathology and Geometric Information Science : great challenges to tackle in the coming decade [GE Healthcare 2012]. R´ef´erences [Cousty et al. 2009] Cousty, J., Najman, L., and Serra, J. (2009). Some morphological operators in graph spaces, Lecture Notes in Computer Science, Mathemati- cal Morphology and Its Application to Signal and Image Processing, Springer, 5720 :149-160. [GE Healthcare 2012] Pathology Innovation Centre of Excellence (PICOE). Digital Histopathology : A New Frontier in Canadian Healthcare. White Paper. Ja- nuary 2012. GE Healthcare. http://www.gehealthcare.com/canada/it/ downloads/digitalpathology/GE_PICOE_Digital_Pathology_A_New_ Frontier_in_Canadian_Healthcare.pdf . Accessed December 2012. [Heijmans et al. 1992] Heijmans, H., Nacken, P., Toet, A., & Vincent, L. (1992). Graph Morphology. Journal of Visual Communication and Image Representation, 3(1) :24-38. [Lom´enie et al. 2000] Lom´enie, N., Gallo, L., Cambou, N. & Stamon, G. (2000). Mor- phological Operations on Delaunay Triangulations. International Conference on Pattern Recognition, 556-59. [Lom´enie & Stamon 2008] Lom´enie, N. and Stamon, G. (2008). Morphological Mesh ﬁl- tering and alpha-objects, Pattern Recognition Letters, 29(10) :1571-79. [Lom´enie & Stamon 2011] Lom´enie, N. and Stamon, G. (2011). Point Set Analysis, Ad- vances in Imaging and Electron Physics, Peter W. Hawkes, San Diego : Academic Press, vol. 167, pp. 255-294. [Lom´enie & Racoceanu 2012] Lom´enie, N. and Racoceanu, D. (2012). Point set morpho- logical ﬁltering and semantic spatial conﬁguration modeling : application to micro- scopic image and bio-structure analysis, Pattern Recognition, 45(8) :2894-2911. [Rusu & Cousins 2011] , Rusu, R.B. and Cousins S. (2011) 3D is here : Point Cloud Library (PCL), IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China. [Ta et al. 2009] Ta, V.T., Lezoray, O., Elmoataz, A. and Schupp, S. (2009). Graph-based Tools for Microscopic Cellular Image Segmentation, Pattern Recognition, Special Issue on Digital Image Processing and Pattern Recognition Techniques for the Detection of Cancer, 42(6) :1113-25. Acknowledgement : This work is part of the SPIRIT project, program JCJC 2011 - ref : ANR-11-JS02-008-01 and of the MICO project, program TecSan 2010 - ref : ANR- 10-TECS-015. A free demonstrator can be downloaded at http://www.math-info. univ-paris5.fr/~lomn/Data/MorphoMesh.zip as an imageJ plugin.
Guest speech (Shun-Ichi Amari)
Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Applications of Information Geometry Statistical Inference Machine Learning and AI Convex Programming Signal Processing (ICA) Information Theory, Systems Theory Quantum Information Geometry High Order Asymptotics 1 1 , (u) : , , ˆu u , , n n p x x x x x ˆ ˆ T e E u u u u 1 22 1 1 ( )e G G n n 1 1G G :Cramér Rao 2 2 2 2 e m m M AG H H Linear regression: Semiparametrics 1 1 2 2 , , ,n n x y x y x y ' i i i i i i x y ' 2 , 0,i i N y x x y Least squares? 2 2 mle, TLS ˆmin : 1 , 0: 0 Neyman-Scott i i i i i ii i i i i i i i i i i x y L y x x yy n x x y x y x y x y x c Semiparametric Statistical Model 1 , ; , : ; ' , , i i i i i i n p x y x y x z ( , ) , ; , , ; ,i ix y p x y z p x y z d : parameter of interest : nuisance parameter :functional degrees of freedomz orthogonalized score , , , , : mixture modelN t t t V v x z v x u u c v dt Projected score Parallel Transports of Scores 0T r x E r x 1 2 1 2: ,r r E r x r xinner product , z zz r x r x E r x e , , , , z z p x z r x r x x z m 1 2 1 2, , z z z z z r r r r e m z , ; ,p x y T Example of estimating functions 2 2 , ; : : arbitrary function 1 1 exp 2 2 { } f x y k x y y x k c x y y x k x y Z dxdyd 0, 0, 0i i i ik x y y x 1 , 1 , ' , 1 ' , i i i y x n n y x y x nn E y xn mixture and unmixture of independent signals 2x 1s ns 2smx 1x 1 n i ij j j x A s x As ICA: Independent Component Analysis Information Geometry of ICA natural gradient estimating function stability, efficiency S ={p(y)} 1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y { ( )}p Wx r q ( ) [ ( ; ) : ( )] ( ) l KL p q r W y W y y compressed sensing 1 0L Lsolution is the same as solution? 2 min 0 : -sparseX ky 0 0 1 : min : min i i L L log 2 log 2 m N k m k N m Applications to Machine Learning Stochastic reasoning: Belief propagation Boosting Support vector machine Neural networks Clustering Optimization Stochastic Reasoning ( , , , , )p x y z r s ( , , | , ) , , ,... 1, 1 p x y z r s x y z 1 1 1 1 2 exp , 1, 1 e p , x s ij L i j i i i r q r r r i i s i i q k x c c c x x r q w x x h x i i x r i i x x x x Boltzmann machine, spin glass, neural networks Turbo Codes, LDPC Codes Information Geometry 0 0 0, exp , expr r r r r r M p M p c x x x x x 1, ,r L q x rM ' r M 0M ( ) exp{ ( )rq x c x Machine Learning Boosting : combination of weak learners 1 1 2 2, , , , , ,N ND y y yx x x 1iy , : , sgn ,f y h fx u x u x u Boosting generalization 1 expt t t t tQ Q y x Q y x yh x f , | E[ const]t tF P y x yh x : min :t tD P Q 1, ,t tD P Q D P Q Neural Networks Higher order correlations Synchronous firing Multilayer Perceptron Neural Firing 1x 2x 3x nx higher order correlations orthogonal decomposition 1 2( ) ( , ,..., ): joint probabilitynp p x x xx [ ]i ir E x [ , ]ij i jv Cov x x ----firing rate ----covariance: correlation 1,0ix 1 2{ ( , ,..., )}nS p x x x Correlations of Neural Firing 1 2 00 10 01 11 1 1 10 11 2 1 01 11 , , , , p x x p p p p r p p p r p p p 11 00 12 12 2 10 01 : : logr p p r r r r p p 1x 2x 2r 1r 1 2{( , ), }r r orthogonal coordinates firing rates correlations Multilayer Perceptrons i iy v nw x 21 ; exp , 2 , i i p y c y f f v x x x w x x 1 2( , ,..., )nx x x x 1 1( ,..., ; ,..., )m mw w v v Multilayer Perceptron 1 1, , , ; , i i m m y f v v v x w x w w neuromanifold ( )x space of functions singularities singularities Center of a cluster argmin , i i Dx x x K means clustering Total Bregman Divergence 2 : : 1 D TD x y x y •rotational invariance •conformal geometry Clustering : t center 1, , mE x x arg min , i i TDx x x E y T center of E x t center is robust 1, , ; 1 ; , nE n x x y x x z x y influence fun ;ction z x y robustas :cz y Linear Programming max log inner method ij j i i i ij j i i A x b c x A x bx Convex Cone Programming P : positive semi definite matrix convex potential function dual geodesic approach , minAx b c x あ
ORAL SESSION 1 Geometric Statistics on manifolds and Lie groups (Xavier Pennec)
10/15/13 1 A Subspace Learning of Dynamics on a Shape Manifold: A Generative Modeling Approach Sheng Yi* and H. Krim VISSTA, ECE Dept., NCSU Raleigh NC 27695 *GE Research Center, NY Thanks to AFOSR Outline • Motivation • Statement of the problem • Highlight key issues and brief review • Proposed model and solution • Experiments 10/15/13 2 Problem Statement X(t) Z(t) Looking for a subspace that preserve geometrical properties of data in the original space Related Work • Point-wise subspace learning – PCA, MDS, LLE, ISOMAP, Hessian LLE, Laplacian Mapping, Diffusion Map, LTSA [T. Wittman, "Manifold Learning Techniques: So Which is the Best?“, UCLA ] • Curve-wise subspace learning – Whitney embedding [D. Aouada, and H. K., IEEE Trans. IP, 2010] • Shape manifold – Kendall’s shape space • Based on landmarks – Klassen et al. shape space • Functional representation • Concise description of tangent space & Fast Implementation – Michor&Mumford’s shape space • Focus on parameterization • Complex description of tangent space& Heavy computation – Trouve and Younes Diff. Hom Approach 10/15/13 3 Contribution Summary • Proposed subspace learning is Invertible Original seq. Reconstructed seq. Subspace seq.
Contribution Summary • The parallel transport of representative frames defined by a metric on the shape manifold preserves curvatures in the subspace • Ability to apply an ambient space calculus instead of relying essentially on manifold calculus 10/15/13 4 Shape Representation • From curve to shape [Klassen et al.] α(s)= x(s),y(s)( )∈R2 ⇒ ∂ ∂s ∂α ∂s = cosθ(s),sinθ(s)( ) (simpleandclosedθ(s))\Sim(n) Closed: cosθ(s)ds 0 2π ∫ = 0 sinθ(s)ds 0 2π ∫ = 0 Rotation: 1 2π θ(s)ds 0 2π ∫ = π Dynamic Modeling on a Manifold • The Core idea27 ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt = Vi (Xt )dZi (t) i=1 dim( M ) ∑ ∈TXt M dim( M ) dZii (t) Zii (t) 10/15/13 5 Parallel Transport span Tangent along curve Tangent along curve Parallel Transport X0 X1 M [ Yi et al. IEEE IP, 2012] The core idea • Adaptively select frame to represent in a lower dimensional space ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt PCA on vectors parallel transported to a tangent space dZi (t) Rdim( M ) 10/15/13 6 Formulation of Curves on Shape Manifold A shape at the shape manifold Vectors span the tangent space of the shape manifold[ Yi et al. IEEE IP, 2012] A Euclidean driving process Vectors span a subspace A driving process in a subspace In original space: In a subspace: Core Idea • Restrict the selection of V to be parallel frames on the manifold • Advantage of using parallel moving frame: • Angles between tangent vectors are preserved. With the same sampling scheme, curvatures are preserved as well. • Given the initial frame and initial location on manifold, the original curve could be reconstructed 10/15/13 7 Core Idea • Find an L2 optimal V Euclidean distance is used here because it is within a tangent space of the manifold Core Idea Given a parallel transport on shape manifold, with some mild assumption we can obtain a solution as a PCA 10/15/13 8 Parallel Transport Flow Chart By definition of parallel transport Discrete approx. of derivation Tangent space of shape manifold is normal to b1,b2,b3 Tangent space of shape manifold is normal to b1,b2,b3 A linear system Experiments • Data Ø Moshe Blank et al., ICCV 2005 http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html Ø Kimia Shape Database Sharvit, D. et al.,. Content-Based Access of Image and Video Libraries,1998 • Walk • Run • Jump • Gallop sideways • Bend • One-hand wave • Two-hands wave • Jump in place • Jumping Jack • Skip 10/15/13 9 Reconstruction Experiment PCA in Euclidean space The proposed method More Reconstructions and Embeddings 10/15/13 10 Other Embedding Result Experiment on curvature preservation 10/15/13 11 Experiment on curvature preservation 10/15/13 12 Generative Reconstruction 10/15/13 13 Conclusions • A low dimensional embedding of a parallelly transported shape flow proposed • A learning-based inference framework achieved • A generative model for various shape- based activities is obtained
Xavier Pennec Asclepios team, INRIA Sophia- Antipolis – Mediterranée, France Bi-invariant Means on Lie groups with Cartan-Schouten connections GSI, August 2013 X. Pennec - GSI, Aug. 30, 2013 2 Design Mathematical Methods and Algorithms to Model and Analyze the Anatomy Statistics of organ shapes across subjects in species, populations, diseases… Mean shape Shape variability (Covariance) Model organ development across time (heart-beat, growth, ageing, ages…) Predictive (vs descriptive) models of evolution Correlation with clinical variables Computational Anatomy Statistical Analysis of Geometric Features Noisy Geometric Measures Tensors, covariance matrices Curves, tracts Surfaces Transformations Rigid, affine, locally affine, diffeomorphisms Goal: Deal with noise consistently on these non-Euclidean manifolds A consistent statistical (and computing) framework X. Pennec - GSI, Aug. 30, 2013 3 X. Pennec - GSI, Aug. 30, 2013 4 Statistical Analysis of the Scoliotic Spine Data 307 Scoliotic patients from the Montreal’s St-Justine Hosp 3D Geometry from multi-planar X-rays Articulated model:17 relative pose of successive vertebras Statistics Main translation variability is axial (growth?) Main rot. var. around anterior-posterior axis 4 first variation modes related to King’s classes [ J. Boisvert et al. ISBI’06, AMDO’06 and IEEE TMI 27(4), 2008 ] Morphometry through Deformations 5X. Pennec - GSI, Aug. 30, 2013 Measure of deformation [D’Arcy Thompson 1917, Grenander & Miller] Observation = “random” deformation of a reference template Deterministic template = anatomical invariants [Atlas ~ mean] Random deformations = geometrical variability [Covariance matrix] Patient 3 Atlas Patient 1 Patient 2 Patient 4 Patient 5 1 2 3 4 5 Hierarchical Deformation model Varying deformation atoms for each subject M3 M4 M5 M6 M1 M2 M0 K M3 M4 M5 M6 M1 M2 M0 1 … Subject level: 6 Spatial structure of the anatomy common to all subjects w0 w1 w2 w3 w4 w5 w6 Population level: Aff(3) valued trees X. Pennec - GSI, Aug. 30, 2013[Seiler, Pennec, Reyes, Medical Image Analysis 16(7):1371-1384, 2012] X. Pennec - GSI, Aug. 30, 2013 7 Outline Riemannian frameworks on Lie groups Lie groups as affine connection spaces A glimpse of applications in infinite dimensions Conclusion and challenges X. Pennec - GSI, Aug. 30, 2013 8 Riemannian geometry is a powerful structure to build consistent statistical computing algorithms Shape spaces & directional statistics [Kendall StatSci 89, Small 96, Dryden & Mardia 98] Numerical integration, dynamical systems & optimization [Helmke & Moore 1994, Hairer et al 2002] Matrix Lie groups [Owren BIT 2000, Mahony JGO 2002] Optimization on Matrix Manifolds [Absil, Mahony, Sepulchre, 2008] Information geometry (statistical manifolds) [Amari 1990 & 2000, Kass & Vos 1997] [Oller & Corcuera Ann. Stat. 1995, Battacharya & Patrangenaru, Ann. Stat. 2003 & 2005] Statistics for image analysis Rigid body transformations [Pennec PhD96] General Riemannian manifolds [Pennec JMIV98, NSIP99, JMIV06] PGA for M-Reps [Fletcher IPMI03, TMI04] Planar curves [Klassen & Srivastava PAMI 2003] Geometric computing Subdivision scheme [Rahman,…Donoho, Schroder SIAM MMS 2005] X. Pennec - GSI, Aug. 30, 2013 9 The geometric framework: Riemannian Manifolds Riemannian metric : Dot product on tangent space Speed, length of a curve Geodesics are length minimizing curves Riemannian Distance Operator Euclidean space Riemannian manifold Subtraction Addition Distance Gradient descent )( ttt xCxx )(yLogxy x xyxy xyyx ),(dist x xyyx ),(dist )(xyExpy x ))(( txt xCExpx t xyxy Unfolding (Logx), folding (Expx) Vector -> Bipoint (no more equivalent class) Exponential map (Normal coord. syst.) : Geodesic shooting: Expx(v) = g(x,v)(1) Log: find vector to shoot right (geodesic completeness!) 10 Statistical tools: Moments Frechet / Karcher mean minimize the variance Existence and uniqueness : Karcher / Kendall / Le / Afsari Gauss-Newton Geodesic marching Covariance (PCA) [higher moments] xyEwith)(expx x1 vvtt M M )().(.x.xx.xE TT zdzpzz xxx xx 0)(0)().(.xxE),dist(Eargmin 2 CPzdzpy y MM MxxxxxΕ X. Pennec - GSI, Aug. 30, 2013 [Oller & Corcuera 95, Battacharya & Patrangenaru 2002, Pennec, JMIV06, NSIP’99 ] 11 Distributions for parametric tests Generalization of the Gaussian density: Stochastic heat kernel p(x,y,t) [complex time dependency] Wrapped Gaussian [Infinite series difficult to compute] Maximal entropy knowing the mean and the covariance Mahalanobis D2 distance / test: Any distribution: Gaussian: 2/x..xexp.)( T xΓxkyN rOk n /1.)det(.2 32/12/ Σ rO /Ric3 1)1( ΣΓ yx..yx)y( )1(2 xxx t n)(E 2 xx rOn /)()( 322 xx [ Pennec, NSIP’99, JMIV 2006 ] X. Pennec - GSI, Aug. 30, 2013 Natural Riemannian Metrics on Lie Groups Lie groups: Smooth manifold G compatible with group structure Composition g o h and inversion g-1 are smooth Left and Right translation Lg(f) = g o f Rg (f) = f o g Natural Riemannian metric choices Chose a metric at Id:
Faculty of Science Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Stefan Sommer Department of Computer Science, University of Copenhagen August 30, 2013 Slide 1/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean data points on non-linear manifold Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean intrinsic mean µ Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean tangent space TµM Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean projection of data point to TµM Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean Euclidean PCA in tangent space Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • ﬁnds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • ﬁnds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean What happens when µ is a poor zero- dimensional descrip- tor? Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 HCA, est. var. 0.492 Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 HCA, est. var. 0.492 HCA - Horizontal Com- ponent Analysis HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of afﬁne subspaces • SDEs are deﬁned in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of afﬁne subspaces • SDEs are deﬁned in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of afﬁne subspaces • SDEs are deﬁned in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of afﬁne subspaces • SDEs are deﬁned in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of afﬁne subspaces • SDEs are deﬁned in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector ﬁelds W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) deﬁned by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative deﬁnition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates ﬁxed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 1 ﬁnd geodesic h1 with d dt h1|t=0 = u1 that 1 minimizes res(h1) = ∑N i=1 dM (xi ,πh1 (xi ))2 2 ﬁnd u2 ⊥ u1 such that x h2 t (xi) are geodesics • that pass πh1 (xi ) • with derivatives ˙x h2 0 (xi ) equal trans. Ph1 u2 • that minimize resh1 (h2) = ∑N i=1 dM (xi ,πh2 (xi ))2 3 ﬁnd u3 ⊥ {u1,u2} such that x h3 t (xi) are geodesics • that pass πh2 (xi ) • with derivatives ˙x h3 0 (xi ) par. transp. in h2 • that minimize resh2 (h3) = ∑N i=1 dM (xi ,πh3 (xi ))2 4 and so on . . . Parallel Transport and Local Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 9/14 Sample on S2 , horz.: uniform, vert.: normal −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 PGA −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 HCA Parallel trans- port along ﬁrst component: Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA deﬁnes a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA deﬁnes a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA deﬁnes a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA deﬁnes a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Components May Flip Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 11/14 x2 x3 x4 (p) x1 = 0 slice x4 x3 x1 (q) x2 = 0 slice −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x3 x 2 x1 (r) HCA visualization Figure: 3-dim manifold 2x2 1 −2x2 2 +x2 3 +x2 4 = 1 in R4 with samples from two Gaussians with largest variance in the x2 direction (0.62 vs. 0.42 ). (a,b) Slices x1 = 0 and x2 = 0. (c) The second HCA horizontal component has largest x2 component (blue vector) whereas the second PGA component has largest x1 component (red vector). Corpora Callosa Corpus callosum variation: 3σ1 along h1, 3σ2 along h2 Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 12/14 Corpora Callosa Corpus callosum variation: 3σ1 along h1, 3σ2 along h2 (Loading corporacallosa.mp4) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 12/14 Summary • Horizontal Component Analysis performs PCA-like dimensionality reduction in Riemannian manifolds • subspaces constructed from iterated frame bundle development • the implied coordinate system • is data adapted • preserves certain pairwise-distances and orthogonality • provides covariance interpretation • decorrelates curvature-adapted measure • provides conditionally congruent components • handles multi-modal distribution with spread over large-curvature areas Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 13/14 References - Sommer: Horizontal Dimensionality Reduction and Iterated Frame Bundle Development, GSI 2013. - Sommer et al.: Optimization over Geodesics for Exact Principal Geodesic Analysis, ACOM, in press. - Sommer et al.: Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations, ECCV 2010. http://github.com/nefan/smanifold Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 14/14
- 1 Parallel Transport with the Pole Ladder: Application to Deformations of time Series of Images Marco Lorenzi, Xavier Pennec Asclepios research group - INRIA Sophia Antipolis, France GSI 2013 - 2GSI 2013 Paradigms of Deformation-based Morphometry Cross sectional Longitudinal t1 t2Sub A Sub B Different topologies Large deformations Biological interpretation is not obvious Within-subject Subtle changes Biologically meaningful - 3GSI 2013 Sub B Template Combining longitudinal and cross-sectional t1 t2 Sub A t1 t2 ? - 4GSI 2013 Sub B Template Sub A Combining longitudinal and cross-sectional Standard TBM approach Focuses on volume changes only Scalar analysis (statistical power) No modeling Jacobian determinant analysis - 5GSI 2013 Sub A Template Combining longitudinal and cross-sectional Longitudinal trajectories Sub B Vector transport is not uniquely defined Missing theoretical insights - 6GSI 2013 Diffeomorphic registration Stationary Velocity Field setting [Arsigny 2006] v(x) stationary velocity field Lie group Exp(v) is geodesic wrt Cartan connections (non-metric) Geodesic defined by SVF Stationary Velocity Field setting [Arsigny 2006] v(x) stationary velocity field Lie group Exp(v) is geodesic wrt Cartan connections (non-metric) Geodesic defined by SVF LDDMM setting [Trouvé, 1998] v(x,t) time-varying velocity field Riemannian expid(v) is a metric geodesic wrt Levi-Civita connection Geodesic defined by initial momentum LDDMM setting [Trouvé, 1998] v(x,t) time-varying velocity field Riemannian expid(v) is a metric geodesic wrt Levi-Civita connection Geodesic defined by initial momentum Transporting trajectories: Parallel transport of initial tangent vectors M id v - 7GSI 2013 [Schild, 1970] P0 P’0 P1 A C curve P2 P’1 A’ From relativity to image processing The Schild’s Ladder - 8GSI 2013 Schild’s Ladder Intuitive application to images P0 P’0 T0 A T’0 SLA) time Inter-subjectregistration [Lorenzi et al, IPMI 2011] - 9GSI 2013 t0 t1 t2 t3 - 10GSI 2013 t0 t1 t2 t3 • Evaluation of multiple geodesics for each time-point • Parallel transport is not consistently computed among time-points - 11 P0 P’0 T0 A T’0 A) The Pole Ladder optimized Schild’s ladder -A’ A’ C geodesic GSI 2013 - 12GSI 2013 Pole Ladder Equivalence to Schild’s ladder Symmetric connection: B is the parallel transport of A Locally linear construction Pole ladder is the Schild’s ladder - 13GSI 2013 t1 t2 t3 t0 - 14GSI 2013 t0 t1 t2 t3 • Minimize the number of geodesics required • Parallel transport consistently computed amongst time-points - 15GSI 2013 Pole Ladder Application to SVF Setting [Lorenzi et al, IPMI 2011] B A + [ v , A ] + ½ [ v , [ v , A ] ] Baker-Campbell-Hausdorff formula (BCH) (Bossa 2007) - 16GSI 2013 Pole Ladder Iterative computation [Lorenzi et al, IPMI 2011] B A + [ v , A ] + ½ [ v , [ v , A ] ] A … v/n - 17 baseline Time 1 Time 4 … …ventricles expansion from the real time series Synthetic example GSI 2013 - 18 Comparison: •Schild’s ladder • Vector reorientation • Conjugate action • Scalar transport GSI 2013 Synthetic example EMETTEUR - NOM DE LA PRESENTATION - 19 Transport consistency Deformation Vector transport Scalar transport Scalar summary Scalar summary ( logJacobian det, …) Vector measure GSI 2013 Synthetic example - 20GSI 2013 Synthetic example - 21GSI 2013 Synthetic example Quantitative analysis • Pole ladder compares well with respect to scalar transport • High variability led by Schild’s ladder - 22 … … • Group-wise Statistics • Extrapolation Application on Alzheimer’s disease Group-wise analysis of longitudinal trajectories GSI 2013 - 23GSI 2013 Longitudinal changes in Alzheimer’s disease (141 subjects – ADNI data) ContractionExpansion Student’s t statistic - 24GSI 2013 Longitudinal changes in Alzheimer’s disease (141 subjects – ADNI data) Comparison with standard TBM Student’s t statistic Pole ladder Scalar transport • Consistent results • Equivalent statistical power - 25GSI 2013 Conclusions • General framework for the parallel transport of deformations (not necessarily requires the choice of a metric) • Minimal number of computations for the transport of time series of deformations • Efficient solution with the SVF setting • Consistent statistical results • Multivariate group-wise analysis of longitudinal changes Perspectives • Further investigations of numerical issues (step-size) • Comparison with other numerical methods for the parallel transport in diffeomorphic registration (Younes, 2007) - 26 Thank you GSI 2013
Keynote speech 1 (Yann Ollivier)
Objective Improvement in Information-Geometric Optimization Youhei Akimoto Project TAO – INRIA Saclay LRI, Bât. 490, Univ. Paris-Sud 91405 Orsay, France Youhei.Akimoto@lri.fr Yann Ollivier CNRS & Univ. Paris-Sud LRI, Bât. 490 91405 Orsay, France yann.ollivier@lri.fr ABSTRACT Information-Geometric Optimization (IGO) is a uniﬁed frame- work of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization prob- lem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO re- covers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-
ORAL SESSION 2 Deformations in Shape Spaces (Alain Trouvé)
Geodesic image regression with a sparse parameterization of diffeomorphisms James Fishbaugh1 Marcel Prastawa1 Guido Gerig1 Stanley Durrleman2 1 Scientific Computing and Imaging Institute, University of Utah 2 INRIA/ICM, Pitié Salpêtrière Hospital, Paris, France Image Regression 6 months 12 months 18 months8 months 14 months 16 months10 months Why image regression? • Extrapolation for change prediction • Align images and cognitive scores acquired at different times • Align subjects with scans acquired at different times • Improved understanding of normal and pathological brain changes 1 of 18 Previous Work Kernel regression Geodesic regression Geodesic regression Davis et al. ICCV 2007 Niethammer et al. MICCAI 2011 Singh et al. ISBI 2013 Require to store many model parameters ~ number of voxels Image evolution described by considerably fewer parameters Concentrated in areas undergoing most dynamic changes 2 of 18 Motivation for Sparsity Fewer parameters Location of parameters • Potential for greater statistical power – less noise in description • Concentrated in areas undergoing the most dynamic changes • Number of parameters should reflect complexity of anatomical changes, not the sampling of the images • Localize potential biomarkers 3 of 18 Compact and generative statistical model of growth Geodesic Image Regression Geodesic path on a sub-group of diffeomorphisms (Dupuis 98, Trouvè 95,98) 4 of 18 Geodesic Image Regression S0 = {c0, α0} I0 O1 O3 O2 5 of 18 Geodesic shooting to evolve control points S0 = {c0, α0} Methods: Shooting 6 of 18 Trajectory of control points defines flow of diffeomorphisms Physical pixel coordinates y follow the trajectory which evolves in time as Methods: Flow (5, 5, 60.25)Deformed images constructed by interpolation 7 of 18 Summary Of Method 1) Shoot control points 2) Trajectory defines flow 3) Flow pixel locations 4) Interpolate in baseline image 8 of 18 Subject to Shoot Flow Regression Criterion 9 of 18 Method Overview Gradient with respect to control points and initial momenta Gradient with respect to initial image Gradient of Regression Criterion 1) Flow voxel Yk(t) to time t and compute residual 2) Grey value in residual is distributed to neighboring voxels with weights from trilinear interoplation 3) Grey values accumulated for every observed image 10 of 18 Method Overview Sparsity on Initial Momenta Fast Iterative Shrinkage-Thresholding Algorithm (Beck 09) Use previous gradient of criterion without L1 penalty Threshold momentum vectors with small magnitude Select a small subset of momenta which best describe the dynamics of image evolution 11 of 18 Used in context of atlas building (Durrleman 12,13) Synthetic Evolution (2D) Generated by shooting baseline with 79,804 predefined momenta Time 1 Time 2 Time 3 Time 4 Time 5 Impact of sparsity parameter on model estimation 12 of 18 Synthetic Evolution (2D) From 79,804 to 67 momenta 13 of 18 Pediatric Brain Development (2D) Models estimated backwards in time with varying sparsity T1W image of same child over time 14 of 18 Method Overview Pediatric Brain Development (2D) From 45,435 to 47 momenta 15 of 18 Brain Atrophy in Alzheimer's Disease (3D) T1W image of same patient over time 70.75 years 71.38 years 71.78 years 72.79 years Six years predicted brain atrophy with 35,937 momenta 98% decrease in number of parameters 16 of 18 Conclusions Geodesic image regression framework: Decouples deformation parameters from image representation L1 penalty which selects optimal subset of initial momenta Number of parameters reduced with only minimal cost in terms of matching target data Future work: Kernels at multiple scales (Sommer 11) Other image matching metrics, LCC (Avants 07, Lorenzi 13) Combine with a framework for longitudinal analysis 17 of 18 This work was supported by: NIH (NINDS) 1 U01 NS082086-01 (4D shape HD) NIH (NICHD) RO1 HD055741 (ACE, project IBIS) NIH (NIBIB) 2U54 EB005149 (NA-MIC) Acknowledgments Thank you 18 of 18
On the geometry and the deformation of shapes represented by piecewise continuous Bézier curves with application to shape optimization Olivier Ruatta XLIM, UNR 7252 Université de Limoges CNRS Geometric Science of Information 2013 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 1 / 32 Motivation: Shapes optimisation problems Let Ω ⊂ P R2 such that for each ω ∈ Ω the frontier ∂ω of this "region" is a regular curve (i.e. piecewise continuous here). Let F : Ω −→ R+ be a positive real valued function. Problem Find ω0 ∈ Ω such that F(ω0) ≤ F(ω) for all ω ∈ Ω. Very often, the computation of F(ω) requires to solve a system of PDE. Two problems : The cost of the computation of F(ω) and its differentiability (and computation of derivatives also). Compatibility of the space of shape and discretization of R2 for the system of PDE. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 2 / 32 Shapes optimization problems R+ M ∂ω F F(ω) ω Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 3 / 32 Shapes optimization methods Geometric gradient techniques (level sets, . . .) : compute how to deform the frontier of the shape and try to deform in a coherent way [Hadamard, Pierre, Henrot, Allaire, Jouve, . . .]. Relaxation method (SIMP, . . .) : compute a density that represent the support of the shape [Bensœ,Sigmund, . . .]. Topologic gradient : generally for PDEs, remove or add ﬁnites elements contening the shape [Masmoudi, Sokolowski, . . .]. Parametric optimization : reduce the shapes to a little space controlled by few parameters and look the problem as a parametric optimization problem [Elyssa (Didon in latin, 4 century before J.-C.),. . .,Goldberg,. . .]. Our approach : try to mix the best aspects of the ﬁrst (level sets) and the last (parametric) approaches. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 4 / 32 Bézier curves Let P0, . . . , Pd ∈ R2, we deﬁne a family of curves parametrized over [0, 1] : B([P0], t) := P0 (degree 0 Bézier curve). B([P0, P1], t) := (1 − t)B([P0], t) + tB([P1], t) = (1 − t)P0 + tP1 (degree 1 Bézier curve) . . . B([P0, . . . , Pd ], t) := (1 − t)B([P0, . . . , Pd−1], t) + tB([P1, . . . , Pd ], t) (degree d Bézier curve). Those are polynomial curves. The points P0, . . . , Pd ∈ R2 are called the control polygon of the curve deﬁned by B([P0, . . . , Pd ], t). Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 5 / 32 Bézier curves P0 P1 P2 P3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 6 / 32 Bernstein polynomials Deﬁnition Let d be a positive integer, for all i ∈ {0, . . . , d} we deﬁne: bi,d (t) = d i (1 − t)d−i ti . The polynomials b0,d , . . . , bd,d are called the Bernstein polynomials of degree d. Proposition The Bernstein polynomials of degree d, b0,d , . . . , bd,d , form a basis of the vector space of polynomials of degree less or equal to d. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 7 / 32 Bernstein polynomials and Bézier curves Theorem B([P0, . . . , Pd ], t) = d i=0 Pibi,d (t) Corollary Every parametrized curve with polynomial parametrization of degree at most can be represented as a Bézier curve of degree at most d. Deﬁnition We deﬁne Bd the space of all Bézier curves of degree at most d. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 8 / 32 Structure of Bd We denote E = R2 and we consider the following map: Ψd : Ed+1 −→ Bd deﬁned by Ψd (P0, . . . , Pd ) = B([P0, . . . , Pd ], t). Proposition Ψd is a linear isomorphism between Ed+1 and Bd . Let t = t0 = 0 < t1 < · · · < td = 1 be a subdivision of [0, 1]. We deﬁne the sampling map: St,d : Γ(t) ∈ Bd −→ (Γ(t0), · · · , Γ(td )) ∈ Ed+1 . Proposition St,d is a linear isomorphism between Bd and Ed+1. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 9 / 32 Evaluation-Interpolation Let t = t0 = 0 ≤ t1 ≤ · · · ≤ td = 1 be a subdivision of [0, 1] and let P0, . . . , Pd ∈ E. Bt,d := b0,d (t0) · · · bd,d (t0) ... ... ... b0,d (td ) · · · bd,d (td ) . Proposition (Evaluation) Bt,d PT 0 ... PT d = B([P0, . . . , Pd ], t0)T ... B([P0, . . . , Pd ], td )T . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 10 / 32 Multi-evaluation t = (0, 1/3, 2/3, 1), MT 0 MT 1 MT 2 MT 3 = Bt,3 PT 0 PT 1 PT 2 PT 3 P0 P1 P2 P3 M0 M1 M2 M3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 11 / 32 Evaluation-Interpolation Let t = t0 = 0 ≤ t1 ≤ · · · ≤ td = 1 be a subdivision of [0, 1] and let M0, . . . , Md ∈ E. Problem Find P0, . . . , Pd ∈ E such that B([P0, . . . , Pd ], ti) = Mi for all i ∈ {0, . . . , d}. Proposition (Interpolation) The points deﬁned by: PT 0 ... PT d = B−1 t,d B([P0, . . . , Pd ], t0)T ... B([P0, . . . , Pd ], td )T solve the problem. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 12 / 32 Summary 1 We have 3 spaces: Pd Ed+1 the vector space of the control polygons, St,d Ed+1 the vector space of the sampling of Bézier curves associated to a subdivision t, Bd the vector space of the degree d Bézier parametrizations. Proposition The following diagram of isomorphisms is commutative: Pd Ψd −→ Bd Bt,d ↓ Et,d St,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 13 / 32 Deformation problem Let Γ(t) := B([P0, . . . , Pd ], t) be a degree d Bézier curve s.t. Et,d (Γ) = M = MT 0 ... MT d and let δM := δMT 0 ... δMT d ∈ TMSt,d , we consider the following problem: Problem (Deformation problem) Denoting P = PT 0 ... PT d ﬁnd δP ∈ TPPd such that Λ(t) := B([P0 + δP0, . . . , Pd + δPd ], t) satisﬁes Λ(ti) = Mi + δMi for all i ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 14 / 32 Deformation problem P1 P2 P3 M0 M1 M2 M3 P0 δM0 δM1 δM2 δM3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 15 / 32 Deformation curve Proposition (Deformation polygon) Taking δP = B−1 t,d δM, the curve Ψd (P + δP) is a solution of the "Deformation problem". δP ∈ TPPd is called the deformation polygon and Ψd (δP) ∈ TB([P0,...,Pd ],t)Bd is called the deformation curve. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 16 / 32 Piecewize Bézier curves Let P1 := (P1,0, . . . , P1,d ) ∈ Pd , P2 := (P2,0, . . . , P2,d ) ∈ Pd , . . . and PN := (PN,0, . . . , PN,d ) ∈ Pd be such that P1,d = P2,0, . . . , PN−1,d = PN,0. We deﬁne the following parametrization: B((P1, . . . , PN), t) = B([P1,0, . . . , P1,d ], N ∗ t) if t ∈ [0, 1/N[ B([P2,0, . . . , P2,d ], N ∗ t − 1) if t ∈ [1/N, 2/N[ ... B([PN,0, . . . , PN,d ], N ∗ t − (N − 1)) if t ∈ [N−1 N , N] We denote Ψ(P1, . . . , PN) := B((P1, . . . , PN), t). This is a continuous curve joining P1,0 to PN,d and the curve is a loop if P1,0 = PN,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 17 / 32 Piecewize Bézier curves P1,3 = P2,0 P1,0 P1,1 P1,2 P2,1 P2,2 P2,3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 18 / 32 Vector space of Piecewize Bézier curves We deﬁne : The vector space PN,d = {(P1, . . . , PN)|P1,d = P2,0, . . . , PN−1,d = PN,0} ⊂ PN d . The vector space LN,d = {(P1, . . . , PN)|P1,d = P2,0, . . . , PN−1,d = PN,0, P1,0 = PN,d } ⊂ PN d . The vector space of PBC BN,d = {B((P1, . . . , Pd ), t)|(P1, . . . , Pd ) ∈ PN,d }. The vector space of PBL Bc N,d = {B((P1, . . . , Pd ), t)|(P1, . . . , Pd ) ∈ LN,d }. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 19 / 32 Sampling piecewize Bézier curves Let t = (t1,0 := 0, t1,1 := 1 N∗d , . . . , t1,d := 1 N , t2,0 = 1 N , . . . , tN−1,d := N−1 N , tN,0 := N−1 N , . . . , tN,d := 1) be a multi-regular subdivision and denote ti := (ti,0, . . . , ti,d ). We deﬁne the following linear map: Et,N,d : λ(t) ∈ BN,d −→ (λ(t1,0), . . . , λ(tN,d )) ∈ St,N,d ⊂ SN d The same way we deﬁne: Ec t,N,d : λ(t) ∈ Bc N,d −→ (λ(t1,0), . . . , λ(tN,d )) ∈ Sc t,N,d ⊂ SN d Finally, we deﬁne: Bt,N,d : (P1,0, . . . , PN,d ) ∈ PN,d −→ Bt1,d × · · · × BtN ,d PT 1,0 ... PT N,d ∈ St,N,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 20 / 32 Summary 2 Proposition The following diagram of isomorphisms is commutative: PN,d ΨN,d −→ BN,d Bt,N,d ↓ Et,N,d St,N,d . Remark B−1 t,N,d := B−1 t1,d × · · · × B−1 tN ,d Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 21 / 32 Deformation problem for PBC Let Γ(t) ∈ BN,d be s.t. Et,N,d (Γ) = M := MT 1,0 ... MT N,d ∈ St,N,d and let δM = δMT 1,0 ... δMT N,d ∈ TMSt,N,d , we consider the following problem: Problem (Deformation problem for PBC) Denoting P = PT 1,0 ... PT N,d ﬁnd δP ∈ TPPN,d such that Λ(t) := B((P0 + δP0, . . . , Pd + δPd ), t) satisﬁes Λ(ti,j) = Mi,j + δMi,j for all i ∈ {1, . . . , N} and j ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 22 / 32 Deformation Piecewize Bézier curve Proposition (Deformation polygons) Taking δP = B−1 t,N,d δM, the curve ΨN,d (P + δP) is a solution of the "Deformation problem for PBC". δP ∈ TPPN,d is called the deformation polygons and ΨN,d (δP) ∈ TB((P0,...,Pd ),t)BN,d is called the deformation piecewize Bézier curve. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 23 / 32 Back to shapes optimization Let ω ⊂ E be such that ∂ω is a piecewise continuous curve and F : P(E) −→ R+ the objective functional. The geometric gradient F(ω) : M ∈ ∂ω −→ F(ω)(M) ∈ TME give a perturbation for each point of the frontier to decrease the objective functional. R+ M ∂ω F F(ω) ω F(ω)(M) Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 24 / 32 Basic idea of the approach The space of admissible shape is ΩN,d := ω ∈ P(E)|∂ω ∈ Bc N,d and let ω ∈ ΩN,d such that ∂ω = B((P1, . . . , PN), t). Let M = MT 1,0 ... MT N,d = Et,N,d (B((P1, . . . , PN), t), to obtain a better shape we compute δM = F(ω)(M1,0)T ... F(ω)(MN,d )T . Then we compute δP = B−1 t,N,d δM and let λ(t) = B((P + δP), t), we have: Proposition λ(ti,j) = Mi,j + F(ω)(Mi,j) for all i ∈ {1, . . . , N} and j ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 25 / 32 Basic theoretical contribution Let Ω = {ω ∈ P(E)|∂ω ∈ C0 p([0, 1], E)} and F : Ω −→ R+ be a smooth function such that F(ω) : ∂ω −→ TE is everywhere well deﬁned. For every γ ∈ BN,d we associate "the" shape γ such that ∂γ = γ. Proposition For every N and d integer and every compatible subdivision t, we associate to F a vector ﬁeld VF : BN,d −→ TBN,d by: B((P), t) −→ ΨN,d B−1 t,N,d ( F(B(P, t))(B((P), t1,0)))T ... ( F(B(P, t))(B((P), tN,d )))T . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 26 / 32 Optimal shapes and ﬁxed points Proposition Let ω ∈ Ω such that F(ω) ≡ 0 then, for all N and d and every compatible subdivision t there is γ ∈ BN,d satisfying VF (γ) = 0. In other words, every optimum of F induce at list a ﬁxed point of VF over BN,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 27 / 32 Meta-algorithm for shapes optimization Input: An initial shape ω s.t. ∂ω = B((P), t) ∈ BN,d Output: The control polygon P a the frontier of a local minimum of F(ω). λ ← B((P), t) while criterium not satiﬁed do δP ← B−1 t,N,d (Et,N,d (λ)) P ← P + δP λ ← ΨN,d (P) end while Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 28 / 32 Snake-like algorithm for omnidirectional images segmentation Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 29 / 32 Snake-like algorithm for omnidirectional images segmentation Image segmentation can be interpreted as a shapes optimization problem using "snake-approach". The geometric gradient is built from: a "balloon" force making the contour expand, the gradient of the intensity of the image (vanishing at the contours). We use a classical approach: Canny ﬁlter to detect contours. This problem is used to detect free space for an autonomous robot with a catadioptric sensor. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 30 / 32 Snake-like algorithm for classical images segmentation Joint work with Ouiddad Labbani-I. and Pauline Merveilleux-O. of "Université de Picardie - Jules Vernes". Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 31 / 32 Snake-like algorithm for omnidirectional images segmentation Joint work with Ouiddad Labbani-I. and Pauline Merveilleux-O. of "Université de Picardie - Jules Vernes". Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 32 / 32
Random Spatial Structure of Geometric Deformations and Bayesian Nonparametrics Xavier Pennec Asclepios Research Project INRIA Sophia Antipolis Christof Seiler Department of Statistics Stanford University Susan Holmes Department of Statistics Stanford University GSI2013 - Geometric Science of Information, Paris, http://www.gsi2013.org/ 2 Clinical question Group 1: Back pain patients Group 2: Abdominal pain patients 3 Clinical question What is the right basis to compare anatomical structures? Geometric differences between groups? Learn shape and number of parts from data. 4 Example: Motion-based segmentation of 3D objects [Soumya Ghosh, Erik B. Sudderth, Matthew Loper, and Michael J. Black, From Deformations to Parts: Motion-based Segmentation of 3D Objects, NIPS 2012] Reference pose for female body Five example poses out of 56 Manual segmentati on ddCR P 5 Template Random deformations Motion-based partitioning of geometric deformations Patient 1 Patient 2 … deform Deformations parameterized as stationary velocity fields and estimated using: [Vercauteren et al., NeuroImage 2009] [Ashburner et al., NeuroImage 2007][Hernandez et al., ICCV 2007] [Lorenzi et al., NeuroImage 2013] 6 Bayesian model of the anatomy [V. Arsigny, O. Commowick, N. Ayache, X. Pennec, A Fast and Log-Euclidean Polyaffine Framework for Locally Linear Registration, J Math Imaging Vis 2009] Observed velocity vectors Subset of voxel coordinates: Distribution on partitions Velocity vector noise: Multivariate normal with mean 0 and covariance Deformation parameter: Multivariate normal with mean 0 and diagonal covariance 7 Prior on deformation parameters Hyperparamet ers: Prior: Concentration matrix Skew symmetric part / rotation Symmetric part / scaling (and shearing) Translati on Connection: Affine transformations (motion) and velocity field? Let t = 1, velocity vectors are consistent with the transformation A = exp(M). linear ODE with analytic solution: Initial condition at t=0 8 Simulated samples from prior aw M i.i.d. from multivariate normal. perparameters: Mean = 0, variance rotation, scaling = 0.01, and translation = 1. Decompose into rotation angle (Rodrigues' formula, SVD) and volume change (determinant): Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volumechanges Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 9 x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Sample link for x11 x11 x12 x13 x21 x22 x23 Sample link for all voxels x11 x12 x13 x21 x22 x23 Partitions are given by the link structure Distances between voxels Decay function: Self linking probability Prior on spatial partitions: Distance dependent Chinese Restaurant Process [D.M. Blei, P.I. Frazier, Distance Dependent Chinese Restaurant Processes, Journal of Machine Learning Research 12 (2011) 2383-2410] 10 Sample partitions with the Gibbs sampler x11 x12 x13 x21 x22 x23 Step 1: Delete link x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Step 2: Sample new link Splitting partitions Rejoining partitions x11 x12 x13 x21 x22 x23 Step 3: New partitions Same as before 11 Sample partitions with the Gibbs sampler x11 x12 x13 x21 x22 x23 Step 1: Delete link x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Step 2: Sample new link Joining partitions Linking to itself x11 x12 x13 x21 x22 x23 Step 3: New partitions Same as before 12 Link to data / Model selection Fixed velocity noise hyperparameter Voxel coordinates Observed: Prior on deformation parameters How well does the data (velocity field) fit a given model (partition) for all possible parameters? Answer: Marginal likelihood 13 Inference with the Gibbs sampler x11 x12 x13 x21 x22 x23 Separate partitions Joined partitions x11 x12 x13 x21 x22 x23 Marginal likelihood for all t velocity fields Results – 2D Velocity Fields 14 Target: 15 Results – 3D Velocity Fields Front view Lateral view Scaling Rotation Translation Rotation 16 Results – 3D Velocity Fields Front view Initializ e Step 1 Step 2 Rotation and translation Scaling and translation 17 Results – 3D Velocity Fields Templat e Patient 1 Patient 2 Partitio n 18 Results – 3D Velocity Fields – Spine Template Step 10 Step 20 Step 30 Abdomin al pain Back pain 19 Conclusions Nonparametric way of estimating the number and structure of partitions. Incorporating uncertainty in a Bayesian fashion (avoiding overfitting). Prior with medically intuitive interpretation. Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volume changes Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 x11 x12 x13 x21 x22 x23 20 Next step and open questions With more data are partitions of the two groups drawn from the same distribution? Group 1 Group 2 Compute posterior of deformation parameters Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volume changes Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 21 Thanks for your attention! 22 23 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines [Vercauteren et al., NeuroImage 2009] [Ashburner et al., NeuroImage 2007][Hernandez et al., ICCV 2007] [Lorenzi et al., NeuroImage 2013] 24 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines 25 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines 26 Dirichlet Process is the de Finetti measure for the Chinese Restaurant Process Two different ways to sample from exchangeable distributions. "parallel" construction First sampling some latent object that then renders all the sequence elements conditionally independent. • [Gosh, van der Vaart, Fundamentals of Nonparametric Bayesian Inference (Chapters 1-5 of book draft)] • [J. Pitman, Combinatorial Stochastic Processes, 2002] • [Question on http://metaoptimize.com/, http://tinyurl.com/n8wtjgy] 27 Prior on affine transformations Affine group: Multiplication of elements of the affine group: First order Baker-Campbell-Hausdorff terms Jordan/Schur decomposition Lie algebraic representation of affine group: 28 Regional Bayesian Linear Regression Prior on deformation parameters: Likelihood given velocity field and fixed noise parameter: Velocity vector noise: 29 Probability distribution over partitions of exchangeable “things”, order doesn’t matter: Random partitions with the Chinese Restaurant Process • [Ghosh, van der Vaart, Fundamentals of Nonparametric Bayesian Inference (Chapters 1-5 of book draft)] • [J. Pitman, Combinatorial Stochastic Processes, 2002] • [QA on http://metaoptimize.com/, http://tinyurl.com/n8wtjgy] Total number of partitions z1z2 z3 z4 z5 … Partition k = 1 Partition k = 2 nk = 3 nk = 2 =2 =2 =1 =1 =1 New partition z2 z3 z4 z1 z5 … 30 Prior on partitions with dependencies Time series t1 …t2 t2 Spatial data x11 …x12 x13 x21 …x22 x23 … … … Graphs g1 g2 g3 g4 … … [S. N. MacEachern, Dependent Dirichlet processes] 2000 [N.J. Foti, S. Williamson, A survey of non-exchangeable priors for Bayesian nonparametric models] 2012 [D.M. Cifarelli, E. Regazzini, Nonparametric statistical problems under partial exchangeability] (in italian) 1978 31 Some implementation details for the spatial ddCPR [Richard Socher, Christopher D. Manning, A Gibbs Sampler for Spatial Clustering with the Distance-dependent Chinese Restaurant Process] In original paper by D. Blei focuses on time series data. In spatial data special care need to be taken for cyclic links. x12 x22 Solution: Recursive function. 32 Not marginal invariant: Links that are unobserved influence the distribution. x11 x12 x13 x21 x22 x23 If x12 is unobserved x11 x13 x21 x22 x23 Prior on spatial partitions: Distance dependent Chinese Restaurant Process then x11 and x12 are not in the same partition. 33 Applications of ddCRP: Image segmentation ddCR with different α rddC RP Thresholded GP [Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, and David M. Blei, Spatial distance dependent Chinese restaurant processes for image segmentation, NIPS 2011] 34 Prior on affine transformations Representation in homogeneous coordinates: Decomposition into translation, rotation and scaling: Transform a point: 35 Overview Results – 2D Velocity Fields 36 Target:
TEMPLATE ESTIMATION FOR LARGE DATABASE: A DIFFEOMORPHIC ITERATIVE CENTROID METHOD USING CURRENTS CLAIRE CURY, JOAN A. GLAUNES AND OLIVIER COLLIOT GSI2013 - Geometric Science of Information Note : send me an email at claire.cury.pro @ gmail.com if you want the pptx version of this presentation. INTRODUCTION Computational Anatomy (CA): Analysis of anatomical structures variability Characterizing differences between normal and pathological anatomies. Link between function and structures Template-based analysis in CA: Population data is encoded in the template coordinate system, then statistics are processed on this data. Large Deformation Diffeomorphic Metric Mapping (LDDMM) methods: provides diffeomorphic maps: invertible smooth transformations that preserve topology. Defines a metric distance that can be used to quantify the similarity between two shapes. GSI2013 - Geometric Science of Information 2 INTRODUCTION Template estimation methods in the LDDMM framework: J. Glaunès and S. Joshi (MFCA 2006). S. Durrleman et al. (MFCA 2008, MICCAI 2012). J. Ma et al. (NeuroImage 2008). GSI2013 - Geometric Science of Information 3 S1 S2 S3 S4 S5 T S1 S2 S3 S4 S5 T S1 S2 S3 S4 S5 TJ0 INTRODUCTION All these methods need a lot of computation time, which is a limitation for the study of large database. example: a matching from one surface to another (with about 3000 vertices each): around 30 minutes Template estimation (N≈100) until convergence: few days or some weeks. To study large databases : need to go faster We can increase the convergence speed by providing a better initialization to the template optimization process. We propose an Iterative Centroid method (IC). GSI2013 - Geometric Science of Information 4 MATHEMATICAL SETUP: LDDMM FRAMEWORK Large Deformation Diffeomorphic Metric Mapping: to quantify the difference between shapes. 2 shapes can be connected by a continuum of intermediate anatomically plausible shapes (shape space idea). Diffeomorphic maps act on the whole 3D space, so spatial organization is preserved. GSI2013 - Geometric Science of Information 5 MATHEMATICAL SETUP: LDDMM FRAMEWORK In LDDMM framework deformation maps : R3 R3 are generated by integration of time-varying vector fields vt(x) : vt belong to a RKHS V , the norm controls the regularity of the maps One can define a right invariant distance on the diffeomorphisms group Geodesic shooting: The last diffeomorphism at t=1 is completely parameterized by the initial condition GSI2013 - Geometric Science of Information 6 The position of point x at time t The velocity of point x at time t The point i of the surface at time t Momentum vector of point i at time t MATHEMATICAL SETUP: CURRENTS GSI2013 - Geometric Science of Information 7 Framework of currents (Vaillant and Glaunès 2005) has been chosen to measure dissimilarities between anatomical structures. Interests Point correspondence solved Robust to different sampling and topologies The set of surfaces gets embedded in a vector space : currents can be added, subtracted If S is a surface, [S] is a current, i.e. a continuous linear map from a space of differential 2-forms to R : with a differential 2-form of R3 MATHEMATICAL SETUP: CURRENTS The space of current W* is the dual space of a RKHS of 2- forms W. Scalar product : Optimal match, is the diffeomorphism minimizing J : GSI2013 - Geometric Science of Information 8 TEMPLATE ESTIMATION METHOD USED We used the method presented by J. Glaunès and S. Joshi (MFCA 2006): Estimates a template given a collection of unlabeled points sets or surfaces Let Si be N surfaces in R3. In the framework of LDDMM and currents the template estimation problem is posed as a minimum mean squared error estimation problem: The template is composed by all meshes of the population Alternated optimization: we successively match each on the template [S] = , then we update the template, and we iterate this whole loop Standard initialization: i= Id , which is equivalent to GSI2013 - Geometric Science of Information 9 THE ITERATIVE CENTROID METHOD Centroid computed iteratively via currents and LDDMM General idea : GSI2013 - Geometric Science of Information 10 THE ITERATIVE CENTROID METHOD WAY 1 We have N shapes Si : Fast process. There is (N-1) matching of 1 to 1 surfaces. GSI2013 - Geometric Science of Information 11 Start with a first subject : B1=S1 We iterate the following process: •Bi is matched to Si+1 we obtain the deformation map •Bi+1 is set as . Bi is transported along the geodesic and stopped at time t = 1/(i+1). We have N shapes Si : This is a slower way, at each step we add one surface more to the centroid. At the end the Centroid is a combination of N surfaces. Start with a first subject : B1=S1 We iterate the following process: •Bi is matched to Si+1 we obtain the deformation map •Bi+1 is set as where ui(x,t) = -vi(x,1-t), ui is the reverse flow. THE ITERATIVE CENTROID METHOD WAY 2 GSI2013 - Geometric Science of Information 12 DATA Human hippocampi, small cerebral structures related with memory process. Base of datasets: 95 human hippocampi segmented by SACHA (Chupin et al. NeuroImage, 2009) from Magnetic Resonance Images (MRI) of the IMAGEN database. We build 3 datasets from this database GSI2013 - Geometric Science of Information 13 DATA RealData: 95 hippocampus meshes from the database IMAGEN. Rigid alignment to a typical subject. Meshes from RealData have between 1716 and 2256 vertices GSI2013 - Geometric Science of Information 12 DATA Data1: one subject S0 is decimated to keep about 100 vertices and then deformed using geodesic shooting in random directions composed with small translations and rotations We have 500 subjects Data1 is a large database with simple meshes and mainly global deformations GSI2013 - Geometric Science of Information 15 DATA Data2: From S0 , we decimate less (about 1000 vertices), and we match via LDDMM this mesh to the 95 hippocampi We have 95 subjects Data2 has more local variability. Closer to the anatomical truth GSI2013 - Geometric Science of Information 12 THE ITERATIVE CENTROID METHOD: RESULTS GSI2013 - Geometric Science of Information 17 Data2 : Iterative Centroid computed via Way 1. T =1.5h Data2 : Iterative Centroid computed via Way 2. T=3.5h THE ITERATIVE CENTROID METHOD: RESULTS GSI2013 - Geometric Science of Information 18 RESULTS : EFFECT OF SUBJECT ORDERING GSI2013 - Geometric Science of Information 19 RESULTS : EFFECTS OF INITIALIZATION AND ORDERING GSI2013 - Geometric Science of Information 20 Data1 C1 C2 C3 Std init 41.1 41.1 40.6 C1 0 0.67 1.17 C2 0.67 0 1.11 C3 1.17 1.11 0 Data2 C1 C2 C3 Std init 20.5 20.2 20.7 C1 0 0.53 0.67 C2 0.53 0 0.84 C3 0.67 0.84 0 RealData C1 C2 C3 Std init 27.4 26.7 26.5 C1 0 7.03 6.24 C2 7.03 0 1.86 C3 6.24 1.86 0 Template initialized by: Std init C1 C2 C3 Data1 0.0062 0.0056 0.0059 0.0212 Data2 0.0077 0.0086 0.0060 0.0206 RealData 0.0073 0.0060 0.0088 0.0094 Distance ||.||W* between template estimated from standard initialization and from I.C. with different orderings. Is the computed template correctly centered ? •We calculated the ratio: •With vector field corresponding to the initial momentum vector of the deformation from the template to subject i RESULTS : EFFECT OF THE NUMBER OF ITERATIONS GSI2013 - Geometric Science of Information 21 W∗-distance ratios between the I.C. computed with x% of the population and the total population with different orderings. 0 10 20 30 40 50 60 70 80 90 100 Data1 (n=500) 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 Data2 (n=95) 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 RealData (n=95) RESULTS : COMPUTATION TIME GSI2013 - Geometric Science of Information 22 A GPU implementation was used for the kernel computation. One matching takes between one and five minutes. Template estimation stopped after 7 loops of alternated optimization. Time computation Standard initialization I.C. Initialized by an I.C. Saving(%) Data1 (n=500, nbPoints=135) 96 h 1,8 h 25 h (26,8h) 72% Data2 (n=95, nbPoints=1001) 21 h 1,5 h 12 h (13,5h) 36% RealData (n=95,nbPoints≈2000) 99 h 2,7 h 26 h (28,7h) 71% CONCLUSION AND PERSPECTIVES We would like to analyze the Way 2 of the Iterative Centroid Method. Compare Way 1 and Way 2 We also would like to test this method with others template estimation method as a template for the analysis of the population, compared to a template estimation Is an actual and precise template estimation process really required ? GSI2013 - Geometric Science of Information 23 This method provides quickly a centroid in a couple of hours The method presented here is used as initialization for template estimation method, in order to increase the convergence speed. This method need an order, but the ordering has an insignificant impact on the template estimation method. THANK YOU FOR YOUR ATTENTION GSI2013 - Geometric Science of Information
Keynote speech 2 (Hirohiko Shima)
. . . . . . . . . .. . . Geometry of Hessian Structures Hirohiko Shima h-shima@c-able.ne.jp Yamaguchi University 2013/8/29 Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 1 / 27 . . . . . . . ..1 Hessian Structures . ..2 Hessian Structures and K¨ahlerian Structures . ..3 Dual Hessian Structures . ..4 Hessian Curvature Tensor . ..5 Regular Convex Cones . ..6 Hessian Structures and Aﬃne Diﬀerential Geometry . ..7 Hessian Structures and Information Geometry . ..8 Invariant Hessian Structures Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 2 / 27 . . . . . . Preface In 1964 Prof. Koszul was sent to Japan by the French Government and gave lectures at Osaka University. I was a student in those days and attended the lectures together with the late Professor Matsushima and Murakami. The topics of the lectures were a theory of ﬂat manifolds with ﬂat connection D and closed 1-form α such that Dα is positive deﬁnite. α being a closed 1-form it is locally expressed as α = dφ, and so Dα = Ddφ is just a Hessian metric in our view point. This is the ultimate origin of the notion of Hessian structures and the starting point of my research. Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 3 / 27 . . . . . . 1. Hessian structures . Deﬁnition (Hessian metric) .. . . .. . . M : manifold with ﬂat connection D A Riemannian metric g on M is said to be a Hessian metric if g can be locally expressed by g = Ddφ, gij = ∂2 φ ∂xi ∂xj , where {x1 , · · · , xn } is an aﬃne coordinate system w.r.t. D. (D, g) : Hessian structure on M (M, D, g) : Hessian manifold The function φ is called a potential of (D, g). Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 4 / 27 . . . . . . . Deﬁnition (diﬀerence tensor γ) .. . . .. . . Let γ be the diﬀerence tensor between the Levi-Civita connection ∇ for g and the ﬂat connection D; γ = ∇ − D, γX Y = ∇X Y − DX Y . γi jk(component of γ)=Γi jk(Christoﬀel symbol for g) . Proposition (characterizations of Hessian metric) .. . . . Let (M, D) be a ﬂat manifold and g a Riemannian metric on M. The following conditions are equivalent. (1) g is a Hessian metric. (2) (DX g)(Y , Z) = (DY g)(X, Z) Codazzi equation i.e. the covariant tensor Dg is symmetric. (3) g(γX Y , Z) = g(Y , γX Z) Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 5 / 27 . . . . . . 2. Hessian Structures and K¨ahlerian Structures . Deﬁnition (K¨ahlerian metric) .. . . .. . . A complex manifold with Hermitian metric g = ∑ i,j gij dzi d¯zj is called a K¨ahlerian metric if g is expressed by complex Hessian gij = ∂2 ψ ∂zi ∂¯zj , where {z1 , · · · , zn } is a holomorphic coordinate system. Chen and Yau called Hessian metrics aﬃne K¨ahlerian metrics. . Proposition .. . . .. . . Let TM be the tangent bundle over Hessian manifold (M, D, g). Then TM is a complex manifold with K¨ahlerian metric gT = ∑n i,j=1 gij dzi d¯zj where zi = xi + √ −1dxi . Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 6 / 27 . . . . . . . Example (tangent bundle of paraboloid) .. . . .. . . Ω = { x ∈ Rn | xn − 1 2 n−1∑ i=1 (xi )2 > 0 } paraboloid φ = log { xn − 1 2 n−1∑ i=1 (xi )2 }−1 g = Ddφ : Hessian metric on Ω TΩ ∼= Ω + √ −1Rn ⊂ Cn : tube domain over Ω TΩ ∼
ORAL SESSION 3 Differential Geometry in Signal Processing (Michel Berthier)
A Riemannian Fourier Transform via Spin Representations Geometric Science of Information 2013 T. Batard - M. Berthier - Outline of the talk The Fourier transform for multidimensional signals - Examples Three simple ideas The Riemannian Fourier transform via spin representations Applications to filtering The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • T. Ell and S.J. Sangwine transform : Fµφ(U) = ∫ R2 φ(X)exp(−µ⟨X, U⟩)dX (1) where φ : R2 −→ H0 is a color image and µ is a pure unitary quaternion encoding the grey axis. Fµφ = A∥ exp[µθ∥] + A⊥ exp[µθ⊥]ν (2) where ν is a unitary quaternion orthogonal to µ. Allows to define an am- plitude and a phase in the chrominance and in the luminance. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • T. Bülow transform : Fijφ(U) = ∫ R2 exp(−2iπx1u1)φ(X)exp(−2jπu2x2)dX (1) where φ : R2 −→ R. Fijφ(U) = Fccφ(U) − iFscφ(U) − jFcsφ(U) + kFssφ(U) (2) Allows to analyse the symetries of the signal with respect to the x and y variables. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • M. Felsberg transform : Fe1e2e3 φ(U) = ∫ R2 exp(−2πe1e2e3⟨U, X⟩)φ(X)dX (1) where φ(X) = φ(x1e1 + x2e2) = φ(x1, x2)e3 is a real valued function defined on R2 (a grey level image). The coefficient e1e2e3 is the pseudos- calar of the Clifford algebra R3,0. This transform is well adapted to the monogenic signal. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • F. Brackx et al. transform : F±φ(U) = ( 1 √ 2π )n ∫ Rn exp( i π 2 ΓU) × exp(−i⟨U, X⟩)φ(X)dX (1) where ΓU is the angular Dirac operator. For φ : R2 −→ R0,2 ⊗ C F±φ(U) = 1 2π ∫ R2 exp(±U ∧ X)φ(X)dX (2) where exp(±U ∧ X) is a bivector. Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • Shift theorem : Fφα(u) = e2iπαu Fφ(u) (3) where φα(x) = φ(x + α). Here, the involved group is the group of trans- lations of R. The action is given by (α, x) −→ x + α := τα(x) (4) The mapping (group morphism) χu : τα −→ e2iπuα = χu(α) ∈ S1 (5) is a so-called character of the group (R, +). The Fourier transform reads Fφ(u) = ∫ R χu(−x)φ(x)dx (6) .. Spinor Fourier Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • More precisely : – By means of χu, every element of the group is represented as a unit complex number that acts by multiplication on the values of the function. Every u gives a representation and the Fourier transform is defined on the set of representations. – If the group G is abelian, we only deal with the group morphisms from G to S1 (characters). Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • Some transforms : – G = (Rn, +) : we recover the usual Fourier transform. – G = SO(2, R) : this corresponds to the theory of Fourier series. – G = Z/nZ : we obtain the discrete Fourier transform. – In the non abelian case one has to deal with the equivalence classes of unitary irreducible representations (Pontryagin dual). Some of these irreducible representations are infinite dimensional. Applications to ge- neralized Fourier descriptors with the group of motions of the plane, to shearlets,... Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • The problem : Find a good way to represent the group of translations (R2, +) in order to make it act naturally on the values (in Rn) of a multidimensional function Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Usual identifications : X = (x1, x2) ∈ R2 ↔ z = x1 + ix2 ∈ C (3) X = (x1, x2, x3, x4) ∈ R4 ↔ q = x1 + ix2 + jx3 + kx4 ∈ H (4) The fields C and H are the Clifford algebras R0,1 (of the vector space R with the quadratic form Q(x) = −x2) and R0,2 (of the vector space R2 with the quadratic form Q(x1, x2) = −x2 1 − x2 2). • Clifford algebras : the vector space Rn with the quadratic form Qp,q is embedded in an algebra Rp,q of dimension 2n that contains scalars, vectors and more generally multivectors such as the bivector u ∧ v = 1 2 (uv − vu) (5) Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • The spin groups : the group Spin(n) is the group of elements of R0,n that are products x = n1n2 · · · n2k (3) of an even number of unit vectors of Rn. • Some identifications : Spin(2) ≃ S1 (4) Spin(3) ≃ H1 (5) Spin(4) ≃ H1 × H1 (6) • Natural idea : replace the group morphisms from (R2, +) to S1 , the cha- racters, by group morphisms from (R2, +) to Spin(n), the spin characters. Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • The problem : Compute the spin characters, i.e. the group morphisms from (R2, +) to Spin(n) Find meaningful representation spaces for the action of the spin characters Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Spin(3) characters : χu1,u2,B : (x1, x2) −→ exp 1 2 [ B A ( x1 x2 )] = exp 1 2 [(x1u1 + x2u2)B] (3) where A = (u1 u2) is the matrix of frequencies and B = ef with e and f two orthonormal vectors of R3. .. Spinor Fourier Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Spin(4) and Spin(6) characters : (x1, x2) −→ exp 1 2 [ (B1 B2) A ( x1 x2 )] (3) (x1, x2) −→ exp 1 2 [ (B1 B2 B3) A ( x1 x2 )] (4) where A is a 2 × 2, resp. 2 × 3, real matrix and Bi = eifi for i = 1, 2, resp. i = 1, 2, 3, with (e1, e2, f1, f2), resp. (e1, e2, e3, f1, f2, f3), an orthonormal basis of R4, resp. R6. Three simple ideas ..3 The spin characters are parametrized by bivectors. • Fundamental remark : the spin characters are as usual parametrized by frequencies, the entries of the matrix A. But they are also parametrized by bivectors, B, B1 and B2, B1, B2 and B3, depending on the context. • How to involve the geometry ? it seems natural to parametrize the spin characters by the bivector corresponding to the tangent plane of the image graph, more precisely by the field of bivectors corresponding to the fiber bundle of the image graph. Three simple ideas ..3 The spin characters are parametrized by bivectors. • Several possibilities for dealing with representation spaces for the action of the spin characters : – Using Spin(3) characters and the generalized Weierstrass representa- tion of surface (T. Friedrich) : in “Quaternion and Clifford Fourier Trans- form and Wavelets (E. Hitzer and S.J. Sangwine Eds), Trends in Mathe- matics, Birkhauser, 2013. – Using Spin(4) and Spin(6) characters and the so-called standard re- presentations of the spin groups : in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Differential Geometry in Signal Pro- cessing, Vol 7, Issue 4, 2013. The Riemannian Fourier transform The spin representations of Spin(n) are defined through complex represen- tations of Clifford algebras. They do not “descend” to the orthogonal group SO(n, R) (since they send −1 to −Identity contrary to the standard represen- tations). These are the representations used in physics. The complex spin representation of Spin(3) is the group morphism ζ3 : Spin(3) −→ C(2) (5) obtained by restricting to Spin(3) ⊂ (R3,0 ⊗ C)0 a complex irreducible repre- sentation of R3,0. An color image is considered as a section .. Spinor Fourier σφ : (x1, x2) −→ 3∑ k=1 (0, φk (x1, x2)) ⊗ gk (6) of the spinor bundle PSpin(E3(Ω)) ×ζ3 C2 (7) where E3(Ω) = Ω × R3 and (g1, g2, g3) is the canonical basis of R3. The Riemannian Fourier transform Dealing with spinor bundles allows varying spin characters and the most natural choice for the field of bivectors B := B(x1, x2) which generalized the field of tangent planes is B = γ1g1g2 + γ2g1g3 + γ3g2g3 (8) with γ1 = 1 δ γ2 = √∑3 k=1 φ2 k,x2 δ γ3 = − √∑3 k=1 φ2 k,x1 δ δ = 1 + 2∑ j=1 3∑ k=1 φ2 k,xj (9) The operator B· acting on the sections of S(E3(Ω)), where · denotes the Clif- ford multiplication, is represented by the 2 × 2 complex matrix field B· = ( iγ1 −γ2 − iγ3 γ2 − iγ3 −iγ1 ) (10) Since B2 = −1 this operator has two eigenvalue fields i and −i. Consequently, every section σ of S(E3(Ω)) can be decomposed into σ = σB + + σB − where σB + = 1 2 (σ − iB · σ) σB − = 1 2 (σ + iB · σ) (11) The Riemannian Fourier transform The Riemannian Fourier transform of σφ is given by .. Usual Fourier FBσφ(u1, u2) = ∫ R2 χu1,u2,B(x1,x2)(−x1, −x2) · σφ(x1, x2)dx1dx2 (12) .. Spin characters .. Image section The decomposition of a section σφ associated to a color image leads to φ(x1, x2) = ∫ R2 3∑ k=1 [ φk+ (u1, u2)eu1,u2 (x1, x2) √ 1 − γ1 2 +φk− −1 (u1, u2)e−u1,−u2 (x1, x2) √ 1 + γ1 2 ] ⊗ gkdu1du2 (13) where φk+ = φk √ 1 − γ1 2 φk− = φk √ 1 + γ1 2 (14) Low-pass filtering Figure: Left : Original - Center : + Component - Right : - Component Low-pass filtering (a) + Component (b) Variance : 10000 (c) Variance : 1000 (d) Variance : 100 (e) Variance : 10 (f) Variance → 0 Figure: Low-pass filtering on the + component Low-pass filtering (a) - Component (b) Variance : 10000 (c) Variance : 1000 (d) Variance : 100 (e) Variance : 10 (f) Variance → 0 Figure: Low-pass filtering on the - component Thank you for your attention !
Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives K-Centroids-Based Supervised Classiﬁcation of Texture Images using the SIRV modeling Aurélien Schutz Lionel Bombrun Yannick Berthoumieu IMS Laboratory - CNRS UMR5218, Groupe Signal 28-30 august 2013 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 1 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Database classiﬁcation Database classiﬁcation Musical genres Images databases Textured images databases Video databases Propositions Information geometry Centroid ¯θi [Choy2007] [Fisher1925], [Burbea1982], [Pennec1999], [Banerjee2005], [Amari2007], [Nielsen2009] Bayesian framework of classiﬁcation Intrinsic prior p(θ | Hi ) [Bayes1763], [Whittaker1915], [Robert1996], [Bernardo2003] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 2 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Prior capability of handling the intra-class diversity Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 3 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 4 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 5 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian decision Data space X where x ∼ P Parameter space Θ PΘ a prior parametric model on Θ Riemannian manifold G the Fisher information matrix Nc classes, D = {Hi } Nc i=1 decision space Prerequisites : likelihood p(x | θ, Hi ), prior p(θ | Hi ), 0-1 loss L Decision rule on X : high computational complexity Xi = x | ˆHi = arg min Hj ∈D − log Θj p(x | θ, Hj )p(θ | Hj ).dθ Decision rule on Θ : minimizing conditional risk Duda, Bayesian decision theory Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 6 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) ¯θi centroid of the class i = 1, . . . , Nc Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Deﬁnition of intrinsic prior based on Jeﬀrey divergence Prior that follow a Gaussian distribution on manifold Θ p(θ | Hi ) = Zi exp − 1 2 (γ¯θi ,θ(1))T Ci γ¯θi ,θ(1) Pennec, Xavier, "Probabilities and statistics on Riemannian manifolds : basic tools for geometric measurements," NSIP, 1999 Proposition Intrinsic prior as Gaussian distribution on manifold Θ, with λi = (¯θi , σ2 i ) p(θ | λi , Hi ) |G(¯θi )|1/2 (σi √ 2π)d exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi )) Jeﬀrey divergence J(p(· | θ), p(· | ¯θi )) = X (p(x | θ) − p(x | ¯θi )) log p(x | θ) p(x | ¯θi ) .dx Fisher, R.A., "Theory of statistical estimation," Proc. Cambridge Phil. Soc., 22, pp. 700-–725, 1925 Burbea, Jacob et Rao, C.Radhakrishna, "Entropy diﬀerential metric, distance and divergence measures in probability spaces : A uniﬁed approach ," Journal of Multivariate Analysis, 4, pp. 575—596, 1982 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 8 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Optimal decision on Θ Decision on X based on Empirical Bayes, Xi = x Hi = arg min Hj ∈D − log Θi p(Xi | θ, Hi )p(θ | Hi ).dθ Kass, R. E. and Steﬀey, D., "Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)," 1989 Miyata, Y., "Fully Exponential Laplace Approximations Using Asymptotic Modes," Journal of the American Statistical Association, 2004 Proposition Decision on X, Laplace approximation Xi x ˆλi = arg min λj ∈D d 2 log{2σ2 j + 1} + 1 2σ2 j J(p(· | ˆθ(x)), p(· | ¯θj )) ˆθ could be maximum likelihood estimator for p(x | θ, Hi ) [Miyata2004] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 9 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 10 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Space/scale decomposition Reel/complex wave- lets Gabor Steerable ﬁlters Bandelets Grouplets Dual-Tree Mallat, S. A, "Theory for multiresolution signal decomposition : The wavelet representation," IEEE PAMI, 1989 Do, M. and Vetterli, M., "Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance," IEEE IP, 2002 Choy, S.-K. and Tong, C.-S., "Supervised Texture Classiﬁcation Using Characteristic Generalized Gaussian Density," Journal of Mathematical Imaging and Vision, 2007 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 11 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Stochastic models for likelihood p(x | θ, Hi) Spherically Invariant Random Vector (SIRV) x = g √ τ : g multivariate gaussian distribution Σ τ Weibull distribution a Joint distribution y = (τ, g), θ = (Σ, a) p(y | θ) = pG (g | Σ)pw (τ | a) Separability of Jeﬀrey divergence J(p(· | θ), p(· | θ )) = J(pG (· | Σ), pG (· | Σ ))+J(pw (· | a), pw (· | a )) Bombrun, L., Lasmar, N.-E., Berthoumieu, Y. and Verdoolaege, G., Multivariate texture retrieval using the SIRV representation and the geodesic distance, 2011 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 12 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Centroid ¯θi computation State of the art : exponential families ; centred multivariate Gaussian ¯ΣR,i = 1 Ni Ni n=1 Σ−1 n −1 and ¯ΣL,i = 1 Ni Ni n=1 Σn Banerjee, A., Merugu, S., Dhillon, I. and Ghosh, J., Clustering with Bregman divergences, 2005 Nielsen, F. and Nock, R. Sided and Symmetrized Bregman Centroids, 2009 Steepest descent algorithm for Weibull centroid Dekker, T. J., Finding a zero by means of successive linear interpolation, 1969 Brent, R. P., An algorithm with guaranteed convergence for ﬁnding a zero of a function, 1971 Proposition Separated estimation of each centroid. ¯θi = (1 − i )¯ΣR,i + i ¯ΣL,i , arg min a∈R+ 1 Ni Ni n=1 J(pw (· | an), pw (· | a)) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 13 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 14 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Unique centroid versus multiple centroids (K-CB) Varma, M. and Zisserman, A., A Statistical Approach to Texture Classiﬁcation from Single Images, 2005 Several centroids per class (¯θi,k )K k=1, likelihood K-CB with binary weights wk pm(θ | (Hi,k )K k=1) = K k=1 wk Zi,k exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi,k )) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 15 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Algorithms for K-CB K-means ( Hard C-Means ) Proposition 1. Assignment of parametric vector θ Θi,k = θ | ˆθi,k = arg min θi,l ∈Hi 1 2σ2 i J(p(· | θ), p(· | ¯θi,l )) 2. Update ¯θi,k ¯θi,k = arg min ¯θ∈Θ Θi,k 1 2σ2 i J(p(· | θ), p(· | ¯θ)).dθ Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 16 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Textured image database Vision Texture database (VisTex) Brodatz database Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 17 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Vistex, SIRV Weibull, Jeﬀrey divergence 2/16 5/16 8/16 11/16 14/16 85 90 95 100 No. of training sample Averagekappaindex(%) 1−CB [1] 3−CB 1−NN Spatial Database K 1-NN 1-CB [1] K-CB neigh. NTr = K NTr = NSa/2 NTr = NSa/2 3 × 3 VisTex 3 83.7 % ±2.0 90.4 % ±1.3 96.8 % ±1.2 Brodatz 10 50.6 % ±2.6 79.9 % ±1.5 96.2 % ±1.2 1 × 1 VisTex 3 78.7 % ±2.3 72.7 % ±2.0 88.9 % ±1.7 Brodatz 10 65.8 % ±2.7 70 % ±1 97 % ±2 [1] Choy, S.K., Tong, C.S. : Supervised texture classiﬁcation using characteristic generalized Gaussian density. Journal of Mathematical Imaging and Vision 29 (Aug. 2007) 35–47 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 18 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 19 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Conclusions et Perspectives Conclusion 1. Bayesian classiﬁcation theory and Information geometry 2. Concentred Gaussian distribution as prior p(θ | Hi ) intrinsic when p(θ) or L depends on Fisher information matrix G(θ) Decision rule done on Θ 3. K-Centroids based (K-CB) classiﬁcation Diversity intra-class too high : a class, K centroids K-means on each class Numerical application : K-CB performances close to 1-NN performances K-CB give a low computing complexity Perspectives 1. K-CB with Possibilistic Fuzzy C-Means (PFCM) algorithm 2. Adapting the number of centroid needed by class Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 20 / 21 Introduction Bayesian classiﬁcation and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Brodatz, P., Textures : A Photographic Album for Artists and Designers, 1966. Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 21 / 21
Keynote speech 3 (Giovanni Pistone)
GSI2013 - Geometric Science of Information, Paris, 28-30 August 2013 Dimensionality reduction for classiﬁcation of stochastic ﬁbre radiographs C.T.J. Dodson1 and W.W. Sampson2 School of Mathematics1 and School of Materials2 University of Manchester UK ctdodson@manchester.ac.uk Abstract Dimensionality reduction helps to identify small numbers of essential features of stochastic ﬁbre networks for classiﬁcation of image pixel density datasets from experimental radiographic measurements of commercial samples and simulations. Typical commercial macro-ﬁbre networks use ﬁnite length ﬁbres suspended in a ﬂuid from which they are continuously deposited onto a moving bed to make a continuous web; the ﬁbres can cluster to differing degrees, primarily depending on the ﬂuid turbulence, ﬁbre dimensions and ﬂexibility. Here we use information geometry of trivariate Gaussian spatial distributions of pixel density among ﬁrst and second neighbours to reveal features related to sizes and density of ﬁbre clusters. Introduction Much analytic work has been done on modelling of the statistical geometry of stochastic ﬁbre networks and their behaviour in regard to strength, ﬂuid ingress or transfer [1, 5, 7]. Using complete sampling by square cells, their areal density distribution is typically well represented by a log-gamma or a (truncated) Gaussian distribution of variance that decreases monotonically with increasing cell size; the rate of decay is dependent on ﬁbre and ﬁbre cluster dimensions. Clustering of ﬁbres is well-approximated by Poisson processes of Poisson clusters of differing density and size. A Poisson ﬁbre network is a standard reference structure for any given size distribution of ﬁbres; its statistical geometry is well-understood for ﬁnite and inﬁnite ﬁbres. Figure : 1. Electron micrographs of four stochastic ﬁbrous materials. Top left: Nonwoven carbon ﬁbre mat; Top right: glass ﬁbre ﬁlter; Bottom left: electrospun nylon nanoﬁbrous network (Courtesy S.J. Eichhorn and D.J. Scurr); Bottom right: paper using wood cellulose ﬁbres—typically ﬂat ribbonlike, of length 1 to 2mm and width 0.02 to 0.03mm. Figure : 2. Areal density radiographs of three paper networks made from natural wood cellulose ﬁbres, of order 1mm in length, with constant mean density but different distributions of ﬁbres. Each image represents a square region of side length 5 cm; darker regions correspond to higher coverage. The left image is similar to that expected for a Poisson process of the same ﬁbres, so typical real samples exhibit clustering of ﬁbres. Spatial statistics We use information geometry of trivariate Gaussian spatial distributions of pixel density with covariances among ﬁrst and second neighbours to reveal features related to sizes and density of ﬁbre clusters, which could arise in one, two or three dimensions—the graphic shows a grey level barcode for the ordered sequence of the 20 amino acids in a yeast genome, a 1-dimensional stochastic texture. Saccharomyces CerevisiaeAmino Acids SC1 For isotropic spatial processes, which we consider here, the variables are means over shells of ﬁrst and second neighbours, respectively, which share the population mean with the central pixel. For anisotropic networks the neighbour groups would be split into more, orthogonal, new variables to pick up the spatial anisotropy in the available spatial directions. Typical sample data Figure : 3. Trivariate distribution of areal density values for a typical newsprint sample. Left: source radiograph; centre: histogram of pixel densities ˜βi , average of ﬁrst neighbours ˜β1,i and second neighbours ˜β2,i ; right: 3D scatter plot of ˜βi , ˜β1,i and ˜β2,i . Information geodesic distances between multivariate Gaussians What we know analytically is the geodesic distance between two multivariate Gaussians, fA, fB, of the same number n of variables in two particular cases [2]: Dµ(fA, fB) when they have a common mean µ but different covariances ΣA, ΣB and DΣ(fA, fB) when they have a common covariance Σ but different means µA, µB. The general case is not known analytically but for the purposes of studying the stochastic textures arising from areal density arrays of samples of stochastic ﬁbre networks, a satisfactorily discriminating approximation is D(fA , fB ) ≈ Dµ(fA , fB ) + DΣ(fA , fB ). Information geodesic distance between multivariate Gaussians [2] (1). µA = µB, ΣA = ΣB = Σ : fA = (n, µA, Σ), fB = (n, µB, Σ) Dµ(fA , fB ) = µA − µB T · Σ−1 · µA − µB . (1) (2). µA = µB = µ, ΣA = ΣB : fA = (n, µ, ΣA), fB = (n, µ, ΣB) DΣ(fA , fB ) = 1 2 n j=1 log2 (λj), (2) with {λj} = Eig(ΣA−1/2 · ΣB · ΣA−1/2 ). From the form of DΣ(fA, fB) in (2) it may be seen that an approximate monotonic relationship arises with a more easily computed symmetrized log-trace function given by ∆Σ(fA, fB) = log 1 2n Tr(ΣA−1/2 · ΣB · ΣA−1/2 ) + Tr(ΣB−1/2 · ΣA · ΣB−1/2 ) . (3) This is illustrated by the plot of DΣ(fA, fB) from equation (2) on ∆Σ(fA, fB) from equation (3) in Figure 4 for 185 trivariate Gaussian covariance matrices. For comparing relative proximity, this is a better measure near zero than the symmetrized Kullback-Leibler distance [6] in those multivariate Gaussian cases so far tested and may be quicker for handling large batch processes. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 DΣ(fA, fB) ∆Σ(fA, fB) Figure : 4. Plot of DΣ(fA , fB ) from (2) on ∆Σ(fA , fB ) from (3) for 185 trivariate Gaussian covariance matrices. Dimensionality reduction for data sets 1. Obtain mutual ‘information distances’ D(i, j) among the members of the data set of textures X1, X2, .., XN each with 250×250 pixel density values. 2. The array of N × N differences D(i, j) is a symmetric positive deﬁnite matrix with zero diagonal. This is centralized by subtracting row and column means and then adding back the grand mean to give CD(i, j). 3. The centralized matrix CD(i, j) is again symmetric positive deﬁnite with diagonal zero. We compute its N eigenvalues ECD(i), which are necessarily real, and ﬁnd the N corresponding N-dimensional eigenvectors VCD(i). 4. Make a 3 × 3 diagonal matrix A of the ﬁrst three eigenvalues of largest absolute magnitude and a 3 × N matrix B of the corresponding eigenvectors. The matrix product A · B yields a 3 × N matrix and its transpose is an N × 3 matrix T, which gives us N coordinate values (xi, yi, zi) to embed the N samples in 3-space. 2 4 6 8 10 12 14 2 4 6 8 10 12 14 -0.5 0.0 0.5 -0.2 0.0 0.2 -0.2 -0.1 0.0 0.1 Figure : 5. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), trivariate Gaussian information distances among 16 datasets of 1mm pixel density differences between a Poisson network and simulated networks from 1mm ﬁbres, same mean density different clustering. Embedding: subgroups show numbers of ﬁbres in clusters and cluster densities. 2 4 6 8 10 12 2 4 6 8 10 12 -1 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 6. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), for trivariate Gaussian information distances among 16 datasets of 1mm pixel density arrays for simulated networks made from 1mm ﬁbres, each network with the same mean density but with different clustering. Embedding: subgroups show numbers of ﬁbres in clusters and cluster densities; the solitary point is an unclustered Poisson network. 2 4 6 8 10 12 2 4 6 8 10 12 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 7. DΣ(fA , fB ) as a cubic-smoothed surface (left), and as a contour plot (right), for trivariate Gaussian information distances among 16 simulated Poisson networks made from 1mm ﬁbres, with different mean density, using pixels at 1mm scale. Second row: Embedding of the same Poisson network data, showing the effect of mean network density. -5 0 5 0 2 4 -1.0 -0.5 0.0 0.5 Figure : 8. Embedding using 182 trivariate Gaussian distributions for samples from a data set of radiographs of commercial papers. The embedding separates different forming methods into subgroups. References [1] K. Arwini and C.T.J. Dodson. Information Geometry Near Randomness and Near Independence. Lecture Notes in Mathematics. Springer-Verlag, New York, Berlin, 2008, Chapter 9 with W.W. Sampson, Stochasic Fibre Networks pp 161-194. [2] C. Atkinson and A.F.S. Mitchell. Rao’s distance measure. Sankhya: Indian Journal of Statistics 48, A, 3 (1981) 345-365. [3] K.M. Carter, R. Raich and A.O. Hero. Learning on statistical manifolds for clustering and visualization. In 45th Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, 2007. https://wiki.eecs.umich.edu/global/data/hero/images/c/c6/Kmcarter- learnstatman.pdf [4] K.M. Carter Dimensionality reduction on statistical manifolds. PhD thesis, University of Michigan, 2009. http://tbayes.eecs.umich.edu/kmcarter/thesis [5] M. Deng and C.T.J. Dodson. Paper: An Engineered Stochastic Structure. Tappi Press, Atlanta, 1994. [6] F. Nielsen, V. Garcia and R. Nock. Simplifying Gaussian mixture models via entropic quantization. In Proc. 17th European Signal Processing Conference, Glasgow, Scotland 24-28 August 2009, pp 2012-2016. [7] W.W. Sampson. Modelling Stochastic Fibre Materials with Mathematica. Springer-Verlag, New York, Berlin, 2009.
Nonparametric Information Geometry http://www.giannidiorestino.it/GSI2013-talk.pdf Giovanni Pistone de Castro Statistics Initiative Moncalieri, Italy August 30, 2013 Abstract The diﬀerential-geometric structure of the set of positive densities on a given measure space has raised the interest of many mathematicians after the discovery by C.R. Rao of the geometric meaning of the Fisher information. Most of the research is focused on parametric statistical models. In series of papers by author and coworkers a particular version of the nonparametric case has been discussed. It consists of a minimalistic structure modeled according the theory of exponential families: given a reference density other densities are represented by the centered log likelihood which is an element of an Orlicz space. This mappings give a system of charts of a Banach manifold. It has been observed that, while the construction is natural, the practical applicability is limited by the technical diﬃculty to deal with such a class of Banach spaces. It has been suggested recently to replace the exponential function with other functions with similar behavior but polynomial growth at inﬁnity in order to obtain more tractable Banach spaces, e.g. Hilbert spaces. We give ﬁrst a review of our theory with special emphasis on the speciﬁc issues of the inﬁnite dimensional setting. In a second part we discuss two speciﬁc topics, diﬀerential equations and the metric connection. The position of this line of research with respect to other approaches is brieﬂy discussed. References in • GP, GSI2013 Proceedings. A few typos corrected in arXiv:1306.0480; • GP, arXiv:1308.5312 • If µ1, µ2 are equivalent measures on the same sample space, a statistical model has two representations L1(x; θ)µ1(dx) = L2(x; θ)µ2(dx). • Fisher’s score is a valid option s(x; θ) = d dθ ln Li (x; θ), i = 1, 2, and Eθ [sθ] = 0. • Each density q equivalent to p is of the form q(x) = ev(x) p(x) Ep [ev ] = exp (v(x) − ln (Ep [ev ])) p(x), where v is a random variable such that Ep [ev ] < +∞. • To avoid borderline cases, we actually require Ep eθv < +∞, θ ∈ I open ⊃ [0, 1]. • Finally, we require Ep [v] = 0. Plan Part I Exponential manifold Part II Vector bundles Part III Deformed exponential Part I Exponential manifold Sets of densities Deﬁnition P1 is the set of real random variables f such that f dµ = 1, P≥ the convex set of probability densities, P> the convex set of strictly positive probability densities: P> ⊂ P≥ ⊂ P1 • We deﬁne the (diﬀerential) geometry of these spaces in a way which is meant to be a non-parametric generalization of Information Geometry • We try to avoid the use of explicit parameterization of the statistical models and therefore we use a parameter free presentation of diﬀerential geometry. • We construct a manifold modeled on an Orlicz space. • We look for applications to applications intrisically non parametric, i.e. Statistical Physics, Information Theory, Optimization, Filtering. Banach manifold Deﬁnition 1. Let P be a set, E ⊂ P a subset, B a Banach space. A 1-to-1 mapping s : E → B is a chart if the image s(E) = S ⊂ B is open. 2. Two charts s1 : E1 → B1, s2 : E2 → B2, are both deﬁned on E1 ∩ E2 and are compatible if s1(E1 ∩ E2) is an open subset of B1 and the change of chart mapping s2 ◦ s−1 1 : s1(E1 ∩ E2) s−1 1 // E1 ∩ E2 s2 // s2(E1 ∩ E2) is smooth. 3. An atlas is a set of compatible charts. • Condition 2 implies that the model spaces B1 and B2 are isomorphic. • In our case: P = P>, the atlas has a chart sp for each p ∈ P> such that sp(p) = 0 and two domains Ep1 and Ep2 are either equal or disjoint. Charts on P> Model space Orlicz Φ-space If φ(y) = cosh y − 1, the Orlicz Φ-space LΦ (p) is the vector space of all random variables such that Ep [Φ(αu)] is ﬁnite for some α > 0. Properties of the Φ-space 1. u ∈ LΦ (p) if, and only if, the moment generating function α → Ep [eαu ] is ﬁnite in a neighborhood of 0. 2. The set S≤1 = u ∈ LΦ (p) Ep [Φ(u)] ≤ 1 is the closed unit ball of a Banach space with norm u p = inf ρ > 0 Ep Φ u ρ ≤ 1 . 3. u p = 1 if either Ep [Φ(u)] = 1 or Ep [Φ(u)] < 1 and Ep Φ u ρ = ∞ for ρ < 1. If u p > 1 then u p ≤ Ep [Φ(u)]. In particular, lim u p→∞ Ep [Φ (u)] = ∞. Example: boolean state space • In the case of a ﬁnite state space, the moment generating function is ﬁnite everywhere, but its computation can be challenging. • Boolean case: Ω = {+1, −1} n , uniform density p(x) = 2−n , x ∈ Ω. A generic real function on Ω has the form u(x) = α∈L ˆu(α)xα , with L = {0, 1} n , xα = n i=1 xαi i , ˆu(α) = 2−n x∈Ω u(x)xα . • The moment generating function of u under the uniform density p is Ep etu = B∈B(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)), where B(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2. • Ep [Φ(tu)] = B∈B0(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)) − 1, where B0(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2 and α∈Supp ˆu α = 0. Example : the sphere is not smooth in general • p(x) ∝ (a + x)−3 2 e−x , x, a > 0. • For the random variable u(x) = x, the function Ep [Φ(αu)] = 1 ea Γ −1 2, a ∞ 0 (a+x)−3 2 e−(1−α)x + e−(1+α)x 2 dx−1 is convex lower semi-continuous on α ∈ R, ﬁnite for α ∈ [−1, 1], inﬁnite otherwise, hence not smooth. −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 \alpha E_p(\Phi(\alphau) q q Isomorphism of LΦ spaces Theorem LΦ (p) = LΦ (q) as Banach spaces if p1−θ qθ dµ is ﬁnite on an open neighborhood I of [0, 1]. It is an equivalence relation p q and we denote by E(p) the class containing p. The two spaces have equivalent norms Proof. Assume u ∈ LΦ (p) and consider the convex function C : (s, θ) → esu p1−θ qθ dµ. The restriction s → C(s, 0) = esu p dµ is ﬁnite on an open neighborhood Jp of 0; the restriction θ → C(0, θ) = p1−θ qθ dµ is ﬁnite on the open set I ⊃ [0, 1]. hence, there exists an open interval Jq 0 where s → C(s, 1) = esu q dµ is ﬁnite. q q J_p J_q I e-charts Deﬁnition (e-chart) For each p ∈ P>, consider the chart sp : E(p) → LΦ 0 (p) by q → sp(q) = log q p + D(p q) = log q p − Ep log q p For u ∈ LΦ 0 (p) let Kp(u) = ln Ep [eu ] the cumulant generating function of u and let Sp the interior of the proper domain. Deﬁne ep : Sp u → eu−Kp(u) · p ep ◦ sp is the identity on E(p) and sp ◦ ep is the identity on Sp. Theorem (Exponential manifold) {sp : E (p)|p ∈ P>} is an aﬃne atlas on P>. Cumulant functional • The divergence q → D(p q) is represented in the chart centered at p by Kp(u) = log Ep [eu ], where q = eu−Kp(u) · p, u ∈ Bp = LΦ 0 (p). • Kp : Bp → R≥ ∪ {+∞} is convex and its proper domain Dom (Kp) contains the open unit ball of Tp. • Kp is inﬁnitely Gˆateaux-diﬀerentiable on the interior Sp of its proper domain and analytic on the unit ball of Bp. • For all v, v1, v2, v3 ∈ Bp the ﬁrst derivatives are: d Kpuv = Eq [v] d2 Kpu(v1, v2) = Covq (v1, v2) d3 Kpu(v1, v2, v3) = Covq(v1, v2, v3) Change of coordinate The following statements are equivalent: 1. q ∈ E (p); 2. p q; 3. E (p) = E (q); 4. ln q p ∈ LΦ (p) ∩ LΦ (q). 1. If p, q ∈ E(p) = E(q), the change of coordinate sq ◦ ep(u) = u − Eq [u] + ln p q − Eq ln p q is the restriction of an aﬃne continuous mapping. 2. u → u − Eq [u] is an aﬃne transport from Bp = LΦ 0 (p) unto Bq = LΦ 0 (q). Summary p q =⇒ E (p) sp // Sp sq◦s−1 p I // Bp d(sq◦s−1 p ) I // LΦ (p) E (q) sq // Sq I // Bq I // LΦ (q) • If p q, then E (p) = E (q) and LΦ (p) = LΦ (q). • Bp = LΦ 0 (p), Bq = LΦ 0 (q) • Sp = Sq and sq ◦ s−1 p : Sp → Sq is aﬃne sq ◦ s−1 p (u) = u − Eq [u] + ln p q − Eq ln p q • The tangent application is d(sq ◦ s−1 p )(v) = v − Eq [v] (does not depend on p) Duality Young pair (N–function) • φ−1 = φ∗, • Φ(x) = |x| 0 φ(u) du • Φ∗(y) = |y| 0 φ∗(v) dv • |xy| ≤ Φ(x) + Φ∗(y) 0 1 2 3 4 5 050100150 v phi φ∗(u) φ(v) Φ∗(x) Φ(y) ln (1 + u) ev − 1 (1 + |x|) ln (1 + |x|) − |x| e|y| − 1 − |y| sinh−1 u sinh v |x| sinh−1 |x| − √ 1 + x2 + 1 cosh y − 1 • LΦ∗ (p) × LΦ (p) (v, u) → u, v p = Ep [uv] • u, v p ≤ 2 u Φ∗,p v Φ,p • (LΦ∗ (p)) = LΦ (p) because Φ∗(ax) ≤ a2 Φ∗(x) if a > 1 (∆2). m-charts For each p ∈ P>, consider a second type of chart on f ∈ P1 : ηp : f → ηp(f ) = f p − 1 Deﬁnition (Mixture manifold) The chart is deﬁned for all f ∈ P1 such that f /p − 1 belongs to ∗ Bp = LΦ+ 0 (p). The atlas (ηp : ∗ E (p)), p ∈ P> deﬁnes a manifold on P1 . If the sample space is not ﬁnite, such a map does not deﬁne charts on P>, nor on P≥. Example: N(µ, Σ), det Σ = 0 I G = (2π)− n 2 (det Σ)− 1 2 exp − 1 2 (x − µ)T Σ−1 (x − µ) µ ∈ Rn , Σ ∈ Symn + . ln f (x) f0(x) = − 1 2 ln (det Σ) − 1 2 (x − µ)T Σ−1 (x − µ) + 1 2 xT x = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) Ef0 ln f f0 = 1 2 (n − Tr Σ−1 ) − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) u(x) = ln f (x) f0(x) − Ef0 ln f f0 = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 (n − Tr Σ−1 ) Kf0 (u) = − 1 2 (n − Tr Σ−1 ) + 1 2 µT Σ−1 µ + 1 2 ln (det Σ) Example: N(µ, Σ), det Σ = 0 II G as a sub-manifold of P> G = x → eu(x)−K(u) f0(x) u ∈ H1,2 ∩ Sf0 • H1,2 is the Hemite space of total degree 1 and 2, that is the vector space generated by the Hermite polynomials X1, . . . , Xn, (X2 1 − 1), . . . , (X2 n − 1), X1X2, . . . , Xn−1Xn • If the matrix S, Sii = βii − 1 2 , Sij
ORAL SESSION 4 Relational Metric (Jean-François Marcotorchino)
A general framework for comparing heterogeneous binary relations Julien Ah-Pine (julien.ah-pine@eric.univ-lyon2.fr) University of Lyon - ERIC Lab GSI 2013 Paris 28/08/2013 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 1 Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 2 Introduction Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 3 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reﬂexive, symmetric and transitive BR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a ﬁnite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reﬂexive, symmetric and transitive BR. Order Relations (OR) are of diﬀerent types : preorders, partial orders and total (or linear or complete) orders. If ties and missing values : preorders (reﬂexive, transitive BR) If no tie but missing values : partial orders (reﬂexive, antisymmetric, transitive BR) If no tie and no missing value : total orders (reﬂexive, antisymmetric, transitive and complete BR) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green X is the ER “has the same color of eyes than” and can be represented by a graph and its adjacency matrix (AM) denoted X such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X = a b c d e a 1 1 . . . b 1 1 . . . c . . 1 1 . d . . 1 1 . e . . . . 1 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Order Relations and quantitative variables OR are related to quantitative variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 X is the OR “has a lower rank than” and its AM X is again such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X = a b c d e a 1 1 1 1 1 b . 1 1 1 1 c . . 1 . 1 d . . 1 1 1 e . . . . 1 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have diﬀerent number of categories and diﬀerent distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have diﬀerent number of categories and diﬀerent distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) When OR are of diﬀerent types ? For example : x = (1, 2, 4, 3, 5) ; y = (1, 1, 1, 4, 5) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Kendall’s general coeﬃcient Γ Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 8 Kendall’s general coeﬃcient Γ Kendall’s Γ coeﬃcient In statistics, Kendall in [Kendall(1948)] proposed a general correlation coeﬃcient in order to deﬁne a broad family of association measures between x and y : Γ(x, y) = i,j Xij Yij i,j X2 ij i,j Y2 ij (1) where X and Y are two square matrices derived from x and y. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 9 Kendall’s general coeﬃcient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Kendall’s general coeﬃcient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. Association measure Xij Kendall’s τa 1 if xi < xj −1 if xi > xj Spearman’s ρa Xij = xi − xj Table: Particular cases of Γ as for OR For Spearman’s ρa, xi is the rank of item i. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Another view of Kendall’s Γ Outline 1 Introduction 2 Kendall’s general coeﬃcient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 11 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more speciﬁcally called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more speciﬁcally called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. For instance, the relational properties of X can be expressed as linear equations of X : reﬂexivity, ∀i (Xii = 1) ; symmetry, ∀i, j (Xij − Xji = 0) ; antisymmetry, ∀i, j (Xij + Xji ≤ 1) ; complete (or total), ∀i = j (Xij + Xji ≥ 1) ; transitivity, ∀i, j, k (Xij + Xjk − Xik ≤ 1). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their diﬀerences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the deﬁnition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. 4 We explain the weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 14 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. If x is a categorical variable then X = X (the complement relation) : Xij
Comparison of linear modularization criteria of networks using relational metric Patricia Conde C´espedes LSTA, Paris 6 August 2013 Thesis supervised by J.F. Marcotorchino (Thales Scientiﬁc Director) Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 1 / 35 Outline 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 2 / 35 Introduction and objective Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 3 / 35 Introduction and objective Description of the problem Objective: Compare the partitions found by different linear criteria Nowadays, we can ﬁnd networks everywhere: (biology, computer programming, marketing, etc). Some practical applications are: Cyber-Marketing: Cyber-Security: It is difﬁcult to analyse a network directly because of its big size. Therefore, we need to decompose it in clusters or modules ⇐⇒ modularize it. Different modularization criteria have been formulated in different contexts in the last few years and we need to compare them. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 4 / 35 Introduction and objective Graph partition Deﬁnition of module or community. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 5 / 35 Mathematical Relational representations of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 6 / 35 Mathematical Relational representations of criteria Mathematical Relational modeling Modularizing a graph G(V, E) ⇔ deﬁning an equivalence relation on V . Let X be a square matrix of order N = |V | deﬁning an equivalence relation on V as follows: xii = 1 if i and i are in the same cluster ∀i, i ∈ V × V 0 otherwise (1) We present a modularization criterion as a linear function to optimize: Max X F(A, X) (2) subject to the constraints of an equivalence relation: xii ∈ {0, 1} Binarity (3) xii = 1 ∀i Reﬂexivity xii − xi i = 0 ∀(i, i ) Symmetry xii + xi i − xii ≤ 1 ∀(i, i , i ) Transitivity Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 7 / 35 Mathematical Relational representations of criteria Properties veriﬁed by linear criteria Every linear criteria is separable as it can be written in the general form (it is possible to separate the data from the variables): F(X) = N i=1 N i =1 φ(aii )xii + constant (4) where aii is the general term of the adjacency matrix A and φ(aii ) is a function of the adjacency matrix only. Besides, the criterion is balanced if it can be written in the form: F(X) = N i=1 N i =1 φ(aii )xii + N i=1 N i =1 ¯φ(aii )¯xii (5) Where: ¯xii = 1 − xii represents the opposite relation of X, noted ¯X. φ(aii ) ≥ 0 ∀i, i and ¯φ(aii ) ≥ 0 ∀i, i are non negative functions verifying: N i=1 N i =1 φii > 0 and N i=1 N i =1 ¯φii > 0. As we will see later the functions φ and ¯φ behave as ”costs”. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 8 / 35 Mathematical Relational representations of criteria The property of Linear balance Given a graph If ¯φii = 0 ∀i, i all the nodes are clustered together, then κ = 1. If φii = 0 ∀i, i all nodes are separated, then κ = N If N i=1 N i =1 φii = N i=1 N i =1 ¯φii the criterion is a null model and therefore it has a resolution limit. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 9 / 35 Mathematical Relational representations of criteria Existing linear functions Function Relational notation Zahn-Condorcet (1785, 1964) FZC(X) = N i=1 N i =1 (aii xii + ¯aii ¯xii ) Owsi´nski-Zadro˙zny (1986) FZOZ (X) = N i=1 N i =1 ((1−α)aii xii +α¯aii ¯xii ) with 0 < α < 1 Newman-Girvan (2004) FNG(X) = 1 2M N i=1 N i =1 aii − ai.a.i 2M xii Table: Linear criteria Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 10 / 35 Mathematical Relational representations of criteria Three new linear criteria Function Relational notation Deviation to uniformity (2013) FUNIF(X) = 1 2M N i=1 N i =1 aii − 2M N2 xii Deviation to indeter- mination (2013) FDI(X) = 1 2M N i=1 N i =1 aii − ai. N − a.i N + 2M N2 xii Balanced modularity (2013) FBM (X) = N i=1 N i =1 (aii − Pii ) xii + (¯aii − ¯Pii )¯xii where Pii = ai.a.i 2M and ¯Pii = ¯aii − (N−ai.)(N−a.i ) N2−2M Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 11 / 35 Mathematical Relational representations of criteria Interpretation of new linear criteria Uniformity structure Indetermination structure Duality independance and indetermination structure: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 12 / 35 Mathematical Relational representations of criteria Some properties of these new criteria Whereas the Newman-Girvan modularity is based on the ”deviation from independance” structure, the DI index criterion is based on the ”deviation to the indetermination” structure. All these three new criteria are null models as Newman-Girvan modularity. The balanced modularity is a balanced version of Newman-Girvan modularity. If all the nodes had the same degree : dav = N i=1 ai N = 2M N all the new criteria would have the same behavior as Newman-Girvan modularity does: FNG ≡ FUNIF ≡ FBM ≡ FDI. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 13 / 35 Algorithm and some results Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 14 / 35 Algorithm and some results The number of clusters The Louvain algorithm is easy to adapt to Separable criteria. Data Jazz Internet N = 198 N = 69949 M = 2742 M = 351380 Function κ κ Zahn-Condorcet 38 40123 Owsi´nski-Zadro˙zny 6 α = 2% 456 α < 1% Newman-Girvan 4 46 Deviation to uniformity 20 173 Deviation to indetermination 6 45 Balanced modularity 5 46 How to explain these differences? Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 15 / 35 Comparison of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 16 / 35 Comparison of criteria Impact of merging two clusters Now let us suppose we want to merge two clusters C1 and C2 in the network of sizes n1 and n2 respectively. Let us suppose as well they are connected by l edges and they have average degree d1 av et d2 av respectively. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 17 / 35 Comparison of criteria Impact of merging two clusters What is the contribution of merging two clusters to the value of each criterion? The contribution C of merging two clusters will be: C = n1 i∈C1 n2 i ∈C2 (φii − ¯φii ) (6) The objective is to compare function φ(.) to function ¯φ(.) If C > 0 the criterion merges the two clusters, the contribution is a gain. If C < 0 the criterion separates the two clusters, the contribution is a cost. l is the number of edges between clusters C1 and C2. ¯l is the number of missing edges between clusters C1 and C2. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 18 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Zahn-Condorcet criterion: CZC = (l − ¯l) = l − n1n2 2 (7) The Zahn-Condorcet criterion requires that the connexions within the cluster be bigger than the absence of connexions ⇐⇒ the number of connections l between C1 and C2 must be at least as half as the possible connexions between the two subgraphs. This criterion does not have resolution limit as the contribution depends only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. With this criterion we obtain many small clusters or cliques, some of them are sigle nodes. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 19 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Owsi´nski-Zadro˙zny criterion: COZ = (l − αn1n2) 0 < α < 1 (8) As this Zahn-Condorcet criterion is so exigent we obtain many small clusters, the Owsi´nski-Zadro˙zny criterion gives the choice to deﬁne the minimum required percentage of within-cluster α edges. This coefﬁcient deﬁnes the balance between φ and ¯φ. For α = 0.5 the wsi´nski-Zadro˙zny criterion ≡ the Zahn-Condorcet criterion. α is deﬁned by the user as the minimum required fraction of within- cluster edges. This criterion does not have resolution limit as the contribution depend only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 20 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Newman-Girvan criterion: CNG = l − n1n2 d1 avd2 av 2M (9) The contribution depends on the degree distribution of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M. The optimal partition has no clusters with a single node. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 21 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to Uniformity criterion: CUNIF = l − n1n22M N2 (10) This criterion is a particular case of Zahn-Condorcet (or Owsi´nski-Zadro˙zny) criterion with α = 2M N2 which can be interpreted as a density occupancy of edges among the nodes δ. To merge the two clusters l n1n2 > δ the fraction of within clusters edges must be greater than the global density of edges δ. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 22 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to indetermination criterion: CDI = l − n1n2 d1 av N + d2 av N − 2M N2 (11) The contribution depends on the degree distribution of the clusters and on their sizes. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. It favors big cluster with high average degree and small clusters with low average degree. So, the degree distribution of each cluster obtained by this criterion tends to be more homogeneous than that of the clusters obtained by optimizing the Newman-Girvan criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 23 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Balanced modularity criterion: CBM = 2l + n1n2 (N − d1 av)(N − d2 av) N2 − 2M − n1n2 − n1n2 d1 avd2 av 2M (12) The contribution depends on the degree distribution of the clusters and on the sizes of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Depending upon δ and dav this criterion behaves like a regulator between the Newman-Girvan criterion and the deviation to indetermination criterion. On one hand the degree distribution within clusters is more homogeneous than that found with Newman-Girvan criterion. On the other hand, the degree distribution within clusters is more heterogeneous than that found with deviation to indetermination criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 24 / 35 Comparison of criteria Summary by criterion Criterion Characteristics of the clustering Zahn-Condorcet Clusters have a fraction of within cluster edges greater than 50%. Owsi´nski-Zadro˙zny A generalization of ZC criterion where the user deﬁnes the minimum fraction of within cluster edges. Deviation to Unifor- mity The OZ criterion for α = δ the density of edges among the nodes. Newman-Girvan It has a resolution limit and the optimal clus- tering does not contain isolated nodes. Deviation to uniformity Within cluster degree distribution is more homogeneous than that found by the Newman-Girvan criterion. Balanced modularity It behaves like a regulator between the NG criterion and the DI criterion. . Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 25 / 35 Applications Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 26 / 35 Applications ”Data: Zachary karate club network” A network of friendships between the 34 members of a karate club at a US university. N = 34 nodes, M = 78 edges, dav = 4.6 and δ = 0.13 Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 27 / 35 Applications ”Data: Zachary karate club network” The number of clusters per criterion (with Louvain Algorithm): Criterion Number of clusters Single nodes Zahn-Condorcet 19 12 Owsi´nski-Zadro˙zny 7 (α = 0.2) 3 Deviation to uniformity 6 2 Newman-Girvan 4 Deviation to indetermination 4 Balanced modularity 4 The partitions found with Newman-Girvan, Deviation to indetermination and the Balanced modularity are the same. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 28 / 35 Applications ”Data: Zachary karate club network” Density of within cluster edges: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 29 / 35 Applications ”Data: Zachary karate club network” Partitions obtained by the criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 30 / 35 Applications The Jazz network A network of jazz musicians, N = 198, M = 2742, dav ∼= 27.7 and δ = 0.14. The clusters found by three criteria: Newman-Girvan nj dj av σj cvj 62 32.3 18.5 0.57 53 30.5 16.2 0.53 61 20.3 14.1 0.69 22 28.4 20.1 0.71 Balanced modularity nj dj av σj cvj 60 33.1 18.2 0.55 53 31.3 16.3 0.52 61 20.3 14.1 0.69 23 26 19.4 0.75 1 1 0 0 Deviation to indetermination nj dj av σj cvj 63 19.8 14.2 0.71 63 33.7 16 0.48 18 13.8 5.2 0.37 51 36.4 17.7 0.49 2 2.5 2.1 0.85 1 1 0 0 Where nj is the size of the cluster, dj av is the average degree, σj is the standard deviation and cvj is the coefﬁcient of variation of the degree of the cluster j. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 31 / 35 Applications The Jazz network The coefﬁcient of variation for the three criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 32 / 35 Conclusions Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 33 / 35 Conclusions Conclusions We presented 6 modularization criteria in Relational Analysis notation, which allowed us to easily calculate their contribution, cost or gain, when merging two clusters. We analysed important characteristics of different criteria. We compared the differences found in the partitions provided by each criterion. However the 3 criteria we introduced have nearly the same properties they differ depending mainly on the degree distribution, on the sizes of the clusters and on global characteristics of the graph. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 34 / 35 Conclusions Thanks for your attention! Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 35 / 35
On Prime-valent Symmetric Bicirculants and Cayley Snarks Klavdija Kutnar University of Primorska Paris, 2013 Joint work with Ademir Hujdurovi´c and Dragan Maruˇsiˇc. Snarks A snark is a connected, bridgeless cubic graph with chromatic index equal to 4. non-snark = bridgeless cubic 3-edge colorable graph The Petersen graph is a snark Blanuˇsa Snarks Not vertex-transitive. Vertex-transitive graph An automorphism of a graph X = (V , E) is an isomorphism of X with itself. Thus each automorphism α of X is a permutation of the vertex set V which preserves adjacency. A graph is vertex-transitive if its automorphism group acts transitively on vertices. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cay(G, S) is connected if and only if G = S . Example The Cayley graph Cay(Z7, {±1, ±2}) on the left-hand side and the Petersen graph on the right-hand side. Snarks Any other snarks amongst vertex-transitive graphs, in particular Cayley graphs? Snarks Nedela, ˇSkoviera, Combin., 2001 If there exists a Cayley snark, then there is a Cayley snark Cay(G, {a, x, x−1}) where x has odd order, a2 = 1, and G = a, x is either a non-abelian simple group, or G has a unique non-trivial proper normal subgroup H which is either simple non-abelian or the direct product of two isomorphic non-abelian simple groups, and |G : H| = 2. Potoˇcnik, JCTB, 2004 The Petersen graph is the only vertex-transitive snark containing a solvable transitive subgroup of automorphisms. Snarks The hunting for vertex-transitive/Cayley snarks is essentially a special case of the Lovasz question regarding hamiltonian paths/cycles. Existence of a hamiltonian cycle implies that the graph is 3-edge colorable, and thus a non-snark. Hamiltonicity problem is hard, the snark problem is hard too, but should be easier to deal with. The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. See next slides. Partial results for Type 3 graphs A (2, s, t)-generated group is a group G = a, x | a2
Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and Applications to Large Graphs and Networks Modularity Jean-Franc¸ois Marcotorchino Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6 August 2013 Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 1 / 29 Outline 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 2 / 29 Goal of the Presentation Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 3 / 29 Goal of the Presentation Goal of the Presentation Exhibiting a Relationship between Monge & Condorcet (1781-1785) 1 Using the Optimal Transport Theory, based on G. Monge (1781) and L. Kantorovitch MK Problem, for deﬁning two alternatives for measuring ”correlation” within ”stressed contingency structures” according to M. Frechet’s ﬁrst attempt of 1951. 2 Introducing two extended variants of the MKP Problem concerned with Spatial interaction Models: The ”Alan Wilson’s Entropy Model” and the Minimal Trade Model. 3 Deriving and Justifying from those models two ”dual structures” of correlation measures: Deviation from Independance (Mutual Information Index), Deviation from Indetermination (Indetermination Index). Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 4 / 29 Goal of the Presentation Goal of the Presentation 1 Justifying this duality through the so called Monge’s Conditions. 2 Translating those speciﬁc situations into very differentiate but usual indexes (the Tchuprow - χ2: and the Janson-Vegelius’s Index). 3 Explaining ”Deviation from Indetermination” by its ﬁliation with the ”Relational Analysis scheme” of A. de Condorcet. 4 Applying this principle to ”Graphs Modularization Criteria”. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 5 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 6 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems The Monge-Kantorovich Problem The Monge-Kantorovich Problem: P[π∗ ] = inf π∈Π(µ,ν) X×Y c(x, y)dπ(x, y) (1) The linear Monge-Kantorovich problem has a dual formulation: D[ϕ, ψ] = sup (ϕ,ψ) { X ϕdµ+ Y ψdν : c(x, y) ≥ ϕ(x)+ψ(y) on X ×Y } (2) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 7 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems The Monge-Kantorovich Duality Theorem (Kantorovich duality) If there exists π∗ ∈ Π(µ, ν) and an admissible pair (ϕ∗, ψ∗) ∈ £ such that: X×Y c(x, y)dπ∗ (x, y) = X ϕ∗ (x)dµ(x) + Y ψ∗ (y)dν(y) then π∗ is an Optimal Transport Plan and the pair (ϕ∗, ψ∗) solves the problem (2). So there is no gap between the values: inf π P[π] = sup (ϕ,ψ) D[ϕ, ψ] Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 8 / 29 Extensions and variants of the MKP problem Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 9 / 29 Extensions and variants of the MKP problem The discrete version of the MKP problem min π p u=1 q v=1 c(u, v)πuv (3) subject to: q v=1 πuv = µu ∀u ∈ {1, 2, ..., p} (4) p u=1 πuv = νv ∀v ∈ {1, 2, ..., q} (5) p u=1 q v=1 πuv = 1 (6) πuv ≥ 0 ∀u ∈ {1, ..., p}; v ∈ {1, ..., q} (7) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 10 / 29 Extensions and variants of the MKP problem Variants of the MKP problem The Alan Wilson’s Entropy Model: Objective function Subject to Optimal solution max π − p u=1 q v=1 πuv ln πuv Contraints (4),(5) and (7) π∗ uv = µuνv∀(u, v) n∗ uv = nu.n.v N The Minimal Trade Model: Objective function Subject to Optimal solution min π p u=1 q v=1 πuv − 1 pq 2 Contraints (4),(5) and (7) π∗ uv = µu q + νv p − 1 pq n∗ uv = nu. q + n.v p − N pq Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 11 / 29 Extensions and variants of the MKP problem The Continuous version of the Minimal Trade Problem The optimal solution of the Continuous version of the Minimal Trade Problem, obtained by considering the Kantorovich duality (2), is given by: π∗ (x, y) = f(x) B + g(y) A − 1 AB ∀ (x, y) ∈ [a, b] × [c, d] where π : [a, b] × [c, d] −→ [0, 1] is deﬁned on the product of two closed intervals of the cartesian plan; A = (b − a) and B = (d − c) are the respective lengths of those intervals; µ and ν (the marginals of π) have densities f and g respectively. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 12 / 29 Monge and Anti-Monge matrices and some related structural properties Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 13 / 29 Monge and Anti-Monge matrices and some related structural properties The Monge and anti-Monge Matrices Deﬁnition A p × q real matrix {cuv} is called a Monge matrix, if C satisﬁes the so called Monge’s property: cuv + cu v ≤ cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (8) Reciprocally, an ”Inverse Monge Matrix” (or Anti Monge matrix) C satisﬁes the following inequality: cuv + cu v ≥ cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (9) In case both inequalities (8) and (9) hold: cuv + cu v = cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (10) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 14 / 29 Monge and Anti-Monge matrices and some related structural properties The Monge and anti-Monge Matrices Theorem Let {πuv} be a p × q real nonnegative frequency Matrix, then the following properties hold and are equivalent: i) If {πuv} is a Monge and Anti-Monge Matrix then: πuv + πu v = πuv + πu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q ii) πuv = µu q + νv p − 1 pq is a minimizer of the Minimal Trade Model. iii) All the sub tables {u, v, u , v } of size 2 × 2 with 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q have the sum of their diagonals equal to the sum of their anti-diagonals. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 15 / 29 Monge and Anti-Monge matrices and some related structural properties The Log-Monge and Log-Anti-Monge Matrices Deﬁnition A p × q positive real matrix {cuv} is called a Log Monge matrix, if C satisﬁes the Log-Monge’s property: ln cuv+ln cu v ≤ ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (11) Reciprocally, an ”Inverse Log-Monge Matrix” (or Log-Anti-Monge matrix) C satisﬁes the following inequality: ln cuv+ln cu v ≥ ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (12) In case both inequalities (11) and (12) hold: ln cuv+ln cu v = ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (13) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 16 / 29 Monge and Anti-Monge matrices and some related structural properties The Log-Monge and Log-Anti-Monge Matrices Theorem Let {πuv} be a p × q real positive frequency Matrix, then the following properties hold and are equivalent: i) If {πuv} is a Log-Monge and Log-Anti-Monge Matrix then: ln πuv + ln πu v = ln πuv + ln πu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q ii) πuv = µuνv ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q is a minimizer of the Alan Wilson’s Program of Spatial Interaction System based upon Entropy Model, with ﬁxed Margins. iii) All the sub tables {u, v, u , v } of size 2 × 2 with 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q have the product of their diagonal terms equal to the product of their anti-diagonals terms. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 17 / 29 Monge and Anti-Monge matrices and some related structural properties A contingency table X\Y 1 . . . v . . . q Total 1 n11 . . . n1v . . . n1q n1. ... ... ... ... ... u nu1 . . . nuv . . . nuq nu. ... ... ... ... ... p np1 . . . npv . . . npq np. Total n.1 . . . n.v . . . n.q n.. where: nuv = Nπuv : quantity of mass transported from u ∈ X to v ∈ Y . nu. = Nπu.: total mass located originaly at u. n.v = Nπ.v: total mass transported to v. n.. = N Total exchange mass. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 18 / 29 Monge and Anti-Monge matrices and some related structural properties Example of applications of Monge’s conditions on two Contingency Tables subject to the same marginals Indetermination structure: A + B = C + D Independance Structure: AB = CD Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 19 / 29 Duality related to ”Independence” and ”Indetermination” structures Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 20 / 29 Duality related to ”Independence” and ”Indetermination” structures The Mutual Information index (MI) and the The Deviation to indetermination Index (IND) The Mutual Information index (MI) ρMI: ρMI = SX + SY − S(X,Y ) where S(X,Y ) = − p u=1 q v=1 πuv ln πuv; SX = − p u=1 µu ln µu and SY = − q v=1 νv ln νv. The Deviation to indetermination Index (IND) ρIND: ρIND(X, Y ) = K(X,Y ) − KX − KY where K(X,Y ) = pq p u=1 q v=1 πuv − 1 pq 2 ; K(X) = p p u=1 µu − 1 p 2 and K(Y ) = q q v=1 νv − 1 q 2 . Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 21 / 29 Duality related to ”Independence” and ”Indetermination” structures Duality between independence and indetermination structures ∀X, Y |X ∼ µ , Y ∼ ν , (X, Y ) ∼ π The Independence case The Indetermination case S(X,Y ) ≤ SX + SY K(X,Y ) ≥ KX + KY with equality in case of inde- pendence with equality in case of indetermination ρMI(X, Y ) = SX +SY −S(X,Y ) = p u=1 q v=1 πuv ln πuv µuνv ρIND(X, Y ) = K(X,Y ) − KX − KY = pq p u=1 q v=1 πuv − µu q + νv p + 1 pq 2 Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 22 / 29 Duality related to ”Independence” and ”Indetermination” structures Translation of the duality between Independance and Indetermination into Contingency Correlation Measures Mutual Information Index behaves as the χ2 does in the Neibourhood of Independance: ρMI(X, Y ) ∼= p u=1 q v=1 (πuv − µuνv)2 µuνv = 1 n.. Fχ2 [π] The Janson-Vegelius index is fully derived from the Indetermination index ρIND(X, Y ): JV (X, Y ) = pq p u=1 q v=1 π2 uv − p p u=1 µ2 u. − q q v=1 ν2 .v + 1 p(p − 2) p u=1 µ2 u. + 1 q(q − 2) q v=1 ν2 .v + 1 ∀(X, Y ) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 23 / 29 Relational Analysis Approach Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 24 / 29 Relational Analysis Approach Relational Analysis Approach Principle: representing relations between objects by binary coding. A partition is nothing but an equivalence relation on the set of objects, which is represented by a relational N × N matrix X, whose entries are deﬁned as follows: xij = 1 if i and j belong to the same cluster. 0 otherwise. (14) As X is an equivalence Relation, it must be Reﬂexive, Symmetric and Transitive, those properties can be turned into linear constraints on the general terms of the relational matrix X. ¯xij = 1 − xij ∀(i, j) is the inverse relation of X. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 25 / 29 Relational Analysis Approach The Relational Transfer Principle p u=1 q v=1 π2 uv = 1 N2 N i=1 N j=1 xijyij ; p u=1 µ2 u. = 1 N2 N i=1 N j=1 xij; q v=1 ν2 .v = 1 N2 N i=1 N j=1 yij; p u=1 q v=1 π2 uv µu.ν.v = N i=1 N j=1 xij xi. yij y.j ; Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 26 / 29 Relational Analysis Approach Relational Transfer Principle Using the ”Relational Transfer Principle” we get: N2 ρIND(X, Y ) = pq N i=1 N j=1 xijyij −p N i=1 N j=1 xij −q N i=1 N j=1 yij +N2 (15) Origin of the Indetermination Concept: when ρIND(X, Y ) = 0 we get: N i=1 N j=1 xijyij + N i=1 N j=1 ¯xij ¯yij Votes in favor = N i=1 N j=1 ¯xijyij + N i=1 N j=1 xij ¯yij Votes against Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 27 / 29 A new Graph modularization criterion Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 28 / 29 A new Graph modularization criterion Graph Modularization Criteria The Newman-Girvan criterion: 1 2M N i=1 N i =1 aii − ai.a.i 2M xii The deviation to indetermination criterion: 1 2M N i=1 N i =1 aii − ai. N − a.i N + 2M N2 xii Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 29 / 29
ORAL SESSION 5 Discrete Metric Spaces (Michel Deza)
Introduction The Main Result Counting the number of solutions of K DMDGP instances Leo Liberti1,2, Carlile Lavor3, Jorge Alencar3, Germano Abud3 1IBM “T.J. Watson” Research Center, Yorktown Heights, 10598 NY, USA 2LIX, Ecole Polytechnique, 91128 Palaiseau, France 3Dept. of Applied Math. , University of Campinas, Campinas – SP, Brazil Introduction The Main Result Contents 1 Introduction Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections 2 The Main Result Counting Incongruent Realizations Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Distance Geometry Problem (DGP) Given an integer K > 0 and an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, is there a function x : V → RK such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? In other words: ﬁnd an embedding (or a realization) of G in RK , such that the Euclidean distances in RK match the given edge weights. NP-complete, for K = 1. Strongly NP-hard, for K > 1. Important case: K = 3. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Discretizable Molecular DGP (DMDGP) Given an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, and such that there is there is an order v1, v2, . . . , vn of V satisfying (a) ∀i ∈ {4, . . . , n}, ∀j, k ∈ {i − 3, . . . , i} : {vj , vk} ∈ E (b) ∀i ∈ {2, . . . , n − 1}, d(vi−1, vi+1) < d(vi−1, vi ) + d(vi , vi+1), is there an embedding x : V → R3 such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections For each vertex vi (with i ≥ 4), if we know the realizations of its 3 immediate predecessors, as well as their distances to vi , then there are, with probability 1, two possible positions for vi . S2 (i − 3, di−3,i ) ∩ S2 (i − 2, di−2,i ) i − 3 i − 2 i i − 1 di−3,i−2 di−2,i−1 θi−3,i−1 i di−3,i di−3,i θi−2,i Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Let ED = {{v, v − j} : j ∈ {1, . . . , K}}, EP = E \ ED and m = |E|. A discrete search is possible (Branch-and-Prune). Let X the set of all realizations. Our goal is to determine the cardinality of X. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Incongruence Two sets in RK are congruent if there is a sequence of translations, rotations and reﬂections that turns one into the other. X is partially correct in this respect: there is a “four level symmetry” . Half of the realizations in X are reﬂections of the other, along the plane through the ﬁrst K vertices. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Probability 1 The theory supporting BP algorithm is based on d satisfying strict simplex inequalities. Intersection of K spheres in RK might have uncountable cardinality, or be a singleton set. We have manifolds of Lebesgue measure zero in RK . The probability of uniformly sampling d such that it yields a YES K DMDGP instance satisfying the strict simplex inequalities is 1. We state most of our results “with probability 1”. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Partial Reﬂections For x ∈ X and K < v ∈ V let Rv x be the reﬂection along the hyperplane through xv−K , . . . , xv−1. The partial reﬂection operator with respect to x is: gv (x) = (x1, . . . , xv−1, Rv x (xv ), Rv x (xv+1), . . . , Rv x (xn)). Figure: The action of the reﬂection Rv x in RK . Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Partial Reﬂections For v > u > K and x ∈ X, we deﬁne a product between partial reﬂections: gugv (x) = gu(gv (x)) = gu(x1, . . . , xv?1, Rv x (xv ), . . . , Rv x (xn)) = = (x1, . . . , xu−1, Ru x (xu), . . . , Ru x (xv−1), Ru gv (x)(xv ), . . . , Ru gv (x)(xn)). Let ΓD = {gv : v > K} and GD = ΓD the invariant group of the set of realizations XD (found by the BP) of GD = (V , ED). Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reﬂections Assuming EP = ∅ Let {u, w} ∈ EP and deﬁne Suw = {u + K + 1, . . . , w} (u < w). GP is the subgroup of GD generated by ΓP = {gv : v > K ∧ ∀{u, w} ∈ EP(v /∈ Suw )}. 5 1 2 3 4 3 1 45 2 Figure: On the left: the set XD. On the right: the eﬀect of the pruning edge {1, 4} on XD. Introduction The Main Result Counting Incongruent Realizations Counting Incongruent Realizations There is a integer such that |X| = 2 with probability 1. We can easily reﬁne this: Proposition With probability 1, |X| = 2ΓP . Proof. GD ∼= Cn−K 2 , so that |GD = 2n−K |. Since GP GD, |GP| divides GD. But |GP| = 2|ΓP |. The action of GP on X has only one orbit: GPx = X, ∀x ∈ X. Every partial reﬂection operator is idempotent. Thus, gx = g x implies g g = 1 whence g = g and |GPx| = |GP|. For any x ∈ X, |X| = |GPx| = |GP| = 2|ΓP |. Introduction The Main Result Counting Incongruent Realizations Thank you !
Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization Orders for Distance Geometry Discretizable DG group Carlile Lavor, Leo Liberti, Nelson Maculan, Antonio Mucherino GSI13 Paris, France August 28th 2013 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Distance Geometry Problem (for molecular conformations) Let G = (V, E, d) be a simple weighted undirected graph, where V the set of vertices of G − it is the set of atoms; E the set of edges of G − it is the set of known distances; E′ ⊂ E the subset of E where distances are exact; d the weights associated to the edges of G the numerical value of each weight corresponds to the known distance; it can be an interval. Deﬁnition The DGP. Determine whether there exists a function x : V −→ ℜK for which, for all edges (u, v) ∈ E, ||xu − xv || = d(u, v). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Sphere intersections In the 3-dimensional space, the intersection of 2 spheres gives one circle 3 spheres gives two points 2 spheres and 1 spherical shell gives two disjont curves Spheres and spherical shells can be centered in known vertex positions, while their radii are related to the distance information. All this is true with probability 1: the reference vertices cannot be aligned, the strict triangular inequality needs to be satisﬁed. Generalization to any dimension K: the volume of the (K − 1)-simplex deﬁned by the reference vertices needs to be strictly positive (simplex inequalities). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Branch & Prune algorithm The Branch & Prune (BP) algorithm is based on the idea of branching over all possible positions for each vertex, and of pruning by using additional information not used in the discretization process. In this tree, it is supposed that all available distances are exact. D sample (exact) distances can be taken from interval distances. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Importance of orders The deﬁnition of an order on the vertices in V allows us to ensure that vertex coordinates are available when needed. Given 1 a simple weighted undirected graph G = (V, E, d) 2 a vertex v ∈ V how to identify K vertices wi , with i = 1, 2, . . . , K, for which the coordinates of every wi are available every edge (wi , v) ∈ E ???? We refer to wi as a reference vertex for v (wi , v) as a reference distance for v Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Deﬁnition of order Deﬁnition An order for V is a sequence r : N → V ∪ {0} with length |r| ∈ N (for which ri = 0 for all i > |r|) such that, for each v ∈ V, there is an index i ∈ N for which ri = v. Some facts about orders: they allow for vertex repetitions (|r| ≥ |V|); however, each vertex can be used as a reference only once; simplex inequalities (generally satisﬁed with probability 1) would not be satisﬁed if the same vertex were used twice as a reference. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Counting the reference vertices Let r be an order for V. Let us consider the following counters. α(ri ): counter of adjacent predecessors of ri; β(ri ): counter of adjacent successors of ri; αex (ri ): counter of adjacent predecessors of ri related to an exact distance. Necessary condition for V to admit a discretization order is that, for any order r on V, ∀i ∈ {1, 2, . . . , |r|}, α(ri ) + β(ri ) ≥ K. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders We refer to any order r that allows for the discretization of the DGP as Discretization order Two big families: Discretization orders with consecutive reference vertices for each ri , reference vertices always immediately precede ri in the order Discretization orders without consecutive reference vertices any vertex with rank < i can be a reference for ri Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders For K = 3. the reference vertices for ri are searched only in the “window” [ri−K , . . . , ri−1] the simplex inequality must be satisﬁed on the window. the reference vertices for ri are in the “big window” [r1, . . . , ri−1] the simplex inequality must be satisﬁed for the K reference vertices inside the big window. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The ordering problem Deﬁnition Given a simple weighted undirected graph G = (V, E, d) and a positive integer K, establish whether there exists an order r such that: (a) GC = (VC , EC) ≡ G[{r1, r2, . . . , rK }] is a clique and EC ⊂ E′ ; (b) ∀i ∈ {K + 1, . . . , |r|}, α(ri ) ≥ K and αex (ri ) ≥ K − 1. Remarks: this problem is NP-complete when K is not ﬁxed no consecutivity assumption: solvable in polynomial time when K is known when dealing with proteins, K = 3 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . A greedy algorithm 0: reorder(G) while (a valid order r is not found yet) do let i = 0; ﬁnd a K-clique C in G with exact distances; // position C at the beginning of new order for (all vertices v in C) do let i = i + 1; let ri = v; end for // greedy search while (V is not covered) do v = arg max{α(u) | ∃j ≤ i : rj = u and αex (u) ≥ K − 1}; if (α(v) < K) then break the inner loop: there are no possible orderings for C; end if // adding the vertex to the order let i = i + 1; let ri = v; end while end while return r; Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . An order for the protein backbone This order was automatically obtained by the greedy algorithm. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments NMR-like instances are considered in these experiments, including protein backbones and side chains. PDB name naa n |E| D LDE 1brv 4 51 368 3 2.10e-4 1brv 8 98 853 9 5.88e-4 1ccq 6 114 1181 3 1.16e-4 1ccq 10 183 2169 8 1.63e-4 1acz 6 94 929 3 1.63e-4 1acz 13 199 2144 3 1.95e-4 1acz 21 308 3358 10 4.93e-4 1k1v 6 110 1236 3 3.04e-4 1k1v 18 317 4169 3 3.66e-4 1k1v 30 519 7068 3 5.63e-4 All instances were automatically reordered by the greedy algorithm, and the BP algorithm was invoked for ﬁnding one solution. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments Two solutions obtained during the experiments. On the left, a 4-amino acid fragment of 1brv; on the right, a 18-amino acid fragment of 1k1v. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Work in progress . . . Can we ﬁnd orders that help BP in ﬁnding solutions? minimize in length the subsequences in the order having no pruning distances (for proteins) maximize the interval distances that are related to pairs of hydrogen atoms . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Other work in progress . . . identify clusters of solutions in BP solution sets ﬁnd a way for avoiding discretizing the intervals improve and tailor the parallel versions of BP to interval data (for proteins) use real NMR data and compare our results to what is currently available on the PDB . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Thanks!
Studying new classes of graph metrics Pavel Chebotarev Russian Academy of Sciences: Institute of Control Sciences pavel4e@gmail.com GSI’2013 – Geometric Science of Information Paris – Ecole des Mines August 28, 2013 Pavel Chebotarev New classes of graph metrics Page 1 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 2 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 2 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter deﬁnition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter deﬁnition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter deﬁnition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) = 0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0 G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) = 0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0 G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) = 0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0 G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the ﬁrst to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Paciﬁc Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the ﬁrst to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Paciﬁc Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the ﬁrst to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Paciﬁc Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Resistance distance: Connections The resistance distance is proportional to the commute-time distance in the corresponding Markov chain. It is expressed as follows: d r (i, j) = + ii + + jj − 2 + ij , where ( + ij )n×n = L+ is the Moore-Penrose pseudoinverse of L, L = diag(A1) − A is the Laplacian matrix of G. Here, diag(A1) is the matrix of weighted vertex degrees. Pavel Chebotarev New classes of graph metrics Page 8 Resistance distance: Connections The resistance distance is proportional to the commute-time distance in the corresponding Markov chain. It is expressed as follows: d r (i, j) = + ii + + jj − 2 + ij , where ( + ij )n×n = L+ is the Moore-Penrose pseudoinverse of L, L = diag(A1) − A is the Laplacian matrix of G. Here, diag(A1) is the matrix of weighted vertex degrees. Pavel Chebotarev New classes of graph metrics Page 8 Example For any tree, the resistance distance coincides with the shortest path distance! For our graph: 2 3 4 1 5 Ds (G) = 0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0 ; Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 G Pavel Chebotarev New classes of graph metrics Page 9 Example For any tree, the resistance distance coincides with the shortest path distance! For our graph: 2 3 4 1 5 Ds (G) = 0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0 ; Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 G Pavel Chebotarev New classes of graph metrics Page 9 A combinatorial interpretation A combinatorial interpretation of the resistance distance 2 3 4 1 5 Dr(G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r(i, j) = f[2](i,j) T Pavel Chebotarev New classes of graph metrics Page 10 A combinatorial interpretation A combinatorial interpretation of the resistance distance 2 3 4 1 5 Dr(G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r(i, j) = f[2](i,j) T Pavel Chebotarev New classes of graph metrics Page 10 Connection with the graph structure 2 3 4 1 5 Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Deﬁnition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Deﬁnition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Deﬁnition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) = 0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0 d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Deﬁnition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisﬁes the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisﬁes a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisﬁes the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisﬁes a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisﬁes the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisﬁes a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 Proximity measures Let’s think of proximity measures... Pavel Chebotarev New classes of graph metrics Page 13 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Proximity measures Spanning forest measures Reliability measures Path measures Walk measures Pavel Chebotarev New classes of graph metrics Page 15 The spanning forest measure Q = (I + L)−1 Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. For networks, the number of forests is replaced by the weight of forests. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. For networks, the number of forests is replaced by the weight of forests. The weight is the product of edge weights. Pavel Chebotarev New classes of graph metrics Page 16 Transitional measures Theorem (a property of the forest measure) The matrix Q satisﬁes: qij qjk ≤ Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisﬁes: qij qjk ≤ qik qjj Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisﬁes: qij qjk ≤ qik qjj (transition inequality) and qij qjk = Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisﬁes: qij qjk ≤ qik qjj (transition inequality) and qij qjk = qik qjj iff all paths from i to k in G contain j (bottleneck identity). Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisﬁes: qij qjk ≤ qik qjj (transition inequality) and qij qjk = qik qjj iff all paths from i to k in G contain j (bottleneck identity). Deﬁnition In this case we say that Q determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 17 Transitional Measures A matrix S =(sij)∈Rn×n satisﬁes the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ (1) i j k Pavel Chebotarev New classes of graph metrics Page 18 Transitional Measures A matrix S =(sij)∈Rn×n satisﬁes the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ sik sjj. (1) i j k Pavel Chebotarev New classes of graph metrics Page 18 Transitional Measures A matrix S =(sij)∈Rn×n satisﬁes the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ sik sjj. (1) i j k Is there any connection between transitional measures and cutpoint additive distances? Pavel Chebotarev New classes of graph metrics Page 18 The connection reliability measure Let the edge weight wij ∈ (0, 1] be the intactness probability of the (i, j) edge. Pavel Chebotarev New classes of graph metrics Page 19 The connection reliability measure Let the edge weight wij ∈ (0, 1] be the intactness probability of the (i, j) edge. Deﬁnition Let pij be the i → j connection reliability, i.e., the probability that at least one path from i to j remains intact, provided that the edge failures are independent. P = (pij) is the matrix of pairwise connection reliabilities. Pavel Chebotarev New classes of graph metrics Page 19 Is the connection reliability a transitional measure? Theorem For any graph G with edge weights wp ij ∈ (0, 1], the matrix P = (pij) of connection reliabilities determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 20 Is the connection reliability a transitional measure? Theorem For any graph G with edge weights wp ij ∈ (0, 1], the matrix P = (pij) of connection reliabilities determines a transitional measure on G. A representation of the connection reliabilities1 pij = k Pr(Pk ) − k
Tessellabilities, Reversibilities, and Decomposabilities of Polytopes ― A Survey ― École nationale supérieure des mines de Paris Paris, August 28, 2013 Jin Akiyama： Tokyo University of Science Ikuro Sato: Miyagi Cancer Center Hyunwoo Seong: The University of Tokyo Tokyo, Japan 1. P1-TILES AND P2-TILES 2 3 A P1-tile is a polygon which tiles the plane with translations only. Two families of convex P1-tiles : (1) parallelograms and (2) hexagons with three pairs of opposite sides parallel and of the same lengths (P1-hexagons). Parallelogram P1-hexagon Parallelepiped(PP) Rhombic Dodecahedron(RD) Hexagonal Prism(HP) Elongated Rhombic Dodecahedron(ERD) Truncated Octahedron(TO) 4 F1 F2 F3 F4 F5 A 3-dimensional P1-tile is a polyhedron which tiles the space with translations only. Five families of convex 3-dimensional P1-tiles (Fedorov) : 10 Triangle Quadrilateral P2-pentagon (BC∥ED) P2-hexagon (QPH) (AB∥ED and |AB|=|ED|) Theorem A Every convex P2-tile belongs to one of the following four families: F1 F2 F3 F4 A P2-tile is a polygon which tiles the plane by translations and 180° rotations only. 11 Determine all convex 3-dimensional P2-tiles, i.e., convex polyhedra each of which tiles the space in P2-manner. (cf) triangular prism, … A net of a convex polyhedron P is defined to be a connected planar object obtained by cutting the surface of P. An ART (almost regular tetrahedron) is a tetrahedron with four congruent faces. CG Theorem B (J.A(2007)) Every net (convex or concave) of an ART tiles the plane in P2-manner. Artworks Artworks Artworks 2. REVERSIBILITY 17 18 Volvox, a kind of green alga known as one of the most simple colonial (≒ multicellular) organisms, reproduces itself by reversing its interior offspring and its surface. Theorem C (J.A. (2007)) If a pair of polygons A and B is reversible, then each of them tiles the plane by translations and 180°rotations only (P2- tiling). 19 A ： red quadrilateral, B： blue triangle CG CG 20 Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem D ( J.A., I. Sato, H. Seong (2013)) For an arbitrary convex P2-tile P and an arbitrary family Fi (i= 1, 2, 3, and 4) of convex P2-tiles, there exists a polygon Q ∈ Fi such that the pair P and Q is reversible. A king in a cage 22 Spider ⇔ Geisha 23 A 3-dimensional P1-tile is said to be canonical if it is convex and symmetric with respect to each orthogonal axis. F5F4F3F2F1 24 UFO ⇔ Alien CG Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem E ( J.A., I. Sato, H. Seong (2011)) For an arbitrary canonical 3-dimensional P1-tile P and an arbitrary family Fi (i= 1, 2, 3, 4, and 5) of canonical 3- dimensional P1-tiles, there exists a polyhedron Q ∈ Fi such that the pair P and Q is reversible. Cube -> Hexagonal Prism 25 CG Hexagonal Prism -> Truncated Octahedron Rhombic Dodecahedron -> Elongated Rhombic Dodecahedron CG CG 3. TILINGS AND ATOMS 26 2 2 2 6 6 2 6 2 2 2 4 4 3 2 23 A symmetric pair of pentadra Pentadron is a convex pentahedron whose net is as follows: 28 Tetrapak is a special kind of ART(tetrahedron with four congruent faces) made by pentadra as follows: 29 Theorem F (J.A.) A tetrapak tiles the space and its net tiles the plane. Problem Determine all convex polyhedra, each of which tiles the space and one of its nets tiles the plane. Theorem G (J.A, G.Nakamura, I.Sato (2012)) Every convex 3-dimensional P1-tile (or its affine- stretching transform) can be constructed by copies of a pentadron. 31 Cube Hexagonal prism 32 33 Truncated octahedron Rhombic dodecahedron 35 Elongated rhombic dodecahedron
ORAL SESSION 6 Computational Information Geometry (Frank Nielsen)
A new implementation of k-MLE for mixture modelling of Wishart distributions Christophe Saint-Jean Frank Nielsen Geometric Science of Information 2013 August 28, 2013 - Mines Paris Tech Application Context (1) 2/31 We are interested in clustering varying-length sets of multivariate observations of same dim. p. X1 = 3.6 0.05 −4. 3.6 0.05 −4. 3.6 0.05 −4. , . . . , XN = 5.3 −0.5 2.5 3.6 0.5 3.5 1.6 −0.5 4.6 −1.6 0.5 5.1 −2.9 −0.5 6.1 Sample mean is a good but not discriminative enough feature. Second order cross-product matrices tXi Xi may capture some relations between (column) variables. Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Outline of this talk 4/31 1 MLE and Wishart Distribution Exponential Family and Maximum Likehood Estimate Wishart Distribution Two sub-families of the Wishart Distribution 2 Mixture modeling with k-MLE Original k-MLE k-MLE for Wishart distributions Heuristics for the initialization 3 Application to motion retrieval Reminder : Exponential Family (EF) 5/31 An exponential family is a set of parametric probability distributions EF = {p(x; λ) = pF (x; θ) = exp { t(x), θ + k(x) − F(θ)|θ ∈ Θ} Terminology: λ source parameters. θ natural parameters. t(x) suﬃcient statistic. k(x) auxiliary carrier measure. F(θ) the log-normalizer: diﬀerentiable, strictly convex Θ = {θ ∈ RD|F(θ) < ∞} is an open convex set Almost all commonly used distributions are EF members but uniform, Cauchy distributions. Reminder : Maximum Likehood Estimate (MLE) 6/31 Maximum Likehood Estimate principle is a very common approach for ﬁtting parameters of a distribution ˆθ = argmax θ L(θ; χ) = argmax θ N i=1 p(xi ; θ) = argmin θ − 1 N N i=1 log p(xi ; θ) assuming a sample χ = {x1, x2, ..., xN} of i.i.d observations. Log density have a convenient expression for EF members log pF (x; θ) = t(x), θ + k(x) − F(θ) It follows ˆθ = argmax θ N i=1 log pF (xi ; θ) = argmax θ N i=1 t(xi ), θ − NF(θ) MLE with EF 7/31 Since F is a strictly convex, diﬀerentiable function, MLE exists and is unique : F(ˆθ) = 1 N N i=1 t(xi ) Ideally, we have a closed form : ˆθ = F−1 1 N N i=1 t(xi ) Numerical methods including Newton-Raphson can be successfully applied. Wishart Distribution 8/31 Deﬁnition (Central Wishart distribution) Wishart distribution characterizes empirical covariance matrices for zero-mean gaussian samples: Wd (X; n, S) = |X| n−d−1 2 exp − 1 2tr(S−1X) 2 nd 2 |S| n 2 Γd n 2 where for x > 0, Γd (x) = π d(d−1) 4 d j=1 Γ x − j−1 2 is the multivariate gamma function. Remarks : n > d − 1, E[X] = nS The multivariate generalization of the chi-square distribution. Wishart Distribution as an EF 9/31 It’s an exponential family: log Wd (X; θn, θS ) = < θn, log |X| >R + < θS , − 1 2 X >HS + k(X) − F(θn, θS ) with k(X) = 0 and (θn, θS ) = ( n − d − 1 2 , S−1 ), t(X) = (log |X|, − 1 2 X), F(θn, θS ) = θn + (d + 1) 2 (d log(2) − log |θS |)+log Γd θn + (d + 1) 2 MLE for Wishart Distribution 10/31 In the case of the Wishart distribution, a closed form would be obtained by solving the following system ˆθ = F−1 1 N N i=1 t(xi ) ≡ d log(2) − log |θS | + Ψd θn + (d+1) 2 = ηn − θn + (d+1) 2 θ−1 S = ηS (1) with ηn and ηS the expectation parameters and Ψd the derivative of the log Γd . Unfortunately, no closed-form solution is known. Two sub-families of the Wishart Distribution (1) 11/31 Case n ﬁxed (n = 2θn + d + 1) Fn(θS ) = nd 2 log(2) − n 2 log |θS | + log Γd n 2 kn(X) = n − d − 1 2 log |X| Case S ﬁxed (S = θ−1 S ) FS (θn) = θn + d + 1 2 log |2S| + log Γd θn + d + 1 2 kS (X) = − 1 2 tr(S−1 X) Two sub-families of the Wishart Distribution (2) 12/31 Both are exponential families and MLE equations are solvable ! Case n ﬁxed: − n 2 ˆθ−1 S = 1 N N i=1 − 1 2 Xi =⇒ ˆθS = Nn N i=1 Xi −1 (2) Case S ﬁxed : ˆθn = Ψ−1 d 1 N N i=1 log |Xi | − log |2S| − d + 1 2 , ˆθn > 0 (3) with Ψ−1 d the functional reciprocal of Ψd . An iterative estimator for the Wishart Distribution 13/31 Algorithm 1: An estimator for parameters of the Wishart Input: A sample X1, X2, . . . , XN of Sd ++ Output: Final values of ˆθn and ˆθS Initialize ˆθn with some value > 0; repeat Update ˆθS using Eq. 2 with n = 2ˆθn + d + 1; Update ˆθn using Eq. 3 with S the inverse matrix of ˆθS ; until convergence of the likelihood; Questions and open problems 14/31 From a sample of Wishart matrices, distr. parameters are recovered in few iterations. Major question : do you have a MLE ? probably ... Minor question : sample size N = 1 ? Under-determined system Regularization by sampling around X1 Mixture Models (MM) 15/31 A additive (ﬁnite) mixture is a ﬂexible tool to model a more complex distribution m: m(x) = k j=1 wj pj (x), 0 ≤ wj ≤ 1, k j=1 wj = 1 where pj are the component distributions of the mixture, wj the mixing proportions. In our case, we consider pj as member of some parametric family (EF) m(x; Ψ) = k j=1 wj pFj (x; θj ) with Ψ = (w1, w2, ..., wk−1, θ1, θ2, ..., θk) Expectation-Maximization is not fast enough [5] ... Original k-MLE (primal form.) in one slide 16/31 Algorithm 2: k-MLE Input: A sample χ = {x1, x2, ..., xN}, F1, F2, ..., Fk Bregman generator Output: Estimate ˆΨ of mixture parameters A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argmaxj log ˆwj pFj (xi ; ˆθj ); foreach Cj := {xi ∈ χ|zi = j} do ˆθj = MLEFj (Cj ); until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE’s properties 17/31 Another formulation comes with the connection between EF and Bregman divergences [3]: log pF (x; θ) = −BF∗ (t(x) : η) + F∗ (t(x)) + k(x) Bregman divergence BF (. : .) associated to a strictly convex and diﬀerentiable function F : Original k-MLE (dual form.) in one slide 18/31 Algorithm 3: k-MLE Input: A sample χ = {y1 = t(x1), y2 = x2, ..., yn = t(xN)}, F∗ 1 , F∗ 2 , ..., F∗ k Bregman generator Output: ˆΨ = (ˆw1, ˆw2, ..., ˆwk−1, ˆθ1 = F∗(ˆη1), ..., ˆθk = F∗(ˆηk)) A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argminj BF∗ j (yi : ˆηj ) − log ˆwj ; foreach Cj := {xi ∈ χ|zi = j} do ˆηj = xi ∈Cj yi /|Cj | until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE for Wishart distributions 19/31 Practical considerations impose modiﬁcations of the algorithm: During the assignment empty clusters may appear (High dimensional data get this worse). A possible solution is to consider Hartigan and Wang’s strategy [6] instead of Lloyd’s strategy: Optimally transfer one observation at a time Update the parameters of involved clusters. Stop when no transfer is possible. This should guarantees non-empty clusters [7] but does not work when considering weighted clusters... Get back to an“old school”criterion : |Czi | > 1 Experimentally shown to perform better in high dimension than the Lloyd’s strategy. k-MLE - Hartigan and Wang 20/31 Criterion for potential transfer (Max): log ˆwzi pFzi (xi ; ˆθzi ) log ˆwz∗ i pFz∗ i (xi ; ˆθzi ∗ ) < 1 with z∗ i = argmaxj log ˆwj pFj (xi ; ˆθj ) Update rules : ˆθzi = MLEFj (Czi \{xi }) ˆθz∗ i = MLEFj (Cz∗ i ∪ {xi }) OR Criterion for potential transfer (Min): BF∗ (yi : ηz∗ i ) − log wz∗ i BF∗ (yi : ηzi ) − log wzi < 1 with z∗ i = argminj (BF∗ (yi : ηj ) − log wj ) Update rules : ηzi = |Czi |ηzi − yi |Czi | − 1 ηz∗ i = |Cz∗ i |ηz∗ i + yi |Cz∗ i | + 1 Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Fast and greedy approximation : Θ(kN) Probabilistic guarantee of good initialization: OPTF ≤ k-meansF ≤ O(log k)OPTF Dual Bregman divergence BF∗ may replace the square distance Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to ﬁx k 22/31 K-means imposes to ﬁx k, the number of clusters We propose on-the-ﬂy cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” It may overestimate the number of clusters... Initialization with DP-k-MLE++ 23/31 Algorithm 4: DP-k-MLE++ Input: A sample y1 = t(X1), . . . , yN = t(XN), F , λ > 0 Output: C a subset of y1, . . . , yN, k the number of clusters Choose ﬁrst seed C = {yj }, for j uniformly random in {1, 2, . . . , N}; repeat foreach yi do compute pi = BF∗ (yi : C)/ N i =1 BF∗ (yi : C) where BF∗ (yi : C) = minc∈CBF∗ (yi : c) ; if ∃pi > λ then Choose next seed s among y1, y2, . . . , yN with prob. pi ; Add selected seed to C : C = C ∪ {s} ; until all pi ≤ λ; k = |C|; Motion capture 24/31 Real dataset: Motion capture of contemporary dancers (15 sensors in 3d). Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Remark: Size of each sub-motion is known (so its θn) Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with diﬀerent row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Mixture parameters can be viewed as a sparse representation of local dynamics in Xi . Application to motion retrieval(2) 26/31 Comparing two movements amounts to compute a dissimilarity measure between ˆΨi and ˆΨj . Remark 1 : with DP-k-MLE++, the two mixtures would not probably have the same number of components. Remark 2 : when both mixtures have one component, a natural choice is KL(Wd (.; ˆθ)||Wd (.; ˆθ )) = BF∗ (ˆη : ˆη ) = BF (ˆθ : ˆθ) A closed form is always available ! No closed form exists for KL divergence between general mixtures. Application to motion retrieval(3) 27/31 A possible solution is to use the CS divergence [10]: CS(m : m ) = − log m(x)m (x)dx m(x)2dx m (x)2dx It has a analytic formula for m(x)m (x)dx = k j=1 k j =1 wj wj exp F(θj +θj )−(F(θj )+F(θj )) Note that this expression is well deﬁned since natural parameter space Θ = R+ ∗ × Sp ++ is a convex cone. Implementation 28/31 Early speciﬁc code in MatlabTM. Today implementation in Python (based on pyMEF [2]) Ongoing proof of concept (with Herranz F., Beuriv´e A.) Conclusions - Future works 29/31 Still some mathematical work to be done: Solve MLE equations to get F∗ = ( F)−1 then F∗ Characterize our estimator for full Wishart distribution. Complete and validate the prototype of system for motion retrieval. Speeding-up algorithm: computational/numerical/algorithmic tricks. library for bregman divergences learning ? Possible extensions: Reintroduce mean vector in the model : Gaussian-Wishart Online k-means -> online k-MLE ... References I 30/31 Nielsen, F.: k-MLE: A fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing. (2012) pp. 869–872 Schwander, O. and Nielsen, F. pyMEF - A framework for Exponential Families in Python in Proceedings of the 2011 IEEE Workshop on Statistical Signal Processing Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J. Clustering with bregman divergences. Journal of Machine Learning Research (6) (2005) 1705–1749 Nielsen, F., Garcia, V.: Statistical exponential families: A digest with ﬂash cards. http://arxiv.org/abs/0911.4863 (11 2009) Hidot, S., Saint Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering. Pattern Recognition Letters 31(14) (2010) 2318–2324 References II 31/31 Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1) (1979) 100–108 Telgarsky, M., Vattani, A.: Hartigan’s method: k-means clustering without Voronoi. In: Proc. of International Conference on Artiﬁcial Intelligence and Statistics (AISTATS). (2010) pp. 820–827 Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (2007) pp. 1027–1035 Kulis, B., Jordan, M.I.: Revisiting k-means: New algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML). (2012) Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR). (2012) pp. 1723–1726
Hypothesis testing, information divergence and computational geometry Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. August 2013, GSI, Paris, FR c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/20 The Multiple Hypothesis Testing (MHT) problem Given a rv. X with n hypothesis H1 : X ∼ P1, ..., Hn : X ∼ Pn, decide for a IID sample x1, ..., xm ∼ X which hypothesis holds true? Pm correct = 1 − Pm error Asymptotic regime: α = − 1 m log Pm e , m → ∞ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/20 Bayesian hypothesis testing (preliminaries) prior probabilities: wi = Pr(X ∼ Pi ) > 0 (with n i=1 wi = 1) conditional probabilities: Pr(X = x|X ∼ Pi ). Pr(X = x) = n i=1 Pr(X ∼ Pi )Pr(X = x|X ∼ Pi ) = n i=1 wi Pr(X|Pi ) Let ci,j = cost of deciding Hi when in fact Hj is true. Matrix [cij ]= cost design matrix Let pi,j(u) = probability of making this decision using rule u. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/20 Bayesian detector Minimize the expected cost: EX [c(r(x))], c(r(x)) = i wi j=i ci,jpi,j(r(x)) Special case: Probability of error Pe obtained for ci,i = 0 and ci,j = 1 for i = j: Pe = EX i wi j=i pi,j(r(x)) The maximum a posteriori probability (MAP) rule considers classifying x: MAP(x) = argmaxi∈{1,...,n} wi pi (x) where pi (x) = Pr(X = x|X ∼ Pi ) are the conditional probabilities. → MAP Bayesian detector minimizes Pe over all rules [8] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/20 Probability of error and divergences Without loss of generality, consider equal priors ( w1 = w2 = 1 2): Pe = x∈X p(x) min(Pr(H1|x), Pr(H2|x))dν(x) (Pe > 0 as soon as suppp1 ∩ suppp2 = ∅) From Bayes’ rule Pr(Hi |X = x) = Pr(Hi )Pr(X=x|Hi ) Pr(X=x) = wi pi (x)/p(x) Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) Rewrite or bound Pe using tricks of the trade: Trick 1. ∀a, b ∈ R, min(a, b) = a+b 2 − |a−b| 2 , Trick 2. ∀a, b > 0, min(a, b) ≤ minα∈(0,1) aαb1−α, c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/20 Probability of error and total variation Pe = 1 2 x∈X p1(x) + p2(x) 2 − |p1(x) − p2(x)| 2 dν(x), = 1 2 1 − 1 2 x∈X |p1(x) − p2(x)|dν(x) Pe = 1 2 (1 − TV(P1, P2)) total variation metric distance: TV(P, Q) = 1 2 x∈X |p(x) − q(x)|dν(x) → Diﬃcult to compute when handling multivariate distributions. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/20 Bounding the Probability of error Pe min(a, b) ≤ minα∈(0,1) aαb1−α for a, b > 0, upper bound Pe: Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) ≤ 1 2 min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x). C(P1, P2) = − log min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x) ≥ 0, Best error exponent α∗ [7]: Pe ≤ wα∗ 1 w1−α∗ 2 e−C(P1,P2) ≤ e−C(P1,P2) Bounding technique can be extended using any quasi-arithmetic α-means [13, 9]... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/20 Computational information geometry Exponential family manifold [4]: M = {pθ | pθ(x) = exp(t(x)⊤ θ − F(θ))} Dually ﬂat manifolds [1] enjoy dual aﬃne connections [1]: (M, ∇2F(θ), ∇(e), ∇(m)). η = ∇F(θ), θ = ∇F∗ (η) Canonical divergence from Young inequality: A(θ1, η2) = F(θ1) + F∗ (η2) − θ⊤ 1 η2 ≥ 0 F(θ) + F∗ (η) = θ⊤ η c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/20 MAP decision rule and additive Bregman Voronoi diagrams KL(pθ1 : pθ2 ) = B(θ2 : θ1) = A(θ2 : η1) = A∗ (η1 : θ2) = B∗ (η1 : η2) Canonical divergence (mixed primal/dual coordinates): A(θ2 : η1) = F(θ2) + F∗ (η1) − θ⊤ 2 η1 ≥ 0 Bregman divergence (uni-coordinates, primal or dual): B(θ2 : θ1) = F(θ2) − F(θ1) − (θ2 − θ1)⊤ ∇F(θ1) log pi (x) = −B∗ (t(x) : ηi ) + F∗ (t(x)) + k(x), ηi = ∇F(θi ) = η(Pθi ) Optimal MAP decision rule: MAP(x) = argmaxi∈{1,...,n}wi pi (x) = argmaxi∈{1,...,n} − B∗ (t(x) : ηi ) + log wi , = argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi → nearest neighbor classiﬁer [2, 10, 15, 16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/20 MAP & nearest neighbor classiﬁer Bregman Voronoi diagrams (with additive weights) are aﬃne diagrams [2]. argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi ◮ point location in arrangement [3] (small dims), ◮ Divergence-based search trees [16], ◮ GPU brute force [6]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/20 Geometry of the best error exponent: binary hypothesis On the exponential family manifold, Chernoﬀ α-coeﬃcient [5]: cα(Pθ1 : Pθ2 ) = pα θ1 (x)p1−α θ2 (x)dµ(x) = exp(−J (α) F (θ1 : θ2)), Skew Jensen divergence [14] on the natural parameters: J (α) F (θ1 : θ2) = αF(θ1) + (1 − α)F(θ2) − F(θ (α) 12 ), Chernoﬀ information = Bregman divergence for exponential families: C(Pθ1 : Pθ2 ) = B(θ1 : θ (α∗) 12 ) = B(θ2 : θ (α∗) 12 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/20 Geometry of the best error exponent: binary hypothesis Chernoﬀ distribution P∗ [12]: P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) e-geodesic: Ge(P1, P2) = {E (λ) 12 | θ(E (λ) 12 ) = (1 − λ)θ1 + λθ2, λ ∈ [0, 1]}, m-bisector: Bim(P1, P2) : {P | F(θ1) − F(θ2) + η(P)⊤ ∆θ = 0}, Optimal natural parameter of P∗: θ∗ = θ (α∗) 12 = argminθ∈ΘB(θ1 : θ) = argminθ∈ΘB(θ2 : θ). → closed-form for order-1 family, or eﬃcient bisection search. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/20 Geometry of the best error exponent: binary hypothesis P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) pθ1 pθ2 pθ∗ 12 m-bisector e-geodesic Ge(Pθ1 , Pθ2 ) η-coordinate system Pθ∗ 12 C(θ1 : θ2) = B(θ1 : θ∗ 12) Bim(Pθ1 , Pθ2 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/20 Geometry of the best error exponent: multiple hypothesis n-ary MHT [8] from minimum pairwise Chernoﬀ distance: C(P1, ..., Pn) = min i,j=i C(Pi , Pj ) Pm e ≤ e−mC(Pi∗ ,Pj∗ ) , (i∗ , j∗ ) = argmini,j=iC(Pi , Pj ) Compute for each pair of natural neighbors [3] Pθi and Pθj , the Chernoﬀ distance C(Pθi , Pθj ), and choose the pair with minimal distance. (Proof by contradiction using Bregman Pythagoras theorem.) → Closest Bregman pair problem (Chernoﬀ distance fails triangle inequality). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/20 Hypothesis testing: Illustration η-coordinate system Chernoﬀ distribution between natural neighbours c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/20 Summary Bayesian multiple hypothesis testing... ... from the viewpoint of computational geometry. ◮ probability of error & best MAP Bayesian rule ◮ total variation & Pe, upper-bounded by the Chernoﬀ distance. ◮ Exponential family manifolds: ◮ MAP rule = NN classiﬁer (additive Bregman Voronoi diagram) ◮ best error exponent from intersection geodesic/bisector for binary hypothesis, ◮ best error exponent from closest Bregman pair for multiple hypothesis. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/20 Thank you 28th-30th August, Paris. @incollection{HTIGCG-GSI-2013, year={2013}, booktitle={Geometric Science of Information}, volume={8085}, series={Lecture Notes in Computer Science}, editor={Frank Nielsen and Fr\’ed\’eric Barbaresco}, title={Hypothesis testing, information divergence and computational geometry}, publisher={Springer Berlin Heidelberg}, author={Nielsen, Frank}, pages={241-248} } c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/20 Bibliographic references I Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Oxford University Press, 2000. Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock. Bregman Voronoi diagrams. Discrete & Computational Geometry, 44(2):281–307, 2010. Jean-Daniel Boissonnat and Mariette Yvinec. Algorithmic Geometry. Cambridge University Press, New York, NY, USA, 1998. Lawrence D. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Institute of Mathematical Statistics, Hayworth, CA, USA, 1986. Herman Chernoﬀ. A measure of asymptotic eﬃciency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. Vincent Garcia, Eric Debreuve, Frank Nielsen, and Michel Barlaud. k-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In IEEE International Conference on Image Processing (ICIP), pages 3757–3760, 2010. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/20 Bibliographic references II Martin E. Hellman and Josef Raviv. Probability of error, equivocation and the Chernoﬀ bound. IEEE Transactions on Information Theory, 16:368–372, 1970. C. C. Leang and D. H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, January 1997. Frank Nielsen. Generalized Bhattacharyya and Chernoﬀ upper bounds on Bayes error using quasi-arithmetic means. submitted, 2012. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2012. preliminary, technical report on arXiv. Frank Nielsen. Hypothesis testing, information divergence and computational geometry. In Frank Nielsen and Fr´ed´eric Barbaresco, editors, Geometric Science of Information, volume 8085 of Lecture Notes in Computer Science, pages 241–248. Springer Berlin Heidelberg, 2013. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/20 Bibliographic references III Frank Nielsen. An information-geometric characterization of Chernoﬀ information. IEEE Signal Processing Letters (SPL), 20(3):269–272, March 2013. Frank Nielsen. Pattern learning and recognition on statistical manifolds: An information-geometric review. In Edwin Hancock and Marcello Pelillo, editors, Similarity-Based Pattern Recognition, volume 7953 of Lecture Notes in Computer Science, pages 1–25. Springer Berlin Heidelberg, 2013. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Paolo Piro, and Michel Barlaud. Bregman vantage point trees for eﬃcient nearest neighbor queries. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (ICME), pages 878–881, 2009. Paolo Piro, Frank Nielsen, and Michel Barlaud. Tailored Bregman ball trees for eﬀective nearest neighbors. In European Workshop on Computational Geometry (EuroCG), LORIA, Nancy, France, March 2009. IEEE. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/20
The exponential family in abstract information theory Jan Naudts and Ben Anthonis Universiteit Antwerpen Paris, August 2013 1 Outline Fisher information Example Abstract Information Theory Assumptions Examples of generalized divergences Deformed exponential families Conclusions 2 Fisher information The standard expression for the Fisher information matrix Ik,l (θ) = Eθ ∂ ∂θk ln pθ ∂ ∂θl ln pθ is a relevant quantity when pθ belongs to the exponential family. A different quantity is needed in the general case. It involves the Kullback-Leibler divergence D(p||pθ) = Ep ln p pθ . Remember that Ik,l (θ) = ∂2 ∂θk ∂θl D(p||pθ) p=pθ . 3 The divergence D(p||pθ) is a measure for the distance between an arbitrary state p and a point pθ of the statistical manifold. Let D(p||M) denote the minimal ‘distance’. Let Fθ denote the ‘ﬁber’ of all points p for which D(p||M) = D(p||pθ). minimum contrast leaf, Eguchi 1992 Deﬁnition The extended Fisher information of a pdf p (not necessarily in M) is Ik,l (p) = ∂2 ∂θk ∂θl D(p||pθ) p∈Fθ . 4 Note that on the manifold the two deﬁnitions coincide Ik,l(θ) = Ik,l (pθ). Proposition Ik,l (p) is covariant. Proof Let η be a function of θ. One calculates ∂2 ∂θk ∂θl D(x||θ) = ∂2 ∂ηm∂ηn D(x||θ) ∂ηm ∂θk ∂ηn ∂θl + ∂ ∂ηm D(x||θ) ∂2 ηm ∂θk ∂θl . The latter term vanishes because p ∈ Fθ. The former term is manifestly covariant. 5 Proposition If pθ belongs to the exponential family then Ik,l(p) is constant on the ﬁbre Fθ. Proof p, pθ satisﬁes the Pythagorean relation D(p||pη) = D(p||pθ) + D(pθ||pη). Hence taking derivatives w.r.t. η only involves D(pθ||pη). Only afterwards put η = θ. One concludes that Ik,l (p) = Ik,l (pθ). Coordinate-independent method to verify that M is not an exponential family! 6 Example Suggested by H. Matsuzoe. Consider the manifold of normal distributions pµ,σ(x) with mean µ and standard deviation σ pµ,σ(x) = 1 √ 2πσ2 e−(x−µ)2 /2σ2 . Consider the submanifold M of normal distributions for which µ = σ pθ(x) = 1 √ 2πθ2 e−(x−θ)2 /2θ2 . Question Is M an exponential family? Answer It is known to be curved (Efron, 1975). Let us show that I(pµ,σ) is not constant along ﬁbers Fθ. 7 The Kullback-Leibler divergence D(pµ,σ||pθ) is minimal when θ is the positive root of the equation θ2 + µθ = µ2 + σ2 . The Fisher information I(pµ,σ) equals I(pµ,σ) = θ2 + µ2 + σ2 θ4 . It is not constant on Fθ — it cannot be written as a function of θ. This implies that M is not an exponential family. 8 Abstract Information Theory Our aims ◮ Formulate the notion of an exponential family in the context of abstract information theory ◮ If M is not an exponential family w.r.t. the Kullback-Leibler divergence, can it be exponential w.r.t. some other divergence? Abstract information theory does not rely on probability theory. We try to bring classical and quantum information theory together in a single formalism. 9 A generalized divergence is a map D : X × M → [0, +∞] between two different spaces. ◮ A divergence is generically asymmetric in its two arguments. This is an indication that the two arguments play a different role. X is the space of data sets, M is a manifold of models. ◮ D(x||m) has the meaning of a loss of information when the data set x is replaced by the model point m. ◮ In the classical setting X is the space of empirical measures, M is a statistical manifold. One has in this case M ⊂ X. 10 Assumptions Let Q denote a linear space of continuous real functions of X. Instead of q(x) we write x|q to stress that Q is not an algebra. In the classical setting Q is the space of random variables. In the quantum setting Q is a space of operators on a Hilbert space. We consider a class of generalized divergences which can be written into the form D(x||m) = ξ(m) − ζ(x) − x|Lm , where ξ and ζ are real functions and L : M → Q is a map from the manifold M into the linear space Q. We assume in addition a compatibility and a consistency condition — see a later slide. 11 For instance, the quantities ln p, ln pθ appearing in the Kullback-Leibler divergence D(p||pθ) = Ep ln p − Ep ln pθ = p| ln p − p| ln pθ are used as random variables and belong to Q. One can deﬁne a map L : M → Q by Lpθ = ln pθ and write the divergence as D(p||pθ) = ξ(pθ) − ζ(p) − p|Lpθ with ξ(pθ) = 0, ζ(p) = −Ep ln p and p|q = Epq. The quantity ξ(pθ) has been called the corrector by Flemming Topsøe. ζ(p) is the entropy. We call L the logarithmic map. 12 Compatibility condition For each x ∈ X there exists a unique point m ∈ M which minimizes the divergence D(x||m). This means that each point of X belongs to some ﬁber Fm. Consistency condition Each point m of M can be approached by points x of Fm in the sense that D(x||m) can be made arbitrary small. 13 Example: Bregman divergence A divergence of the Bregman type is deﬁned by D(x||m) = a F(x(a)) − F(m(a)) − (x(a) − m(a))f(m(a)) = a x(a) m(a) du [f(u) − f(m(a))] , where F is any strictly convex function deﬁned on the interval (0, 1] and f = F′ is its derivative. L.M. Bregman, The relaxation method to ﬁnd the common point of convex sets and its applications to the solution of problems in convex programming, USSR Comp. Math. Math. Phys. 7 (1967) 200–217. 14 In the notations of our abstract information theory one has ◮ x|q = Ex q; ◮ Lm(a) = f(m(a)); ◮ ζ(x) = − a F(x(a)); ◮ ξ(m) = a m(a)f(m(a)) − a F(m(a)). 15 Note that the Bregman divergence can be written as D(x||m) = a f(m(a)) f(x(a)) du [g(u) − x(a)]. g is the inverse function of f. N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi, Information Geometry of U-Boost and Bregman Divergence, Neural Computation 16, 1437–1481 (2004). In the language of non-extensive statistical physics is f the deformed logarithm, g the deformed exponential function. The Kullback-Leibler divergence is recovered by taking F(u) = u ln u − 1. This implies g(u) = eu and f(u) = ln u. 16 Deformed exponential families A parametrized exponential family is of the form mθ(a) = c(a) exp(−α(θ) − θk Hk (a)). physicists′ notation This implies a logarithmic map of the form Lmθ(a) = ln mθ(a) c(a) = −α(θ) − θk Hk (a). It is obvious to generalize this deﬁnition by replacing the exponential function by a deformed exponential function. J. Naudts, J. Ineq. Pure Appl. Math. 5 102 (2004). S. Eguchi, Sugaku Expositions (Amer. Math. Soc.) 19, 197–216 (2006). P. D. Grünwald and A. Ph. Dawid, Ann. Statist. 32,1367–1433 (2004). 17 Can we give a deﬁnition of a deformed exponential family - which relies only on the divergence? - which does not involve canonical coordinates? - which has a geometric interpretation? Lafferty 1999: additive models mθ minimizes d(m||m0) + θk EmHk . This is a constraint maximum entropy principle. Our proposal: The Fisher information I(x) is constant along the ﬁbers of minimal divergence. This property is a minimum requirement for a distribution to be a (deformed) exponential family. It is satisﬁed for the deformed exponential families based on Bregman type divergences. 18 Csiszár type of divergences Csiszár type of divergence D(x||m) = a m(a)F x(a) m(a) . The choice F(u) = u ln u reproduces Kullback-Leibler. Example In the context of non-extensive statistical mechanics both Csiszár and Bregman type divergences are being used. Fix the deformation parameter q = 1, 0 < q < 2. Csiszár Dq(x||m) = 1 q − 1 a x(a) x(a) m(a) q−1 − 1 , Bregman Dq(x||m) = 1 q − 1 a x(a) m(a)1−q − x(a)1−q + a [m(a) − x(a)] m(a)1−q . 19 Introduce the q-deformed exponential function expq(u) = [1 + (1 − q)u] 1/(1−q) + . The distribution of the form mθ(a) = expq(−α(θ) − θk Hk (a)) is a deformed exponential family relative to the Bregman type divergence, but not relative to the Csiszár type divergence. In the latter case the extended Fisher info is given by Ik,l (x) = z(x) ∂2 α ∂θk ∂θl with ∂α ∂θk = − 1 z(θ) a x(a)q Hk (a) and z(x) = a x(a)q . If q = 1 then z(x) = 1 and the extended Fisher info is constant along Fθ. If q = 1 it is generically not constant along Fθ. 20 Conclusions ◮ We consider Fisher information not only on the statistical manifold of model states but also for empirical measures. ◮ If the model is an exponential family then the Fisher information is constant along ﬁbers of minimal divergence. ◮ We extend the notion of an exponential family to an abstract setting of information theory ◮ In the abstract setting the deﬁnition of a generalized exponential family only depends on the choice of the divergence. 21
Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Variational Problem in Euclidean Space With Density Lakehal BELARBI1 and Mohamed BELKHELFA2 1Departement de Math´ematiques, Universit´e de Mostaganem B.P.227,27000,Mostaganem, Alg´erie. 2Laboratoire de Physique Quantique de la Mati`ere et Mod´elisations Math´ematiques (LPQ3M), Universit´e de Mascara B.P.305 , 29000,Route de Mamounia Mascara, Alg´erie. GEOMETRIC SCIENCE OF INFORMATION Paris-Ecole des Mines 28,29 and 30 August 2013 1 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 4 The Divergence operator in manifolds with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References What is a manifold with density A manifold with density is a Riemannian manifold Mn with positive density function eϕ used to weight volume and hyperarea (and sometimes lower-dimensional area and length).In terms of underlying Riemannian volume dV0 and area dA0 , the new weighted volume and area are given by dV = eϕ .dV0, dA = eϕ .dA0. 3 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Examples of a manifold with density One of the ﬁrst examples of a manifold with density appeared in the realm of probability and statistics, Euclidean space with the Gaussian density e−π|x| (see ([13]) for a detailed exposition in the context of isoperimetric problems). 4 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For reasons coming from the study of diﬀusion processes,Bakry and ´Emery ([1]) deﬁned a generalization of the Ricci tensor of Riemannian manifold Mn with density eϕ (or the ∞−Bakry-´Emery-Ricci tensor) by Ric∞ ϕ = Ric − Hessϕ, (1) where Ric denotes the Ricci curvature of Mn and Hessϕ the Hessian of ϕ. 5 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References By Perelman in ([11],1.3,p.6),in a Riemannian manifold Mn with density eϕ in order for the Lichnerovicz formula to hold, the corresponding ϕ−scalar curvature is given by S∞ ϕ = S − 2∆ϕ− | ϕ |2 , (2) where S denotes the scalar curvature of Mn.Note that this is diﬀerent than taking the trace of Ric∞ ϕ which is S − ∆ϕ. 6 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Following Gromov ([6],p.213), the natural generalization of the mean curvature of hypersurfaces on a manifold with density eϕ is given by Hϕ = H − 1 n − 1 d ϕ dN , (3) where H is the Riemannian mean curvature and N is the unit normal vector ﬁeld of hypersurface . 7 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For a 2-dimensional smooth manifold with density eϕ , Corwin et al.([5],p.6) deﬁne a generalized Gauss curvature Gϕ = G − ∆ϕ. (4) and obtain a generalization of the Gauss-Bonnet formula for a smooth disc D: D Gϕ + ∂D κϕ = 2π, (5) where κϕ is the inward one-dimensional generalized mean curvature as (1.3) and the integrals are with respect to unweighted Riemannian area and arclength ([9],p.181). 8 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Bayle ([2]) has derived the ﬁrst and second variation formulae for the weighted volume functional (see also [9],[10],[13]).From the ﬁrst variation formula, it can be shown that an immersed submanifold Nn−1 in Mn is minimal if and only if the generalized mean curvature Hϕ vanishes (Hϕ = 0). Doan The Hieu and Nguyen Minh Hoang ([8]) classiﬁed ruled minimal surfaces in R3 with density Ψ = ez. 9 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In ([4]) , we have previously written the equation of minimal surfaces in R3 with linear density Ψ = eϕ (in the case ϕ(x, y, z) = x, ϕ(x, y, z) = y and ϕ(x, y, z) = z), and we gave some solutions of the equation of minimal graphs in R3 with linear density Ψ = eϕ. In ([3]),we gave a description of ruled minimal surfaces by geodesics straight lines in Heisenberg space H3 with linear density Ψ = eϕ = eαx+βy+γz,where (α, β, γ) ∈ R3 − {(0, 0, 0)} (in particular ϕ(x, y, z) = αx and ϕ(x, y, z) = βy), and we gave the ∞−Bakry-´Emery Ricci curvature tensor and the ϕ−scalar curvature of Heisenberg space H3 with radial density e−aρ2+c,where ρ = x2 + y2 + z2. 10 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In this section, we introduce notations, deﬁnitions, and preliminary facts which are used throughout this paper. We deal with two-dimensional surfaces in Euclidean 3-space.We assume that the surface is given parametrically by X : U ⊆ R2 → R3. We denote the parameters by u and v. We denote the partial derivatives with respect to u and v by the corresponding subscripts. The normal vector N to the surface at a given point is deﬁned by N = Xu ∧ Xv Xu ∧ Xv . 11 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References The ﬁrst fundamental form of the surface is the metric that is induced on the tangent space at each point of the surface.The (u, v) coordinates deﬁne a basis for the tangent space.This basis consists of the vectors Xu and Xv .In this basis the matrix of the ﬁrst fundamental form is E F F G , where E = Xu.Xu, F = Xu.Xv , and G = Xv .Xv . In this basis, the second fundamental form of the surface is given by the matrix : L M M N , where L = −Xu.Nu, M = −Xu.Nv , and N = −Xv .Nv . 12 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Deﬁnition ([12]) The area AX(R) of the part X(R) of a surface patch X : U ⊆ R2 → R3 corresponding to a region R ⊆ U is AX(R) = R Xu ∧ Xv dudv. and Xu ∧ Xv = (EG − F2 ) 1 2 . 13 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. The surface variation of the family is the function η : U → R3 given by η = ∂Xt ∂t /t=0, Let γ be a simple closed curve that is contained,along with its interior int(γ),in U. Then γ corresponds to a closed curve γt = Xt ◦ γ in the surface St, and we deﬁne the ϕt−area function Aϕt (t) in R3 with density eϕt to be the area of the surface St inside γt in R3 with density eϕt : Aϕt (t) = int(γ) eϕt dAXt . 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Theorem With the above notation, assume that the surface variation ηt vanishes along the boundary curve γ.Then, ∂Aϕt (t) ∂t /t=0 = Aϕ(0) = −2 int(γ) Hϕ.η.N.eϕ .(EG − F2 ) 1 2 dudv, (6) where Hϕ = H − 1 2 ϕ.N is the ϕ−mean curvature of S in R3 with density eϕ , E, F and G are the coeﬃcients of its ﬁrst fundamental form,and N is the standard unit normal of S. 15 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Motavation If S in R3 with density eϕ has the smallest ϕ−area among all surfaces in R3 with density eϕ with the given boundary curve γ, then Aϕ must have an absolute minimum at t = 0, so Aϕ(0) = 0 for all smooth families of surfaces as above. This means that the integral in Eq.(6) must vanish for all smooth functions ζ = η.N : U → R. This can happen only if the term that multiplies ζ in the integrand vanishes,in other words only ifHϕ = 0. This suggests the following deﬁnition. 16 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Deﬁnition A minimal surface in R3 with density eϕ is a surface whose ϕ−mean curvature is zero everywhere. 17 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proposition The minimal equation of surface S : z = f (x, y) in R3 with linear density ex given by the parametrization: X : (x, y) → (x, y, f (x, y)) , where (x, y) ∈ R2 is 1 + ∂f ∂x 2 ∂2f ∂y2 + ∂f ∂x + 1 + ∂f ∂y 2 ∂2f ∂x2 + ∂f ∂x −2 ∂f ∂x . ∂f ∂y . ∂2f ∂x∂y − ∂f ∂x = 0. 18 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Example The surface S in R3 with linear density ex deﬁned by the parametrization : X : (x, y) → x, y, − a2 √ 1 + a2 arcsin(βe − 1+a2 a2 x ) + ay + b + γ , where (x, y) ∈ R2 , a, b, β ∈ R∗ is minimal. 19 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let (Mn, g) be a Riemannian manifold equipped with the Riemannian metric g.For any smooth function f on M, the gradient f is a vector ﬁeld on M, which is locally coordinates x1, x2....., xn has the form ( f )i = gij ∂f ∂xj , where summation is assumed over repeated indices. For any smooth vector ﬁeld F on M, the divergence divF is a scalar function on R, which is given in local coordinates by divF = 1 detgij ∂ ∂xi ( detgij Fi ) Let ν be the Riemannian volume on M, that is ν = detgij dx1.....dxn. By the divergence theorem, for any smooth function f and a smooth vector ﬁeld F, such that either f or F has compact support, M fdivFdν = − M < f , F > dν. (7) 20 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References where < ., . >= g(., .). In particular, if F = ψ for a function ψ then we obtain M fdiv ψdν = − M < f , ψ > dν. (8) provided one of the functions f , ψ has compact support. The operator ∆ := div ◦ is called the Laplace (or Laplace-Beltrami ) operator of the Riemannian manifold M. From (8) we obtain the Green formulas M f ∆ψdν = − M < f , ψ > dν = M ψ∆fdν. (9) 21 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let now µ be another measure on M deﬁned by dµ = eϕ dν where ϕ is a smooth function on M. A triple (Mn, g, µ) is called a weighted manifold or manifold with density.The associative divergence divµ is deﬁned by divµF = 1 eϕ detgij ∂ ∂xi (eϕ detgij Fi ), and the Laplace-Beltrami operator ∆µ of (Mn, g, µ) is deﬁned by ∆µ. := divµ ◦ . = 1 eϕ div(eϕ .) = ∆. + ϕ .. (10) It is easy to see that the Green formulas hold with respect to the measure µ, that is, M f ∆µψdµ = − M < f , ψ > dµ = M ψ∆µfdµ. (11) provided f or ψ belongs to C∞ 0 (M). 22 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Let S a surface in M3 with density Ψ = eϕ, we have divϕN = −2Hϕ. (12) where Hϕ is the ϕ−mean curvature of a surface S and N is the unit normal vector ﬁeld of a surface S. 23 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proof. by deﬁnition we have divϕN = 1 eϕ div(eϕN).+ < ϕ, N > = divN + N. ϕ = ∑i=2 i=1 < ei N, ei > +N. ϕ = ∑i=2 i=1( ei < ei , N > − < ei ei , N >) + N. ϕ = −2 < HN, N > +N. ϕ = −2(H − 1 2 ϕ.N) = −2Hϕ. where we have used that < ei , N >= 0 and the deﬁnition of the mean curvature vector. 24 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References D.Bakry,M.´Emery.Diﬀusions hypercontractives. S´eminaire de Probabilit´es,XIX, 1123 (1983/1984,1985), 177-206. V.Bayle,Propri´et´es de concavit´e du proﬁl isop´erim´etrique et applications.graduate thesis,Institut Fourier,Univ.Joseph-Fourier,Grenoble I ,(2004). L.Belarbi,M.Belkhelfa.Heisenberg space with density,( submetted) . L.Belarbi,M.Belkhelfa.Surfaces in R3 with density,i-manager’s Journal on Mathematics,Vol. 1 .No. 1.(2012),34-48. I.Corwin,N.Hoﬀman,S.Hurder,V.Sesum,and Y.Xu,Diﬀerential geometry of manifolds with density,Rose-Hulman Und.Math.J.,7(1) (2006). 25 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References M.Gromov, Isoperimetry of waists and concentration of maps,Geom.Func.Anal 13(2003),285-215. J.Lott,C.Villani,Ricci curvature metric-measure space via optimal transport.Ann Math,169(3)(2009),903-991. N.Minh,D.T.Hieu,Ruled minimal surfaces in R3 with density ez,Paciﬁc J. Math. 243no. 2 (2009), 277–285. F.Morgan,Geometric measure theory,A Beginer’s Guide,fourth edition,Academic Press.(2009). 26 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References F.Morgan,Manifolds with density,Notices Amer.Math.Soc.,52 (2005),853-858. G.Ya.Perelman,The entropy formula for the Ricci ﬂow and its geometric applications,preprint,http://www.arxiv.org/abs/math.DG/0211159. (2002). A.Pressley.Elementary Diﬀerential Geometry,Second Edition,Springer. (2010). C.Rosales,A.Ca˜nete,V.Bayle,F.Morgan.On the isoperimetric problem in Euclidean space with density.Cal.Var.Partial Diﬀerential Equations.31(1) (2008) 27-46. 27 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References THANK YOU FOR YOUR ATTENTION 28 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density
ORAL SESSION 7 Hessian Information Geometry I (Michel Nguiﬀo Boyom)
Complexiﬁcation of Information Geometry in view of quantum estimation theory Introduction • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually ﬂat structure) Introduction • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually ﬂat structure) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with aﬃne structure (ﬂat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → pθ is injective. Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → pθ is injective. Classical Exponential Families Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → p is injective. Let • X : a ﬁnite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ → pθ is injective. Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an eﬃcient estimator). • P itself is an exponential family. ⇒ (M, g, ∇(e) , ∇(m) ) is dually ﬂat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an eﬃcient estimator). • P itself is an exponential family. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product · | ·, and deﬁne L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] deﬁnes a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) = 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) = 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) = 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the aﬃne space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is deﬁned as the natural ﬂat con- nection. • The e-connection ∇(e) is deﬁned as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) = 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually ﬂat. r=1: pure states • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor ﬁeld J satisfying J2 = −1 (almost complex structure) is canonicaly deﬁned by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diﬃrential 2-form ω is deﬁned by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iﬀ ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iﬀ ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Quasi-Classical Exponential Family (QCEF) M = {ρθ | θ ∈ Rn } ⊂ ¯S is called a quasi-classical exponential family when it is represented as ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # where {F1, . . . , Fn} ⊂ Lh(H), [Fi, Fj] := FiFj − FjFi = 0 (commutative), {ρ0, F1ρ0, . . . , Fnρ0} are linearly independent, ψ(θ) = log Tr h ρ0 exp £X j θi Fj §i . Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are deﬁned by aﬃne coordinates ﬂat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually ﬂat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |xx|. Then we have: pθ(x):= x|ρθ|x = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |xx|. Then we have: pθ(x):= x|ρθ|x = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V z → ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z −→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V z → ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z −→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V z → ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξξ| | ξ ∈ H, ξ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z −→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z −→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V z → ρz is injective, deﬁne ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξξ| | ξ ∈ H, ξ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξξ| | ξ ∈ H, ξ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z −→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z −→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z −→ ρz S1 ∼= P(H) Rn Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexiﬁcation of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Similar to the case of Shima's observation on M and TM Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexiﬁcation of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the ﬂat connection with aﬃne coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X e −→ X X e −→ X m −→ X X m −→ X Y Y Y 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X X → X X e −→ X X e −→ X → X X m −→ X X m −→ X Y Y Y Y → Y Y e −→ Y Y e −→ Y m → Y Y m −→ Y Y m −→ Y X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X X X X e −→ X X e −→ X X e −→ X m −→ X X m −→ X X m −→ Y Y Y Y X e −→ X X X m −→ X X Y Y e −→ Y Y Y m −→ Y Y Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y g(X, Y ) = g(X , Y ) ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X e −→ X X e −→ X m −→ X X m −→ X Y Y Y 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X X X → X X e −→ X X e −→ X → X X m −→ X X m −→ X Y Y Y Y → Y Y e −→ Y Y e −→ Y m → Y Y m −→ Y Y m −→ Y X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X X X X e −→ X X e −→ X X e −→ X m −→ X X m −→ X X m −→ Y Y Y Y X e −→ X X X m −→ X X Y Y e −→ Y Y Y m −→ Y Y Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y g(X, Y ) = g(X , Y ) ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). X e −→ X ↓ J ↓ J J(X) m −→ J(X ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). X e −→ X ↓ J ↓ J J(X) m −→ J(X ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X m m m ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y g(X, Y ) = g(X , Y ) ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). ∇ (m) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) X e −→ X and Y e −→ Y X m −→ X and Y m −→ Y ∇ (m) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) X e −→ X and Y e −→ Y X m −→ X and Y m −→ Y Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X m m m ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X X X X e −→ X X e −→ X X e −→ X X m −→ X X m −→ X X m −→ X Y Y Y Y Y e −→ Y Y e −→ Y Y e −→ Y Y m −→ Y Y m −→ Y Y m −→ Y g(X, Y ) = g(X , Y ) ω(X, Y ) = ω(X , Y ) X e −→ X iﬀ J(X) m −→ J(X ), and X m −→ X iﬀ J(X) e −→ J(X ). ∇ (m) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) X e −→ X and Y e −→ Y X m −→ X and Y m −→ Y ∇ (m) X Z) X m −→ X ↓ J ↓ J J(X) e −→ J(X ) X e −→ X and Y e −→ Y X m −→ X and Y m −→ Y
Fisher Information Geometry of the Barycenter of Probability Measures Mitsuhiro Itoh and Hiroyasu Satoh Institute of Mathematics, University of Tsukuba, Japan and Tokyo Denki University, Japan Motivation. Consider the following character- ization problem. Let (Xo, go) be a Damek-Ricci space. Let (X, g) be an Hadamard manifold, a simply connected complete Riemannian manifold of nonpositive curvature. Assume (X, g) ∼= (Xo, go) (quasi-isometric). Then, is (X, g) itself Damek- Ricci ? Here (Xo, go) is Damek-Ricci, an R-extention of a generalized Heisenberg group N. A Damek- Ricci space is a solvable Lie group with a left invariant metric. A Damek-Ricci space is Riemannian homogeneous and of nonpos- itive curvature. Moreover, a Damek-Ricci space is harmonic, namely, mean curvature of a geodesic sphere is a function of radius. A Damek-Ricci space is a rank one symmetric space of noncompact type, when it is strictly negative curvature. RHn, CHn, HHn and a Cayley hyperbolic space QH2 exhaust the rank one symmetric spaces of noncompact type. §1 Barycenter and barycenter-isometric maps Denote by ∂X the ideal boundary of (X, g). Let P+(∂X) = P+(∂X, dθ) be the space of probability measures on ∂X, absolutely con- tinuous with respect to the canonical mea- sure dθ and having positive density function. So any µ ∈ P+(∂X) is written as µ(θ) = f(θ)dθ, θ ∈ ∂X, f(θ) > 0. Let Bθ(x) = B(x, θ), x ∈ X, θ ∈ ∂X be the Busemann function on X associated to θ, normalized at a reference point o, deﬁned by Bθ(x) = lim t→∞ {d(x, γ(t)) − t}, where γ(t) denotes the geodesic starting o and going to θ. It holds |∇Bθ(·)| = 1 at any point x. Further we have the so-called Busemann cocycle formula with respect to a Riemannian isom- etry φ of (X, g) Bθ(φx) = Bφ−1θ(x) + Bθ(o) ∀ (x, θ) ∈ X × ∂X See [G-J-T]. Deﬁnition 1.1. Let µ ∈ P+(∂X). Then a point y ∈ X is called a barycenter of µ, if the function Bµ : X → R, deﬁned by Bµ(x) = ∫ ∂X Bθ(x)dµ(θ) (1) takes a least value at y. Note the Busemann function and hence the function Bµ(x) is convex in a non-positively curved manifold X. To discuss the existence and uniqueness of barycenter we need fur- ther the strict convexity hypothesis on the Busemann function. Proposition 1.1. Let (X, g) be an Hadamard manifold. Assume that the Hessian DdB(x,θ) of any Busemann function B(x, θ) is strictly positive, except for the gradient direction ∇B(·, θ). Then, there exists uniquely a barycenter for every µ ∈ P+(∂X). See [B-C-G-1, Appendice A] for the proof. We have thus a map bar : P+(∂X, dθ) → X; µ → y, and call it the barycenter map and write y = bar(µ). Note. bar(ˆφ♯µ) = φ(bar(µ)) for a Riemannian isometry φ of (X, g). Here ˆφ denotes the bijective map (homeomorphism) : ∂X → ∂X induced from φ. Remark. in [B-C-G-1] Besson, Courtois and Gallot utilize the notion of barycenter to as- sert the Mostow rigidity of hyperbolic man- ifolds. In fact, let f : ∂X → ∂X be a cer- tain map, where X = RHn, n ≥ 3, a real hyperbolic space. Then, there exists a map F : X → X; F(y) = bar(f∗µy), y ∈ X asso- ciated to the map f, where µy ∈ P+(∂X, dθ) is a special probability measure, called Pois- son kernel probability measure, appeared in §2 and they showed F : X → X is an isometry by using Schwarz’s iequality lemma([B-C-G- 1]). Now, let Φ : ∂X → ∂X be a bijective map (homeomorphism) and Φ♯ : P+(∂X, dθ) → P+(∂X, dθ) be the push-forward map induced from Φ. Φ♯ satisﬁes from deﬁnition of push-forward map ∫ θ∈∂X f(θ) d[Φ♯µ](θ) = ∫ θ∈∂X (f ◦ Φ)(θ) dµ(θ) for any function f = f(θ) on ∂X. Deﬁnition 1.2. We consider the following sit- uation: The map Φ♯ yields a bijective map φ : X → X satisfying bar ◦ Φ♯ = φ ◦ bar in the diagram P+(∂X, dθ) Φ♯ −→ P+(∂X, dθ) (2) ↓ bar ↓ bar X φ −→ X We call such a φ a barycenter-isometric map of (X, g) and denote it bar(Φ). Lemma 1.1. The composition φ◦φ1 of barycenter- isometric maps φ = bar(Φ), φ1 = bar(Φ1) is also barycenter-isometric, with φ ◦ φ1 = bar(Φ ◦ Φ1). Proof. With respect to their composition one can check bar(Φ ◦ Φ1) = bar(Φ) ◦ bar(Φ1) (3) Theorem 1.1. Let φ : X → X be a barycenter- isometric map induced from a homeomorphic map Φ : ∂X → ∂X. Assume that φ is of C1, then φ is a Riemannian isometric map of (X, g), i.e., φ fulﬁlls φ∗g = g. (4) For its proof we need the notion of Fisher in- formation geometry together with the Pois- son kernel. §2. Poisson kernel and Fisher Information Geometry Now we assume that an Hadamard manifold (X, g) admits Poisson kernel. Deﬁnition 2.1. A function P(x, θ) of (x, θ) ∈ X ×∂X is called Poisson kernel, when (i) it induces the fundamental solution of the Dirichlet problem at the ideal boundary ∆u = 0 on X and u|∂X = f for a given data f ∈ C(∂X) so the solution u is described as u = u(x) = ∫ ∂X P(x, θ)f(θ)dθ, (ii) P(x, θ) > 0 for any (x, θ). Then, the mea- sure P(x, θ)dθ is a probability measure on ∂X parametrized by a point x of X and (iii) P(o, θ) = 1 for any θ(normalization at the ref. point o). (iv) limx→θ1 P(x, θ) = 0, ∀θ, θ1 ∈ ∂X, θ1 ̸= θ A Damek-Ricci space admits a Poisson kernel described speciﬁcally as P(x, θ) = exp{−QB(x, θ)}. in terms of B(x, θ) and the volume entropy Q > 0. See [B-C-G-1], [I-S-1], [I-S-2],[I-S-3], [A-B] Lemma 2.1. µx := P(x, θ)dθ ∈ P+(∂X) is a probability measure, parametrized in x for which bar(µx) = x. For a point x ∈ X, let bar−1(x) := {µ ∈ P+(∂X) | bar(µ) = x}. Then, the set bar−1(x) ⊂ P+(∂X) is path- connected and we can discuss the tangent space Tµbar−1(x) to bar−1(x), and then ν ∈ TµP+(∂X) belongs to Tµbar−1(x) if and only if ∫ θ dB(x,θ)(U) dν(θ) = 0 for any tangent vector U ∈ TxX. Now we take µx = P(x, θ)dθ. Then µx ∈ bar−1(x), seen as before. Let Θ : X → P+(∂X); x → µx be the canon- ical map, which we call Poisson kernel map. Proposition 2.1. Let x be a ﬁxed point and U a tangent vector at x. For any ν ∈ Tµxbar−1(x) G(dΘx(U), ν) = 0, where G is the Fisher information metric de- ﬁned on the space P+(∂X). From the proposition we have the ﬁbration of P+(∂X) over the Hadamard manifold X whose ﬁbre over x is bar−1(x). Further the Poisson kernel map Θ : X → P+(∂X) gives a cross section of the ﬁbration. Proof of Proposition 2.1. Since P(x, θ) = exp{−QB(x, θ)}, dΘx(U) = −Q dB(x,θ)(U) µx which we denote by νo. Then, from deﬁnition of the Fisher information metric we have Gµx(νo, ν) = ∫ ∂X dνo dµx dν dµx dµx = ∫ −QdB(x,θ)(U)P(x, θ) P(x, θ) × f(θ) P(x, θ) P(x, θ)dθ = −Q ∫ dB(x,θ)(U)dν(θ) which must be zero, since ν = f(θ)dθ belongs to Tµxbar−1(x). Remark. At µx ∈ P+(∂X) the tangent space TµxP+(∂X) is written in an orthogonal direct sum as TµxP+(∂X) = dΘx(TxX) ⊕ Tµxbar−1(x) (5) with respect to the Fisher information metric G. Remark. (5) is valid also with respect to the L2-inner product < f, f1 >= ∫ ∂X f(θ) f1(θ) dθ. Here, the diﬀerential of the Poisson kernel map (dΘ)x : TxX → TµxP+(∂X) is injec- tive. In fact, assume that (dΘ)x(U) = 0 in TµxP+(∂X) for U ∈ TxX. Then, this means dB(x,θ)(U)P(x, θ)dθ = 0. Since P(x, θ) > 0, this implies dB(x,θ)(U) = 0 for any θ. To get a conclusion that U = 0 from this we assume U is not zero and then may assume U is unit. Then, we have a geodesic γ(t) = expx tU and hence a point θo = [γ] so dB(x,θo)(U) = −1. This is a contradiction and thus the map dΘx is injective. Proof of Theorem 1.1. For a x ∈ X let y = ϕx, where ϕ = bar(Φ). From deﬁnition of barycenter for any µ ∈ bar−1(x) ∫ dB(x,θ)(U) dµ(θ) = 0, ∀U ∈ TxX. Since Φ♯µ ∈ bar−1(y) for µ ∈ bar−1(x), y is a barycenter of Φ♯µ if and only if ∫ dB(y,θ)(V ) d(Φ♯µ)(θ) = 0, ∀V ∈ TyX. Since θ = Φ−1Φθ, from this we have the fol- lowing ∫ dB(y,Φ−1Φθ)(V ) d(Φ♯µ)(θ) = ∫ (Φ♯dB(y,Φθ)(V )dµ)(θ) = ∫ dB(y,Φθ)(V )dµ(θ) = 0 which is valid for any µ ∈ bar−1(x) and in- dicates that dB(y,Φθ)(V )dµ(θ) is orthogonal to the tangent space Tµbar−1(x). In par- ticular, dB(y,Φθ)(V )dµx(θ) belongs from (5) to dΘx(TxX). So, we conclude that for any V ∈ TϕxX there exists U ∈ TxX such that dB(ϕx,Φθ)(V ) = dB(x,θ)(U). The vector V depends on a vector U so we may write V = dϕxU, where dϕx is the deﬀerential map : TxX → TϕxX of the map ϕ. Then, we may assume ⟨∇B(ϕx,Φθ), dϕx(U)⟩ϕx = ⟨∇B(x,θ)), U⟩x, which is reduced into, by using the formal adjoint dϕ∗ x : TϕxX → TxX ⟨dϕ∗ x∇B(ϕx,Φθ), U⟩x = ⟨∇B(x,θ)), U⟩x, for any U. As a consequence of this, the gradient vector ﬁelds must satisfy dϕ∗ x∇B(ϕx,Φθ) = ∇B(x,θ) (6) for any x in X and θ ∈ ∂X. Now take an arbitrary unit vector V ∈ TϕxX. So, we have V = ∇B(ϕx,Φθ) for some θ. Then from the above equation we have |dϕ∗ xV | = |dϕ∗ x∇B(ϕx,Φθ)| = |∇B(x,θ)| = 1 where we used |∇B(x,θ)| = 1. This holds for any unit vector so dϕ∗ x and hence dϕx : TxX → TϕxX is a linear isometry and hence ϕ : X → X is a Riemannian isometry of (X, g). §3 Quasi-isometries and quasi-geodesics Let X be an Hadamard manifold with the ideal boundary ∂X. Deﬁnition 3.1 Let φ : X → X be a (smooth) map. It is called rough-isometric , or quasi- isometric, when φ satisﬁes the following, that is, there exist λ > 1 and k > 0 such that for any points x, x′ in X 1 λ d(x, x′) − k < d(φ(x), φ(x′)) < λd(x, y) + k.(7) Note a rough-isometric map is not necessarily continuous. See [Bourd], More generally, a map f : X1 → X2 is called a (λ, k)-quasi-isometric map, if there exist con- stants λ > 1 and k > 0 such that λ−1d1(x, x′) − k < d2(Fx, Fx′) < λd1(x, x′) + k A quasi-isometric map is a generalization of an isometric map. Note we say a (λ, k)-quasi-isometric map sim- ply a quasi-isometric map by abbreviating, when we do not mention the constants λ, k, precisely We say that metric spaces X1 and X2 are quasi-isometric, X1 ∼= X2 (quasi-isometric), if they satisfy one of the following two con- ditions; (i) There exist quasi-isometric maps f : X1 → X2 and g : X2 → X1 and a positive num- ber ε such that g ◦ f and f ◦ g are in an ε- neighborhood of the identity maps idX1 , idX2 , respectively. (ii) There exist a quasi-isometry f : X1 → X2 and ε > 0 such that f(X1) is ε-dense in X2. Let (X, g) be a Riemannian manifold which is quasi-isometric to another Riemannian mani- fold (Xo, go). Then any Riemannian isometry of (Xo, go) induces a bijective quasi-isometric map of (X, g). A curve c : R → X is called a quasi-geodesic, if c is a quasi-isometric map, that is, λ−1|t′ − t| − k < d(c(t), c(t′)) < λ|t′ − t| + k, t, t′ ∈ R for some λ > 1, k > 0. We also call a curve c : [a, b] → X a quasi-geodesic segment, when it satisﬁes the above inequality in any t, t′ ∈ [a, b]. A geodesic is quasi-geodesic. A quasi-isometric map f : (Xo, go) → (X, g) maps a geodesic γ : R → Xo into a quasi- geodesic f ◦ γ : R → X. Moreover, it holds that let φ : X → X be a quasi-isometric and γ : R → X be a quasi-geodesic. Then the curve φ ◦ γ : R → X is quasi-geodesic. Let F : ∂Xgeod → ∂Xq−geod : [γ]geod → [γ]q−geod(8) be the inclusion map. If an Hadamard mani- fold (X, g) satisﬁes a certain negative curva- ture condition or a hyperbolicity condition, then the F is bijective. In fact, if the cur- vature satisﬁes K < −k2 < 0, so is F. See [K-1] for a strictly negative curvature case and [Bourd], [K-2] for the case of manifolds satisfying the hyperbolicity condition. Now we consider the following situation: an Hadamard manifold (X, g) is quasi-isometric with another Hadamard manifold (Xo, go) which is equipped with isometries. An isometry φ of (Xo, go) gives rise of a quasi- isometric bijective map of (X, g). So, φ in- duces a bijective map ˆφ : ∂Xq−geod → ∂Xq−geod, since, for any quasi-geodesic σ, φ◦σ is quasi- geodesic, and if σ ∼ σ1, then φ ◦ σ ∼ φ ◦ σ1. However, ∂Xq−geod is identiﬁed with ∂Xgeod = ∂X by the natural map F. So, φ induces a bijective map ˜φ = F ◦ ˆφ ◦ F−1 : ∂X → ∂X. References [A-N] S.Amari and H.Nagaoka, Methods of Information Geometry, AMS,2000. [B-G-S] W.Ballmann, M.Gromov and V.Schroeder, Manifolds of Nonpositive Curvature, Birkh¨auser, 1985, Boston. [Bar] F. Barbaresco, Chap.9 Information Ge- ometry of Covariance Matrix: Cartan-Siegel Homogeneous Bounded Domains, Mostow/Berger Fibration and Fr´echet Median, pdf ﬁle, 2013. [Bern-T-V] J. Berndt, F.Tricerri and L.Vanhecke, Generalized Heisenberg Groups and Damek- Ricci Harmonic Spaces, Lecture Notes, 1598, Springer, 1991. [B-C-G-1] G.Besson, G.Courtois and S.Gallot, Entropies et Rigidit´es des espaces localement sym´etriques de courbure strictement n´egative, Geom Func. Anal. 5(1995), 731-799. [B-C-G-2] G.Besson, G.Courtois and S.Gallot, A simple and constructive proof of Mostow’s rigidity and the minimal entropy theorems, Erg. Th. Dyn. Sys., 16(1996), 623-649. [Bourd] M. Bourdon, Structure conforme au bord et ﬂot g´eod´esique d’un CAT(-1)-espace, L’Enseignement Math., 41(1995), 63-102. [D-E] E. Douady, C. Earle, Conformally nat- ural extension of homeomorphisms of the cir- cle, Acta Math., 157(1986), 23-48. [G-J-T] Y. Guivarc’h, L. Ji and J.C. Tay- lor, Compactiﬁcations of Symmetric Spaces, Birkh¨auser, 1997. [I-Sat-1] M.Itoh and H.Satoh, Information geometry of Poisson Kernels on Damek-Ricci spaces, Tokyo J.Math., 33(2010), xx-xx. [I-Sat-2] M.Itoh and H.Satoh, Fisher Infor- mation Geometry, Poisson Kernel and Asymp- totically Harmonicity, Diﬀer. Geom. Appl., 29(2011), S107-S115. [I-Shi] M.Itoh and Y.Shishido, Fisher Infor- mation Metric and Poisson Kernels, Diﬀer. Geom. Appl., 26(2008), 347-356. [K-1] G.Knieper, Hyperbolic Dynamics and Riemannian Geometry, in Handbook of Dy- namical Systems, Vol.1A, edited by B.Hasselblatt and A.Katok, 453-545, Elsevier Science B.V., 2002. [K-2] G.Knieper, New results on noncompact harmonic manifold, Comment. Math. Helv., 87(2012),669-703. [S] T.Sakai, Riemannian Geometry, AMS, 2000.
' & $ % Foliations on Aﬃnely Flat Manifolds Information Geometry Robert Wolak Jagiellonian University, Krakow (Poland) joint work with Michel Nguiﬀo Boyom UMR CNRS 5149. 3M D´epartement de Math´ematiques et de Mod´elisation Universit´e Montpellier2, Montpellier GSI2013 - Geometric Science of Information MINES ParisTech Paris 28-08-2013 - 30-08-2013 GSI2013 Foliations on Aﬃnely Flat Manifolds. 1/17 ' & $ % Contents 1. Algebraic preliminaries Koszul-Vinberg algebra (KV-algebra) Algebroid of Koszul-Vinberg Twisted KV-cochain complex Chevalley-Eilenberg complex of the twisted module 2. Foliations on locally ﬂat manifolds 3. Fisher information metric 4. Dual pairs of connections 5. Foliations on locally ﬂat manifolds cont. Ehresmann connections Topological properties GSI2013 Foliations on Aﬃnely Flat Manifolds. 2/17 ' & $ % An algebra A is an R-vector space endowed with a bilinear map µ : A × A → A. This map µ is the multiplication map of A. For a, b ∈ A, ab will stand for µ(a, b). Given an algebra A, the Koszul-Vinberg anomaly (KV-anomaly) of A is the three-linear map KV : A3 → A deﬁned by KV (a, b, c) = (ab)c − a(bc) − (ba)c + b(ac) Deﬁnition An algebra A is called a Koszul-Vinberg algebra (KV-algebra) if its KV anomaly vanishes identically. GSI2013 Foliations on Aﬃnely Flat Manifolds. 3/17 ' & $ % Deﬁnition An algebroid of Koszul-Vinberg is a couple (V, a) where V is a vector bundle over the base manifold M and whose module of sections Γ(V ) has the structure of a Koszul-Vinberg algebra over R and a is a homomorphism of the vector bundle V into TM satisfying the following properties: (i) (fs).s′ = f(ss′ ) ∀s, s′ ∈ Γ(V ), ∀f ∈ C∞ (M, R) ; (ii) s.(fs′ ) = (a(s)f)s′ + f(ss′ ). Remark (i) If we equip Γ(V ) with the bracket [s, s′ ] = ss′ − s′ s then the Koszul-Vinberg algebroid (V, a) becomes a Lie algebroid. (ii) The condition [s, fs′ ] = (a(s)f)s′ + f[s, s′ ] ensures that a is a homomorphism of Lie algebras, i.e. a([s, s′ ]) = [a(s), a(s′ )] The vector space spanned by the commutators [a, b] = ab − ba of a KV-algebra A is a Lie algebra denoted by AL. GSI2013 Foliations on Aﬃnely Flat Manifolds. 4/17 ' & $ % Let A be a Koszul-Vinberg algebra; a A -module is a vector space W equipped with a right action and a left action of A related by the following equalities: for any a, b ∈ A, and any w ∈ W we have a(bw) − (ab)w = b(aw) − (ba)w and a(wb) − (aw)b = w(ab) − (wa)b. Let A = (X(M), ·) be an algebra with the multiplication given by X · Y = DXY , then A is a Koszul-Vinberg algebra and the space T(M) of tensors on M is a two sided A-module. T(M) is a bigraded by subspaces Tp,q (M) of tensors of type (p, q), GSI2013 Foliations on Aﬃnely Flat Manifolds. 5/17 ' & $ % Twisted KV-cochain complex Let A be a KV-algebra and let W be a two-sided KV-module over A. We equip the vector space W with the left module structure A × W −→ W deﬁned by a ∗ w = aw − wa, ∀a ∈ A, w ∈ W. (1) One has KV (a, b, w) = (a, b, w) − (b, a, w) = 0, where (a, b, w) = (ab) ∗ w − a ∗ (b ∗ w). Deﬁnition The left KV-module structure deﬁned by (1) is called the twisted KV-module structure derived from the two-sided KV-module W. The vector space W endowed with the twisted module structure is denoted by Wτ . GSI2013 Foliations on Aﬃnely Flat Manifolds. 6/17 ' & $ % The map (a, w) −→ a ∗ w deﬁnes on Wτ a left module structure over the Lie algebra AL. The complex CCE (AL, Wτ ) is called the Chevalley-Eilenberg complex of the twisted module. Let A be a KV-algebra and let W be a two-sided KV-module over A. We consider the graded vector space CKV (A, Wτ ) = q∈Z Cq KV (A, Wτ ) where Cq KV (A, Wτ ) = {0} if q < 0, C0 KV (A, Wτ ) = Wτ for q ≥ 1, Cq KV (A, Wτ ) = HomR(⊗q A, Wτ ). If no risk of confusion, C(A, Wτ ) will stand for CKV (A, Wτ ). GSI2013 Foliations on Aﬃnely Flat Manifolds. 7/17 ' & $ % Let us deﬁne the linear mapping d : Cq (A, Wτ ) −→ Cq+1 (A, Wτ ): ∀w ∈ Wτ , f ∈ Cq (A, Wτ ), a ∈ A and ζ = a1 ⊗ ... ⊗ aq+1 ∈ ⊗q+1 A, (dw)(a) = −aw + wa, (df)(ζ) = q+1 i=1 (−1)i {ai ∗ (f(∂iζ)) − f(ai.∂iζ)} (2) the action ai.∂iζ is deﬁned by the standard tensor product extension. Theorem (i) The pair (C(A, Wτ ), d) is a cochain complex whose qth cohomology space is denoted by Hq KV (A, Wτ ). (ii) The graded space CN (A, Wτ ) = W ⊕ q>0 Hom(∧q A, Wτ ) is a subcomplex of (C(A, Wτ ), d) whose cohomology coincides with the cohomology of the Chevalley-Eilenberg complex CCE (AL, Wτ ). GSI2013 Foliations on Aﬃnely Flat Manifolds. 8/17 ' & $ % (M, ∇) - a locally ﬂat manifold. A∇ = (X(M), ∇) - the KV-algebra associated to (M, ∇) Wτ = C∞ (M) - the left KV-module over A∇ under the covariant derivative C0(A∇, Wτ ) the vector subspace of C∞ (A∇, Wτ ) formed by cochains of order 0, thus C0(A∇, Wτ ) consists of C∞ (M)-multilinear mappings. Theorem The second cohomology space H2 0 (A∇, Wτ ) can be decomposed as it follows: H2 0 (A∇, Wτ ) = H2 dR(M) ⊕ H0 (A∇, Hom(S2 A∇, Wτ )) (3) where H2 dR(M) is the 2nd de Rham cohomology space of M. GSI2013 Foliations on Aﬃnely Flat Manifolds. 9/17 ' & $ % H(A∇, Wτ ) = q≥0 Hq (A∇, Wτ ) - a geometric invariant of (M, ∇), bq(∇) = dim Hq 0 (A∇, C∞ (M)) - qth Betti number of (M, ∇) bq(M) = dim Hq dR(M, R). - the classical qth Betti number of M. bq(M) ≤ bq(∇). M. Nguiﬀo Boyom, F. Ngakeu, P. M. Byande, R. Wolak, KV-cohomology and diﬀerential geometry of aﬃnely ﬂat manifolds. information geometry, African Diaspora Journal of Mathematics, Special Volume in Honor of Prof. Augustin Banyaga Vol. 14, 2, pp. 197–226 (2012) GSI2013 Foliations on Aﬃnely Flat Manifolds. 10/17 ' & $ % Deﬁnition Let (M, ∇) be a locally ﬂat manifold. (i) A totally geodesic foliation F of (M, ∇) is called aﬃne foliation. (ii) A totally geodesic foliation F of M is tranversally euclidean if its normal bundle TM/TF is endowed with a ∇-parallel (pseudo) euclidean scalar product. Q(M) = HomC∞(M) (S2 A, Wτ ), the vector space of tensorial quadratic forms on (sections of) TM. For σ ∈ H0 KV (A, Q(M)), let ¯σ be the quadratic form on TM/ ker σ deduced from σ and let sign(σ) be the Morse index of ¯σ. We deﬁne the following numerical invariants: Deﬁnition We set: ρ∇(M) = min{ρ∇(σ) = dim ker σ, σ ∈ H0 (A, Q(M))} and S∇(M) = min{S∇(σ) = dim ker σ + sign(σ), σ ∈ H0 (A, Q(M))}. GSI2013 Foliations on Aﬃnely Flat Manifolds. 11/17 ' & $ % (Ξ, Ω) a measurable set. Θ ⊂ Rn be a connected subset. Deﬁnition A connected open subset Θ ⊂ Rn is an n-dimensional statistical model for a measurable set (Ξ, Ω) if there exists a real valued positive function p : Θ × Ξ → R subject to the following requirements. (i) For every ﬁxed ξ ∈ Ξ the function θ → p(θ, ξ) is smooth. (ii) For every ﬁxed θ ∈ Θ the function ξ → p(θ, ξ) is a probability density in (Ξ, Ω) viz Ξ p(θ, ξ)dξ = 1. (iii) For every ﬁxed ξ ∈ Ξ there exists a couple (θ, θ′ ) such that p(θ, ξ) = p(θ′ , ξ) ∇ a torsion free linear connection in the manifold Θ and let one set ln(θ, ξ) = log(p(θ, ξ)). GSI2013 Foliations on Aﬃnely Flat Manifolds. 12/17 ' & $ % At each point θ ∈ Θ we deﬁne the the family {q(θ,ξ))} of bilinear forms. Let (X, Y ) be a couple of smooth vector ﬁelds in Θ. We put q(θ,ξ)(X, Y ) = −(∇dln)(X, Y )(θ, ξ). Since ∇ is torsion free q(θ,ξ)(X, Y ) is symmetric w.r.t. the couple (X, Y ). Deﬁnition The Fisher information g of the local model (Θ, p) is the mathematical expectation of the bilinear form q(θ,ξ), g(X, Y )(θ) = Ξ p(θ, ξ)q(θ,ξ)(X, Y )dξ. The Fisher information g does not depend on the choice of the symmetric connection ∇. The Fisher information g is a semi deﬁnite positive. When g is deﬁnite it is called Fisher metric of the model (Θ, p). GSI2013 Foliations on Aﬃnely Flat Manifolds. 13/17 ' & $ % The dualistic relation between linear connections. Deﬁnition In a Riemannian manifold (M, g) a couple (∇, ∇∗ ) of linear connections are dual of each other if the identity Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇∗ X Z) holds for all vector ﬁelds X, Y, Z on the manifold M. A dual pair (∇, ∇∗ ) in a Riemannian manifold (M, g). Assume that both (M, ∇) and (M, ∇∗ ) are locally ﬂat structures. They deﬁne the pair ([ρ∇], [ρ∇∗ ]) of conjugation class of canonical representations. Therefore we have the following two properties. Theorem The pair ([ρ∇], [ρ∇∗ ]) does not depend on the choice of the riemannian structure g. GSI2013 Foliations on Aﬃnely Flat Manifolds. 14/17 ' & $ % Theorem Every locally ﬂat manifold (M, ∇) whose 2-dimensional twisted cohomology H2 0 (A∇, Wτ ) diﬀers from the de Rham cohomology space H2 dR(M) is either a ﬂat (pseudo)-Riemannian manifold or is foliated by a pair (F, F∗) of g-orthogonal foliations for every Riemannian metric g. Moreover, these foliations are totally geodesic w.r.t. the g-dual pair (D, D∗ ) (respectively). GSI2013 Foliations on Aﬃnely Flat Manifolds. 15/17 ' & $ % TM = TF ⊕ TF∗ Deﬁne a torsion free linear connection by setting ˜D(X1,X2)(Y1, Y2) = (DX1 Y1 + [X2, Y1], D∗ X2 Y2 + [X1, Y2]) for all (X1, X2), (Y1, Y2) ∈ Γ(TF) × Γ(TF∗). ˜D is the unique torsion free linear connection which preserves (F, F∗). GSI2013 Foliations on Aﬃnely Flat Manifolds. 16/17 ' & $ % Assume that one of the connections (D, D∗ ) is geodesically complete. The foliations are Ehresmann connections for the other. – the universal coverings of leaves of the foliation F, respectively, F∗, are D-aﬃnely isomorphic, respectively, D∗-aﬃnely isomorphic. – the universal covering ˜M of the manifold M is the product K × L where K is the universal covering of leaves of the foliation F and L is the universal covering of leaves of the foliation F∗. Assume that the connection D is complete. Then the restriction of D to leaves of F is complete and each leaf of F is a geodesically complete locally ﬂat manifold, so its universal covering is diﬀeomorphic Rp where p is the dimension of leaves of F. The same is true if the connection D∗ is complete. GSI2013 Foliations on Aﬃnely Flat Manifolds. 17/17 ' & $ % Merci Thank you
Hessian structures on deformed exponential families MATSUZOE Hiroshi Nagoya Institute of Technology joint works with HENMI Masayuki (The Institute of Statistical Mathematics) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 1 PRELIMINARIES 1 Preliminaries 1.1 Geometry of statistical models Deﬁnition 1.1 S is a statistical model or a parametric model on Ω def ⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ s.t. S = { p(x; ξ) ∫ Ω p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn } . We regard S as a manifold with a local coordinate system {Ξ; ξ1 , . . . , ξn } gF = (gF ij) is the Fisher metric (Fisher information matrix) of S def ⇐⇒ gF ij(ξ) := ∫ Ω ∂ ∂ξi log p(x; ξ) ∂ ∂ξj log p(x; ξ)p(x; ξ)dx = ∫ Ω ∂ipξ ( ∂ ∂ξj log pξ ) dx = Eξ[∂ilξ∂jlξ] ∂ipξ def ⇐⇒ mixture representation, ∂ilξ = ( ∂ipξ pξ ) def ⇐⇒ exponential representation. (the score function) 2/29 1 PRELIMINARIES A statistical model Se is an exponential family def ⇐⇒ Se = { p(x; θ) p(x; θ) = exp[C(x) + n∑ i=1 θi Fi(x) − ψ(θ)] } , C, F1, · · · , Fn : random variables on Ω ψ : a function on the parameter space Θ The coordinate system [θi ] is called the natural parameters. Proposition 1.2 For an exponential family Se, (1) ∇(1) is ﬂat (2) [θi ] is an aﬃne coordinate, i.e., Γ (1) k ij ≡ 0 For simplicity, assume that C = 0. gF ij(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))] = E[−∂i∂j log p(x; θ)] = E[∂i∂jψ(θ)] = ∂i∂jψ(θ) :the Fisher metric CF ijk(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))(∂k log p(x; θ))] = ∂i∂j∂kψ(θ) :the cubic form The triplets (Se, ∇(e) , gF ) and (Se, ∇(m) , gF ) are Hessian manifolds. Remark: (S, ∇(α) , gF ) is an invariant statistical manifold. 3/29 1 PRELIMINARIES Normal distributions Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane). S = { p(x; µ, σ) p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ]} The Fisher metric is (gij) = 1 σ2 ( 1 0 0 2 ) ( S is a space of constant negative curvature − 1 2 ) . ∇(1) and ∇(−1) are ﬂat aﬃne connections. In addition, θ1 = µ σ2 , θ2 = − 1 2σ2 ψ(θ) = − (θ1 )2 4θ2 + 1 2 log ( − π θ2 ) =⇒ p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ] = exp [ xθ1 + x2 θ2 − ψ(θ) ] . {θ1 , θ2 }: natural parameters. (∇(1) -geodesic coordinate system) η1 = E[x] = µ, η2 = E [ x2 ] = σ2 + µ2 . {η1, η2}: moment parameters. (∇(−1) -geodesic coordinate system) 4/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex) The Fisher metric: (gij) = 1 η0 1 + η0 η1 1 · · · 1 1 1 + η0 η2 ... ... ... ... 1 · · · · · · 1 + η0 ηn , where η0 = 1 − n∑ j=1 ηj. ( Sn is a space of constant positive curvature 1 4 ) . 5/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex) {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log 1 + n∑ j=1 eθj {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem) 6/29 1 PRELIMINARIES Proposition 1.3 For Se, the following hold: (1) (Se, gF , ∇(e) , ∇(m) ) is a dually ﬂat space. (2) {θi } is a ∇(e) -aﬃne coordinate system on Se. (3) ψ(θ) is the potential of gF w.r.t. {θi }: gF ij(θ) = ∂i∂jψ(θ). (4) Set the expectations of Fi(x) by ηi =Eθ[Fi(x)] =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set ϕ(η) = Eθ[log pθ]. =⇒ ϕ(η) is the potential of gF w.r.t. {ηi}. Since (Se, gF , ∇(e) , ∇(m) ) is a dually ﬂat space, the Legendre transfor- mation holds. ∂ψ ∂θi = ηi, ∂ϕ ∂ηi = θi , ψ(p) + ϕ(p) − m∑ i=1 θi (p)ηi(p) = 0 gF ij = ∂2 ψ ∂θi∂θj , CF ijk = ∂3 ψ ∂θi∂θj∂θk 7/29 1 PRELIMINARIES Kullback-Leibler divergence (or relative entropy on S def ⇐⇒ DKL(p, r) = ∫ Ω p(x) log p(x) r(x) dx = Ep[log p(x) − log r(x)] ( = ψ(r) + ϕ(p) − n∑ i=1 θi (r)ηi(p) = D(r, p) ) For Se, DKL coincides with the canonical divergence D on a dually ﬂat space (Se, ∇(m) , gF ). Construction of a divergence from an estimating function s(x; ξ) = ∂/∂ξ1 log p(x; ξ) ... ∂/∂ξn log p(x; ξ) : the score function of p(x; ξ) (estimating function) by Integrating of the score function and by taking an expectation, dKL(p, r) := ∫ Ω p(x; ξ) log r(x; ξ′ )dx the cross entropy on S The KL-divergence is given by the diﬀerence of cross entropies. DKL(p, r) = dKL(p, p) − dKL(p, r) 8/29 1 PRELIMINARIES 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 9/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 2 Deformed exponential family (χ-exp. family) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm Deﬁnition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t) Usually, the χ-exponential is called ϕ-exponential in statistical physics. In this talk, ϕ is used as the dual potential on a dually ﬂat space. Example 2.2 In the case χ(t) = tq , we have ∫ x 1 1 χ(t) dt = ∫ x 1 1 tq dt = x1−q − 1 1 − q = logq x q-logarithm λ(t) = (1 + (1 − q)t) q 1−q 1 + ∫ x 0 λ(t) dt = (1 + (1 − q)x) 1 1−q q-exponential 10/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm Deﬁnition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t) F1(x), . . . , Fn(x) : functions on Ω θ = {θ1 , . . . , θn } : parameters S = { p(x, θ) p(x; θ) > 0, ∫ Ω p(x; θ)dx = 1 } : statistical model Deﬁnition 2.3 Sχ = {p(x; θ)} : χ-exponential family, deformed exponential family def ⇐⇒ Sχ := { p(x, θ)p(x; θ) = expχ [ n∑ i=1 θi Fi(x) − ψ(θ) ] , p(x, θ) ∈ S } 11/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Proposition 2.4 (discrete distributions) The set of discrete distributions is a χ-exponential family for any χ (Proof) Ω = {x0, x1, . . . , xn} Sn = { p(x; η) ηi > 0, n∑ i=0 ηi = 1, p(x; η) = n∑ i=0 ηiδi(x) } , η0 = 1 − n∑ i=1 ηi Set θi = logχ p(xi) − logχ p(x0) = logχ ηi − logχ η0 Then logχ p(x) = logχ ( n∑ i=0 ηiδi(x) ) = n∑ i=1 ( logχ ηi − logχ η0 ) δi(x) + logχ(η0) ψ(θ) = − logχ η0 12/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex) {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log 1 + n∑ j=1 eθj {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem) 13/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Example 2.5 (Student t-distribution (q-normal distribution)) Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane), q > 1. p(x; µ, σ) = 1 zq [ 1 − 1 − q 3 − q (x − µ)2 σ2 ] 1 1−q Set θ1 = 2 3 − q zq−1 q · µ σ2 , θ2 = − 1 3 − q zq−1 q · 1 σ2 . Then logq pq(x) = 1 1 − q (p1−q − 1) = 1 1 − q { 1 z1−q q ( 1 − 1 − q 3 − q (x − µ)2 σ2 ) − 1 } = 2µzq−1 q (3 − q)σ2 x − zq−1 q (3 − q)σ2 x2 − zq−1 q 3 − q · µ2 σ2 + zq−1 q − 1 1 − q = θ1 x + θ2 x2 − ψ(θ) ψ(θ) = − (θ1 )2 4θ2 − zq−1 q − 1 1 − q The set of Student t-distributions is a q-exponential family. 14/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 15/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 3 Geometry of deformed exponential family (1) Sχ : a deformed exponential family ψ(θ) : strictly convex (normalization for Sχ) sχ (x; θ) = ( (sχ )1 (x; θ), . . . , (sχ )n (x; θ) )T is the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). (1) Statistical structure for Sχ Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logχ p(x; θ) dx Dual aﬃne connections ∇M(e) , ∇M(m) : Γ M(e) ij,k (θ) = ∫ Ω ∂kp(x; θ)∂i∂j logχ p(x; θ)dx Γ M(m) ij,k (θ) = ∫ Ω ∂i∂jp(x; θ)∂k logχ p(x; θ)dx (Sχ, ∇M(e) , gM ) and (Sχ, ∇M(m) , gM ) are Hessian manifolds. 16/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) Proposition 3.1 For Sχ, the following hold: (1) (Sχ, gM , ∇M(e) , ∇M(m) ) is a dually ﬂat space. (2) {θi } is a ∇M(e) -aﬃne coordinate system on Sχ. (3) Ψ(θ) is the potential of gM with respect to {θi }, that is, gM ij (θ) = ∂i∂jΨ(θ). (4) Set the expectations of Fi(x) by ηi = Eθ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set Φ(η) = −Iχ(pθ). =⇒ Φ(η) is the potential of gM with respect to {ηi}. Iχ(pθ) = − ∫ Ω {Uχ(p(x; θ)) + (p(x; θ) − 1)Uχ(0)} dx, where Uχ(t) = ∫ t 1 logχ(s) ds, Uχ(0) = lim t→+0 Uχ(t) < ∞. the generalized entropy functional Ψ(θ) = ∫ Ω p(x; θ) logχ p(x; θ)dx + Iχ(pθ) + ψ(θ), the generalized Massieu potential 17/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) Construction of β-divergence (β = 1 − q) uq(x; θ): a weighted score function def ⇐⇒ uq(x; θ) = (u1 q(x; θ), . . . , un q (x; θ))T ui q(x; θ) = p(x; θ)1−q si (x; θ) − Eθ[p(x; θ)1−q si (x; θ)]. From the deﬁnition of q-logarithm function, uq(x; θ) is written by ui q(x; θ) = ∂ ∂θi { 1 1 − q p(x; θ)1−q − 1 2 − q ∫ Ω p(x; θ)2−q dx } = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Hence, this estimating function is the bias-corrected q-score function. 18/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) By integrating uq(x; θ), and taking the expectation, we deﬁne a cross entropy by d1−q(p, r) = − 1 1 − q ∫ Ω p(x; θ)r(x; θ)1−q + 1 2 − q ∫ Ω r(x; θ)2−q dx Then the β-divergence (β = 1 − q) is given by D1−q(p, r) = −d1−q(p, p) + d1−q(p, r) = 1 (1 − q)(2 − q) ∫ Ω p(x)2−q dx − 1 1 − q ∫ Ω p(x)r(x)1−q dx + 1 2 − q ∫ Ω r(x)2−q dx Remark 3.2 A β-divergence D1−q induces Hessian manifolds (Sq, ∇M(m) , gM ) and (Sq, ∇M(e) , gM ). 19/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 20/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) 4 Geometry of deformed exponential family (2) Deﬁnition 4.1 Pχ(x) : the escort distribution of p(x; θ), def ⇐⇒ Pχ(x; θ) = 1 Zχ(θ) χ(p(x; θ)), Zχ(θ) = ∫ Ω χ(p(x; θ))dx Eχ,θ[f(x)] : the χ-expectation of p(x) def ⇐⇒ the expectation of f(x) with respect to the escort distribution: Eχ,θ[f(x)] = ∫ f(x)Pχ(x; θ)dx = 1 Zχ(θ) ∫ f(x)χ(p(x; θ))dx Deﬁnition 4.2 Sχ = {p(x; θ)}: a deformed exponential family gχ ij(θ) = ∂i∂jψ(θ) : the χ-Fisher information metric Cχ ijk(θ) = ∂i∂j∂kψ(θ) : the χ-cubic form Set Γ χ(e) ij,k := Γ χ(0) ij,k − 1 2 Cχ ijk, Γ χ(m) ij,k := Γ χ(0) ij,k + 1 2 Cχ ijk, 21/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) Proposition 4.3 For Sχ, the following hold: (1) (Sχ, gχ , ∇χ(e) , ∇χ(m) ) is a dually ﬂat space. (2) {θi } is a ∇χ(e) -aﬃne coordinate system on Sχ. (3) ψ is the potential of gχ with respect to {θi }, that is, gχ ij(θ) = ∂i∂jψ(θ). (4) Set the χ-expectation of Fi(x) by ηi = Eχ,θ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gχ . (5) Set ϕ(η) = Eχ,θ[logχ p(x; θ)] =⇒ ϕ(η) is the potential of gχ with respect to {ηi}. Proof: Statements 1, 2 and 3 are obtained from the deﬁnition of χ-Fisher metric and χ-cubic form. Statements 4 and 5 follow the fact that Eχ,θ[logχ p(x; θ)] = Eχ,θ [ n∑ i=1 θi Fi(x) − ψ(θ) ] = n∑ i=1 θi ηi − ψ(θ) 22/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. The generalized relative entropy Dχ of Sχ coincides with the canonical divergence D(r, p) on (Sχ, ∇χ(e) , gχ ). In fact, Dχ (pθ, rθ′) = Eχ,p [( n∑ i=1 θi Fi(x) − ψ(θ) ) − ( n∑ i=1 (θ′ )i Fi(x) − ψ(θ′ ) )] = ψ(θ′ ) + ( n∑ i=1 θi ηi − ψ(θ) ) − n∑ i=1 (θ′ )i ηi = D(rθ′, pθ). Tsallis relative entropy (q-exponential case) Dq (p, r) = Eq,p [ logq p(x) − logq r(x) ] = 1 − ∫ p(x)q r(x)1−q dx (1 − q)Zq(p) = q Zq(p) D(1−2q) (p, r). The Tsallis relative entropy is conformal to α-divergence (α = 1−2q). 23/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. Construction of χ-relative entropy sχ (x; θ): the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). p The χ-score is unbiased w.r.t. χ-expectation, Eχ,θ[(sχ )i (x; θ)] = 0. =⇒ We regard that sχ (x; θ) is a generalization of estimating function. By integrating the χ-score function, we deﬁne the χ-cross entropy by dχ (p, r) = − ∫ Ω P (x) logχ r(x)dx. Then we obtain the generalized relative entropy by Dχ (p, r) = −dχ (p, p) + dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. 24/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5 Maximum q-likelihood estimators 5.1 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)] (p1(x) > 0, p2(y) > 0) x > 0, y > 0 and x1−q + y1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y def ⇐⇒ x ⊗q y := [ x1−q + y1−q − 1 ] 1 1−q = expq [ logq x + logq y ] expq x ⊗q expq y = expq(x + y), logq(x ⊗q y) = logq x + logq y. X and Y : q-independent with m-normalization (mixture normalization) def ⇐⇒ pq(x, y) = p1(x) ⊗ p2(y) Zp1,p2 where Zp1,p2 = ∫ ∫ XY p1(x) ⊗q p2(y)dxdy 25/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq. Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) ) In the case q → 1, Lq is the standard likelihood function on Ξ. expq(x1 + x2 + · · · + xN) = expq x1 ⊗q expq x2 ⊗q · · · ⊗q expq xN = expq x1 · expq ( x2 1 + (1 − q)x1 ) · · · expq ( xN 1 + (1 − q) ∑N−1 i=1 xi ) Each measurement inﬂuences the others. 26/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq. Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) ) In the case q → 1, Lq is the standard likelihood function on Ξ. ˆξ : the maximum q-likelihood estimator def ⇐⇒ ˆξ = arg max ξ∈Ξ Lq(ξ) ( = arg max ξ∈Ξ logq Lq(ξ) ) . the q-likelihood is maximum ⇐⇒ the canonical divergence (Tsallis relative entropy) is minimum. 27/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) ) estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)] Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) ) estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)] 28/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) ) estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)] Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) ) estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)] The notion of expectations, independence are determined from a geometric structure of the statistical model. 29/29
Isometric Reeb Flow and Related Results on Hermitian Symmetric Spaces of Rank 2 Young Jin Suh Department of Mathematics Kyungpook National University Taegu 702-701, Korea Ecole des Meines, Paris, France Geometric Science of Information, GSI’13 28-30th, August, 2013 E-mail: yjsuh@knu.ac.kr August 29, 2013 Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hypersurfaces in Hermitian Symmetric Spaces Let M be a hypersurfaces in a Hermitian Symmetric Space ¯M with Kaehler structure J. AX = − ¯ X N : Weingarten formula Here A: the shape operator of M in ¯M. ξ = −JN : the Reeb vector ﬁeld. JX = φX + η(X)N, X ξ = φAX for any vector ﬁeld X∈Γ(M). Then (φ, ξ, η, g): almost contact structure on a hypersurface M Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Deﬁne) A hypersurfcae M: Isometric Reeb Flow ⇐⇒ Lξg = 0 ⇐⇒ g(dφt X, dφt Y) = g(X, Y) for any X, Y∈Γ(M), where φt denotes a one parameter group, which is said to be an isometric Reeb ﬂow of M, deﬁned by dφt dt = ξ(φt (p)), φ0(p) = p, ˙φ0(p) = ξ(p). Note) Lξg = 0 ⇐⇒ jξi + iξj = 0, ξ: skew-symmetric ⇐⇒ g( X ξ, Y) + g( Y ξ, X) = 0 ⇐⇒ g((φA − Aφ)X, Y) = 0 for any X, Y∈Γ(M). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Deﬁne) A hypersurfcae M: Isometric Reeb Flow ⇐⇒ Lξg = 0 ⇐⇒ g(dφt X, dφt Y) = g(X, Y) for any X, Y∈Γ(M), where φt denotes a one parameter group, which is said to be an isometric Reeb ﬂow of M, deﬁned by dφt dt = ξ(φt (p)), φ0(p) = p, ˙φ0(p) = ξ(p). Note) Lξg = 0 ⇐⇒ jξi + iξj = 0, ξ: skew-symmetric ⇐⇒ g( X ξ, Y) + g( Y ξ, X) = 0 ⇐⇒ g((φA − Aφ)X, Y) = 0 for any X, Y∈Γ(M). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow In the future homeogeneous hypersurfaces in HSSP satisfying certain geometric conditions might be solved completely as follows: Problem 1 Classify all of homogeneous hypersurfaces in HSSP. In this talk let us consider hypersurfaces with isometric Reeb ﬂow in Hermitian Symmetric Spaces as follows: Problem 2 If M is a complete hypersurface in HSSP ¯M with isometric Reeb ﬂow, then M becomes homogeneous ? Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow In the future homeogeneous hypersurfaces in HSSP satisfying certain geometric conditions might be solved completely as follows: Problem 1 Classify all of homogeneous hypersurfaces in HSSP. In this talk let us consider hypersurfaces with isometric Reeb ﬂow in Hermitian Symmetric Spaces as follows: Problem 2 If M is a complete hypersurface in HSSP ¯M with isometric Reeb ﬂow, then M becomes homogeneous ? Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb ﬂow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb ﬂow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb ﬂow will be classiﬁed. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb ﬂow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb ﬂow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb ﬂow will be classiﬁed. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb ﬂow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb ﬂow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb ﬂow will be classiﬁed. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Complex Projective Space Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classiﬁed hypersurfaces in CHm with isometric Reeb ﬂow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb ﬂow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classiﬁed hypersurfaces in CHm with isometric Reeb ﬂow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb ﬂow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classiﬁed hypersurfaces in CHm with isometric Reeb ﬂow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb ﬂow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classiﬁed hypersurfaces in CHm with isometric Reeb ﬂow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb ﬂow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Complex Two-Plane Grassmannians Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classiﬁed real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classiﬁed real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classiﬁed real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Berndt and Suh (Monat. 2002) have given a classiﬁcation of hypersurfaces in G2(Cm+2), m≥3 wih isometric Reeb ﬂow as follows: Theorem 1.3 Let M be a real hypersurface in G2(Cm+2), m≥3, with isometric Reeb ﬂow. Then M is locally congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2). The two singular orbits are totally geodesically embedded CPm and G2(Cm+1), Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Berndt and Suh (Monat. 2002) have given a classiﬁcation of hypersurfaces in G2(Cm+2), m≥3 wih isometric Reeb ﬂow as follows: Theorem 1.3 Let M be a real hypersurface in G2(Cm+2), m≥3, with isometric Reeb ﬂow. Then M is locally congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2). The two singular orbits are totally geodesically embedded CPm and G2(Cm+1), Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow The Riemannian symmetric space SU(2, m)/S(U(2)×U(m)) is a connected, simply connected, irreducible Riemannian symmetric space of noncompact type with rank 2. Let G = SU(2, m) and K = S(U(2)×U(m)), and denote by G and K the corresponding Lie algebra. Let B denotes the Cartan Killing form of G and by P the orthogonal complement of K in G with respect to B. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow The decomposition G = K⊕P is a Cartan decomposition of G = su(2, m). The Cartan involution θ∈Aut(g) on su(2, m) is given by θ(A) = I2,mAI2,m for A∈su(2, m), where I2,m = −I2 02,m 0m,2 Im Then < X, Y >= −B(X, θY): a positive deﬁnite Ad(K)-invariant on G. Its restriction to P: a Riemannian metrc g, where g: the Killing metric on SU(2, m)/S(U(2)×U(m)). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative deﬁnite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive deﬁnite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative deﬁnite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive deﬁnite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative deﬁnite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive deﬁnite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Let C = {X∈TM|JX∈TM} : the maximal complex subbundle and Q = {X∈TM|JX⊂TM} the maximal quaternionic subbundle for M in SU(2, m)/S(U(2)×U(m)). When C and Q of TM are both invariant by the shape operator A of M , we write h(C, C⊥ ) = 0 and h(Q, Q⊥ ) = 0, where h denotes the second fundamental form deﬁned by g(h(X, Y), N) = g(AX, Y) for any X, Y on M. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at inﬁnity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at inﬁnity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at inﬁnity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at inﬁnity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Horosphere Let Ht = cos te1 + sin te2∈A: a unit normal to a horosphere Mt , where A denotes a maximal abelian subspace of P for the E. Cartan’s decomposition G = K⊕P. Here a horosphere is given by Mt = SHt ·o, where SHt denotes the Lie subgroup of G corresponding to the Lie subalgebra SH = S RH, S = A⊕N and N = ⊕λ∈Σ+ Gλ for the Iwasawa decomposition G = K⊕A⊕N with corresponding G = KAN. The shape operator of a horosphere Mt is given by AH = ad(H). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classiﬁcation with isometric Reeb ﬂow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb ﬂow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at inﬁnity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classiﬁcation with isometric Reeb ﬂow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb ﬂow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at inﬁnity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classiﬁcation with isometric Reeb ﬂow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb ﬂow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at inﬁnity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (B) and a Horosphere Deﬁnition For a real hypersurface M in SU2,m/S(U2Um) is said to be a contact ⇐⇒ ∃ a non-zero constant function ρ deﬁned on M such that φA + Aφ = kφ, k = 2ρ, The condition is equivalent to g((φA + Aφ)X, Y) = 2dη(X, Y), where dη is deﬁned by dη(X, Y) = ( X η)Y − ( Y η)X for any X, Y on M in SU2,m/S(U2Um). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classiﬁcation in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at inﬁnity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classiﬁcation in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at inﬁnity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classiﬁcation in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at inﬁnity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Reeb ﬂow on a real hypersurface in G2(Cm+2) is isometric if and only if M is an open part of a tube around a totally geodesic G2(Cm+1) ⊂ G2(Cm+2). In view of the previous results a natural expectation would lead to the totally geodesic Qm−1 ⊂ Qm. Surprisingly, this is not the case. In fact, we will prove Theorem 3.1 Let M be a real hypersurface of the complex quadric Qm, m ≥ 3. The Reeb ﬂow on M is isometric if and only if m is even, say m = 2k, and M is an open part of a tube around a totally geodesic CPk ⊂ Q2k . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The homogeneous quadratic equation Qm = {z∈Cm+2 |z2 1 + . . . + z2 m+2 = 0}⊂CPm+1 deﬁnes a complex hypersurface in complex projective space CPm+1 = SUm+2/S(Um+1U1). For a unit normal vector N of Qm at a point [z] ∈ Qm we denote by AN the shape operator of Qm in CPm+1 with respect to N. The shape operator is an involution on T[z]Qm and T[z]Qm = V(AN) ⊕ JV(AN), where V(AN) is the (+1)-eigenspace and JV(AN) is the (−1)-eigenspace of AN. Geometrically this means that AN deﬁnes a real structure on the complex vector space T[z]Qm, or equivalently, is a complex conjugation on T[z]Qm. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Riemannian curvature tensor ¯R of Qm can be expressed as follows: ¯R(X, Y)Z = g(Y, Z)X − g(X, Z)Y + g(JY, Z)JX −g(JX, Z)JY − 2g(JX, Y)JZ +g(AY, Z)AX − g(AX, Z)AY +g(JAY, Z)JAX − g(JAX, Z)JAY. A nonzero tangent vector W ∈ T[z]Qm is called singular if it is tangent to more than one maximal ﬂat in Qm. 1. If a conjugation A ∈ A[z] such that W ∈ V(A), then W is singular, that is A-principal. 2. If a conjugation A ∈ A[z] and orthonormal vectors X, Y ∈ V(A) such that W/||W|| = (X + JY)/ √ 2, then W is said to be A-isotropic. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Riemannian curvature tensor ¯R of Qm can be expressed as follows: ¯R(X, Y)Z = g(Y, Z)X − g(X, Z)Y + g(JY, Z)JX −g(JX, Z)JY − 2g(JX, Y)JZ +g(AY, Z)AX − g(AX, Z)AY +g(JAY, Z)JAX − g(JAX, Z)JAY. A nonzero tangent vector W ∈ T[z]Qm is called singular if it is tangent to more than one maximal ﬂat in Qm. 1. If a conjugation A ∈ A[z] such that W ∈ V(A), then W is singular, that is A-principal. 2. If a conjugation A ∈ A[z] and orthonormal vectors X, Y ∈ V(A) such that W/||W|| = (X + JY)/ √ 2, then W is said to be A-isotropic. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector ﬁeld of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also deﬁne δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector ﬁeld of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also deﬁne δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector ﬁeld of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also deﬁne δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Geometric Descriptions of the Tube We assume that m is even, say m = 2k. The map CPk → Q2k ⊂ CP2k+1 , [z1, . . . , zk+1] → [z1, . . . , zk+1, iz1, . . . , izk+1] gives an embedding of CPk into Q2k as a totally geodesic complex submanifold. Deﬁne a complex structure j on C2k+2 by j(z1, . . . , zk+1, zk+2, . . . , z2k+2) = (−zk+2, . . . , −z2k+2, z1, . . . , zk+1). Then j2 = −I and note that ij = ji. We can then identify C2k+2 with Ck+1 ⊕ jCk+1 and get T[z]CPk = {X + jiX | X ∈ Ck+1 [z]} = {X + ijX|X∈V(A¯z)}. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The normal space becomes ν[z]CPk = A¯z(T[z]CPk ) = {X − ijX|X∈V(A¯z)}. The normal N of T[z]CPk : A-isotropic, the four vectors {N, JN, AN, JAN}: pairwise orthonormal. The normal Jacobi operator ¯RN is given by ¯RNZ = ¯R(Z, N)N = Z − g(Z, N)N + 3g(Z, JN)JN −g(Z, AN)AN − g(Z, JAN)JAN. Both T[z]CPk and ν[z]CPk are invariant under RN, and RN has three eigenvalues 0, 1, 4 according to RN⊕[AN], T[z]Q2k ([N]⊕[AN]) and RJN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Principal Curvatures and Spaces of the Tube To calculate the principal curvatures of the tube of radius 0 < r < π/2 around CPk : the standard Jacobi ﬁeld method as described in Section 8.2 of Berndt, Console and Olmos. Let γ: the geodesic in Q2k with γ(0) = [z] and ˙γ(0) = N. γ⊥ : the parallel subbundle of TQ2k along γ deﬁned by γ⊥ γ(t) = T[γ(t)]Q2k R˙γ(t). Let us deﬁne the γ⊥-valued tensor ﬁeld R⊥ γ along γ by R⊥ γ(t)X = R(X, ˙γ(t))˙γ(t). Now consider the End(γ⊥)-valued differential equation Y + R⊥ γ ◦ Y = 0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem If we decompose γ⊥ [z] further into γ⊥ [z] = (T[z]CPk [AN]) ⊕ [AN] ⊕ (ν[z]CPk [N]) ⊕ RJN, we get by explicit computation that S(r) = 0 0 0 0 0 tan(r) 0 0 0 0 − cot(r) 0 0 0 0 −2 cot(2r) with respect to that decomposition. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem If we decompose γ⊥ [z] further into γ⊥ [z] = (T[z]CPk [AN]) ⊕ [AN] ⊕ (ν[z]CPk [N]) ⊕ RJN, we get by explicit computation that S(r) = 0 0 0 0 0 tan(r) 0 0 0 0 − cot(r) 0 0 0 0 −2 cot(2r) with respect to that decomposition. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb ﬂow on M is an isometric ﬂow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Now we investigate real hypersurfaces in Qm for which the Reeb ﬂow is isometric. From this, we get a complete expression for the covariant derivative as follows: ( X S)Y = {dα(X)η(Y) + g((αSφ − S2 φ)X, Y) +δη(Y)ρ(X) + δg(BX, φY) + η(BX)ρ(Y)}ξ +{η(Y)ρ(X) + g(BX, φY)}Bξ + g(BX, Y)φBξ −ρ(Y)BX − η(Y)φX − η(BY)φBX. Lemma 3.1 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then the normal vector ﬁeld N is A-isotropic everywhere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb ﬂow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem For each point [z] ∈ M we denote by γ[z] the geodesic in Q2k with γ[z](0) = [z] and ˙γ[z](0) = N[z] and by F the smooth map F : M −→ Qm , [z] −→ γ[z](r). F is the displacement of M at distance r in the direction of N. Thee differential d[z]F of F at [z] can be computed by d[z]F(X) = ZX (r), where ZX is the Jacobi vector ﬁeld along γ[z] with ZX (0) = X and ZX (0) = −SX. The A-isotropic N gives that RN = R(Z, N)N has the three constant eigenvalues 0, 1, 4 with corresponding eigenbundles νM ⊕ (C Q) = νM ⊕ Tν, Q = Tλ ⊕ Tµ and F = Tα. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Rigidity of totally geodesic submanifolds : =⇒ M is an open part of a tube of radius r around a k-dimensional connected, complete, totally geodesic complex submanifold P of Q2k . Klein classiﬁed the totally geodesic submanifolds P in Q2k as follows: The focal submanifold P : a totally geodesic Qk ⊂ Q2k or a totally geodesic CPk ⊂ Q2k . ⇐⇒ M is an open part of a tube around CPk . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric ﬂows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric ﬂows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric ﬂows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric ﬂows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb ﬂow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb ﬂow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb ﬂow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb ﬂow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb ﬂow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem ENDING THANKS FOR YOUR ATTENTION! Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces
ORAL SESSION 8 Computational Aspects of Information Geometry in Statistics (chaired by Frank Critchley)
A General Metric for Riemannian Hamiltonian Monte Carlo Michael Betancourt University College London August 30th, 2013 I’m going to talk about probability and geometry, but not information geometry! Instead our interest is Bayesian inference ⇡(✓|D) / ⇡(D|✓) ⇡(✓) Markov Chain Monte Carlo admits the practical analysis and manipulation of posteriors even in high dimensions Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Markov transitions can be though of as an “average” isomorphism that preserves a target distribution ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i MCMC performance is limited by complex posteriors, which are common in large dimensions Random walk Metropolis sampling explores only slowly Random walk Metropolis sampling explores only slowly Gibbs sampling doesn’t fare much better Gibbs sampling doesn’t fare much better RWM and Gibbs explore incoherently in large dimensions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i How do we generate coherent transitions? ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 How do we generate coherent transitions? ⇡T = ⇡ ( , B( ), ) t : M ! M, 8t 2 (M, B(M) , ⇡) T : M ! T⇤ M ! T⇤ M ! M Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map Marginalization We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) T We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) V Quadratic kinetic energies with constant metrics emulate dynamics on a Euclidean manifold ⇡(p|q) = N(0, M) T = 1 2 pipj M 1 ij The coherent flow the Markov chain along the target distribution, avoiding random walk behavior The coherent flow the Markov chain along the target distribution, avoiding random walk behavior Unfortunately, EHMC is sensitive to large variations in curvature As well as variations in the target density V = T = d 2 These weaknesses are particularly evident in hierarchical models x1 x2 xn 1 xn. . . v ⇡(x, v) = nY i=1 ⇡(xi|v) ⇡(v) These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models DV ⇡ 250 These weaknesses are particularly evident in hierarchical models DV ⇡ 250 Quadratic kinetic energies with dynamic metrics emulate dynamics on a Riemannian manifold ⇡(p|q) = N(0, ⌃(q)) T = 1 2 pipj ⌃ 1 (q) ij + 1 2 log |⌃(q)| Optimal numerical integration suggests using the Hessian, but the Hessian isn’t positive-definite ⌃(q)ij = @i@jV (q) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) @i@jV (q|D) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) ED [@i@jV (q|D)] We can regularize without appealing to expectations [exp(↵Hlj) exp( ↵Hlj)] 1 ·Hkl· ⌃ij(q) = [exp(↵Hik) + exp( ↵Hik)] The “SoftAbs” metric serves as a differentiable absolute value of the Hessian λ’ λ 1 / α The SoftAbs metric locally standardizes the target distribution The SoftAbs metric locally standardizes the target distribution And the log determinant admits full exploration of the funnel And the log determinant admits full exploration of the funnel The SoftAbs metric admits a general- purpose, practical implementation of RHMC -15 -10 -5 0 5 10 15 x1 -10 -5 0 5 10v
Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Computational Information Geometry (CIG) in Statistics: foundations Karim Anaya-Izquierdo, FC, Paul Marriott and Paul Vos (Bath, OU, Waterloo and East Carolina) GSI'13: Paris, August 2013 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC mixture estimation (see PM's talk) Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Multinomials on k + 1 categories $ int(∆k ), the r.i. of ∆k Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Multinomials on k + 1 categories $ int(∆k ), the r.i. of ∆k (1) allows d/ns. with different support sets. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda (One Iteration of) Statistical Science: Working Problem Formulation: WPF = (Q, p.c., model, data, inference) ) A Q takes the form: `what is θQ θQ[F]?', so that θQ has same (= population) meaning in all models perturbations of problem formulation are pertinent ) sensitivity analyses are sensible perturb (weight) data via CSF: see CALB, (2001), JRSS, B Focus: (perturb) the working model, M say, ... a set of (often, explicitly parameterised) d/ns. on Ω Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda AGENDA: Represent working model M by a subset of ∆k (cf. coarse-graining). Use IG of ∆k to: numerically compute statistically important features of M ... including: properties of likelihood (can be nontrivial here) adequacy of rst order asymptotic methods ... notably, via higher order asymptotic expansions curvature based dimension reduction mixture model structure/inference (see PM's talk). Focus: ideas, not proofs (given in arXiv paper [2]). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions KEY QUESTION: CIG approach is inherently discrete and nite. Sometimes: this is without loss. In general: 9 appropriate theory for suitably ne partitions: cost: some loss of generality (obvious ~ relation induced). bene t: excellent foundation for a computational theory. ... while: FMP ) models can (arguably, should) be seen as fundamentally categorical. Poses the key question: What is the effect on the inferential objects of interest of a particular selection of such categories? Addressed in Theorems 1 & 2 but, rst, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: leukaemia patient data 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for (later) expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t directly illustrates 2 points: 1 whereas model is continuous, data are discrete ) ZERO loss in treating them as sparse categorical Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: leukaemia patient data 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for (later) expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t directly illustrates 2 points: 1 whereas model is continuous, data are discrete ) ZERO loss in treating them as sparse categorical 2 a further level of coarseness using bin size = 4 days produces effectively NO inferential loss ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: log-likelihood for interest parameter µ Panel (a): bin size: circles = 1 day; solid line = 4 days 600 800 1000 1200 1400 1600 1800 -3.5-2.5-1.5-0.5 (a) Log-likelihoods mu log-likelihood -0.002 0.000 0.001 0.002 0.003 0.004 -0.50.00.51.01.5 (b)Fullexponentialfamily theta1 theta2 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions Information loss under discretisation for continuous r.v.'s, need to: truncate & discretise Ω into a nite number of bins. Theorems 1 & 2 show: the associated info. loss can be made arbitrarily small. Key: control bin-conditional moments of r.v.'s of interest, uniformly in the parameters of the model. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 1: likelihood ratios Let: f (x; θ), θ 2 Θ, be a family of density functions with common support X Rd , each continuously diff'ble. on r.i.(X ) 6= ? X be compact fk∂f (x; θ)/∂xk : x 2 X g be uniformly bounded in θ 2 Θ. Then, 8 > 0 and 8 sample sizes N > 0, 9 a nite measurable partition fBi g k( ,N) i=0 of X such that: for all (x1, ..., xN ) 2 X N, and for all (θ0, θ) 2 Θ2, log Likc(θ) Likc(θ0) log Likd (θ) Likd (θ0) where Likc and Likd are the likelihood functions for the continuous and discretised d/ns. respectively. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions Theorem 2 considers discretisation of an EF ) the tools of classical IG can be applied In general: a discretised full EF 6= a full EF, and 9 information loss However, Theorem 2 shows: this loss can be made inferentially unimportant all IG results on the 2 families can be made arbitrarily close Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 2: Amari structure Let: f (x; θ) = ν(x) expfθT s(x) ψ(θ)g, x 2 X , θ 2 Θ, be an EF satisfying the regularity conditions of Amari (1990), p.16 s(x) be uniformly continuous s(X ) be compact. Then, 8 > 0, 9 a nite measurable partition fBi g k( ) i=0 of X such that, for all choices of bin labels si 2 s(Bi ): all terms of Amari's IG for f (x; θ) ... can be approximated to the relevant order of ... by the corring. terms for the discretised family: n (πi (θ), si ) : πi (θ) := R Bi f (x; θ)dx, si 2 s(Bi ) o . In particular, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 2: (continued) (a) for all θ 2 Θ, and any norm, kµc(θ) µd (θ)k = O( ) where µc(θ) = R X xf (x; θ)dx and µd (θ) = ∑i si πi (θ). (b) the expected Fisher information matrices for θ of f (x; θ) and of fπi (θ)g, denoted Ic(θ) and Id (θ) resp., satisfy: kIc(θ) Id (θ)k∞ = O( 2 ) (c) the skewness tensors [Amari, (1990), p. 105] Tc(θ) and Td (θ), for f (x; θ) and fπi (θ)g resp., satisfy: kTc(θ) Td (θ)k∞ = O( 3 ). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXTENSIONS: Above: compactness condition keeps the geometry nite. Later paper: case where compactness not needed. There: `space of all d/ns.' = (closure of) ∞-D simplex extending classical IG ) convergence issues use appropriate Hilbert space structures ... esp., to bound loss of inferential information when move to nite () computable) simplex. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull BACKGROUND: IG $ 1 af ne geometries, non-linearly related via duality & Fisher information. In a full EF context: +1 geometry $ natural parameterisation 1 geometry $ mixture parameterisation. Closures of EF's have been studied by, e.g.: B-N ('78), Brown ('86), Lauritzen ('96) & Rinaldo (2006) and, in ∞-D case: Csiszar & Matus (2005). Here, rather than pointwise limits, focus = limits of families of d/ns. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull IG theory follows Amari (1990) via Murray & Rice's (1993) af ne space construction, extended by Marriott (2002). Recall: r.v.'s take values in a nite set of categories (bins) B = fBi gi2I ) d/n. = set of corring. probabilities fπi gi2I NB: identify bin Bi with its label i 2 I = f0, ..., kg 1 af ne space structure over d/ns. on B: (Amix , Vmix , +) where: Amix = fai gi2I : ∑i2I ai = 1 , Vmix = fvi gi2I : ∑i2I vi = 0 and `+' is the usual addition of sequences. ∆k is a 1-cvx. subset of (Amix , Vmix , +). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull +1 af ne space structure over d/ns. on B: {sets of d/ns. with same support} form a simplicial complex support $ ? 6= F I ... where `F' connotes `face' each F has a separate +1 structure: (Aexp,F , Vexp,F , F ) de ning F on AF := ffai gi2F : ai > 0g by fai g F fbi g , 9λ > 0 s.t. 8i 2 F, ai = λbi, we put: Aexp,F := AF / F and Vexp,F := ffvi gi2F : vi 2 Rg, de ning F by hfai gi F fvi g := hfai exp(vi )gi. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Extended TRInomial IG: (obvious extns. ) general case in [2]) ∆2: bin probs. π = (π0, π1, π2), πi 0 panels (a) to (d) show 1 geodesics in 1 parameters panels (a), (c) $ ∆2 in 1 (mixture) parameters panels (b), (d) $ +1 (natural) parameters (each πi > 0) cT = (1, 2, 3), X Trinomial(1; π) (a), (b): blue lines = level sets of E(cT X) = -1 geodesics (d), (c): black lines = 1-D full EFs* = +1 geodesics *with probs. of form: πi exp(θci )/ ∑2 j=0 πj exp(θcj ) these -1-parallel blue lines & +1-parallel black lines ... are everywhere orthogonal w.r.t. the Fisher info. metric Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull (a)-1-g eodesicsin-1-simple x 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 (b)-1-g eodesicsin+1-simple x -10 -5 0 5 10 -10-50510 (c)+1-g eodesicsin-1-simple x 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 (d)+1-g eodesicsin+1-simple -10 -5 0 5 10 -10-50510 (b): -1 geodesics nonlinear in +1 parameters ... & v.v.: see (c). (a): -1 geodesics extend naturally to the bdy. in -1-parameters (c): limits of +1 geodesics lie in bdy. of ∆2; de ne +1 closure s.t. these continuous limits are de ned `at ∞' in +1-parameters: – shown schematically as dotted triangle in (b); – key to understanding the simplicial nature of +1-geometry. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations (a): -1 in -1 (b): -1 in +1 (c): +1 in -1 (d): +1 in +1 Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: natural spaces for CIG = high-D simplicial structures ) primary question: behaviour of log-likelihood l( ) in them? 2 important issues ... typically, sample size N << k = dimension of the simplex ∆k contains sub-simplices of varying support ... ) standard intuition about shape of l( ) will not hold, ... esp.: standard χ2-approxn. to d/n. of the deviance fails. discretising: data fxi gN i=1 f (x; θ) ! counts fni gi2I Multinomial(N; π(θ)), (I = f0, ..., kg) . I = P [ Z where P := fi : ni > 0g & Z := fi : ni = 0g. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: (continued) observed face := face spanned by vertices (bins) in P unobserved face := face spanned by vertices (bins) in Z The log-likelihood l( ) is: strictly concave on the observed face strictly decreasing in the normal direction from it to the unobserved face and, otherwise, constant. For more re geometry of the observed face: see PM's talk. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: (continued) Theorem 3: Let: Vmix := fvi gi2I : ∑i2I vi = 0 , V0 := fvi gi2I 2 Vmix : vi = 0, i 2 P for any iZ 2 Z: ViZ := fvi gi2I 2 Vmix : vi = 0, i 2 Z n fiZ g . Then: V0 is a linear subspace of Vmix l( ) is constant on -1-af ne subspaces of the form π + V0 Vmix has direct sum V0 ViZ . Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: denote: all bin probs. except π0 by π(0) := (π1, ..., πk )T . viewed as the covariance matrix of the score, N 1 (Fisher info. matrix for +1-params.) is: I(π) := diag(π(0)) π(0)πT (0) ... whose explicit spectral decomposition is, in all cases, an example of interlacing eigenvalue results. Accordingly, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: continued ... the Fisher spectrum mimics key features of the bin probabilities. Of central importance: 1 eigenvalues are exponentially small , the same is true of the fπi gk i=0 the Fisher info. matrix is singular , one of the fπi gk i=0 vanishes. [Again, typically, 2 eigenvalues are close whenever 2 corresponding πi are.] Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: continued In particular, if fπi gk i=1 comprise g > 1 distinct values λ1 > ... > λg > 0, λi occurring mi times, ... then, the spectrum of I(π) comprises g simple eigenvalues feλi gg i=1, the roots of an explicit polynomial, satisfying λ1 > eλ1 > ... > λg > eλg 0 together, if g < k, with fλi : mi > 1g, each such λi having multiplicity mi 1, while eλg > 0 , π0 > 0. [Further, each eλi (i < g) is typically (much) closer to λi than to λi+1, making it a near replicate of λi.] Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull CLOSURE: Given a full EF embedded in a high-D sparse simplex, an important question is to identify its limit points – how it is connected to the boundary. Panel (c) in the trinomial example above illustrates that: 1-D EF limits lie at vertices which vertex is determined by the rank order of the components of the tangent vector of the +1-geodesic. In general (see [2]): nding the limit points $ nding redundant linear constraints this can be converted, via duality, into: nding extremal points in a nite-D af ne space. cf.: Geyer (2009): directions of recession. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Total positivity and the convex hull: The -1-convex hull of an EF is of great interest, mixture models being widely used in statistical science. Explored further in PM's talk, we simply state the main result here. It follows easily from the total positivity of EFs that, generically, convex hulls are of maximal dimension k. Here, `generically' means that the +1 tangent vector which de nes the EF has components which are all distinct. Theorem 4: The -1-convex hull of an open subset of a generic 1-D EF is of full dimension. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion EXAMPLE 1 (continued): leukaemia patient data Return now to Example 1 to illustrate above results. In particular, to show: an application of dimension reduction based on IG. Recall, we have ... 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion DIMENSION REDUCTION: 600 800 1000 1200 1400 1600 1800 -3.5-2.5-1.5-0.5 (a)Log-likelihoods mu log-likelihood -0.002 0.000 0.001 0.002 0.003 0.004 -0.50.00.51.01.5 (b)Fullexponentialfamily theta1 theta2 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion EXAMPLE 1 (continued): DIMENSION REDUCTION (a): plot of l(µ): shows appreciable skewness ... suggests standard rst order asymptotics can be improved by the higher order asymptotic methods of classical IG. (b): in +1-params: solid curve = Y's 1-D curved EF embedded in 2-D full EF dashed lines = contours of l( ) for full EF clear, even visually: Y has low +1 curvature on this inferential scale ) its curved EF behaves inferentially like a 1-D full EF ) can use Marriott & Vos (2004) DR techniques Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion (c)DistributionofMLE mu Density 500 1000 1500 2000 2500 3000 0.00000.00050.00100.00150.0020 Panel (c) shows how well a saddlepoint-based approxn. does at approximating the d/n. of bµMLE. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion CONCLUSION: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC mixture estimation (see PM's talk) Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations
CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Computational information geometry in statistics: mixture modelling Paul Marriott University of Waterloo GSI2013 - Geometric Science of Information CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Overview • Joint work with Karim Anaya-Izquierdo, Frank Critchley and Paul Vos • This paper applies the tools of computation information geometry, see Frank’s talk • High dimensional extended multinomial families as proxies for the ‘space of all distributions’ • Look in the inferentially demanding area of statistical mixture modelling. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Overview • We look at, and show the relationship between, different geometrically approaches to mixture modelling • Lindsay’s data dependent, ﬁnite dimensional afﬁne space • Our, extended multinomial embedding space • Show a new algorithm which uses the full Information Geometry of the problem to its advantage • Exploit the idea of polytope approximation in the ‘correct’ geometry CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixture models • Mixture models form an extremely ﬂexible class of models • Used when some data not observed, hidden dependence structures or when there is unexplained heterogeneity • They are of the form ρifX (x; θi) or fX (x; θ)dQ(θ) • Consider ρ0N(µ0, σ2 0) + ρ1N(µ1, σ2 1) + ρ2N(µ2, σ2 2) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Inference for mixture models can be problematic • They can be ‘too ﬂexible’ hence can overﬁt • The likelihood function can have multiple modes, singularities and be unbounded • The underlying structure is not a manifold so have to be careful using calculus • Inference questions where Z ∼ f(z; θ)dQ(θ) 1 what can we learn about E(Z)? 2 what can we learn about Q? 3 Can we predict the next value of Z? CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixture of binomial distributions • First example comes from Kupper and Haseman (1978) • Concerns frequency of death of implanted foetuses in laboratory animals • It could be expected that there is underlying clustering - hence mixture modelling is appropriate • Paper states: ‘simple one-parameter binomial and Poisson models generally provide poor ﬁts to this type of binary data’ • It is of interest to look in a ‘neighbourhood’ of these models. • The extended multinomial space is a natural place to deﬁne such a ‘neighbourhood’ • Our new computational algorithm is used for inference. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Tripod model • Second example is the tripod, discussed in Zwiernik and Smith (2011) • Graphical model given by 2 1 H 3 • Binary variables Xi, i = 1, 2, 3, on each of the terminal nodes, these being assumed independent given the binary variable at the internal node H • H is unobserved • Get very complex likelihood structure - problematic for inference CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Extended Multinomial • Look at discrete models • Space of distributions simplicial • Boundaries where probabilities are zero • Information geometry of extended multinomial models • Applications to graphical models and elsewhere • Proxy for space of all models • IG explicit: computable? CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Lindsay’s (1995) fundamental result characterises the maximum likelihood estimate in the class of all mixtures of fX (x; θ) • Finds Q which maximises the likelihood of f(x, θ)dQ(θ) over all possible Q when f(x, θ) is exponential family • This is called the Non-parametric maximum likelihood estimate of Q. • Uses results from ﬁnite dimensional convex geometry • Tangent spaces replaced by tangent cones • Asymptotic limits are mixtures of χ2 distributions. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Lindsay’s geometry lies in an afﬁne space which is determined by the observed data. • In particular, it is always ﬁnite dimensional, and the dimension is determined by the number of distinct observation • Deﬁne Lθ = (L1(θ), . . . , LN∗ (θ)) represent the N∗ distinct likelihood values. • The likelihood on the space of mixtures is deﬁned on the convex hull of the image of the map θ → (L1(θ), . . . , LN∗ (θ)) ⊂ RN∗ . • Find the non-parametric likelihood estimate, f(y; Q), maximising a concave function over this convex set. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • In our examples can embed families in different afﬁne space • Assume a discrete sample space and data is of the form (n0, n1, . . . , nk ) • Deﬁne ∆k := π = (π0, π1, . . . , πk) : πi ≥ 0 , k i=0 πi = 1 • Embed unmixed model in ∆k and look convex hull • Deﬁne the observed face P to be determined by index set of the strictly positive observed counts. • The afﬁne structure of Lindsay is determined by the vertices of P (Theorem 1) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • Deﬁnition: Deﬁne ΠL to be the Euclidean orthogonal projection from the simplex ∆k to the smallest vector space containing the vertices indexed by P. • Theorem: (a) The likelihood on the simplex is completely determined by the likelihood on the image of ΠL. In particular, all elements of the pre-image of ΠL have the same likelihood value. (b) ΠL maps −1 convex hulls in the −1-simplex to the convex hull of Lindsay’s geometry. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial (a) (1,0,0) (0,1,0) (0,0,1) (0,0,0) Observed face (b) (1,0,0) (0,0,1) (0,0,0) Observed face CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • There are some deﬁnite advantages to working in larger space • Can deﬁne a new search algorithm which exploits the information geometry of the full simplex. • Enables ﬁnessing the label-switching problem encountered by many other methods. • Lindsay’s geometry captures the −1- and likelihood structure, it does not capture the full information geometry. • For example, the expected Fisher information cannot be represented CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Total positivity and local mixing • Two seemingly contradictory results: • Theorem: The −1-convex hull of an open subset of a generic one dimensional exponential family π(θ) is of full dimension. • Anaya and Marriott (2007) show, under regularity but for many applications, mixtures of exponential families have accurate low dimensional representations: local mixtures • Curve π(θ) for θ ∈ U ⊂ Θ lies ‘close’ to a low dimensional −1-afﬁne subspace, then all mixtures over U ⊂ Θ also lie ‘close’ to this space. • Such subspaces are determined by −1-curvature • Can get good approximations using polygonal approximations CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximations • Given a norm · , the curve π(θ) and the polygonal path ∪Si, deﬁne the distance function by, for each θ, d(π(θ)) := inf π∈∪Si π(θ) − π . • Which norm? • Deﬁne the inner product v, w π := k i=0 viwi πi for v, w ∈ Vmix and π such that πi > 0 for all i. • This deﬁnes a preferred point metric as discussed in Critchley et al (1993) . Further, let · π be the corresponding norm. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximations • As motivation for using such a metric, consider the Taylor expansion for the likelihood around ˆπ (π) − (ˆπ) ≈ − N 2 π − ˆπ 2 ˆπ . • Theorem: Let π(θ) be an exponential family, and {θi} a ﬁnite and ﬁxed set of support points such that d(π(θ)) ≤ for all θ. Further, denote by ˆπNP and ˆπ the maximum likelihood estimates in the convex hulls of π(θ) and {π(θi)|i = 1, . . . , M} respectively, and by ˆπG i := ni N the global maximiser in the simplex. Then, (ˆπNP ) − (ˆπ) ≤ N||(ˆπG − ˆπNP )||ˆπ + o( ) (1) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions 0 1 2 3 4 5 6 7 0.00.10.20.30.4 Data and Fit counts Probability/Proportion X X X X X X X X 0 1 2 3 4 5 6 7 0.00.10.20.30.40.50.60.7 Mixing proportions Support points Probability 0 1 2 3 4 5 6 7 −500−400−300−200−1000 Directional Derivative mu DirectionalDerivative Figure : The mixture ﬁt using polygonal approximation CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximation tripod example Figure : The bipod model: space of unmixed independent distributions showing the ruled-surface structure. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Conclusions • We look at, and show the relationship between, different geometrically approaches to mixture modelling • Lindsay’s data dependent, ﬁnite dimensional afﬁne space • Our, extended multinomial embedding space • Show a new algorithm which uses the full Information Geometry of the problem to its advantage • Exploit the idea of polytope approximation in the correct geometry
Visualizing projective shape space John Kent University of Leeds hello j.t.kent@leeds.ac.uk http://maths.leeds.ac.uk/~john GSI August 2013 Overview This talk is about a camera view of a “scene”, where the scene contains a set of collinear points in the plane (using a one-dimensional ﬁlm), or a set of coplanar points in three dimensions (using a two-dimensional ﬁlm). We are interested in the information in the scene that is invariant to the location of the focal point of the camera and the orientation of the ﬁlm. Thus we are looking for features in the scene that are invariant under the group of projective transformations. Such features are known as projective invariants. The collection of information in projective invariants is called “projective shape”. Unfortunately, projective invariants, as usually formulated, are not suitable for quantitative statistical analysis — there is no obvious metric between diﬀerent sets of projective invariants. The purpose of this talk is to give a standardized representation of projective shape that is amenable to metric comparisons. The simplest case — 4 collinear points For much of the talk we focus on the simplest case (k = 4 points in m = 1 dimension), where there is just projective invariant — the cross ratio. We then generalize the methodology to higher values of k and m. The next slides illustrate the main issues. First is a ﬁgure containing a scene of 4 collinear points, a focal point of a camera, and a linear ﬁlm. The eﬀect of changing camera position is then illustrated by two images from my back garden taken from diﬀerent positions. Camera view of 4 collinear points * * * * X X XX My back garden View 1 of lanterns View 2 of lanterns The cross ratio Given four numbers u1, . . . , u4 (representing coordinates for four labelled collinear points in a two-dimensional scene), the cross-ratio is deﬁned by τ = (u2 − u1)(u4 − u3) (u3 − u1)(u4 − u2) . It can be shown that the cross ratio is the one and only projective invariant in this situation. If the landmarks are re-labelled (there are 24 permutations), the cross ratio takes 6 possible forms (spanning all of R if the original value of τ is restricted to the interval (0, 1/2)): τ, 1 − τ, 1/(1 − τ), 1/τ, −(1 − τ)/τ, −τ/(1 − τ) (0, 1/2), (1/2, 1), (1, 2), (2, ∞), (−∞, −1), (−1, 0) Cross ratios in the back garden From the two images of my back garden, I extracted the coordinates of the lanterns and computed τ in each case. The answers are very similar (as expected)! τ1 = 0.489, τ2 = 0.487. Unsuitability of cross ratio for metric comparions The behavior of the cross ratio under relabelling underscores its unsuitability for metric comparisons. In particular if we want to compare two cross ratios near 0 (e.g. τ1 = 0.1, τ2 = 0.01), they look very close together on the τ scale (|0.1 − 0.01| = 0.09), but quite far apart on the 1/τ scale (|10 − 100| = 90), which means the labelling of the landmarks aﬀects metric comparisons between cross ratios. What to do? We shall look at a geometric solution (limited to 4 collinear landmarks) and an algebraic solution (more landmarks and higher dimensions). Geometric standardization for 4 collinear landmarks Suppose the four landmarks are labelled, A,B,C,D in increasing order on the line. Draw two semi-circles, one with diameter AC and the other with diameter BD. The two semicircles intersect in a point O, say. Make this point the focal point of the camera. Switch from linear ﬁlm to circular ﬁlm. The image of a landmark is now a pair of antipodal points on the circle. The angles AOC and BOD are right angles. The angle AOB, δ, say, is related to the cross ratio by τ = sin2δ. Further under relabellings the cross ratio takes the following forms in terms of δ: sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, Geometric choice of preferred focal point. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.51.0 O A B C D q Standardized image of 4 collinear points on circular ﬁlm −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Standardized configuration Y X X XX Homogeneous coordinates To understand why this choice of focal point is useful for metric comparisons, we need to do some algebraic calculations. The ﬁrst step is to construct homogeneous coordinates. Starting with the four real coordinates u1, . . . , u4 construct a 4 × 2 “augmented” conﬁguration matrix by adding a column of ones, X = u1 1 u2 1 u3 1 u4 1 = x1 x2 x3 x4 where xT i denotes the ith row of X, and think of each row as deﬁned only up to a scalar multiple. (In general X is a k × p matrix, p = m + 1.) Projective shape as an equivalence class of matrices It can be shown that the projective shape is precisely the information in X that is invariant under the transformations X → DXB, where D = diag(di ) is a k × k diagonal nonsingular matrix (the distance between the focal point and each landmark in the scene is unknown), and B(p × p) nonsingular (representing the eﬀect of focal point position). Thus projective shape can be described in terms of an equivalence class of matrices. How can we choose a preferred element of the equivalence class? Tyler standardization for projective shape — 1 For projective shape recall that X ≡ DXB. Let us choose D and B so that after standardization (a) the rows of X are unit vectors, xT i xi = 1, i = 1, . . . , k, and (b) the columns of X are orthonormal, up to a factor k/p, XT X = (k/p)Ip. Choice of D: Since each row of X is deﬁned only up to a multiplicative constant, we can scale each row of X so the ﬁrst element is 1 (the conventional choice, appropriate for ﬂat ﬁlm) or to have norm 1 (the Tyler choice, appropriate for spherical ﬁlm), in both cases with the focal point of the camera at the origin. The existence of a solution for D and B is due to Dave Tyler who developed a similar result in the context of robust estimation of a covariance matrix in multivariate analysis. In general D and B must be found numerically using an iterative algorithm. Tyler standardization for projective shape — 2 Let Y = DXB denote the Tyler standardized conﬁguration after using the optimal D and B. Then the rows yi , i = 1, . . . , k are unit vectors and the columns are orthonormal up to a factor k/p. On our spherical ﬁlm the yi are “uniformly spread” around the unit sphere in Rp in terms of their moment of inertia matrix, Y T Y = yi yT i = (k/p)Ip. Note that Y is unique up to (a) multiplying each row of Y by ±1, and (b) multiplying Y on the right by a p × p orthogonal matrix. How to remove these remaining indeterminacies in Y ? Embedding From a standardized conﬁguration Y , deﬁne an “absolute inner product” matrix M(k × k) by mij = |yT i yj |, i, j = 1, . . . , k. Then (a) mij is invariant under sign changes for each row and under rotation/reﬂection of data around the circle. (b) At least for p = 2, it is possible to reconstruct the projective shape of Y from M. (c) Hence, at least for p = 2, M is a representation of the projective shape of Y Tyler standardization for 4 collinear points In the case k = 4, p = 2 it can be shown that that a standardized conﬁguration Y takes the form Y = v(−δ/2)T v(δ/2)T v(π/2 − δ/2)T v(π/2 + δ/2)T = c −s c s s c s −c , where v(θ) = (cos(θ), sin(θ))T c = cos(δ/2), s = sin(δ/2), 0 < δ < π/4 unique up to (a) permutation of landmarks, (b) sign of each row, (c) rotation/reﬂection of data around the circle. Then τ is related to δ by one of the trig functions sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, depending on the permutation. Standardized representation of 4 collinear points −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Standardized configuration Y X X XX Embedding for 4 collinear points In this case Y = c −s c s s c s −c , c = cos(δ/2) s = sin(δ/2) , where 0 < δ < π/2. Then M = 1 C 0 S C 1 S 0 0 S 1 C S 0 C 1 where C = cos(δ), S = sin(δ). Note m2 12 + m2 13 + m2 14 = 1 with one structural 0, so M can be represented as the edges of a spherical triangle, in unit sphere in R3. Projective shape space for 4 collinear points as a spherical triangle (a) A=C A=B A=D A~C A~D A~B q q q q q q 0 0.5 1 2−1 +/− ∞ Interpretation of the spherical triangle The position of the structural 0 in M is closely related to the ordering of the landmarks. In particular it identﬁes which pairs of landmarks are perpendicular in the circular ﬁlm image. In our earlier picture with ordered landmarks, A,B,C,D, angles AOC (and hence also BOD) were right angles. At one end of this edge (i.e. vertex of the spherical triangle), landmarks A & B coalesce (as do landmarks B & D). At the other vertex, landmarks A & D coalesce (as do landmarks B & C). Why corners? Why does the spherical triangle representation of projective shape space for 4 collinear landmarks have corners? In terms of the cross ratio, τ = {B − A)(D − C)}/{C − A)(D − B)}, there seems no reason for corners. E.g. hold A < C < D ﬁxed and let B vary through the extended real line. Then the cross ratio varies in a bijective fashion through the extended real line. If we avoid the singularity at B = D, then the cross ratio is an inﬁnitely diﬀerentiable function of B. In particular, there is no hint of a singularity as B passes through A and C. But at these points (B = A and B = C) the cross ratio takes the values 0 and 1, respectively, corresponding to two of the vertices in projective shape space. Where do these singularities (i.e. vertices or corners) come from? The reasons for corners (a) The ﬁrst answer is that when B approaches one of the other three landmarks, e.g. B → A, Tyler standardization forces the other two landmarks to come together as well. Thus the single-pair singularity in the simple cross ratio description (B = A) is actually a double-pair singularity (B = A, D = C) in the Tyler-standardized description. (b) Further, there are two distinct ways to move away from a singularity (e.g. B = A, D = C) in terms of the separation of the landmarks. On one edge (the lower edge of the spherical triangle we have A is separated from C (and hence B is separated from D). On the other edge (the left edge of the spherical triangle) we have A is separated from D (and hence B is separated from D). (c) The rank of the Tyler standardized conﬁguration Y drops from 2 to 1 at the corners. Further ideas I: Statistical issues It is possible to do distribution theory in some simple cases (e.g. 4 iid normally distributed landmarks on the line), but the results are complicated, the pdfs have singularities at the corners of the spherical triangle, and such models are not very realistic. A more promising approach is to look in more detail at the eﬀect of small-scale variability about a ﬁxed conﬁguration/projective shape. But the pose of the object aﬀects the distribution of projective shape. Further ideas II: Four types of projective shape space In many cases there is partial information about the camera: (a) oriented vs. unoriented, and (b) directional vs. axial. (a) In an oriented camera we know the side of the scene that the camera lies on. That is, mathematically we know whether det(B) is positive or negative. Conversely, for an unoriented camera, the sign of det(B) is unknown. (b) In a directional camera we know whether an image point lies between the focal point of the camera and the corresponding real-world point, or whether the focal point lies between the image point and the real-world point. In an axial camera this information is not available. Mathematically, in terms of the k × k diagonal matrix D, we require the di > 0 for a directional camera, and merely that di = 0 for an axial camera. Which version of projective shape space to use? Projective geometry focuses mainly on an unoriented axial camera. However, in real life a camera is usually oriented and directional. We now illustrate these ideas for the simplest situation of k = 4 collinear points (m = 1 dimension). Comments for 4 collinear points Directional vs. axial For a directional camera, the red “X”s are observed. For an axial camera, we cannot distinguish each red “X” from the opposite point on the circle. Oriented vs. unoriented For an oriented camera, we see the circle as given. For an unoriented camera, we cannot distinguish the circle from its reﬂection.
ORAL SESSION 9 Optimization on Matrix Manifolds (Silvere Bonnabel)
Fachgebiet Geometrische Optimierung und Maschinelles Lernen A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices Martin Kleinsteuber joint work with Hao Shen Research Group for Geometric Optimization and Machine Learning www.gol.ei.tum.de August 29, 2013 Slide 1/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Outline Mixing model and problem statement Separating with second order information The geometric setting for NUJD Uniqueness result for complex NUJD Conclusion Slide 2/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The complex linear BSS Model Mixing model: w(t) = As(t), A ∈ Gl(m). s(t) = [s1(t), . . . , sm(t)]T : m-dimensional complex signal w(t): observed signals Mixing matrix A ∈ Gl(m) (set of all invertible complex (m × m)-matrices) Task: Recover s(t) given w(t) only, via the demixing model y(t) = XH w(t) Demixing matrix X ∈ Gl(m) Slide 3/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The complex linear BSS Model Mixing model: w(t) = As(t), A ∈ Gl(m). s(t) = [s1(t), . . . , sm(t)]T : m-dimensional complex signal w(t): observed signals Mixing matrix A ∈ Gl(m) (set of all invertible complex (m × m)-matrices) Task: Recover s(t) given w(t) only, via the demixing model y(t) = XH w(t) Demixing matrix X ∈ Gl(m) Slide 3/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Idea Use the uncorrelation assumption of the source signals to estimate the mixing matrix. Covariance of observations: Cw (t) := E[w(t)wH (t)] = A E[s(t)sH (t)] =:Cs(t) AH , Cs(t) is diagonal for non-stationary signals: Cw (ti) = Cw (tj) estimate A by simultaneously diagonalizing a set of covariance matrices Slide 4/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Idea Use the uncorrelation assumption of the source signals to estimate the mixing matrix. Covariance of observations: Cw (t) := E[w(t)wH (t)] = A E[s(t)sH (t)] =:Cs(t) AH , Cs(t) is diagonal for non-stationary signals: Cw (ti) = Cw (tj) estimate A by simultaneously diagonalizing a set of covariance matrices Slide 4/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Pseudo-Covariance of observations: Rw (t) := E[w(t)wT (t)] = ARs(t)AT . Rs(t) is diagonal for non-stationary signals: Rw (ti) = Rw (tj) estimate A by simultaneously diagonalizing a set of Pseudo-covariance matrices Slide 5/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Pseudo-Covariance of observations: Rw (t) := E[w(t)wT (t)] = ARs(t)AT . Rs(t) is diagonal for non-stationary signals: Rw (ti) = Rw (tj) estimate A by simultaneously diagonalizing a set of Pseudo-covariance matrices Slide 5/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The SUT-algorithm [Eriksson, Koivunen 2004] Particular task Diagonalize Cw (t) and Rw (t) simultaneously via XH Cw (t)X, and XH Rw (t)X∗ Pseudo-Code: 1. Diagonalize C = UΦUH via SVD 2. Compute R = Φ−1/2UHRU∗Φ−1/2 3. Diagonalize R = VΨVT via Takagi factorization 4. Output X = UΦ−1/2V Slide 7/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The SUT-algorithm [Eriksson, Koivunen 2004] Particular task Diagonalize Cw (t) and Rw (t) simultaneously via XH Cw (t)X, and XH Rw (t)X∗ Pseudo-Code: 1. Diagonalize C = UΦUH via SVD 2. Compute R = Φ−1/2UHRU∗Φ−1/2 3. Diagonalize R = VΨVT via Takagi factorization 4. Output X = UΦ−1/2V Slide 7/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Non-unitary joint diagonalization Problem Given a set of complex symmetric (Pseudo-covariance-) matrices {Ri}i=1,...,n, ﬁnd X ∈ Gl(m) such that XTRiX are all diagonal. Permutation and scale ambiguity of solutions X is solution ⇐⇒ XDΠ is solution D diagonal, Π Permutation Optimization methods like to have isolated solutions, what to do? Slide 8/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD let D := {D | D ∈ Gl(m) is diagonal} equivalence classes X := {XD ∈ Gl(m) | D ∈ D } complex oblique projective (COP) manifold Op := { X | X ∈ Gl(m) } Slide 9/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Let f : Gl(m) → R be a function that measures simultaneous diagonality. e.g. reconstruction error X → n i=1 1 4 Ri − X−T ddiag(XT Ri X)X−1 2 F then f(X) = f(XD) naturally induces a function ˆf on Op Idea Optimize ˆf on the COP manifold. lower dimensional search space chance to have non-degenerated minima Slide 10/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Let f : Gl(m) → R be a function that measures simultaneous diagonality. e.g. reconstruction error X → n i=1 1 4 Ri − X−T ddiag(XT Ri X)X−1 2 F then f(X) = f(XD) naturally induces a function ˆf on Op Idea Optimize ˆf on the COP manifold. lower dimensional search space chance to have non-degenerated minima Slide 10/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Op is open and dense Riemannian submanifold of product of CPm−1 tangent spaces, geodesics, parallell transport coincide locally remains to consider CPm−1 = Sm/S1 with Sm := {x ∈ Cm | xH x = 1} [Absil et al. Optimization Algorithms on Matrix Manifolds, Princeton Press, 2008] No representation of CPm−1 in Cm. Use rank-1 projection matrices in Cm×m? Here: Use quotient-space properties. Slide 11/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Op is open and dense Riemannian submanifold of product of CPm−1 tangent spaces, geodesics, parallell transport coincide locally remains to consider CPm−1 = Sm/S1 with Sm := {x ∈ Cm | xH x = 1} [Absil et al. Optimization Algorithms on Matrix Manifolds, Princeton Press, 2008] No representation of CPm−1 in Cm. Use rank-1 projection matrices in Cm×m? Here: Use quotient-space properties. Slide 11/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD identify tangent space at x ∈ CPm−1 with the horizontal lift of the tangent space at x T x CPm−1 = z∈S1 TxzSm = z∈S1 {h ∈ Cm | (hH xz) = 0} = {h ∈ Cm | hH x = 0}. Slide 12/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Result A The geodesics in CPm−1 through x are given by γ(t) with γ(t) := et(hxH−xhH) x. Result B The parallel transport from T γ(0) CPm−1 to T γ(t) CPm−1 along the geodesic γ(t) is given by τ(t) := et(hxH−xhH) h. Slide 13/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD results transfer straightforwardly to Op (use product manifold structure) All ingredients for a geometric Conjugate Gradient method for minimizing ˆf. Slide 14/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Separation performance AC/DC Off−norm CG Proposed CG 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Amarierror AC/DC: alternating algorithm minimizing the direct ﬁt cost function Off-norm CG: a CG algorithm minimizing the off-norm cost function Slide 15/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Well known: Good local convergence properties of CG if minimum has non degenerated Hessian Under what conditions on the source signals is the minimizer isolated on the COP manifold? Equivalently: Under what conditions on the source signals is the diagonalizer unique (up to permutation and diagonal scaling)? Slide 16/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Let Dk be the (diagonal) Pseudo-Covariances of the signals. Let di be a complex vector consisting of all diagonal entries of Dk at the i − th position. L PROCESSING MAGAZINE: SPECIAL ISSUE ON SOURCE SEPARATION AND APPLICATIONS d12 0 0 d11 D1 d22 0 0 d21 D2 · · · dK2 0 0 dK1 DK =⇒ [d11, d21, . . . , dK1] =: dT 1 . . . =⇒ [d12, d22, . . . , dK2] =: dT 2 Dk := diag(dk1, dk2) ∈ C2×2 for k = 1, . . . , K. For a ﬁxed diagonal position i, we denote by di := [d1i, . . . , dKi]T ∈ onsisting of the i-th diagonal element of each matrix, respectively. PIEEE Signal Processing Magazine Slide 17/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Result (MK, Shen 2013) The simultaneous diagonalizer is unique up to permutation and scaling if and only if |c(di, dj)| = 1, i = j, with c(v, w) := vH w v w if v = 0 and w = 0, 1 otherwise, Slide 18/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Literature Absil et al.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ (2008). Kleinsteuber, M., Shen, H.: Uniqueness analysis of non-unitary matrix joint diagonalization. IEEE Transactions on Signal Processing 61(7) (2013) 1786–1796. Shen, H., Kleinsteuber, M.: Complex blind source separation via simultaneous strong uncorrelating transform. In: LNCS, Proc. 9th International Conference on Latent Variable Analysis and Signal Separation. Volume 6365., Berlin/Heidelberg, Springer-Verlag (2010) 287–294 Slide 19/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Preprints and Matlab codes available at www.gol.ei.tum.de Slide 21/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Properties of complex signals Second order stationarity: (i) E[s(t)] = E[s(t + τ)] (ii) E[s(t1)s(t2)] = E[s(t1 + τ)s(t2 + τ)] Circularity: s(t) and eiαs(t) have the same probability distribution This implies E[s(t)2 ] = 0 and motivates the circularity coefﬁcient λs(t) := |E[s(t)2 ]| E[|s(t)|2] Slide 22/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber |
An extrinsic look at the Riemannian Hessian Pierre-Antoine Absil (UCLouvain) Robert Mahony (Australian National University) Jochen Trumpf (Australian National University) GSI 2013, Paris 29 August 2013 Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 1 Broader topic: easy-to-implement Newton-type methods on manifolds M f R x Given: ◮ A manifold M, i.e., a set endowed (often implicitly) with a manifold structure (i.e., a collection of compatible charts). ◮ A function f : M → R, smooth in the sense of the manifold structure. Task: Compute a local minimizer of f . Approach: Newton-type methods on manifolds. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 2 Some speciﬁc manifolds and related applications I ◮ Stiefel manifold St(p, n) and orthogonal group Op = St(n, n) St(p, n) = {X ∈ Rn×p : XT X = Ip} Applications: computer vision; principal component analysis; independent component analysis... ◮ Grassmann manifold Gr(p, n) Set of all p-dimensional subspaces of Rn Applications: various dimensionality reduction problems... Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 3 Some speciﬁc manifolds and related applications II ◮ Low-rank symmetric PSD manifold: Rn×p ∗ /Op ≃ {YY T : Y ∈ Rn×p ∗ } where Rn×p ∗ is the set of all full-rank n × p matrices. Applications: Low-rank approximation of positive-deﬁnite matrices, e.g., for metric learning. ◮ Low-rank manifold: M(p, m × n) = {X ∈ Rm×n : rank(X) = p}. Applications: low-rank approximation of matrices, e.g., for recommander systems. ◮ Shape manifolds. Applications: shape analysis, e.g., for medical applications. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 4 Some speciﬁc manifolds and related applications III ◮ Oblique manifold Rn×p ∗ /Sdiag+ Rn×p ∗ /Sdiag+ ≃ {Y ∈ Rn×p ∗ : diag(Y T Y ) = Ip} Applications: independent component analysis; factor analysis (oblique Procrustes problem)... ◮ Flag manifold Rn×p ∗ /Supp∗ Elements of the ﬂag manifold can be viewed as a p-tuple of linear subspaces (V1, . . . , Vp) such that dim(Vi ) = i and Vi ⊂ Vi+1. Applications: analysis of QR algorithm... Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 5 Topic of talk: easy-to-implement Newton-type methods on manifolds M f R x Given: ◮ A manifold M, i.e., a set endowed (often implicitly) with a manifold structure (i.e., a collection of compatible charts). ◮ A function f : M → R, smooth in the sense of the manifold structure. Task: Compute a local minimizer of f . Approach: Newton-type methods on manifolds. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 6 Reminder: Newton in Rn Required: (smooth) real-valued function f on Rn. Iteration xk ∈ Rn → xk+1 ∈ Rn deﬁned by 1. Solve the Newton equation Hess f · ηk = −∂f (xk) for the unknown ηk ∈ Txk Rn ≃ Rn, where ∂f (x) := ∂1f (x) . . . ∂nf (x) T and, for all zx ∈ Tx M, Hess f · zx := Dzx (∂f ). 2. Set xk+1 := xk + ηk. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 7 Newton on Riemannian submanifolds (with Levi-Civita connection) Required: Riemannian submanifold M of Euclidean space E; retraction R on M; real-valued function f on M; extension ¯f of f on E. Iteration xk ∈ M → xk+1 ∈ M deﬁned by 1. Solve the Newton equation Hess f · ηk = −grad f (xk) for the unknown ηk ∈ Txk M, where grad f (x) = Px (∂¯f (x)), with Px orthog proj onto Tx M and, for all zx ∈ Tx M, Hess f · zx := Px Dzx (grad f ). 2. Set xk+1 := R(xk, ηk). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 8 An extrinsic look at the Riemannian Hessian For all zx ∈ Tx M (and dropping the subscript x to lighten the notation), we have Hess f · z = Px Dz (grad f ) = Px Dz P∂¯f = Px (Px ∂2¯f (x)z + DzP ∂¯f (x)) = Px ∂2¯f (x)z + Px Dz P∂¯f (x) Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 9 An extrinsic look at the Riemannian Hessian Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ Theorem: Px Dz Pu = Px DzPP⊥ x u = −Px Dz(P⊥ U) =: Ax (z, P⊥ x u), for all x ∈ M, z ∈ Tx M, u ∈ Tx E ≃ E, and all extension U of u. The symbol Ax stands for the Weingarten map of the submanifold M of the Euclidean space E. ◮ Proof: For the ﬁrst equality, observe that 0 = PP⊥ = Dz (PP⊥ ) = Dz PP⊥ x + Px Dz P⊥ = DzPP⊥ x − Px Dz P. Multiplying by Px on the left and using the identity Px Px = Px yields Px DzP = Px Dz PP⊥ x . For the second equality, observe that, for all extensions U of u, − Px Dz(P⊥ U) = −Px Dz P⊥ U − Px P⊥ x Dz U = −Px DzP⊥ U = Px DzPU. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 10 An extrinsic look at the Riemannian Hessian: the Stiefel manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The Stiefel manifold St(p, n) is the set of orthonormal p-frames in Rn: St(p, n) = {X ∈ Rn×p : XT X = Ip}. ◮ The orthogonal projector PX onto TX St(p, n) is given by PX U = (I − XXT )U + X 1 2 (XT U − UT X) = U − X 1 2 (XT U + UT X). ◮ Let Z ∈ TX M and W ∈ T⊥ X M. We have PX DZ PW = −ZXT W − X 1 2 (ZT W + W T Z). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 11 An extrinsic look at the Riemannian Hessian: the Grassmann manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The Grassmann manifold Grm,n is the set of m-dimensional subspaces of Rn. Equivalently, it can be viewed as the set of rank-m orthogonal projectors in Rn, i.e., Grm,n = {X ∈ Rn×n : XT = X, X2 = X, trX = n}. ◮ It is known that PX = ad2 X with adX A := [X, A] := XA − AX and ad2 X := adX ◦ adX . ◮ We obtain that, for all Z ∈ TX Grm,n and all W ∈ T⊥ X Grm,n, PX DZ P W = −adX adW Z. One recovers herewith the Hessian formula given by Helmke et al (arXiv:0709.2205). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 12 An extrinsic look at the Riemannian Hessian: the ﬁxed-rank manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The ﬁxed-rank manifold Mp(m × n) is the set of all m × n matrices of rank p. ◮ The projector PX onto TX Mp(m × n) is given by PX W = PUW PV +P⊥ U W PV +PUW P⊥ V = W PV +PUW −PUW PV , where PU := UUT and P⊥ U := I − PU. ◮ Let Z ∈ TX Mp(m × n). Let W ∈ T⊥ X Mp(m × n). We obtain PX DZ P W = WZT (X+ )T + (X+ )T ZT W . Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 13 Newton on Riemannian submanifolds (with Levi-Civita connection) Required: Riemannian submanifold M of Euclidean space E; retraction R on M; real-valued function f on M; extension ¯f of f on E. Iteration xk ∈ M → xk+1 ∈ M deﬁned by 1. Solve the Newton equation Hess f · ηk = −grad f (xk) for the unknown ηk ∈ Txk M, where grad f (x) = Px (∂¯f (x)), with Px orthog proj onto Tx M and, for all zx ∈ Tx M, Hess f · zx := Px Dzx (grad f ). 2. Set xk+1 := R(xk, ηk). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 14 Take-home message ◮ Several reasons to optimize a real-valued function on a manifold (Stiefel manifold, Grassmann manifold, ﬁxed-rank manifold...). ◮ Newton’s method is the archetypal second-order method. ◮ Newton’s method on a submanifold: Hess f · ηk = −grad f (xk) xk+1 := R(xk, ηk). ◮ Recent results for the Hessian: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x), with formulas for Px and Px Dz P on several speciﬁc manifolds. Ref: PAA, Mahony & Trumpf, http://sites.uclouvain.be/absil/2013.01 ◮ Recent results for retractions: Projection-like techniques to construct R. Ref: PAA, Malick, http://sites.uclouvain.be/absil/2010.038 Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 15 A freely available toolbox for optimization on manifolds: www.manopt.org Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 16 Projection-like retractions on submanifolds Ref: PAA, Malick, http://sites.uclouvain.be/absil/2010.038 ◮ A retraction on M is a smooth mapping R : TM → M such that R(x, 0x ) = x and d dt R(x, tu) t=0 = u, for all (x, u) ∈ TM. ◮ A retractor on a d-dim submanifold M of an n-dim Euclidean space E is a smooth mapping D : TM → Gr(n − d, E) such that, for all x ∈ M, D(x, 0) is transverse to Tx M. ◮ Deﬁne the aﬃne space D(x, u) = x + u + D(x, u). Let R(x, u) be the point of M ∩ D(x, u) nearest to x + u. ux D(x, u) M R(x, u) ◮ Theorem: R is a retraction on M. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 17 Projection-like retractions: orthographic retraction ux M D(x, u) R(x, u) Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 18 Projection-like retractions: orthographic retraction on ﬁxed-rank manifold ux M D(x, u) R(x, u) ◮ Now M := Mp(m × n), the set of all m × n matrices of rank p. ◮ Let X = U Σ0 0 0 0 V T with Σ0 ∈ Rp×p the diagonal matrix of non-zero singular values, and let Z = U A C B 0 V T be in TX Mp(m × n). ◮ The orthographic retraction R on M is given by R(X, Z) = U Σ0 + A C B B(Σ0 + A)−1C V T = U Σ0 + A B I (Σ0 + A)−1C V T . Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 19
Discrete curve ﬁtting on manifolds Nicolas Boumal joint work with Pierre-Antoine Absil Universit´e catholique de Louvain August 2013 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 The regression problem in R2 A balance between ﬁtting and smoothness 3 The regression problem in R2 A balance between ﬁtting and smoothness p1 p2 p3 p4 Each data point pi corresponds to a ﬁxed time ti. 3 The regression problem in R2 A balance between ﬁtting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between ﬁtting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between ﬁtting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between ﬁtting and smoothness p1 p2 p3 p4 Regression is about denoising and ﬁlling the gaps. 3 The regression problem in R2 can be seen as an optimization problem Minimize: ˆE(γ) = Penalty on misﬁt N i=1 pi − γ(ti) 2 +λ Penalty on velocity tN t1 ˙γ(t) 2 dt + µ Penalty on acceleration tN t1 ¨γ(t) 2 dt λ and µ (≥ 0) balance ﬁtting VS smoothness. Minimize over some curve space ˆΓ: dim ˆΓ may be inﬁnite. 4 We discretize the curves γ hence reverting to ﬁnite dimensional optimization p1 p2 p3 p4 5 We discretize the curves γ hence reverting to ﬁnite dimensional optimization p1 p2 p3 p4 γi γi+1 γi−1 γ1 γNd Each point γi corresponds to a ﬁxed time τi Γ = Rn × · · · × Rn ≡ RNd×n 6 We thus need a new objective E deﬁned over the new curve space Γ E(γ) = N i=1 pi − γ(ti) 2 ⇓ N i=1 pi − γsi 2 +λ tN t1 ˙γ(t) 2 dt ⇓ Nd i=1 αi vi 2 +µ tN t1 ¨γ(t) 2 dt ⇓ Nd i=1 βi ai 2 7 What if the data lies on a manifold? Manifolds are smoothly “curved” spaces. Simple toy example: the sphere S2 in R3 More exciting manifolds discussed in this work: Pn + and SO(n). 8 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 We need a few concepts from Riemannian geometry to deﬁne discrete regression on S2 Redeﬁne E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 10 We need a few concepts from Riemannian geometry to deﬁne discrete regression on S2 Redeﬁne E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 ⇓ N i=1 dist2 (pi, γsi ) +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 10 We need a few concepts from Riemannian geometry to deﬁne discrete regression on S2 Redeﬁne E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 ⇓ N i=1 dist2 (pi, γsi ) +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 11 Finite diﬀerences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 12 Finite diﬀerences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 can be rewritten like this: ai = (γi+1 − γi) + (γi−1 − γi) ∆τ2 . 12 Finite diﬀerences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 can be rewritten like this: ai = (γi+1 − γi) + (γi−1 − γi) ∆τ2 . Now, we can interpret the terms: γi+1 − γi is a vector rooted at γi and pointing toward γi+1 12 Logarithms on manifolds generalize diﬀerences We use them to deﬁne geometric ﬁnite diﬀerences Loga (b) is a vector rooted at a, in the tangent space to S2 at a, pointing toward b. Furthermore, Loga (b) = dist (a, b). 13 Logarithms on manifolds generalize diﬀerences We use them to deﬁne geometric ﬁnite diﬀerences Loga (b) is a vector rooted at a, in the tangent space to S2 at a, pointing toward b. Furthermore, Loga (b) = dist (a, b). b − a is replaced by Loga (b) Hence: vi = Logγi (γi+1) ∆τ ai = Logγi (γi+1) + Logγi (γi−1) ∆τ2 13 We now have a proper objective for manifolds E(γ) = Penalty on misﬁt N i=1 dist2 (pi, γsi ) +λ Penalty on velocity Nd−1 i=1 αi Logγi (γi+1) ∆τ 2 + µ Nd−1 i=2 βi Logγi (γi+1) + Logγi (γi−1) ∆τ2 2 Penalty on acceleration Minimize over Γ = S2 × · · · × S2, a ﬁnite dimensional manifold. The constraint γ ∈ Γ is tough for standard software. 14 To minimize E, we use Manopt A Matlab toolbox for optimization on manifolds Manopt is a user-friendly, documented package which gives access to 1 A large collection of manifold descriptions; 2 A number of solvers (including Riemannian trust-regions); 3 And helper tools to get things right. It is available at www.manopt.org. 15 In a nutshell 1 We deﬁned the discrete regression problem in Rn; 16 In a nutshell 1 We deﬁned the discrete regression problem in Rn; 2 Then generalized it to manifolds as an optimization problem on Γ; 16 In a nutshell 1 We deﬁned the discrete regression problem in Rn; 2 Then generalized it to manifolds as an optimization problem on Γ; 3 And we feed it to Manopt. 16 Example of convergence on S2 with geometric non-linear CG and iterative reﬁnement 17 Example of convergence on S2 with geometric non-linear CG and iterative reﬁnement 18 It works well, but the manifold has to be “gentle” You need to compute the logarithmic map and its derivatives. . . Second order methods seem to help, but they require more work. . . . and that may not always be possible. If your manifold is not nice enough, perhaps you can make do with an approximate log map? 19
Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 1 August 29, 2013 1 This work was supported by EPSRC grant EP/H031936/1. Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 1 / 16 Main Result: Shannon-Pythagorean Theorem w q p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q q ⊗ p w joint measure (state) q, p its marginals Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q q ⊗ p w joint measure (state) q, p its marginals w deﬁnes T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q // << q ⊗ p w joint measure (state) q, p its marginals w deﬁnes T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q I(p,q) // << q ⊗ p w joint measure (state) q, p its marginals w deﬁnes T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w I(w,q⊗p) q ⊗ q I(p,q) // << q ⊗ p w joint measure (state) q, p its marginals w deﬁnes T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) << q ⊗ p w joint measure (state) q, p its marginals w deﬁnes T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 3 / 16 Duality: Observables and States Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 4 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Y dual of X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x Y dual of X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x Y dual of X Involution x, y∗ = x∗, y ∗ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Y dual of X Involution x, y∗ = x∗, y ∗ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Transposition: ∀ z ∈ X ∃ z ∈ Y : zx, y = x, z y Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Transposition: ∀ z ∈ X ∃ z ∈ Y : zx, y = x, z y Y is a left (resp. right) module over X ⊆ Y w.r.t. z y (resp. yz∗ ∗). Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Exponents and Logarithms Deﬁne by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Deﬁne by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Deﬁne by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Group homomorphisms for tesnor product ⊗ and Kronecker ⊕: ex⊕z = ex ⊗ ez and ln(y ⊗ z) = ln y ⊕ ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Deﬁne by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Group homomorphisms for tesnor product ⊗ and Kronecker ⊕: ex⊕z = ex ⊗ ez and ln(y ⊗ z) = ln y ⊕ ln z Because X ⊆ Y , we can consider exp : X → Y and ln : Y → X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Quantum Information Distance Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 7 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-diﬀerentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-diﬀerentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Example F(y) = ln y − 1, y F∗ (x) = 1, ex Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-diﬀerentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Example F(y) = ln y − 1, y F∗ (x) = 1, ex F(y) = ln y ex = F∗ (x) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iﬀ xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iﬀ xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) y/z := y1/2z−1y1/2 z−1/2yz−1/2 (Belavkin & Staszewski, 1984) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Deﬁnition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iﬀ xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) y/z := y1/2z−1y1/2 z−1/2yz−1/2 (Belavkin & Staszewski, 1984) I(y, z) := I∗∗(y, z) = sup{ x, y − I∗(x, z)}, where I∗ (x, z) := 1, z1/2 ex z1/2 or I∗ (x, z) := 1, ex/2 zex/2 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Law of Cosines and Shannon-Pythagorean Theorem Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 10 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with ﬁnite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with ﬁnite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Proof. First order Taylor expansion of I(·, z) at y: I(w, z) = I(y, z) + wI(y, z), w − y + R1(y, w) where wI(y, z) = ln y − ln z and the remainder R1(y, w) = I(w, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with ﬁnite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Proof. First order Taylor expansion of I(·, z) at y: I(w, z) = I(y, z) + wI(y, z), w − y + R1(y, w) where wI(y, z) = ln y − ln z and the remainder R1(y, w) = I(w, y) Corollary (Log-Pythagorean Theorem) If ln y − ln z, y − w = 0, then I(w, z) = I(w, y) + I(y, z) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt = 1 2 1, 2 yI(ξ, w)(y − z)2 for some ξ ∈ [z, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt = 1 2 1, 2 yI(ξ, w)(y − z)2 for some ξ ∈ [z, y) Corollary (Stratonovich, 1975) I(p, q) + I(q, p) ≥ 1, (p − q)2 for all p, q ∈ P(X). Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) B b → 1A ⊗ b ∈ A ⊗ B, where B = ln p − ln q Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) B b → 1A ⊗ b ∈ A ⊗ B, where B = ln p − ln q b, p = 1A ⊗ b, w = 1A ⊗ b, z ⊗ p , as p = 1, w A = 1, z ⊗ p A Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Discussion Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 14 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Type III Maximize Ew{v} = v, w subject to I(w, q ⊗ p)} ≤ γ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) deﬁnes a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p) q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Type III Maximize Ew{v} = v, w subject to I(w, q ⊗ p)} ≤ γ I+III=IV I(w, q ⊗ q) ≤ γ + λ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 References Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 16 / 16 Discussion Araki, H. (1975). Relative entropy of states of von Neumann algebras. Publications of the Research Institute for Mathematical Sciences, 11(3), 809–833. Belavkin, V. P., & Staszewski, P. (1984). Relative entropy in C∗-algebraic statistical mechanics. Reports in Mathematical Physics, 20, 373–384. Khinchin, A. I. (1957). Mathematical foundations of information theory. New York: Dover. Moreau, J.-J. (1967). Functionelles convexes. Paris: Coll´ege de France. Rockafellar, R. T. (1974). Conjugate duality and optimization (Vol. 16). PA: Society for Industrial and Applied Mathematics. Stratonovich, R. L. (1975). Information theory. Moscow, USSR: Sovetskoe Radio. (In Russian) Umegaki, H. (1962). Conditional expectation in an operator algebra. IV. entropy and information. Kodai Mathematical Seminar Reports, 14(2), 59–85. Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 16 / 16
Some remarks on the intrinsic Cramer-Rao bound GSI 2013, Paris Axel Barrau* and Silv`ere Bonnabel Mines Paristech august, 29th (Mines Paristech) august, 29th 1 / 21 Introduction Problem: estimate a covariance matrix Σ given a sample of X ∼ N(0, Σ) Sample covariance matrix estimation. S.T. Smith has proven 1 for the natural distance d on the cone of covariance matrices: E(d2 (Σ, ˆΣ)) Σ with Σ = Cste. This result can be seen as: A consequence of information geometry A consequence of the invariances of the problem 1 S. T. Smith, ”Covariance, Subspace, and Intrinsic Cramer-Rao Bounds” IEEE Trans. signal process., vol. 53, no. 5, May 2005 (Mines Paristech) august, 29th 2 / 21 1 Cramer-Rao bound in classical estimation theory 2 Intrinsic Cramer-Rao bound 3 Invariant parametric families (Mines Paristech) august, 29th 3 / 21 Cramer-Rao bound in classical estimation theory Cramer-Rao bound in classical estimation theory (Mines Paristech) august, 29th 4 / 21 Cramer-Rao bound in classical estimation theory Cramer-Rao bound Parametric family of densities: p(x|θ), θ ∈ Rn Classical Fisher Information Matrix: Ii,j (θ) = E( ∂ ∂θi log p(x|θ) ∂ ∂θj log p(x|θ)) Cramer-Rao Lower Bound for unbiased estimators: Var(ˆθ) I−1(θ) (Mines Paristech) august, 29th 5 / 21 Cramer-Rao bound in classical estimation theory Fisher Metric Deﬁnition The Fisher metric is the Riemannian metric deﬁned by the local scalar product dθT I(θ)dθ. Deﬁnition The Fisher distance is the geodesic distance associated to the Fisher metric. (Mines Paristech) august, 29th 6 / 21 Cramer-Rao bound in classical estimation theory Examples: Location parameter: p(x|θ) = f (x − θ) The Fisher distance is proportional to the euclidian distance. Scale parameter: p(x|θ) = 1 θ f ( x θ ) The Fisher distance is proportional to d(θ1, θ2) = || log(θ1)−log(θ2)||. (Mines Paristech) august, 29th 7 / 21 Intrinsic Cramer-Rao bound Intrinsic Cramer-Rao bound (Mines Paristech) august, 29th 8 / 21 Intrinsic Cramer-Rao bound Normal coordinates θ ∈ M endowed with a Riemannian metric gθ. An orthogonal basis X1, ...Xn of the tangent plane deﬁnes a set of local coordinates through (a1, ..., an) → expθ(a1X1 + ... + anXn). gθ becomes the euclidian scalar prduct. (Mines Paristech) august, 29th 9 / 21 Intrinsic Cramer-Rao bound Basic statistical tools 2 The exponential coordinates map M to its tangent plane at θ. Bias of an estimator ˆθ: b(θ) = E(exp−1 θ (ˆθ)) Covariance of an estimator ˆθ: C(θ) = Cov(exp−1 θ (ˆθ)) 2 X. Pennec, ”Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements” Journal of Mathematical Imaging and Vision, 25:127-164, 2006 (Mines Paristech) august, 29th 10 / 21 Intrinsic Cramer-Rao bound Examples: Estimation of a covariance matrix in statistics Subspace estimation in signal processing Pose estimation in robotics (Mines Paristech) august, 29th 11 / 21 Intrinsic Cramer-Rao bound Intrinsic Cramer-Rao bound The Intrinsic Fisher Information Matrix is deﬁned using local coordinates: Ii,j (θ) = E( ∂ ∂θi log p(x|θ) ∂ ∂θj log p(x|θ)) Intrinsic Cramer-Rao lower bound without bias: C(θ) Ii,j (θ)−1 + curvature terms (Mines Paristech) august, 29th 12 / 21 Intrinsic Cramer-Rao bound Intrinsic root mean square error Let d(., .) denote the riemannian distance on M. Deﬁnition 2 θ = E(d(θ, ˆθ)2 )) (= E(|| exp−1 θ (ˆθ)||2 ) = E(Tr(exp−1 θ (ˆθ) exp−1 θ (ˆθ)T ) = Tr[C(θ)]) If d is the Fisher distance: I(θ) = Id Neglecting the curvature terms: C(θ) I(θ)−1 = Id 2 θ = Tr(C(θ)) n (Mines Paristech) august, 29th 13 / 21 Intrinsic Cramer-Rao bound Application Sample Covariance Matrix estimation: p(x|Σ) = N(0, Σ) The Fisher metric is the natural metric: GΣ(D, D) = Tr(DΣ−1 )2 As proved by Smith: 2 n(n + 1) 2 which doesn’t depend on Σ. (Mines Paristech) august, 29th 14 / 21 Invariant parametric families Invariant parametric families (Mines Paristech) august, 29th 15 / 21 Invariant parametric families Invariances Consider a parametric family p(x|θ). Assume there exist two actions of a group G: (g, x) → φg (x) is an action of G on X. (g, θ) → ρg (θ) is an action of G on M. Deﬁnition Invariance under the action of G : y = φg (x) has density function py (y|ρg (θ)) = px (x|θ) (Mines Paristech) august, 29th 16 / 21 Invariant parametric families Example: radioactive decay. Law: p(t|θ) = 1 θ exp(− t θ ) This law has to be insensitive to a change of units (for instance from minuts to seconds): θ → Θ = 60 × θ t → T = 60 × t pT (T|Θ) = px (T|Θ) (Mines Paristech) august, 29th 17 / 21 Invariant parametric families Properties of invariant families Proposition If p(x|θ) is invariant under the actions ρg and φg of a group G and if ρg is transitive, then ∀(θ, g), ρg (θ) = θ Corollary If p(x|θ) is invariant under the actions ρg and φg of a group G and if ρg is transitive, then the Cramer-Rao Bound on the Mean Square Error associated to any G-invariant metric on M is constant. (Mines Paristech) august, 29th 18 / 21 Invariant parametric families Examples: Wahba’s problem : Etimate R using noisy measurments Yi = RT bi + Wi For any right-invariant metric we have: 2 R = Cste Sample Covariance Matrix estimation: p(x|Σ) = 1 (2π) n 2 exp(− 1 2 xT Σ−1 x) The family is invariant under the action of GLn(R) ρA(Σ) = AΣAT . As the natural metric of the cone of covariance matrices has the same invariance we have: 2 Σ = Cste (Mines Paristech) august, 29th 19 / 21 Invariant parametric families Conclusions The constant lower bound found by Smith has two interpretations: It is a general property of the Fisher metric. It is a consequence of the invariances of the problem. Furether result: An optimal estimator respects the invariances of the system. (Mines Paristech) august, 29th 20 / 21 Invariant parametric families Questions ? (Mines Paristech) august, 29th 21 / 21
ORAL SESSION 10 Optimal Transport Theory (Gabriel Peyré)
A primal-dual approach for a total variation Wasserstein ﬂow Martin Benning 1 , Luca Calatroni 2 , Bertram D¨uring 3 , Carola-Bibiane Sch¨onlieb 4 1 Magnetic Resonance Research Centre, University of Cambridge, UK 2 Cambridge Centre for Analysis, University of Cambridge, UK 3 Department of Mathematics, University of Sussex, UK 4 Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK Geometric Science of Information 2013 Ecole des Mines, Paris, 28-30 August 2013. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 1 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 2 / 24 A highly nonlinear fourth-order PDE For a regular domain Ω ⊂ Rd , d = 1, 2 we consider: The problem ut = · (u q), q ∈ ∂|Du|(Ω), in Ω × (0, T), u(0, x) = u0(x) ≥ 0 in Ω where Ω u0 dx = 1 and the total variation of u over Ω is deﬁned as: |Du|(Ω) = sup p∈C∞ 0 (Ω;Rd ), p ∞≤1 Ω u · p dx. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 3 / 24 A highly nonlinear fourth-order PDE For a regular domain Ω ⊂ Rd , d = 1, 2 we consider: The problem ut = · (u q), q ∈ ∂|Du|(Ω), in Ω × (0, T), u(0, x) = u0(x) ≥ 0 in Ω where Ω u0 dx = 1 and the total variation of u over Ω is deﬁned as: |Du|(Ω) = sup p∈C∞ 0 (Ω;Rd ), p ∞≤1 Ω u · p dx. Subgradients of TV can be characterised such that: q ∈ ∂|Du|(Ω) ⇒ q = − · u | u| if | u| = 0, which makes the problem above a nonlinear fourth-order PDE with severe restrictions and constraints for its numerical solution. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 3 / 24 An L2 -Wasserstein ﬂow for density smoothing An equivalent problem has been investigated by Burger, Franek, Sch¨onlieb (2012). Therein, a smoothed version u of a given probability density u0 was computed as a minimiser of: 1 2 W2(u0Ld , uLd )2 L2−Wasserstein distance + α E(u) smoothing term for different choices of E(u) (Dirichlet energy, Log-entropy, Fisher information, Total Variation...), e.g. u0 could be a noisy MRI image or represent some real-world data (earthquakes or ﬁres measurements). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 4 / 24 An L2 -Wasserstein ﬂow for density smoothing An equivalent problem has been investigated by Burger, Franek, Sch¨onlieb (2012). Therein, a smoothed version u of a given probability density u0 was computed as a minimiser of: 1 2 W2(u0Ld , uLd )2 L2−Wasserstein distance + α E(u) smoothing term for different choices of E(u) (Dirichlet energy, Log-entropy, Fisher information, Total Variation...), e.g. u0 could be a noisy MRI image or represent some real-world data (earthquakes or ﬁres measurements). Previous work in imaging by means of Wasserstein distance: S. Haker , L. Zhu and A. Tannenbaum (2004) for image registration; G. Peyr´e et al. (2013) for image color transfer; X. Bresson, T. Chan et al. (2009) for image segmentation; L. P. S. Demers et al. (2010) for particle image velocimetry; . . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 4 / 24 The L2 -Wasserstein metric Let (Ω, d) be a metric space. The L2 -Wasserstein distance between two probability measures µ1 , µ2 ∈ P2(Ω) (the space of all probability measures on Ω with µ-integrable second moment) is deﬁned by W2(µ1 , µ2 )2 := min Π∈Γ(µ1,µ2) Ω×Ω d(x, y)2 dΠ(x, y). Here Γ(µ1 , µ2 ) denotes the space of pairings γ ∈ P(Ω × Ω) such that: µ1 is the ﬁrst marginal of γ, µ2 is the second marginal of γ. The deﬁnition can be extended to (p-th)-Wasserstein distances. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 5 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 6 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). The combination of TV and the Wasserstein ﬁdelity term gives you: Mass conservation! u0 initial probability measure ⇒ regularised probability density u. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 6 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). The combination of TV and the Wasserstein ﬁdelity term gives you: Mass conservation! u0 initial probability measure ⇒ regularised probability density u. Introduces a higher-order smoothing that reduces TV-artifacts. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 6 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient ﬂow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 7 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient ﬂow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Solving: 1 2 W2(uk Ld , uLd )2 + (tk+1 − tk )E(u) → argminu =: uk+1 provides an iterative approach (JKO scheme) to approximately solve diffusion equations of the type: ut = · (u E (u)) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 7 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient ﬂow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Solving: 1 2 W2(uk Ld , uLd )2 + (tk+1 − tk )E(u) → argminu =: uk+1 provides an iterative approach (JKO scheme) to approximately solve diffusion equations of the type: ut ∈ · (u ∂|Du|(Ω)) ⇒ our PDE! Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 7 / 24 Previous results and our goal In their work Burger, Franek, Sch¨onlieb have shown: Existence results (by standard technique in Calculus of Variations); Self-similarity properties of the solutions; Numerical results: augmented Lagrangian schemes solving the minimisation problem (for a ﬁxed α, this means computing one timestep of the minimising movement scheme). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 8 / 24 Previous results and our goal In their work Burger, Franek, Sch¨onlieb have shown: Existence results (by standard technique in Calculus of Variations); Self-similarity properties of the solutions; Numerical results: augmented Lagrangian schemes solving the minimisation problem (for a ﬁxed α, this means computing one timestep of the minimising movement scheme). We want to study the dynamics of the corresponding gradient ﬂow (multiple timesteps), ﬁnding a numerical scheme providing its discrete approximation. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 8 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 9 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 10 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By deﬁnition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) |Du|(Ω) − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By deﬁnition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) sup p∈C∞ 0 (Ω;R2), p ∞≤1 Ω u · p dx − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By deﬁnition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) sup p∈C∞ 0 (Ω;R2), p ∞ ≤ 1 Ω u · p dx − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By deﬁnition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) sup p∈C∞ 0 (Ω;R2) Ω u · p dx − 1 ε F(|p| − 1) penalty term − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) where 0 < ε 1 measures the weight of the penalisation (Benning, M¨uller). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 11 / 24 The relaxed problem Merging the original equation we started from with the optimality conditions with respect to u and p we get: ut = · (u q), q = · p, 0 = − u − 1 ε F (|p| − 1) and the nonlinearity now is encoded in the penalty term F . A typical choice for F is: F(|p| − 1) = 1 2 max{|p| − 1, 0}2 , F (|p| − 1) = 1{|p|≥1}sgn(p)(|p| − 1). We can now linearise F via its ﬁrst-order Taylor approximation. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 12 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 13 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme: Un+1 − Un ∆t = · (Un Qn+1) Qn+1 = · Pn+1, 0 = − Un+1 − 1 ε F (Pn ) − 1 ε F (Pn )(Pn+1 − Pn ). Outer iterations (n subscripts) for the time evolution; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 14 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme: U (k) n+1 − Un ∆t = · (Un Q (k) n+1) Q (k) n+1 = · P (k) n+1, 0 = − U (k) n+1 − 1 ε F (P (k−1) n+1 ) − 1 ε F (P (k−1) n+1 )(P (k) n+1 − P (k−1) n+1 ). Outer iterations (n subscripts) for the time evolution; Inner process (k superscripts) producing approximations of Un+1, Qn+1 and Pn+1 via Newton method; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 14 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme: U (k) n+1 − Un ∆t = · (Un Q (k) n+1) Q (k) n+1 = · P (k) n+1, 0 = − U (k) n+1 − 1 ε F (P (k−1) n+1 ) − 1 ε F (P (k−1) n+1 )(P (k) n+1 − P(k−1) )−τk (P (k) n+1 − P (k−1) n+1 ). Outer iterations (n subscripts) for the time evolution; Inner process (k superscripts) producing approximations of Un+1, Qn+1 and Pn+1 via Newton method; The damping sequence τk guarantees the invertibility of the operators deﬁning the system: it starts from a large τ0 and decreases to ensure quick convergence. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 14 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 15 / 24 Numerical ingredients 1 Discretisations of the differential operators using forward differences (for ) and backward differences (for ·), thus preserving adjointness; 2 Neumann boundary conditions; 3 Computational domains: closed and bounded (cartesian product of) interval(s); 4 The matrix deﬁning the linear system in each Newton step has block-structure ⇒ numerical inversion of the operators by using Schur complement; 5 Stopping criterion for the inner Newton loop: U (k) n+1 − U (k−1) n+1 2 U (k) n+1 2 ≤ tol , Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 16 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 17 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein ﬂows. ε = 10−5 , τ0 = 1. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein ﬂows. ε = 10−5 , τ0 = 1. Features: Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein ﬂows. ε = 10−5 , τ0 = 1. Features: Similar with TV: Preservation of structure (i.e. discontinuities); Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein ﬂows. ε = 10−5 , τ0 = 1. Features: Similar with TV: Preservation of structure (i.e. discontinuities); Different with TV: * Decreasing of intensity ↔ enlarging of the support (because of the mass conservation); * Constant background = TV solutions: convergence to their mean. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 18 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 19 / 24 2-D results Solution of the TV-Wasserstein ﬂow: (a) Initial condition. (b) TV result. (c) TV-Wasserstein result. The intensity of the square decreases, but the intensity of the background stays constant (different from TV!); As the intensity of the square decreases, the support enlarges due to the mass conservation property. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 20 / 24 2-D results: applications to denoising Solution of the TV-Wasserstein ﬂow: (a) Original pyramid. (b) Noisy pyramid. (c) TV-Wasserstein. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 21 / 24 2-D results: applications to denoising Solution of the TV-Wasserstein ﬂow: (a) Original pyramid. (b) Noisy pyramid. (c) TV-Wasserstein. (d) TV. Reduced staircasing! Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 21 / 24 2-D results: applications to denoising (cont.) Solution of the TV-Wasserstein ﬂow for real-world images: (a) Noisy LEGO. (b) TV result. (c) TV-Wasserstein result. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 22 / 24 2-D results: applications to denoising (cont.) Solution of the TV-Wasserstein ﬂow for real-world images: (a) Noisy LEGO. (b) TV result. (c) TV-Wasserstein result. Applications in MRI: the images of interest are densities restored from undersampled measurements and/or corrupted by noise or blur.. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 22 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 23 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efﬁciently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efﬁciently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efﬁciently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Q2 From the analysis of the 1-D case, more insights on the theory underlying the TV-Wasserstein gradient ﬂow (joint work with M. Burger, D. Matthes). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efﬁciently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Q2 From the analysis of the 1-D case, more insights on the theory underlying the TV-Wasserstein gradient ﬂow (joint work with M. Burger, D. Matthes). Thanks for listening! e-mail: l.calatroni@maths.cam.ac.uk Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein ﬂow GSI 2013, Paris, August 2013 24 / 24
1 Dual Methods for Optimal Transport Quentin M´erigot Geometric Science of Information 2013, Paris August 28, 2013 LJK / CNRS / Universit´e de Grenoble 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj Source and target with density: Benamou-Brenier ’00 Loeper-Rapetti ’05 Angenent-Haker-Tannenbaum ’03 Benamou-Froese-Oberman ’12 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj Source with density, ﬁnite target: Aurenhammer, Hoﬀmann, Aronov ’98 Oliker-Prussner ’89 Caﬀarelli-Kochengin-Oliker ’04 Kitagawa ’12 Source and target with density: Benamou-Brenier ’00 Loeper-Rapetti ’05 Angenent-Haker-Tannenbaum ’03 Benamou-Froese-Oberman ’12 3 Optimal transport: Monge’s problem µ = probability measure on X ν = prob. measure on ﬁnite Y Monge problem: Tc(µ, ν) := min{Cc(T); T#µ = ν} y with density, X
The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar University of Washington The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar Ariel University University of Washington Motivation • Computer Vision / Machine Learning problems with distances: – Image retrieval. – Descriptors matching. – K-nearest neighbor classification. – Support vector machines. – Clustering. – … • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Earth Mover’s Distance ≠ The Earth Mover’s Distance = The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance - Rubner, Tomasi, Guibas IJCV 2000 • Pele and Werman 08 – , a new EMD definition. The Earth Mover’s Distance The Earth Mover’s Distance Differences Between and EMD • EMD - scale invariant, - scale variant. Differences Between and EMD • EMD - scale invariant, - scale variant. Differences Between and EMD • EMD – partial match, not necessarily. Differences Between and EMD • EMD – partial match, not necessarily. Differences Between and EMD • If ground distance is a metric: • EMD shortcoming: – Does not differentiate between global transformation to local non-structured ones. The Earth Mover’s Distance • EMD shortcoming: – Does not differentiate between global transformation to local non-structured ones. • Our solution: – The Tangent Earth Mover’s Distance. The Earth Mover’s Distance • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Tangent Distance – Simard et al. 98 • What we want: a distance that is invariant to small global transformations: • What we want: a distance that is invariant to small global transformations. • Main idea: approximate transforms of a pattern by its tangent plane at the pattern: The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. • Our solution: – The Tangent Earth Mover’s Distance. • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The EMD part: arbitrary movements that cost. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The EMD part: arbitrary movements that cost. $ The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 Tangent Vector 0 0 0 1 11 -1 -2 The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 Transformed Histogram 0 0 0 1 12 0 0 Tangent Vector 0 0 0 1 11 -1 -2 The Tangent Earth Mover’s Distance • Previous works about the eﬃcient computation of the EMD can be used to accelerate also the TEMD computation: The Tangent Earth Mover’s Distance • Previous works about the eﬃcient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. The Tangent Earth Mover’s Distance • Previous works about the eﬃcient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. – Pele and Werman 2009: Thresholded ground distances. The Tangent Earth Mover’s Distance • Previous works about the eﬃcient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. – Pele and Werman 2009: Thresholded ground distances. – Combinations. • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline Experiments • 10 classes: – People in Africa – Beaches – Outdoor Buildings – Buses – Dinosaurs – Elephants – Flowers – Horses – Mountains – Food Experiments • 5 queries from each class. • Computed the distance of each image to the query and its reflection and chose the minimum. Experiments • 5 queries from each class. Experiments • Image representations: SIFT 32x48 L*a*b* Image Experiments - SIFT Why Thresholded ? The Earth Mover’s Distance • EMD shortcomings: – Poor performance with outliers. – Long computation time. The Earth Mover’s Distance • EMD shortcomings: – Poor performance with outliers. – Long computation time. • Our solutions: – Thresholded distances between bins. – Efficient algorithms. Robust Distances Robust Distances • Very high distances outliers same difference. Robust Distances • Very high distances outliers same difference. Robust Distances - Exponent • Usually a negative exponent is used: Robust Distances - Exponent • Usually a negative exponent is used: Robust Distances - Exponent • Exponent is used because it is (Ruzon and Tomasi 01): robust, smooth, monotonic, and a metric Robust Distances - Exponent • Exponent is used because it is (Ruzon and Tomasi 01): robust, smooth, monotonic, and a metric Input is always discrete anyway … Robust Distances - Thresholded Thresholded Distances • Color distance should be thresholded (robust). 0 50 100 150 distancefromblue Thresholded Distances 0 5 10 distancefromblue Exponent changes small distances Another Reason for Thresholded Distances FastEMD - Pele and Werman ICCV 2009 FastEMD - Pele and Werman ICCV 2009 • Any thresholded distance number of edges with cost different from the threshold The Flow Network Transformation Original Network The Flow Network Transformation Original Network Simplified Network The Flow Network Transformation Original Network Simplified Network The Flow Network Transformation Flowing between exact corresponding bins (Monge sequence for metrics) The Flow Network Transformation Removing Empty Bins and their edges The Flow Network Transformation We actually finished here…. Many Successful Applications of FastEMD Superpixel comparison Image retargetingOriginal image Object class Semantic Scene Surface Layout Image segmentation Image retargeting Results- Retrieval Curve using SIFT Averagenumberofcorrectimagesretrieved Number of nearest neighbors images retrieved (our ECCV 08) QF A2 c=2,D2 (our ICCV 09) c=1,D2 (our ICCV 09) c=2,D2 (our 13) c=1,D2 (our 13) c=1,2D1(our 13) (Simard et al. 98) L1 EMD-L1 (Ling & Okada 07) L2 Results- Normalized AUC using SIFT c=2,D2 (our ICCV 09) c=1,D2 (our ICCV 09) c=2,D2 (our 13) c=1,D2 (our 13) c=1, 2D1 (our 13) (Simard et al. 98) L1 EMD-L1 (Ling & Okada 07) QF A2 L2 (our ECCV 08) Experiments - Color Images Results- Retrieval Curve using Color Images Averagenumberofcorrectimagesretrieved Number of nearest neighbors images retrieved QF A20 D20 (our ICCV 09) D20 (our 13) Results- Normalized AUC using Color Images D20 (our ICCV 09) D20 (our 13) QF A20 • Faster Algorithms for the Tangent Earth Mover’s Distance: Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. – Decomposition idea: break the problem into easy to solve “pieces” and then enforce consistency. Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. – Decomposition idea: break the problem into easy to solve “pieces” and then enforce consistency. • Efficient and accurate segmentation of images using the Tangent Earth Mover’s Distance. Near Future Work • “Big Data” – need for complex non-linear models. Far Future Work • “Big Data” – need for complex non-linear models. • Complex distances are perfect fit for this. Far Future Work • “Big Data” – need for complex non-linear models. • Complex distances are perfect fit for this. • New research about how to learn efficiently with such distances (cascades, nearest neighbors, …). Far Future Work Papers & Code are / will be at my website: Or “Ofir Pele” http://www.seas.upenn.edu/~ofirpele/
ORAL SESSION 11 Probability on Manifolds (Marc Arnaudon)
Group Action Induced Distances On Spaces of Linear Stochastic Processes Bijan Afsari and René Vidal (JHU) Motivation: Classification and Clustering of High dimensional Time-series Data High-dimensional dynamic data: Econometrics, video surveillance, biomedical applications, ... How to classify and cluster such data? Linear Dynamical System (LDS) based approach: Model time-series data as output of LDSs. Do statistics on spaces of models i.e., LDSs. Example: Classification & Clustering Human Actions Model with/Learn LDSs: Typically: ( ). Classification & Clustering on LDS Spaces: Choose a distance. 1-nearest neighbor, nearest mean, k-means clustering, … m-dimensional input (e.g. standard white Gaussian) [1] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003. [2] P. V. Overschee and B. D. Moor. Subspace algorithms for the stochastic identification problem. Automatica, 29(3):649–660, 1993. p-dimensional output (video or features) n-dimensional state Important fact we will visit again: R=(A,B,C) is called a realization . We distinguish between realization R and the LDS M it realizes An LDS has an equivalent class of realizations all of which are indistinguishable from input-output relations. Some LDS Basics n: order of the LDS (p,m): size or dimension of the LDS State space representation. Equivalent representations e.g., vector ARMA or transfer function representations. State space representation is advantageous because of fast learning or system identification algorithms available e.g., [1,2]. Consider -dimensional time series Model with an LDS of the form (order and size ): Statistical analysis on spaces of LDSs: Choose an appropriate space containing . Geometrize (e.g., define a distance, find shortest paths). Develop tools for statistical analysis on : Probability distributions. Averaging algorithms. PCA. Problem: Pattern Recognition for Time-series Data Via Statistical Analysis on Spaces of LDSs n: order of the LDS (p,m): size or dimension of the LDS Statistical analysis on spaces …(cont’ed) We assume that all LDSs have the same order and size. motivated by implementational and theoretical considerations. Statistical analysis on spaces of LDSs: Not a new problem! 1D version under different disguises is an old problem, but not fully addressed or solved in full generality. For 1D AR models a nice theory exists and widely used e.g., speech processing [1,2,3,4]. High-dimensional version more recent (e.g., activity recognition). Even here already some theoretical frameworks exist, but they are not computationally friendly. An important feature of our approach [1] S. I. Amari. Differential geometry of a parametric family of invertible linear systems-Riemannian metric, …. Math. Systems Theory, 20:53–82, 1987. [2] S. I. Amari and H. Nagaoka. Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs. AMS, 2000. [3] A. Jr. Gray., and J. Markel. "Distance measures for speech processing."Acoustics, Speech and Signal Processing, IEEE Trans.on 24.5 (1976): 380-391. [4] F. Barbaresco. Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet Target Space: Processes generated by LDSs of size and order . From processes to spectra: Identify Gaussian process with its PSD matrix . Parameterization of this space is difficult. Why? ‘s dependence on is highly nonlinear. A Panoramic View: Various Ambient Spaces Any distance in an ambient space induces an extrinsic distance. Infinite vs. finite dimensional ambient spaces. Most available distances: For the infinite dimensional ambient spaces. Specializing to smaller spaces possible but practically difficult [1,2]. We will try to by-pass this difficulty by directly comparing realizations! Power spectral density is a p x p matrix. Recall: under Gaussianity We will have a small ambient space! [1] S. I. Amari. Differential geometry of a parametric family of invertible linear systems-Riemannian metric, …. Mathematical Systems Theory, 20:53–82, 1987. [2] N. Ravishanker, E. L. Melnick, and C.-L. Tsai. Differential geometry of ARMA models. Journal of Time Series Analysis, 11(3):259–274, 1990. e.g., the Itakura-Saito divergence and its variants Some existing approaches Control theory literature: Our target space of fixed order and fixed size LDSs is an important space in control theory. Its topology has been studied e.g., [3,4]. Riemannian distances have been proposed [1,4]. Computationally very demanding especially in high dimensions. [[1] B. Hanzon. Identifiability, Recursive Identification and Spaces of Linear Dynamical Systems, volume CWI Tracts 63 and 64. Amsterdam, 1989. [2] M. Hazewinkel. Moduli and canonical forms for linear dynamical systems II: the topological case. Mathematical Systems Theory, 10:363–385, 1977. [3] M. Hazewinkel, and R. E. Kalman. On invariants, canonical forms and moduli for linear, constant, finite dimensional, dynamical systems. Springer Berlin Heidelberg, 1976. [4] P. S. Krishnaprasad. Geometry of Minimal Systems and the Identification Problem. PhD thesis, Harvard University, 1977. How to Go from Spectra to realizations (A,B,C)? Internal & input symmetries: Group acts on realizations as and generate the same output process. When is the converse true? Under minimum phase and certain extra rank conditions (i.e., on some submanifolds of realization space). For example: ( is a tall, full-rank matrix). (A,B) is controllable i.e., . (A,C) is observable i.e. Group of n x n non-singular matrices Group of m x m orthogonal matrices m-dim. unit variance Gaussian noise A rank condition but in terms of a complex variable (frequency) Well known in control theory The realization-LDS space pair form a principal fiber bundle with structure group . Road map: Instead of comparing PSDs we compare realizations considering the group action. LDS Space as Base Space of A Principal Fiber Bundle A principal fiber bundle: Under these rank conditions the action is free and proper. LDS space is a smooth quotient manifold: [1] J. M. Lee. Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer, 2002. Follows from a theorem in diff. geometry [1]. Locally looks like a product space but not globally! The Alignment Distance: A Group Action Induced Distance The Alignment distance: Let a -invariant distance on the realization space be given. Slide one realization along its fiber till it’s aligned with another one, i.e., solve: [1] L. Younes. Shapes and Diffeomorphisms, volume 171 of Applied Mathematical Sciences. Springer, 2010. Difficulty: Since is non-compact constructing such a is difficult. A true distance [1]. Computational advantage: Many (extrinsic) -invariant distances are available. For example: An alignment distance: A similar notion of standardization is used in Kendall’s shape analysis theory [2]. Standardize then Align the Realizations Reduction of structure group: A standardized (orthogonal) subbundle and the maximal compact subgroup acting on it: [1] S. Kobayashi and K. Nomizu. Foundations of Differential Geometry Volume I. Wiley Classics Library Edition. John Wiley & Sons, 1963. [2] D. G. Kendall, D. Barden, T. K. Carne, and H. Le. Shape and Shape Theory. Wiley Series In Probability And Statistics. John Wiley & Sons, 1999. Consequential in quantum gauge theory ! Throw out the non-compact part of the structure group safely! Basic fact from diff. geometry of fiber bundles [1]. Proof is essentially based on the Gram-Schmidt orthogonalization! No Canonical reduction! Depending on application we might prefer one. An Example: Tall and full rank LDSs of order and size : In video analysis and generalized dynamic factor models [1,3,4]. Standardize via SVD of C The simplest distance ( ) : A fast algorithm available [2]. For other LDS spaces: Other standardizations possible via methods known as balancing.[1] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003. [2] N. D. Jimenez et. al. Fast Jacobi-type algorithm for computing distances between linear dynamical systems. In ECC, 2013. [3] B. Afsari, et. al. Group action induced distances for averaging and clustering linear dynamical systems […]. In IEEE CVPR, 2012. [4] M. Deistler et. al.. Generalized linear dynamic factor models: An approach via singular autoregressions. EJC, 3:211–224, 2010. Appears naturally as the output of a fast systems Identification algorithm [1]. C belongs to a Stieffel manifold . The Alignment Distance: Pros & Cons! Not an intrinsic distance on our target space: But does not come from an infinite dimensional ambient space. In some instances can preserve system order in averaging naturally. Optimization on orthogonal group: Non-convex and local minimizers are possible. But even for intrinsic Riemannian distances not every solution to geodesic equation is length minimizing. Instead of a (large) set of ODEs we solve a static optimization. More Examples of Alignment Distances The basic definition can be extended in various way. For example: Getting rid of : Norms/or distances other than the Frobenius norm: E.g., nuclear norm or 1-norm. Distances on other spaces (e.g., Stieffel manifold for ): Define distances which are insensitive to scaling: Consider suitable realization submanifolds e.g., Align and Average Algorithm Consider tall and full rank LDSs: Take realizations and : Coordinate descent: alternate alignment finding average: Decouples to Alignment, Euclidean averaging & projection to Stieffel manifold (because of ): Align realization with by finding . Euclidean average and orthonormalize . Iterate over . Iterates (LDSs) are almost surely tall full rank and minimal phase, however stability of the average LDS is not guaranteed. [1]B. Afsari, R. Chaudhry, A. Ravichandran, and R. Vidal. Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic visual scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. [2] M. Deistler, B. O. Anderson, A. Filler, C. Zinner, and W. Chen. Generalized linear dynamic factor models: An approach via singular autoregressions. European Journal of Control, 3:211–224, 2010. Example: Clustering of Human Actions Via K-means Algorithm on Space of Tall Full Rank LDSs We have 55 videos: Model with LDSs (m=n=5,p=13542): Four classes: Running to the left/right. Walking to the left/right. [1]B. Afsari, R. Chaudhry, A. Ravichandran, and R. Vidal. Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic visual scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. K-means clustering algorithm: Only number of clusters known. Align & Average algorithm used in the K-means algorithm to find the center of clusters. The four clusters are recovered. Conclusions More Information, • Vision Lab @ Johns Hopkins University • http://www.vision.jhu.edu •Thank You! • ONR N00014-09-10084, NSF #0941362, NSF #0941463, NSF 0931805, NSF #1335035
Integral Geometry of Linearly Combined Gaussian and Student t, and Skew t Random Fields Yann Gavet, Ola Ahmad and Jean-Charles Pinoli École Nationale Supérieure des Mines de Saint-Etienne, LGF 5307, France ahmad@emse.fr, gavet@emse.fr, pinoli@emse.fr GSI2013 - Geometric Science of Information, Paris August 28-30 2013 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work General stochastic problem: Y = Data + error e.g., Data is a matrix of unknown random variables of N dimensions represented on a group of voxels, (2D or 3D images). How can they be approximated or represented? Non-parametric methods "often numeric" have no reference of probability. Parametric methods: e.g.; Random ﬁelds theory priori knowledge of Y based on the measurements. few signiﬁcant parameters & geometric information that control and interpret some physical problems. 4 / 32 General stochastic problem: Y = Data + error e.g., Data is a matrix of unknown random variables of N dimensions represented on a group of voxels, (2D or 3D images). How can they be approximated or represented? Non-parametric methods "often numeric" have no reference of probability. Parametric methods: e.g.; Random ﬁelds theory priori knowledge of Y based on the measurements. few signiﬁcant parameters & geometric information that control and interpret some physical problems. 4 / 32 Application example: Total hip implant 5 / 32 Statistical analysis via stochastic modelling Real phenomenon: biology, physics, mechanics, ... Stochastic represen- tation of problem Experimental observa- tions & measurements Geometric features (MF or LKCs,...) cal- culated from the model Emprical features Parameters estimation. Validity testning of model. Decision & analysis of phenomena. 6 / 32 Why Gaussian random ﬁelds? Completely characterized by their ﬁrst and second order moments, mean and covariance function. Smooth and twice-differentiable Why not Gaussian random ﬁeld ? Real observations are often not Gaussian 7 / 32 Why Gaussian random ﬁelds? Completely characterized by their ﬁrst and second order moments, mean and covariance function. Smooth and twice-differentiable Why not Gaussian random ﬁeld ? Real observations are often not Gaussian 7 / 32 Example: Worn engineered surface Rough Skewed Heavy-tailed distributions Need to go beyond the Gaussian 8 / 32 Beyond Gaussian random ﬁelds Related Gaussian RFs F : Rk R, f(x) = F(g(x)) g1, ..., gk are i.i.d Gaussian RFs. Examples: χ2 , F, t RFs Mixed random ﬁelds : High ﬂexibility f(x) = β1Z(x) + β2G(x), G, Z are independent random ﬁelds. 9 / 32 Beyond Gaussian random ﬁelds Related Gaussian RFs F : Rk R, f(x) = F(g(x)) g1, ..., gk are i.i.d Gaussian RFs. Examples: χ2 , F, t RFs Mixed random ﬁelds : High ﬂexibility f(x) = β1Z(x) + β2G(x), G, Z are independent random ﬁelds. 9 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 10 / 32 Random Fields A random ﬁeld Y(x) : x S indexed by some space S, (e.g., S RN ), satisﬁes that any arbitrary p collection, (Y(x1), ..., Y(xp)) follows a multivariate probability density function with (p p) covariance matrix ΩY 11 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 12 / 32 Excursion sets Excursion sets A random set over a level h of Y: Eh = x S : Y(x) h Example: thresholding the surface at some height level 13 / 32 Excursion sets Excursion sets A random set over a level h of Y: Eh = x S : Y(x) h Example: thresholding the surface at some height level 13 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry j(S) : N(S) = σ−N S det(Λ(x)) 1=2 dx N−1(S) = 1 2 σ−(N−1) @S det(Λ@S(x)) 1=2 N−1(dx) ρj(.) : Morse theory: ρj(h) = E ˙Y+ (j)det( ¨Y|j−1) ˙Y|j−1 = 0, Y = h p ˙Y|j−1 (0; h) 15 / 32 Integral geometry j(S) : N(S) = σ−N S det(Λ(x)) 1=2 dx N−1(S) = 1 2 σ−(N−1) @S det(Λ@S(x)) 1=2 N−1(dx) ρj(.) : Morse theory: ρj(h) = E ˙Y+ (j)det( ¨Y|j−1) ˙Y|j−1 = 0, Y = h p ˙Y|j−1 (0; h) 15 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 16 / 32 Gaussian t random ﬁeld Deﬁnition Y(x) = G(x) + βT (x), β > 0, x S G: is a stationary Gaussian random ﬁeld. T : is a homogeneous student’s t random ﬁeld with ν degrees of freedom. Linear transformed pdf at each ﬁxed point x of both normal and t pdfs: pY (y) = Γ +1 2 2πβΓ 2 2 1=2 ∞ −∞ 1 + (y u)2 β2ν −ν+1 2 e−u2 2 du 17 / 32 EC densities of Gaussian t random ﬁeld Theorem [Ahmad and Pinoli(2013a)] The EC densities, ρj(.) of a two-dimensional real-valued Gaussian t random ﬁeld with ν 2 degrees of freedom, and β > 0 are given, at level h, by: where ΛG = λGI2, and Λ = λI2 is the second spectral moments matrix of G, and Λ = λI2 is associated with T 18 / 32 Simulation example Simulated and analytical Minkowski functionals for the Gaussian−t random ﬁeld of 5 degrees of freedom and β = 0.2. 19 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 20 / 32 Application to surface characterization Machined surface observed from Polyethylene material [Ahmad and Pinoli(2012)]: Fitting the empirical and analytical intrinsic volumes of the real surface and the Gaussian−t random ﬁeld of 5 degrees of freedom and β = 1.2 21 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 22 / 32 Skew t Random ﬁeld Deﬁnition G0(x), G1(x), ..., Gk (x), (x S), i.i.d stationary centred Gaussian random ﬁelds, with (N N) spectral moment matrix Λ. Z Normal(0, 1) is independent of G0, G1, ..., Gk : Y(x) = δ Z + 1 δ2G0(x) k i=1 G2 i (x)/k 1=2 , , δ2 < 1 (1) deﬁnes a skew t RF with k degrees of freedom, and skewness index α = δ/ 1 δ2 23 / 32 Example: two-dimensional skew t RFs 24 / 32 EC densities Theorem [Ahmad and Pinoli(2013c)] The EC densities, ρj(.) of a two-dimensional real-valued skew t random ﬁeld with k degrees of freedom and skewness parameter α R, are given by: 25 / 32 Simulation example Simulated and analytical Minkowski functionals for the skew−t random ﬁeld of 5 degrees of freedom and skewness index α = 0.7. 26 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random ﬁelds Linear mixtures random ﬁelds Gaussian t random ﬁeld Application Skew random ﬁelds Skew t random ﬁeld Application Conclusions and Future work 27 / 32 Application to worn engineered surfaces Worn engineered surface observed from Polyethylene material [Ahmad and Pinoli(2013b)]: Fitting the empirical and analytical intrinsic volumes of the real surface and the skew−t random ﬁeld of 6 degrees of freedom and δ = 0.5 28 / 32 Application to worn engineered surfaces Worn engineered surface observed from Polyethylene material [Ahmad and Pinoli(2013b)]: 28 / 32 Conclusion Random ﬁelds are computationally feasible, voxel-based, probabilistic models that can be used to approximate and represent some physical problems. Integral geometry provides interesting geometric information of the excursion sets, called intrinsic volumes. These geometric characteristics can be calculated analytically to ﬁt the real measurements with some probabilistic model, and to estimate its parameters. Skew t random ﬁeld is an appropriate model for statistical representation of worn engineered surfaces. 29 / 32 Future Work Using skew t random ﬁeld for statistical analysis of surface roughness evolution. Space-scale random ﬁelds for multi-scale characterization. Space-time random ﬁelds for prediction of future behaviour, and for estimation of roughness evolution of rough engineered surfaces. Opened question: Intrinsic volumes of probabilistic models of non explicit or closed analytical form. 30 / 32 References: Ola Ahmad and Jean-Charles Pinoli. On the linear combination of the gaussian and student’s t random ﬁeld and the integral geometry of its excursion sets. Statistics & Probability Letters, 83(2):559 – 567, 2013a. ISSN 0167-7152. doi: 10.1016/j.spl.2012.10.022. Ola Suleiman Ahmad and Jean-Charles Pinoli. On the linear combination of the gaussian and student-t random ﬁelds and the geometry of its excursion sets. In Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress on Engineering and Computer Science 2012, WCECS 2012, 24-26 October, San Francisco, USA, pages 1–5, 2012. Ola Suleiman Ahmad and Jean-Charles Pinoli. Lipschitz-killing curvatures of the excursion sets of skew student-t random ﬁelds. In 2nd Annual International Conference on Computational Mathematics, Computational Geometry & Statistics, volume 1, Feb 2013b. doi: 10.5176/2251-1911_CMCGS13.05. Ola Suleiman Ahmad and Jean-Charles Pinoli. Lipschitz-killing curvatures of the excursion sets of skew student’ s t random ﬁelds. Stochastic Models, 29(2):273–289, 2013c. ISSN 1532-6349. doi: 10.1080/15326349.2013.783290. 31 / 32 Thank You for Your Attention ahmad@emse.fr, gavet@emse.fr, pinoli@emse.fr
Nonlinear
Modeling
and
Processing
Using
Empirical
Intrinsic
Geometry
with
Application
to
Biomedical
Imaging
Ronen
Talmon1,
Yoel
Shkolnisky2,
and
Ronald
Coifman1
1Mathematics
Department,
Yale
University
2Applied
Mathematics
Department,
Tel
Aviv
University
Geometric
Science
of
Information
(GSI
2013)
August
28-‐30,
2013,
Paris
Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
2
• Example
for
Intrinsic
Modeling
I
" Molecular
Dynamics
" Consider
a
molecule
oscilla)ng
stochas)cally
in
water
– For
example,
Alanine
Dipep)de
" Due
to
the
coherent
structure
of
molecular
mo)on,
we
assume
that
the
conﬁgura)on
at
any
given
)me
is
essen)ally
described
by
a
small
number
of
structural
variables
–
In
the
Alanine
case,
we
will
discover
two
factors,
corresponding
to
the
dihedral
angles
1 8 7 4 3 9 5 6 2 10 Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
3
• Example
for
Intrinsic
Modeling
I
" We
observe
three
atoms
of
the
molecule
for
a
certain
period,
three
other
atoms
for
a
second
period,
and
the
rest
in
the
last
period
" The
task
is
to
describe
the
posi)ons
of
all
atoms
at
all
)mes
– More
precisely,
derive
intrinsic
variables
that
correspond
to
the
dihedral
angles
and
describe
their
rela)on
to
the
posi)ons
of
all
atoms
– We
always
derive
the
same
intrinsic
variables
(angles)
from
par)al
observa)ons
(independently
of
the
speciﬁc
atoms
we
observe)
– If
we
learn
the
model,
we
can
describe
the
posi)ons
of
all
atoms
1 8 7 4 3 9 5 6 2 10 Introduc)on
Talmon,
Shkolnisky,
and
Coifman
4
• Example
for
Intrinsic
Modeling
II
" PredicBng
EpilepBc
Seizures
" Goal:
to
warn
the
pa)ent
prior
to
the
seizure
(when
medica)on
or
surgery
are
not
viable)
" Data:
intracranial
EEG
recordings
8/28/13
0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
Talmon,
Shkolnisky,
and
Coifman
5
• Example
for
Intrinsic
Modeling
II
" Our
assump)on:
the
measurements
are
controlled
by
underlying
processes
that
represent
the
brain
ac)vity
" Main
Idea:
predict
seizures
based
on
the
“brain
ac)vity
processes”
" Challenges:
Noisy
data,
unknown
model,
and
no
available
examples
8/28/13
0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
6
• Manifold
Learning
" Represent
the
data
as
points
in
a
high
dimensional
space
" The
points
lie
on
a
low
dimensional
structure
(manifold)
that
is
governed
by
latent
factors
" For
example,
atom
trajectories
and
the
dihedral
angles
Introduc)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
7
• Manifold
Learning
" Tradi)onal
manifold
learning
techniques:
– Laplacian
eigenmaps
[Belkin
&
Niyogi,
03’]
– Diﬀusion
maps
[Coifman
&
Lafon,
05’;
Singer
&
Coifman,
08’]
Manifold
Learning
Parameteriza)on
of
the
manifold
Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
8
• Formula)on
–
“State”
Space
" Dynamical
model:
let
be
a
-‐dimensional
underlying
process
(the
state)
in
)me
index
that
evolves
according
to
where
are
unknown
dri`
coeﬃcients
and
are
independent
white
noises
" Measurement
modality:
let
be
an
-‐dimensional
measured
signal,
given
by
–
is
the
clean
observa)on
component
drawn
from
the
)me-‐varying
pdf
–
is
a
corrup)ng
noise
(independent
of
)
–
is
an
arbitrary
measurement
func)on
" The
goal:
recover
and
track
given
zt = g(yt, vt) f(y; ✓) d✓i t = ai (✓t)dt + dwi t, i = 1, . . . , d Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
9
• Manifold
Learning
for
Time
Series
" The
general
outline:
– Construct
an
aﬃnity
matrix
(kernel)
between
the
measurements
,
e.g.,
– Normalize
the
kernel
to
obtain
a
Laplace
operator
[Chung,
97’]
– The
spectral
decomposi)on
(eigenvectors)
represents
the
underlying
factors
Manifold
Learning
k(zt, zs) = exp ⇢ kzt zsk2 " 'i 2 RN $ ✓i t 2 RN Measurement
Modality
N N N N N Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
10
• Intrinsic
Modeling
" The
mapping
between
the
observable
data
and
the
underlying
processes
is
o`en
stochas)c
and
contains
measurement
noise
– Repeated
observa)ons
of
the
same
phenomenon
usually
yield
diﬀerent
measurement
realiza)ons
– The
measurements
may
be
performed
using
diﬀerent
instruments/sensors
" Each
set
of
related
measurements
of
the
same
phenomenon
will
have
a
diﬀerent
geometric
structure
– Depending
on
the
instrument
and
the
speciﬁc
realiza)on
– Poses
a
problem
for
standard
manifold
learning
methods
Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
11
• Intrinsic
Modeling
Intrinsic(Embedding( Observable(Domain(II(Observable(Domain(I( Par6al(Observa6on(I7A( Par6al(Observa6on(I7B( Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
12
• How
to
Obtain
an
Intrinsic
Model?
" Q:
Does
the
Euclidean
distance
between
the
measurements
convey
the
informa)on?
Realiza)ons
of
a
random
process
and
measurement
noise
" A:
We
propose
a
new
paradigm
-‐
Empirical
Intrinsic
Geometry
(EIG)
[Talmon
&
Coifman,
PNAS,
13’]
– Find
a
proper
high
dimensional
representa)on
– Find
an
intrinsic
distance
measure:
robust
to
measurement
noise
and
modality
k(zt, zs) = exp ⇢ kzt zsk2 " Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
13
• Geometric
Interpreta)on
" Exploit
perturba)ons
to
explore
and
learn
the
tangent
plane
" Compare
the
points
based
on
the
principal
direc)ons
of
the
tangent
planes
(“local
PCA”)
Underlying*Process* Measurement*1* Measurement*2* Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
14
• The
Mahalanobis
Distance
" We
view
the
local
histograms
as
feature
vectors
for
each
measurement
" For
each
feature
vector,
we
compute
the
local
covariance
matrix
in
a
temporal
neighborhood
of
length
where
is
the
local
mean
" Deﬁne
a
symmetric
-‐dependent
distance
between
feature
vectors
Deﬁni)on
–
Mahalanobis
Distance
zt ! ht L C d2 C(zt, zs) = 1 2 (ht hs)T (C 1 t + C 1 s )(ht hs) Ct = 1 L tX s=t L+1 (hs µt)(hs µt)T Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
15
• Results
" Each
histogram
bin
can
be
expressed
as
where
are
the
histogram
bins
" By
relying
on
the
independence
of
the
processes:
Assump)on
" The
histograms
are
linear
transforma)ons
of
the
pdf
p(z; ✓) = Z g(y,v)=z f(y; ✓)q(v)dydv Lemma
" In
the
histograms
domain,
any
sta)onary
noise
is
a
linear
transforma)on
p(z; ✓) Hj hj t = Z z2Hj p(z; ✓)dz Empirical
Intrinsic
Geometry
Assump)on
8/28/13
Talmon,
Shkolnisky,
and
Coifman
16
• Results
The
Mahalanobis
distance:
" Is
invariant
under
linear
transforma)ons,
thus
by
lemma,
noise
resilient
" Approximates
the
Euclidean
distance
between
samples
of
the
underlying
process,
i.e.,
" The
process
can
be
described
as
a
(possibly
nonlinear)
bi-‐Lipschitz
func)on
of
the
underlying
process
" We
rely
on
a
ﬁrst
order
approxima)on
of
the
measurement
func)on:
where
is
the
Jacobian,
deﬁned
as
k✓t ✓sk2 = d2 C(zt, zs) + O(kht hsk4 ) Theorem
[Talmon
&
Coifman,
PNAS,
13’]
ht Jt ht = JT t ✓t + ✏t Jji t = @hj @✓i Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
17
• Rela)onship
to
Informa)on
Geometry
Q:
Does
the
structure
of
the
measurements
convey
the
informa)on?
A:
The
local
densi)es
of
the
measurements
do
and
not
par)cular
realiza)ons
" Informa)on
Geometry
[Amari
&
Nagaoka,
00’]:
– Use
the
Kullback-‐Liebler
divergence
approximated
by
the
Fisher
metric
where
is
the
Fisher
InformaBon
matrix
" EIG:
a
similar
data-‐driven
metric:
consider
the
following
features
It D(p(zt; ✓)||p(zt0 ; ✓)) = ✓T t It ✓t lj t = ↵j log ⇣ hj t ⌘ Theorem
"
(underlying
manifold
dimensionality)
"
(feature
vectors
dimensionality)
It = JT t Jt Ct = JtJT t Empirical
Intrinsic
Geometry
8/28/13
Talmon,
Shkolnisky,
and
Coifman
18
• Anisotropic
Kernel
" Let
be
a
set
of
measurements
– For
each
measurement,
we
compute
the
local
histogram
and
covariance
" Construct
an
symmetric
aﬃnity
matrix
– Approximates
the
Euclidean
distances
between
the
underlying
process
– Invariant
to
the
measurement
modality
and
resilient
to
noise
" The
corresponding
Laplace
operator
can
recover
the
underlying
process
" Compute