GSI2013 - Geometric Science of Information

A propos

The objective of this SEE Conference hosted by MINES ParisTech is to bring together pure/applied mathematicians and engineers, with common interest for Geometric tools and their applications for Information analysis.

 It emphasises an active participation of young researchers for deliberating emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.

Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for the industry.

The Conference will be therefore being held in areas of priority/focused themes and topics of mutual interest with a mandate to:

  • Provide an overview on the most recent state-of-the-art
  • Exchange of mathematical information/knowledge/expertise in the area
  • Identification of research areas/applications for future collaboration
  • Identification of academic & industry labs expertise for further collaboration

This conference will be an interdisciplinary event and will federate skills from Geometry, Probability and Information Theory to address the following topics among others. The conference proceedings, are published in Springer's Lecture Notes in Computer Science (LNCS) series.

- Computational Information Geometry
- Hessian/Symplectic Information Geometry
- Optimization on Matrix Manifolds
- Probability on Manifolds
- Optimal Transport Geometry
- Divergence Geometry & Ancillarity
- Machine/Manifold/Topology Learning
- Tensor-Valued Mathematical Morphology
- Differential Geometry in Signal Processing
- Geometry of Audio Processing
- Geometry for Inverse Problems
- Shape Spaces: Geometry and Statistic
- Geometry of Shape Variability
- Relational Metric
- Discrete Metric Spaces

Comités

Comité d'organisation

Program chairs

Scientific committee

Sponsors et organisateurs

Documents

XLS

OPENING SESSION ()

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo

Geometric Science of Information GSI’13 Frédéric BARBARESCO GSI’13 General Chair President of SEE SI2D Club (Signal, Image, Information & Decision) Société de l'électricité, de l'électronique et des technologies de l'information et de la communication SEE at a glance • Meeting place for science, industry and society • An officialy recognised non-profit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 6 Technical Commissions and 12 Regional Groups • Organizes conferences and seminars • Initiates and attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, Blanc-Lapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 2 periodical publications (REE, 3E.I) • Publishes 3 monographs each year • Present the Web: http://www.see.asso.fr and LinkedIn SEE group • Past SEE Presidents: Louis de Broglie, Paul Langevin, … 1883-2013: From SIE & SFE to SEE: 130 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 http://www.see.asso.fr/ GSI’13 Geometric Science of Information GSI’13 Sponsors SMF/SEE GSI’13 • >150 international attendees • 100 scientific presentations on 3 days • 3 keynote speakers • Yann OLLIVIER (Paris-Sud Univ.): “Information geometric optimization: The interest of information theory for discrete and continuous optimization” • Hirohiko SHIMA (Yamaguchi Univ.): “Geometry of Hessian Structures” dedicated to Prof. J.L. KOSZUL • Giovanni PISTONE (Collegio Carlo Alberto): “Nonparametric Information Geometry” • 1 Guest speaker • Shun-ichi Amari (RIKEN Brain Science Institute): “Information Geometry and Its Applications: Survey” • 4 social events • Welcome coktail at Ecole des Mines • Visit and Concert at IRCAM • Diner at Eiffel Tower • Visit of Minearology Museum at Ecole des Mines GSI’13: Dedicated to Jean-Louis KOSZUL WORK • Hessian Geometry and J.L. Koszul Works – Hirohiko Shima Book, « Geometry of Hessian Structures », world Scientific Publishing 2007, dedicated to Jean-Louis Koszul – Hirohiko Shima Keynote Talk at GSI’13 – Plenary Session chaired by Prof. M. Boyom on Hessian Information Geometry Jean-Louis Koszul J.L. Koszul, « Sur la forme hermitienne canonique des espaces homogènes complexes », Canad. J. Math. 7, pp. 562-576., 1955 J.L. Koszul, « Domaines bornées homogènes et orbites de groupes de transformations affines », Bull. Soc. Math. France 89, pp. 515-533., 1961 J.L. Koszul, « Ouverts convexes homogènes des espaces affines », Math. Z. 79, pp. 254-259., 1962 J.L. Koszul, « Variétés localement plates et convexité », Osaka J. Maht. 2, pp. 285-290., 1965 J.L. Koszul, « Déformations des variétés localement plates », .Ann Inst Fourier, 18 , 103-114., 1968 GSI’13 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 8085 (879 pages), ISBN 978-3-642-40019-3 • http://www.springer.com/computer/image+processing/boo k/978-3-642-40019-3 GSI’13 Topics • GSI’13 federates skills from Geometry, Probability and Information Theory: • shape spaces (geometric statistics on manifolds and Lie groups, deformations in shape space,…), • probability/optimization & algorithms on manifolds (structured matrix manifold, structured data/Information, …), • relational and discrete metric spaces (graph metrics, distance geometry, relational analysis,…), • computational and hessian information geometry, • algebraic/infinite dimensionnal/Banach information manifolds, • divergence geometry, • tensor-valued morphology, • optimal transport theory, • manifold & topology learning, … and applications (audio-processing, inverse problems and signal processing) GSI’13 Program Keynote/Guest Speakers Talks & Plenary Session in L108 Poincaré Amphi Cocktail and Guest Speaker Talk : 18h15 – 19h15 (Closure at 19h30) 08h30‐09h00 09h00‐10h00 10h00‐10h30 10h30‐12h35 12h35‐13h30 SCILAB “GSI”  TOOLBOX Initiative (Amphi V107) Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B 13h30‐15h35 Relational Metric (chairman: Jean‐ François  Marcotorchino) Algebraic/Infinite  dimensionnal/Banach  Information Manifolds (Chairman: Giovanni  Pistone) Computational  Information Geometry (chairman: Frank  Nielsen) Hessian Information  Geometry II (Chairman: Frédéric  Barbaresco) Tensor‐Valued  Mathematical  Morphology (Chairman: Jesus  Angulo) Geometry of Inverse  Problems (Chairman: Ali  Mohammad‐Djafari) Geometric Statistics  on manifolds and Lie  groups (Chairman: Xavier  Pennec) Machine/Manifold/ Topology Learning (Chairmen: Michael  Aupetit & Frédéric  Chazal) Differential Geometry  in Signal Processing (Chairman: Michel  Berthier) 15h35‐16h05 Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B Amphi V107 Amphi V106A Amphi V106B 16h05‐18h10 Discrete Metric Spaces (chairmen: Michel  Deza & Michel  Petitjean) Optimal Transport  Theory (Chairmen: Gabiel  Peyré & Bertrand  Maury) Geometry of Audio  Processing (Chairmen: Arshia  Cont & Arnaud  Dessein) Optimization on  Matrix Manifolds (Chairman: Silvere  Bonnabel) Divergence Geometry  & Ancillarity (Chairman: Michel  Broniatowski)  Information  Geometry Manifolds (Chairman: Hichem  Snoussi) Entropic Geometry (Chairman: Roger  Balian) Algorithms on  Manifolds (Chairman: Olivier  Schwander) Computational  Aspects of Inform.  Geometry in Statistics  (Chairman: Frank  Critchley) 18h15‐19h15 20h30‐22h30 Mineralogy  Museum  Visit  Mineralogy Museum Visit  Friday 30th of AugustWednesday 28th of August Thursday 29th of August Coffee Break / Poster session Opening Session + Keynote Speaker 1: Yann OLLIVIER Information‐Geometric Optimization: the Interest of Information  Theory for Discrete and Continuous Optimization Amphi L108 Poincaré Welcome /Registration Welcome Welcome Keynote Speaker 2: Hirohiko SHIMA Geometry of Hessian Structures (dedicated to Prof. J.L. KOSZUL) Keynote Speaker 3: Giovanni PISTONE Nonparametric Information Geometry Amphi L108 Poincaré Amphi L108 Poincaré Concert at IRCAM Ecole des Mines, Terrasse of Hôtel de Vendôme Amphi L108 Poincaré IRCAM Eiffel Tower (1rst Floor) Coffee Break Plenary session: Hessian Information Geometry I (Chairman: Michel Boyom) Lunch Break at Ecole des Mines + Poster session (Chairman: Frédéric Barbaresco) Amphi L108 Poincaré End of GSI'13 Coffee Break GSI 2013 GALA DINNER RESTAURANT 58 TOUR EIFFEL, 1st FLOOR Lunch Break at Ecole des Mines Cocktail at Ecole des Mines Guest Speaker: Shun‐Ichi AMARI Information Geometry and Its Applications:  Survey Closing Session Coffee Break Plenary session: Deformations in Shape Space (Chairman: Alain Trouvé) Coffee BreakCoffee Break / Poster session Lunch Break at Ecole des Mines Plenary session: Probability on Manifolds (Chairman: Marc Arnaudon) SCILAB GSI TOOLBOX • Contributing to “Geometric Science of Information” development, project of SCILAB “GSI” TOOLBOX is initiated, inviting contributors to write external modules that extend Scilab capabilities in specific fields of GSI (Information Geometry, Geometry of Structured Matrices, Statistics/optimization on Manifolds, …). These modules provide new features and documentation to Scilab users. A new website called "ATOMS Portal" has been released that host all external modules developed by external developers. These modules can be made available to Scilab users directly from Scilab console via a new feature named ATOMS (AuTomatic mOdules Management for Scilab), if the module author wishes it. • http://wiki.scilab.org/ATOMS • In parallel, external modules sources can now be managed through the new Scilab Forge. • http://forge.scilab.org/index.php/projects/ GSI’13 Coktail • AT ECOLE DES MINES • ON TERRASSE OF HÔTEL DE VENDÔME (exit of Lunch Break area) • Wednesday 28th of August, 18h15 – 19h15 • School is closed at 19h30 ! IRCAM Visit & Concert • AT IRCAM, 1 Place Igor-Stravinsky, 75004 Paris • METRO/RER: Châtelet-Les-Halles • Wednesday 28th of August • 19h30 – 20h00: 1rst Group (20 pers.) of IRCAM labs visit • 20h00 – 20h30: 2rst Group (20 pers.) of IRCAM labs visit • 20h30 – 21h30: Presentation & Demo/Concert of (Automatic Improvisation System with saxophonist player based on Information Geometry) (70 persons max) IRCAM Visit & Concert IRCAM LAB TOUR • We are glad to invite GSI participants for a lab tour of IRCAM. Ircam is a unique research center in the heart of Paris bringing artists and scientists together to foster research and creativity. Besides being a joint research venture between French Ministry of Culture, the CNRS, INRIA and Parisian Universities, it is also home to artists dealing with technological innovation in their works. • IRCAM has accompanied our efforts in Geometric Science of Information for some time now. IRCAM DEMO/CONCERT • The visit will be followed by a live demonstration of automatic improvisation, featuring Jazz saxophone and computer! The Lab Tour will be done in two separate sessions and for limited number of people at each session. • Participants are kindly asked to REGISTER at the registration desk for the visit and the demonstration. GSI’13 Gala Diner • AT Eiffel Tower, Champ de Mars, 5 Avenue Anatole France • First Floor of Eiffel Tower: 58 Tour Eiffel Restaurant • Thursday 29th of August , 20h30 – 22h30 • METRO Ticket & First Floor Lift Ticket will be given at GSI’13 Registration Desk Thursday 29th of August afternoon Visit of The Mineralogy Museum • AT Ecole des Mines • Visit by groups • Opening hours: 10h00-12h00 / 13h30-17h00 • More Information at GSI’13 Registration Desk Ecole des Mines de Paris • Special thanks to « Mathématiques et Systèmes » Department of Mines ParisTech: • Pierre Rouchon • Silvere Bonnabel • Jesus Angulo http://www.mines-paristech.eu/ Ecole des Mines de Paris • Since 1783 (230 years of sciences) GSI Topics & Ecole des Mines Students (Corps des Mines) Paul LEVY (general use of characteristic function in Probability) Henri POINCARE (Introduction of characteristic function  in Probability)  logor  eψ 1869 François MASSIEU (introduction of characteristic function in Thermodynamic: Gibbs-Duhem Potentials)  TT S /1 . 1      « je montre, dans ce mémoire, que toutes les propriétés d’un corps peuvent se déduire d’une fonction unique, que j’appelle la fonction caractéristique de ce corps» Roger BALIAN (metric for quantum states by hessian metric from Von- Neumann Entropy) EntropyS Sdds : 22  Roger Balian, 1986 DISSIPATION IN MANY-BODY SYSTEMS: A GEOMETRIC APPROACH BASED ON INFORMATION THEORY Massieu Work by Poincaré 1908 Characteristic Function by Poincaré 1912 Enjoy « Geometry » 1713-2013: 300 years birthday of Denis Diderot Diderot & d’Alembert Encyclopedy (Geometry chapter) Henri Poincaré / © Gallica Enjoy all « Geometries » 08h30‐09h00 Welcome /Registration Amphi V107 09h00‐10h00 Keynote Speaker 1: Yann OLLIVIER Information‐Geometric Optimization: the Interest of Information Theory for Discrete and Continuous  Optimization 10h00‐10h30 Coffee Break L108 Poincaré Amphi Yann Ollivier, Paris‐Sud University, France Information‐geometric optimization: The interest of information  theory for discrete and continuous optimization Biography Yann's research generally focuses on the introduction of probabilistic models on structured  objects, and  more  particularly  addresses  the  interplay  between  probability  and  differential  geometry.  He is  currently Research  scientist  at  the  CNRS,  currently  in  the  Computer  Science  department at Paris‐Sud Orsay University, previously in the Mathematics department at the  École Normale Supérieure in Lyon (2004–2010).  He graduated to his PhD in Mathematics, under the supervision of M. Gromov and P. Pansu in  2003 and is accredited to supervise research since 2009  http://www.yann‐ollivier.org/rech/index    

POSTER SESSION (Frédéric Barbaresco)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

(3) is satisfied (3) is not satisfied Fig.10. The two cases of proposition 1. Fast Polynomial Spline Approximation for Large Scattered Data Sets via L1 Minimization Laurent Gajny, Éric Nyiri, Olivier Gibaru Laurent.GAJNY@ensam.eu, Eric.NYIRI@ensam.eu, Olivier.GIBARU@ensam.eu References [Her13] F.Hernoux, R.Béarée, L.Gajny, E.Nyiri, J.Bancalin, O.Gibaru, Leap Motion pour la capture de mouvement 3D par spline L1. Application à la robotique. GTMG 2013, Marseille, 27-28 mars 2013 [Gib1899] J. Willard Gibbs, lettre à l’éditeur, Nature 59 (April 27, 1899) 606. [Lav00] J.E. Lavery, Univariate Cubic Lp splines and shape-preserving multiscale interpolation by univariate cubic L1 splines. CAGD 17 (2000) 319 – 336. [Lav00bis] J.E. Lavery, Shape-preserving, multiscale tting of univariate data by cubic L1 smoothing splines, Comput. Aided Geom. Design, 17 (2000), 715-727. [NGA11] E. Nyiri, O. Gibaru, P. Auquiert, Fast L1 kCk polynomial spline interpolation algorithm with shape preserving properties. CAGD 28(1), 2011, 65 – 74. Behavior on noisy data set Context of the study The proposed method A brief state of the art We propose to develop a fast method of approximation by polynomial spline with no Gibbs phenomenon near to abrupt changes in the shape of the data [Gib1899]. Comparison between L1 and L2 regression line Fig2. Stability of L1 regression line against outliers . Cubic spline interpolation The L1 norm is robust against outliers. Sliding window for L1 interpolation The Lp smoothing splines Aims : Using theoretical results, develop a spline approximation method : • With shape-preserving properties. L1 norm. • With prescribed error. The control. • Fast for real-time use. Sliding window process. The problem and existence theory Fig3. A cubic spline is defined by its knots and associated first derivative values. Let (xi,yi), i=1,2,…,n, be n points in the plane, the Lp regression line problem is to find a line y=a*x+b* solution of : Let qi, ui i=1,2,…,n, be respectively n points in Rd and the associated parameters, The cubic spline interpolation problem is to find a curve  such that: • (ui) = qi for all i=1,2,…,n. •  is a polynomial function of degree at most 3 on each interval [ui,ui+1], The solution set is infinite. The C2 cubic spline interpolation problem leads to a unique solution which satisfies the following minimization property : Fig4. Gibbs phenomenon with cubic spline interpolation. It is a least square method or a L2 method. The L1 cubic spline interpolation Fig5. No Gibbs phenomenon with the L1 cubic interpolation. The L1 cubic spline interpolation introduced in [Lav00] consist in mininizing the following functional : over the set of C1 cubic splines which interpolates the points qi. + No Gibbs phenomenon. - Non-linear problem with non-unique solution. A sliding window process proposed in [Nyi2011] admits a linear complexity with the number of data points. Fig6. the sliding window process. We solve a sequence of local L1 problems and we keep only the derivative value at the middle point of each window. This method enables to obtain an algebraic solution and to manage the non-unicity of solutions. Fig7. Global method (left) and local method (right). The Lp smoothing splines are obtained by minimization of : Fig8. Overshoot with L2 smoothing spline. In this method, the parameter is not easy to choose. We have no control to the initial data points. Algebraic resolution on three points Iteration 1 Iteration 3 Iteration 6 Iteration 1 Iteration 3 Iteration 6 The three-point algorithm The -controlled L1 regression line for non-unicity cases Behavior of the method on a Heaviside data set When (3) is satisfied, the solution set may be infinite. To compute a relevant solution, we extend the window with supplementary more neighbours and solve : We consider the case n = 5 and we give more importance to the three middle point. Then we choose w2 = w3 = w4 > w1 = w5 = 1. Fig.11. The -controlled L1 regression line step in the three-point algorithm. Fig.12. Approximation of a Heaviside data set (left) and zoom on the top part of the jump (right). After applying the algorithm on the initial noisy data set, the spline solution is not smooth. However we can iterate the method. At step k+1, data points are the approximation points computed at step k. Using this process, we emphasize a smoothing phenomenon while keeping jumps. We are currently studying the proposed approach for the problem of approximation of functions. Fig.13. Approximation over noisy data sets. We apply these results to the treatment of sensor data from a low-cost computer vision system, the Leap Motion (See [Her13]). This system is able to capture very accurately fingers motion. When the user’s moves are too fast, we may observe holes in the data. Fig1. Motion capture by the leap motion and industrial robot steering. Fig.9. We look for spline solutions that do not deviate to initial data more than a given tolerance.

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Target detection in non-stationary clutter background and Riemannian geometry Haiyan Fan 2013.05.21 Contents 1 Background 2 Methodology & Technology Road 3 Experiment Program & Results ContentsContents 4 Conclusions Background BackgroundBackground  The emerging of Riemannian geometry approach brings out a new era of statistical signal processing  The emerging of Riemannian geometry approach brings out a new era of statistical signal processing  Non-stationary signal detection is gradually gaining importance  Non-stationary signal detection is gradually gaining importance Many kinds of signal we meet today are non-stationary, for example  Non-Gaussian sea clutter has essential non-stationarity  Ultrasound Doppler Signals, obtained from the Physiological flows, are also non-stationary  Riemannian manifold has a more natural description of the signal structure  Barbaresco et al. has done much work in applying Riemannian metric to target detection of Radar signal Background Background  Existing methods Review The understanding of the sentence “Riemannian manifold has a more natural description of the signal structure”  Measured signals often belong to manifolds that are not vector spaces. In that case, processing the signal in flat Euclidean space is imprecise.  Riemannian manifold satisfies the invariance requirements to build some statistical tools on transformation groups and homogeneous manifolds that avoids paradoxes. Background  Existing methods Review The RG approach proposed by Barbaresco Autoregressive coefficient parameterization Riemannian metric & Riemannian distance Riemannian geodesico Riemannian median Targets detection Step 1 Step 2 Step 3 Step 4Step 5 In Step 1, a regularized Burg algorithm was used for parameterization of the signal by Barbaresco. Then the signal is mapped into a complex Riemannian manifold identified by the autoregressive coefficients. Riemannian distance and Riemannian median is derived for the manifold. The Principle of targets detection is : if a location has a good Riemannian distance from its Riemannian median, targets are supposed to appear in this location. Background  Existing methods Review Methodology & Technology Road RG+SLAR Smooth Prior long AR model (SLAR) Riemannian geometry method (RG) Based on Barbaresco’s work, we extend the Riemannian geometry method for targets detection of non-stationary signal Methodology & Technology Road Better accommodate the non-stationarity of signal Inherit the RG technical road of Barbaresco Methodology  smooth prior long AR model (SLAR) Pseudo-stationary spectral analysis for non-stationary signal in short analysis window Large model order fitted to relatively short analysis window Smoothness constraint to overcome the ill-posedness and spurious peaks brought by high order SLAR model Avoid underestim ation of m odel order as order selection criterion does Methodology  Riemannian geometry approach(RG) Observation flow Complex Riemannian manifold Autoregressive coefficient parameterization by SLAR model SLAR part ... Guard cell Cell under test Guard cell …1R iR 1iR NR Computing Riemannian median Riemannian distance Threshold Detec tion RG approach Threshold is set by empirical value Technology roadmap Experiment Program & Results Experiment Program Experiment Program One typical instance of target detection in non-stationary clutter background is the problem of radar detection in non-Gaussian sea clutter the experiment part will use target detection in the presence of non-Gaussian sea clutter to demonstrate the performance of our proposed method  Numeric experiments: simulated examples are given to validate performance of RG+SLAR method proposed in the paper, by comparing with Doppler filtering with DFT method and the RG approach with Regularized Burg algorithm(RG+ReBurg)  Real targets detection: RG+SLAR method will applied to real target detection within sea clutter with McMaster IPIX radar data. Experiment Results  Numeric experiment results The simulated Radar & targets parameters: Table 1 Radar Parameters Carrier Frequency Bandwidth Pulse repetition frequency Unambiguous Range interval Unambiguous velocity interval 10Ghz 10Mhz 10Khz 15Km 150m/s Table 2 Target Parameters Range SNR Rel_RCS velocity 2km -3dB -26.7dB 60m/s 3.8km 5dB -7.55dB 30m/s 4.4km 10dB 0dB -30m/s 4.4km 7dB -3dB -60m/s [1] SNR is the abbreviation of “Signal to Noise Ratio”. [2] Rel_RCS means relative RCS,. The relative RCS is RCS/ max (RCS) in dB. Max (RCS) is the maximum RCS of the 4 targets. Experiment Results  Numeric experiment results RG+SLAR Figure 1 the range-velocity map of clutter cancelled data obtained from SLAR modeled spectral estimation. Here, the velocity axis is linearly mapped from the frequency. , is the speed of light, is the carrier frequency. Figure 2 range-velocity map obtained from the Riemannian median of the clutter canceled data based on the reflection coefficients parameterization using SLAR model Experiment Results  Numeric experiment results RG+SLAR Table 3 Detected Target Parameters (RG+SLAR) Range Rel_RCS velocity 2km -30.8dB 61.81m/s 3.8km -12.5dB 31.39m/s 4.4km 0dB -29.89m/s 4.4km -3.38dB -62.05m/s Figure 3 Range with targets using RG+SLAR method Experiment Results  Numeric experiment results Doppler filtering method & RG+ReBurg Figure 5 the Range-velocity map of clutter cancelled data. (a) The Range-velocity map of clutter cancelled data through spectral estimation of each range bin using regularized Burg algorithm. (b) The Range-velocity contour using Doppler filtering in the slow time. Experiment Results  Numeric experiment results Doppler filtering method & RG+ReBurg Figure 6 the ambient estimation of clutter cancelled data. (a) The Range-velocity map of the Riemannian median of clutter-cancelled data parameterized by RG+ReBurg method (b) The estimation using Doppler filtering method. Experiment Results  Numeric experiment results Doppler filtering method & RG+ReBurg Figure 7 detected Range peaks (a) The Range peaks detected by RG+ReBurg method. (b) The Range peaks detected by Doppler filtering method. Experiment Results  Real targets detection The measured data we use is the file 19931118_023604_stare C0000.cdf collected by McMaster IPIX radar. Table 4 IPIX Radar Parameters Environment Value Geometry Value Radar Value Wind condition 0~60km/h Antenna azm. 170.2606º Unambig. Vel. 7.9872m/s Wind gust 90km/h Antenna elv. 359.5605º Range res. 15m Wave condition 0.8~3.8m Beam width 0.9º Carrier freq. 9,39Ghz Wave peak 5.5m Antenna gain 45.7dB PRF 1Khz [1] Unambig. Vel. is the abbreviation of “Unambiguous velocity”. [2] Range res. is the abbreviation of “Range resolution”. The average target to clutter ratio varies in the range 0-6 dB, and only one weak static target with small fluctuation is available in the range bin 8 (Primary target bin), with neighboring range bins 7-10, where the target may also be visible (Secondary target bin). Experiment Results  Real targets detection Real targets detection results Figure 8 (a) is the Range-velocity contour of pre-processed data (b) The ambient estimation of pre-processed data based on the reflection coefficients parameterization using SLAR Experiment Results  Real targets detection Real targets detection results Figure 9 Range bins with target. Primary target bin appears in range bin 8; the secondary target region spreads in 7-9 range bins. Figure 10 the velocity detection of the primary range bin 8 Conclusions Conclusions Conclusions A. Numeric and Real target detection experiments show that the proposed RG+SLAR method can attenuate the contamination brought by non-stationary clutter disturbance. B. The statistic depicting based on Riemannian geometry has higher accuracy of target detection than Doppler filtering based on DFT dose. C. The innovative idea of combing SLAR model and Riemannian geometry can achieve precise measurement of target location and velocity for non-stationary signal. Acknowledgement ! Reflection coefficients parameterization Riemannian metric Geodesic Riemannian median p 1 Step 2 Step 4 Riemannian distance ... CUT … Observation flow Complex Riemannian manifold Reflection coefficients parameterization by SLAR model 1θ iθ 1iθ Nθ Computing Riemannian median RG approach Riema dist ...... ...... guard cells guard cells threshold

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Visual Point Set Processing with Lattice Structures : Application to Parsimonious Representations of Digital Histopathology Images Nicolas Lom´enie Universit´e Paris Descartes, LIPADE, SIP Group nicolas.lomenie@parisdescartes.fr Digital tissue images are too big to be processed with traditional image processing pipelines. We resort to the nuclear architecture within the tissue to explore such big images with geometrical and topological representations based on Delaunay triangulations of seed points. Then, we relate this representation to the parsimonious paradigm. Finally, we develop specific mathematical morphology operators to analyze any point set and contribute to the exploration of these huge medical images. Preliminary results proved good performance for both focusing on areas of interest and discrimination between slightly but significantly varying nuclear geometric configurations. Keywords : Digital Histopathology ; Point Set Processing ; Mathematical Morphology Sparsity and Digital Histopathology The rationale : 1. Shape as a geometric visual point set vs. an assembly of radiometric pixels ; 2. Image Analysis/Pattern Recognition Issues over Geometric and hence Sparse repre- sentations ; 3. Versatile nature of digital high-content histopathological images : staining procedure, biopsy techniques → structural analysis. The statement : Promoting new representations for the exploration of Whole Slide Images (WSIs) by using the recently acknowledged sparsity paradigm based on geometric representations. In [Chen et al. 2001], Chen et al. relates Huo’s findings about general image analysis : ”In one experiment, Huo analyzed a digitized image and found that the humanly interpretable information was really carried by the edgelet component of the de- composition. This surprising finding shows that, in a certain sense, images are not made of wavelets, but instead, the perceptually important components of the image are carried by edgelets. This contradicts the frequent claim that wavelets are the optimal basis for image representation, which may stimulate discussion.” We propose a sparse representation of a WSI based on a codebook of representative cells that are translated over the seed points detected by a low level processing operator as illustrated below. We use a semantic sparse representation relying on the most robustly detected significant tissue elements : the nuclei. WSInuclear(x, y) = (i,j)∈S δi,j(x, y) ∗ Cell Atom where S is a geometric point set corresponding to the nucleus seeds and Cell Atom is an atomic cell element image in the specific case of a 1-cell dictionary. S can be considered as a sparse representation of a WSI according to the given definition of a s-sparse vector x ∈ ℜd as given in [Needell & Ward 2012] : ||x||0 = |supp(x)| ≤ d << s (a) (b) (c) (d) (e) (f) (g) (h) (i) Sparse representation of a WSI illustrated with a tubule/gland structure ; (a) based on the (b) 1- atomic cell dictionary and the sparse represen- tation in (c) as a point set binary matrix S1 ; (d) Reconstruction of the tubule by convolution with a point set S1 obtained with a specific seed extractor ; (e) Superimposed with the gland ; (f) Reconstruction of the tubule by convolution with a point set S2 obtained with another specific seed extractor ; (g) superimposed with the gland struc- ture ; (h)(i) Sparse representations over a 1024 × 1024 sub-image of a more complex view out of a WSI (about 50 000 × 70 000 pixels size). In the field of computational pathology, graph-based representations and geometric science of information are gaining momentum [Doyle et al. 2008]. R´ef´erences [Chen et al. 2001] Chen SS, Donoho DL, Saunders MA. (2001) Atomic Decomposition Basis Pursuit, SIAM Review, 3(1), 129-159. [Doyle et al. 2008] Doyle, S., Agner, S., Madabhushi, A., Feldman, M. and Tomaszewski, J. (2008). Automated Grading of Breast Cancer Histopathology Using Spectral Clustering with Textural and Architectural Image Features, 5th IEEE Interna- tional Symposium on Biomedical Imaging, 29 :496-499. [Needell & Ward 2012] Needell D, Ward, R. (2012) Stable image reconstruction using total variation minimization http://arxiv.org/abs/1202 Point Set Processing Point set processing in the manner of image processing is gaining momentum in the computer graphics community [Rusu & Cousins 2011] with the example of the Point Cloud Library (PCL : http://www.pointclouds.org) inspired by the GNU Image Manipulation Program (GIMP : http://www.gimp.org). At the same time, in the field of applied mathematics, a new trend consists in adapting mature image analysis algorithms working on regular grids to parsimonious representa- tions like graphs of interest points or superpixels [Ta et al. 2009]. Applying mathematical morphology to graphs was first suggested in [Heijmans et al. 1992] but never really came up with tractable applications. Nevertheless, the idea is emerging again with recent works by the mathematical morphology pioneers [Cousty et al. 2009] and was also related to the concept of α-objects in [Lom´enie & Stamon 2008] based on seminal ideas in [Lom´enie et al. 2000] and then applied to the modeling of spatial relations and histopathology in [Lom´enie & Racoceanu 2012]. Lattice Structures for Point Set Processing We refer the reader to [Lom´enie & Stamon 2011] for a detailed presentation of the ma- thematical morphology framework operating on point sets. But formally it is enough to define a lattice structure operating on unorganized point sets, or more precisely, on a tessellation of the space that embeds any point set S in a neighborhood system. For any point set S ⊂ ℜ2, it exists a Delaunay triangulation Del(S) defining the aforementioned topology of the workspace. This mesh acts as the regular grid for a radiometric image. Then we define the complete lattice algebraic structure called L = (M(Del), ≤), where M(Del) is the set of meshes defined on Del, that means the set of mappings from a triangle T in Del(S) to a φT value in ℜ that is M ∈ M(Del) = {(T, φ)}T∈Del, and where the partial ordering relation ≤ is defined as follows : ∀M1 et M2 ∈ M(Del), M1 ≤ M2 ⇐⇒ ∀T ∈ Del, φ1 T ≤ φ2 T where φT is a positive measure of the k-simplex T in Del(S) related to the size, shape, area or visibility of the triangle [Lom´enie & Racoceanu 2012].The infimum operators are defined as follows : ∀M1 et M2 ∈ M(Del), inf(M1, M2) = {T ∈ Del, min(φ1 T , φ2 T )} sup(M1, M2) = {T ∈ Del, max(φ1 T , φ2 T )} Then, given the basic definition of an erosion and an involution c operators, we inherit the infinite spectrum of theoretical well-sounded range of operators from mathematical mor- phology : ∀M ∈ M(Del), e(M) = {T ∈ Del, eT } and Mc = {T ∈ Del, 1 − φT } Left : The pyramid of structural operators we can obtain ranging from the fundamen- tal low-level erosion operator to the se- mantic high-level Ductal Carcinoma In Situ (DCIS) characterization and the represen- tation of spatial relationships like ’between’. Structural Analysis for Digital Histopathology Focusing Operators : (Top) Focusing on a tumorous area at magnification ×1 of the WSI ; (Down) Focusing on a small part of the WSI at ×20 Pattern Recognition Operators : (Above) Characterizing a DCIS structure with a structural bio-code’110’ based on our operators with a precise (Method 1) and a coarse seed nu- clei extractor (Method 2) at magnification ×40. (Below) New results on a small database. Type Nb samples Correct Biocodes Method 1 Method 2 DCIS(S) =′ 110′ 10 9 8 DCISpost(S) =′ 110′ 10 9 9 Tubule(S) =′ 101′ 10 10 10 Digital Histopathology and Geometric Information Science : great challenges to tackle in the coming decade [GE Healthcare 2012]. R´ef´erences [Cousty et al. 2009] Cousty, J., Najman, L., and Serra, J. (2009). Some morphological operators in graph spaces, Lecture Notes in Computer Science, Mathemati- cal Morphology and Its Application to Signal and Image Processing, Springer, 5720 :149-160. [GE Healthcare 2012] Pathology Innovation Centre of Excellence (PICOE). Digital Histopathology : A New Frontier in Canadian Healthcare. White Paper. Ja- nuary 2012. GE Healthcare. http://www.gehealthcare.com/canada/it/ downloads/digitalpathology/GE_PICOE_Digital_Pathology_A_New_ Frontier_in_Canadian_Healthcare.pdf . Accessed December 2012. [Heijmans et al. 1992] Heijmans, H., Nacken, P., Toet, A., & Vincent, L. (1992). Graph Morphology. Journal of Visual Communication and Image Representation, 3(1) :24-38. [Lom´enie et al. 2000] Lom´enie, N., Gallo, L., Cambou, N. & Stamon, G. (2000). Mor- phological Operations on Delaunay Triangulations. International Conference on Pattern Recognition, 556-59. [Lom´enie & Stamon 2008] Lom´enie, N. and Stamon, G. (2008). Morphological Mesh fil- tering and alpha-objects, Pattern Recognition Letters, 29(10) :1571-79. [Lom´enie & Stamon 2011] Lom´enie, N. and Stamon, G. (2011). Point Set Analysis, Ad- vances in Imaging and Electron Physics, Peter W. Hawkes, San Diego : Academic Press, vol. 167, pp. 255-294. [Lom´enie & Racoceanu 2012] Lom´enie, N. and Racoceanu, D. (2012). Point set morpho- logical filtering and semantic spatial configuration modeling : application to micro- scopic image and bio-structure analysis, Pattern Recognition, 45(8) :2894-2911. [Rusu & Cousins 2011] , Rusu, R.B. and Cousins S. (2011) 3D is here : Point Cloud Library (PCL), IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China. [Ta et al. 2009] Ta, V.T., Lezoray, O., Elmoataz, A. and Schupp, S. (2009). Graph-based Tools for Microscopic Cellular Image Segmentation, Pattern Recognition, Special Issue on Digital Image Processing and Pattern Recognition Techniques for the Detection of Cancer, 42(6) :1113-25. Acknowledgement : This work is part of the SPIRIT project, program JCJC 2011 - ref : ANR-11-JS02-008-01 and of the MICO project, program TecSan 2010 - ref : ANR- 10-TECS-015. A free demonstrator can be downloaded at http://www.math-info. univ-paris5.fr/~lomn/Data/MorphoMesh.zip as an imageJ plugin.

Guest speech (Shun-Ichi Amari)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Information Geometry and Its Applications Shun-Ichi Amari RIKEN Brain Science Institute Applications of Information Geometry Statistical Inference Machine Learning and AI Convex Programming Signal Processing (ICA) Information Theory, Systems Theory Quantum Information Geometry High Order Asymptotics 1 1 , (u) : , , ˆu u , , n n p x x x x x ˆ ˆ T e E u u u u 1 22 1 1 ( )e G G n n 1 1G G :Cramér Rao 2 2 2 2 e m m M AG H H Linear regression: Semiparametrics 1 1 2 2 , , ,n n x y x y x y ' i i i i i i x y ' 2 , 0,i i N y x x y Least squares? 2 2 mle, TLS ˆmin : 1 , 0: 0 Neyman-Scott i i i i i ii i i i i i i i i i i x y L y x x yy n x x y x y x y x y x c Semiparametric Statistical Model 1 , ; , : ; ' , , i i i i i i n p x y x y x z ( , ) , ; , , ; ,i ix y p x y z p x y z d : parameter of interest : nuisance parameter :functional degrees of freedomz orthogonalized score , , , , : mixture modelN t t t V v x z v x u u c v dt Projected score Parallel Transports of Scores 0T r x E r x 1 2 1 2: ,r r E r x r xinner product , z zz r x r x E r x e , , , , z z p x z r x r x x z m 1 2 1 2, , z z z z z r r r r e m z , ; ,p x y T Example of estimating functions 2 2 , ; : : arbitrary function 1 1 exp 2 2 { } f x y k x y y x k c x y y x k x y Z dxdyd 0, 0, 0i i i ik x y y x 1 , 1 , ' , 1 ' , i i i y x n n y x y x nn E y xn mixture and unmixture of independent signals 2x 1s ns 2smx 1x 1 n i ij j j x A s x As ICA: Independent Component Analysis Information Geometry of ICA natural gradient estimating function stability, efficiency S ={p(y)} 1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y { ( )}p Wx r q ( ) [ ( ; ) : ( )] ( ) l KL p q r W y W y y compressed sensing 1 0L Lsolution is the same as solution? 2 min 0 : -sparseX ky 0 0 1 : min : min i i L L log 2 log 2 m N k m k N m Applications to Machine Learning Stochastic reasoning: Belief propagation Boosting Support vector machine Neural networks Clustering Optimization Stochastic Reasoning ( , , , , )p x y z r s ( , , | , ) , , ,... 1, 1 p x y z r s x y z 1 1 1 1 2 exp , 1, 1 e p , x s ij L i j i i i r q r r r i i s i i q k x c c c x x r q w x x h x i i x r i i x x x x Boltzmann machine, spin glass, neural networks Turbo Codes, LDPC Codes Information Geometry 0 0 0, exp , expr r r r r r M p M p c x x x x x 1, ,r L q x rM ' r M 0M ( ) exp{ ( )rq x c x Machine Learning Boosting : combination of weak learners 1 1 2 2, , , , , ,N ND y y yx x x 1iy , : , sgn ,f y h fx u x u x u Boosting generalization 1 expt t t t tQ Q y x Q y x yh x f , | E[ const]t tF P y x yh x : min :t tD P Q 1, ,t tD P Q D P Q Neural Networks Higher order correlations Synchronous firing Multilayer Perceptron Neural Firing 1x 2x 3x nx higher order correlations orthogonal decomposition 1 2( ) ( , ,..., ): joint probabilitynp p x x xx [ ]i ir E x [ , ]ij i jv Cov x x ----firing rate ----covariance: correlation 1,0ix 1 2{ ( , ,..., )}nS p x x x Correlations of Neural Firing 1 2 00 10 01 11 1 1 10 11 2 1 01 11 , , , , p x x p p p p r p p p r p p p 11 00 12 12 2 10 01 : : logr p p r r r r p p 1x 2x 2r 1r 1 2{( , ), }r r orthogonal coordinates firing rates correlations Multilayer Perceptrons i iy v nw x 21 ; exp , 2 , i i p y c y f f v x x x w x x 1 2( , ,..., )nx x x x 1 1( ,..., ; ,..., )m mw w v v Multilayer Perceptron 1 1, , , ; , i i m m y f v v v x w x w w neuromanifold ( )x space of functions singularities singularities Center of a cluster argmin , i i Dx x x K means clustering Total Bregman Divergence 2 : : 1 D TD x y x y •rotational invariance •conformal geometry Clustering : t center 1, , mE x x arg min , i i TDx x x E y T center of E x t center is robust 1, , ; 1 ; , nE n x x y x x z x y influence fun ;ction z x y robustas :cz y Linear Programming max log inner method ij j i i i ij j i i A x b c x A x bx Convex Cone Programming P : positive semi definite matrix convex potential function dual geodesic approach , minAx b c x あ

ORAL SESSION 1 Geometric Statistics on manifolds and Lie groups (Xavier Pennec)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

10/15/13 1 A Subspace Learning of Dynamics on a Shape Manifold: A Generative Modeling Approach Sheng Yi* and H. Krim VISSTA, ECE Dept., NCSU Raleigh NC 27695 *GE Research Center, NY Thanks to AFOSR Outline •  Motivation •  Statement of the problem •  Highlight key issues and brief review •  Proposed model and solution •  Experiments 10/15/13 2 Problem Statement X(t) Z(t) Looking for a subspace that preserve geometrical properties of data in the original space Related Work •  Point-wise subspace learning –  PCA, MDS, LLE, ISOMAP, Hessian LLE, Laplacian Mapping, Diffusion Map, LTSA [T. Wittman, "Manifold Learning Techniques: So Which is the Best?“, UCLA ] •  Curve-wise subspace learning –  Whitney embedding [D. Aouada, and H. K., IEEE Trans. IP, 2010] •  Shape manifold –  Kendall’s shape space •  Based on landmarks –  Klassen et al. shape space •  Functional representation •  Concise description of tangent space & Fast Implementation –  Michor&Mumford’s shape space •  Focus on parameterization •  Complex description of tangent space& Heavy computation –  Trouve and Younes Diff. Hom Approach 10/15/13 3 Contribution Summary •  Proposed subspace learning is Invertible Original seq. Reconstructed seq. Subspace seq.
 Contribution Summary •  The parallel transport of representative frames defined by a metric on the shape manifold preserves curvatures in the subspace •  Ability to apply an ambient space calculus instead of relying essentially on manifold calculus 10/15/13 4 Shape Representation •  From curve to shape [Klassen et al.] α(s)= x(s),y(s)( )∈R2 ⇒ ∂ ∂s ∂α ∂s = cosθ(s),sinθ(s)( ) (simpleandclosedθ(s))\Sim(n) Closed: cosθ(s)ds 0 2π ∫ = 0 sinθ(s)ds 0 2π ∫ = 0 Rotation: 1 2π θ(s)ds 0 2π ∫ = π Dynamic Modeling on a Manifold •  The Core idea27 ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt = Vi (Xt )dZi (t) i=1 dim( M ) ∑ ∈TXt M dim( M ) dZii (t) Zii (t) 10/15/13 5 Parallel Transport span Tangent along curve Tangent along curve Parallel Transport X0 X1 M [ Yi et al. IEEE IP, 2012] The core idea •  Adaptively select frame to represent in a lower dimensional space ( ) ti t XV X T M∈ dXt Process on M Driving Process on Rn dXt PCA on vectors parallel transported to a tangent space dZi (t) Rdim( M ) 10/15/13 6 Formulation of Curves on Shape Manifold A shape at the shape manifold Vectors span the tangent space of the shape manifold[ Yi et al. IEEE IP, 2012] A Euclidean driving process Vectors span a subspace A driving process in a subspace In original space: In a subspace: Core Idea •  Restrict the selection of V to be parallel frames on the manifold •  Advantage of using parallel moving frame: •  Angles between tangent vectors are preserved. With the same sampling scheme, curvatures are preserved as well. •  Given the initial frame and initial location on manifold, the original curve could be reconstructed 10/15/13 7 Core Idea •  Find an L2 optimal V Euclidean distance is used here because it is within a tangent space of the manifold Core Idea Given a parallel transport on shape manifold, with some mild assumption we can obtain a solution as a PCA 10/15/13 8 Parallel Transport Flow Chart By definition of parallel transport Discrete approx. of derivation Tangent space of shape manifold is normal to b1,b2,b3 Tangent space of shape manifold is normal to b1,b2,b3 A linear system Experiments •  Data Ø Moshe Blank et al., ICCV 2005 http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html Ø Kimia Shape Database Sharvit, D. et al.,. Content-Based Access of Image and Video Libraries,1998 •  Walk •  Run •  Jump •  Gallop sideways •  Bend •  One-hand wave •  Two-hands wave •  Jump in place •  Jumping Jack •  Skip 10/15/13 9 Reconstruction Experiment PCA in Euclidean space The proposed method More Reconstructions and Embeddings 10/15/13 10 Other Embedding Result Experiment on curvature preservation 10/15/13 11 Experiment on curvature preservation 10/15/13 12 Generative Reconstruction 10/15/13 13 Conclusions •  A low dimensional embedding of a parallelly transported shape flow proposed •  A learning-based inference framework achieved •  A generative model for various shape- based activities is obtained

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Xavier Pennec Asclepios team, INRIA Sophia- Antipolis – Mediterranée, France Bi-invariant Means on Lie groups with Cartan-Schouten connections GSI, August 2013 X. Pennec - GSI, Aug. 30, 2013 2 Design Mathematical Methods and Algorithms to Model and Analyze the Anatomy  Statistics of organ shapes across subjects in species, populations, diseases…  Mean shape  Shape variability (Covariance)  Model organ development across time (heart-beat, growth, ageing, ages…)  Predictive (vs descriptive) models of evolution  Correlation with clinical variables Computational Anatomy Statistical Analysis of Geometric Features Noisy Geometric Measures  Tensors, covariance matrices  Curves, tracts  Surfaces  Transformations  Rigid, affine, locally affine, diffeomorphisms Goal:  Deal with noise consistently on these non-Euclidean manifolds  A consistent statistical (and computing) framework X. Pennec - GSI, Aug. 30, 2013 3 X. Pennec - GSI, Aug. 30, 2013 4 Statistical Analysis of the Scoliotic Spine Data  307 Scoliotic patients from the Montreal’s St-Justine Hosp  3D Geometry from multi-planar X-rays  Articulated model:17 relative pose of successive vertebras Statistics  Main translation variability is axial (growth?)  Main rot. var. around anterior-posterior axis  4 first variation modes related to King’s classes [ J. Boisvert et al. ISBI’06, AMDO’06 and IEEE TMI 27(4), 2008 ] Morphometry through Deformations 5X. Pennec - GSI, Aug. 30, 2013 Measure of deformation [D’Arcy Thompson 1917, Grenander & Miller]  Observation = “random” deformation of a reference template  Deterministic template = anatomical invariants [Atlas ~ mean]  Random deformations = geometrical variability [Covariance matrix] Patient 3 Atlas Patient 1 Patient 2 Patient 4 Patient 5 1 2 3 4 5 Hierarchical Deformation model Varying deformation atoms for each subject M3 M4 M5 M6 M1 M2 M0 K M3 M4 M5 M6 M1 M2 M0 1 … Subject level: 6 Spatial structure of the anatomy common to all subjects w0 w1 w2 w3 w4 w5 w6 Population level: Aff(3) valued trees X. Pennec - GSI, Aug. 30, 2013[Seiler, Pennec, Reyes, Medical Image Analysis 16(7):1371-1384, 2012] X. Pennec - GSI, Aug. 30, 2013 7 Outline Riemannian frameworks on Lie groups Lie groups as affine connection spaces A glimpse of applications in infinite dimensions Conclusion and challenges X. Pennec - GSI, Aug. 30, 2013 8 Riemannian geometry is a powerful structure to build consistent statistical computing algorithms Shape spaces & directional statistics  [Kendall StatSci 89, Small 96, Dryden & Mardia 98] Numerical integration, dynamical systems & optimization  [Helmke & Moore 1994, Hairer et al 2002]  Matrix Lie groups [Owren BIT 2000, Mahony JGO 2002]  Optimization on Matrix Manifolds [Absil, Mahony, Sepulchre, 2008] Information geometry (statistical manifolds)  [Amari 1990 & 2000, Kass & Vos 1997]  [Oller & Corcuera Ann. Stat. 1995, Battacharya & Patrangenaru, Ann. Stat. 2003 & 2005] Statistics for image analysis  Rigid body transformations [Pennec PhD96]  General Riemannian manifolds [Pennec JMIV98, NSIP99, JMIV06]  PGA for M-Reps [Fletcher IPMI03, TMI04]  Planar curves [Klassen & Srivastava PAMI 2003] Geometric computing  Subdivision scheme [Rahman,…Donoho, Schroder SIAM MMS 2005] X. Pennec - GSI, Aug. 30, 2013 9 The geometric framework: Riemannian Manifolds Riemannian metric :  Dot product on tangent space  Speed, length of a curve  Geodesics are length minimizing curves  Riemannian Distance Operator Euclidean space Riemannian manifold Subtraction Addition Distance Gradient descent )( ttt xCxx   )(yLogxy x xyxy  xyyx ),(dist x xyyx ),(dist )(xyExpy x ))(( txt xCExpx t   xyxy  Unfolding (Logx), folding (Expx)  Vector -> Bipoint (no more equivalent class) Exponential map (Normal coord. syst.) :  Geodesic shooting: Expx(v) = g(x,v)(1)  Log: find vector to shoot right (geodesic completeness!) 10 Statistical tools: Moments Frechet / Karcher mean minimize the variance Existence and uniqueness : Karcher / Kendall / Le / Afsari Gauss-Newton Geodesic marching Covariance (PCA) [higher moments]  xyEwith)(expx x1  vvtt        M M )().(.x.xx.xE TT zdzpzz xxx xx         0)(0)().(.xxE),dist(Eargmin 2   CPzdzpy y MM MxxxxxΕ X. Pennec - GSI, Aug. 30, 2013 [Oller & Corcuera 95, Battacharya & Patrangenaru 2002, Pennec, JMIV06, NSIP’99 ] 11 Distributions for parametric tests Generalization of the Gaussian density:  Stochastic heat kernel p(x,y,t) [complex time dependency]  Wrapped Gaussian [Infinite series difficult to compute]  Maximal entropy knowing the mean and the covariance Mahalanobis D2 distance / test:  Any distribution:  Gaussian:           2/x..xexp.)( T xΓxkyN       rOk n /1.)det(.2 32/12/    Σ    rO /Ric3 1)1(    ΣΓ yx..yx)y( )1(2   xxx t    n)(E 2 xx  rOn /)()( 322  xx [ Pennec, NSIP’99, JMIV 2006 ] X. Pennec - GSI, Aug. 30, 2013 Natural Riemannian Metrics on Lie Groups Lie groups: Smooth manifold G compatible with group structure  Composition g o h and inversion g-1 are smooth  Left and Right translation Lg(f) = g o f Rg (f) = f o g Natural Riemannian metric choices  Chose a metric at Id: Id  Propagate at each point g using left (or right) translation g = < DLg (-1) .x , DLg (-1) .y >Id  Practical computations using left (or right) translations X. Pennec - GSI, Aug. 30, 2013 12   g)(f.LogDL(g)Logfgx).DL(ExpfxExp 1)( Idffff 1)(    Id 13 Example on 3D rotations Space of rotations SO(3):  Manifold: RT.R=Id and det(R)=+1  Lie group ( R1 o R2 = R1.R2 & Inversion: R(-1) = RT ) Metrics on SO(3): compact space, there exists a bi-invariant metric  Left / right invariant / induced by ambient space = Tr(XT Y) Group exponential  One parameter subgroups = bi-invariant Geodesic starting at Id  Matrix exponential and Rodrigue’s formula: R=exp(X) and X = log(R)  Geodesic everywhere by left (or right) translation LogR(U) = R log(RT U) / ExpR(X) = R exp(RT X) / Bi-invariant Riemannian distance  d(R,U) = ||log(RT U)|| = q( RT U ) X. Pennec - GSI, Aug. 30, 2013 14 General Non-Compact and Non-Commutative case No Bi-invariant Mean for 2D Rigid Body Transformations  Metric at Identity:

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Faculty of Science Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Stefan Sommer Department of Computer Science, University of Copenhagen August 30, 2013 Slide 1/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean data points on non-linear manifold Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean intrinsic mean µ Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean tangent space TµM Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean projection of data point to TµM Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean Euclidean PCA in tangent space Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 Dimensionality Reduction in Non-Linear Manifolds • Aim: Find subspaces with coordinates that approximate data in non-linear manifolds • Not learning non-linear subspaces from data in linear Euclidean spaces (ISOMAP, LLE, etc.) • Principal Geodesic Analysis (PGA, Fletcher et al., 2004) • finds geodesic subspaces - geodesics rays originating from a manifold mean µ ∈ M • the non-linear data space is linearized to TµM • Geodesic PCA (GPCA, Huckeman et al., 2010) • finds principal geodesics - geodesics minimizing residual distances that passes principal mean Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 2/14 PGA: analysis relative to the data mean What happens when µ is a poor zero- dimensional descrip- tor? Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 HCA, est. var. 0.492 Curvature Skews Centered Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 3/14 Bimodal distribution on S2 , var. 0.52 . −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 PGA, est. var. 1.072 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 HCA, est. var. 0.492 HCA - Horizontal Com- ponent Analysis HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 HCA Properties • data-adapted coordinate system r : RD → M • preserves distances orthogonal to lower order components: x −y = dM(r(x),r(y)) x = (x1 ,...,xd ,0,...,0), y = (x1 ,...,xd ,0,...,0,y ˜d ,0,...,0), 1 ≤ ˜d < d • intrinsic interpretation of covariance: cov(Xd ,X ˜d ) = R2 Xd X ˜d p(Xd ,X ˜d )d(Xd ,X ˜d ) = γd γ˜d ±dM(µ,r(Xd ))dM(r(Xd ),r(X ˜d ))p(r(Xd ,X ˜d ))ds ˜d dsd • orthogonal coordinates/subspaces • coordinate-wise decorrelation with respect to curvature-adapted measure Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 4/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs (p,u), p ∈ M, u frame for TpM • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 5/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of affine subspaces • SDEs are defined in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of affine subspaces • SDEs are defined in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of affine subspaces • SDEs are defined in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of affine subspaces • SDEs are defined in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 The Subspaces: Iterated Development • manifolds in general provides no canonical generalization of affine subspaces • SDEs are defined in the frame bundle using development of curves wt = t 0 u−1 s ˙xsds , wt ∈ Rη i.e. pull-back to Euclidean space using parallel transported frames ut • iterated development constructs subspaces of dimension > 1 (geodesic, polynomial, etc.) • geodesic developments (multi-step Fermi coordinates) generalize geodesic subspaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 6/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Iterated Development • V ⊂ Rη linear subspace, V = V1 ⊥ V2, η = dimM • f : V1 → F(M) smooth map (e.g. an immersion) • Df (v1 +tv2) development starting at f(v1) • vector fields W1 ,...,WdimV2 : V → V, columns of W • Euclidean integral curve ˙wt = W(wt )v2 • development Df,W (v1 +tv2) = (xt ,ut ) ∈ F(M) defined by ˙ut = ut ˙wt , xt = πF(M)(ut ) • immersion for small v = v1 +v2 if full V2 rank on W • Wi constant: geodesic development; Wi = Dei p polynomial submanifolds; etc. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 7/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 Horizontal Component Analysis • distances measured relative to lower order components • iterative definition of hd given hd−1: data points {xi} • curves: geodesics x hd t (xi ) passing points that are closest to xi on d −1st component hd−1 • projection: πhd (xi ) = argmint dM (x hd t (xi ),xi )2 • transport: derivatives ˙x hd 0 (xi ) connected in hd−1 by parallel transport • orthogonality: x hd t orthogonal to d −1 basis vectors transported horizontally in hd−1 • horizontal component hd : subspace containing curves x hd t (xi ) minimizing reshd−1 (hd ) = N ∑ i=1 dM (xi ,πhd (xi ))2 with d −1th coordinates fixed Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 8/14 1 find geodesic h1 with d dt h1|t=0 = u1 that 1 minimizes res(h1) = ∑N i=1 dM (xi ,πh1 (xi ))2 2 find u2 ⊥ u1 such that x h2 t (xi) are geodesics • that pass πh1 (xi ) • with derivatives ˙x h2 0 (xi ) equal trans. Ph1 u2 • that minimize resh1 (h2) = ∑N i=1 dM (xi ,πh2 (xi ))2 3 find u3 ⊥ {u1,u2} such that x h3 t (xi) are geodesics • that pass πh2 (xi ) • with derivatives ˙x h3 0 (xi ) par. transp. in h2 • that minimize resh2 (h3) = ∑N i=1 dM (xi ,πh3 (xi ))2 4 and so on . . . Parallel Transport and Local Analysis Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 9/14 Sample on S2 , horz.: uniform, vert.: normal −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 PGA −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 HCA Parallel trans- port along first component: Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA defines a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA defines a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA defines a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Conditional Congruency • data/geometry congruency: data can be approximated by geodesics (Huckemann et al.) • one-dimensional concept • conditional congruency: X ˜d |X1 ,...,Xd is congruent • HCA defines a data-adapted coordinate system that provides a conditionally congruent splitting Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 10/14 Components May Flip Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 11/14 x2 x3 x4 (p) x1 = 0 slice x4 x3 x1 (q) x2 = 0 slice −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x3 x 2 x1 (r) HCA visualization Figure: 3-dim manifold 2x2 1 −2x2 2 +x2 3 +x2 4 = 1 in R4 with samples from two Gaussians with largest variance in the x2 direction (0.62 vs. 0.42 ). (a,b) Slices x1 = 0 and x2 = 0. (c) The second HCA horizontal component has largest x2 component (blue vector) whereas the second PGA component has largest x1 component (red vector). Corpora Callosa Corpus callosum variation: 3σ1 along h1, 3σ2 along h2 Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 12/14 Corpora Callosa Corpus callosum variation: 3σ1 along h1, 3σ2 along h2 (Loading corporacallosa.mp4) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 12/14 Summary • Horizontal Component Analysis performs PCA-like dimensionality reduction in Riemannian manifolds • subspaces constructed from iterated frame bundle development • the implied coordinate system • is data adapted • preserves certain pairwise-distances and orthogonality • provides covariance interpretation • decorrelates curvature-adapted measure • provides conditionally congruent components • handles multi-modal distribution with spread over large-curvature areas Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 13/14 References - Sommer: Horizontal Dimensionality Reduction and Iterated Frame Bundle Development, GSI 2013. - Sommer et al.: Optimization over Geodesics for Exact Principal Geodesic Analysis, ACOM, in press. - Sommer et al.: Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations, ECCV 2010. http://github.com/nefan/smanifold Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Slide 14/14

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

- 1 Parallel Transport with the Pole Ladder: Application to Deformations of time Series of Images Marco Lorenzi, Xavier Pennec Asclepios research group - INRIA Sophia Antipolis, France GSI 2013 - 2GSI 2013 Paradigms of Deformation-based Morphometry Cross sectional Longitudinal t1 t2Sub A Sub B Different topologies Large deformations Biological interpretation is not obvious Within-subject Subtle changes Biologically meaningful - 3GSI 2013 Sub B Template Combining longitudinal and cross-sectional t1 t2 Sub A t1 t2 ? - 4GSI 2013 Sub B Template Sub A Combining longitudinal and cross-sectional Standard TBM approach Focuses on volume changes only Scalar analysis (statistical power) No modeling Jacobian determinant analysis - 5GSI 2013 Sub A Template Combining longitudinal and cross-sectional Longitudinal trajectories Sub B Vector transport is not uniquely defined Missing theoretical insights - 6GSI 2013 Diffeomorphic registration  Stationary Velocity Field setting [Arsigny 2006] v(x) stationary velocity field Lie group Exp(v) is geodesic wrt Cartan connections (non-metric) Geodesic defined by SVF Stationary Velocity Field setting [Arsigny 2006] v(x) stationary velocity field Lie group Exp(v) is geodesic wrt Cartan connections (non-metric) Geodesic defined by SVF LDDMM setting [Trouvé, 1998] v(x,t) time-varying velocity field Riemannian expid(v) is a metric geodesic wrt Levi-Civita connection Geodesic defined by initial momentum LDDMM setting [Trouvé, 1998] v(x,t) time-varying velocity field Riemannian expid(v) is a metric geodesic wrt Levi-Civita connection Geodesic defined by initial momentum Transporting trajectories: Parallel transport of initial tangent vectors M id v  - 7GSI 2013 [Schild, 1970] P0 P’0 P1 A C curve P2     P’1 A’ From relativity to image processing The Schild’s Ladder - 8GSI 2013 Schild’s Ladder Intuitive application to images P0 P’0 T0 A T’0 SLA) time Inter-subjectregistration [Lorenzi et al, IPMI 2011] - 9GSI 2013 t0 t1 t2 t3 - 10GSI 2013 t0 t1 t2 t3 • Evaluation of multiple geodesics for each time-point • Parallel transport is not consistently computed among time-points - 11 P0 P’0 T0 A T’0 A) The Pole Ladder optimized Schild’s ladder -A’ A’ C geodesic GSI 2013 - 12GSI 2013 Pole Ladder Equivalence to Schild’s ladder Symmetric connection: B is the parallel transport of A Locally linear construction  Pole ladder is the Schild’s ladder - 13GSI 2013 t1 t2 t3 t0 - 14GSI 2013 t0 t1 t2 t3 • Minimize the number of geodesics required • Parallel transport consistently computed amongst time-points - 15GSI 2013 Pole Ladder Application to SVF Setting [Lorenzi et al, IPMI 2011] B A + [ v , A ] + ½ [ v , [ v , A ] ] Baker-Campbell-Hausdorff formula (BCH) (Bossa 2007) - 16GSI 2013 Pole Ladder Iterative computation [Lorenzi et al, IPMI 2011] B A + [ v , A ] + ½ [ v , [ v , A ] ] A … v/n - 17 baseline Time 1 Time 4 … …ventricles expansion from the real time series Synthetic example GSI 2013 - 18 Comparison: •Schild’s ladder • Vector reorientation • Conjugate action • Scalar transport GSI 2013 Synthetic example EMETTEUR - NOM DE LA PRESENTATION - 19 Transport consistency Deformation Vector transport Scalar transport Scalar summary Scalar summary ( logJacobian det, …) Vector measure GSI 2013 Synthetic example - 20GSI 2013 Synthetic example - 21GSI 2013 Synthetic example Quantitative analysis • Pole ladder compares well with respect to scalar transport • High variability led by Schild’s ladder - 22 … … • Group-wise Statistics • Extrapolation Application on Alzheimer’s disease Group-wise analysis of longitudinal trajectories GSI 2013 - 23GSI 2013 Longitudinal changes in Alzheimer’s disease (141 subjects – ADNI data) ContractionExpansion Student’s t statistic - 24GSI 2013 Longitudinal changes in Alzheimer’s disease (141 subjects – ADNI data) Comparison with standard TBM Student’s t statistic Pole ladder Scalar transport • Consistent results • Equivalent statistical power - 25GSI 2013 Conclusions • General framework for the parallel transport of deformations (not necessarily requires the choice of a metric) • Minimal number of computations for the transport of time series of deformations • Efficient solution with the SVF setting • Consistent statistical results • Multivariate group-wise analysis of longitudinal changes Perspectives • Further investigations of numerical issues (step-size) • Comparison with other numerical methods for the parallel transport in diffeomorphic registration (Younes, 2007) - 26 Thank you GSI 2013

Keynote speech 1 (Yann Ollivier)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo

Objective Improvement in Information-Geometric Optimization Youhei Akimoto Project TAO – INRIA Saclay LRI, Bât. 490, Univ. Paris-Sud 91405 Orsay, France Youhei.Akimoto@lri.fr Yann Ollivier CNRS & Univ. Paris-Sud LRI, Bât. 490 91405 Orsay, France yann.ollivier@lri.fr ABSTRACT Information-Geometric Optimization (IGO) is a unified frame- work of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization prob- lem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO re- covers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-

ORAL SESSION 2 Deformations in Shape Spaces (Alain Trouvé)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Geodesic image regression with a sparse parameterization of diffeomorphisms James Fishbaugh1 Marcel Prastawa1 Guido Gerig1 Stanley Durrleman2 1 Scientific Computing and Imaging Institute, University of Utah 2 INRIA/ICM, Pitié Salpêtrière Hospital, Paris, France Image Regression 6 months 12 months 18 months8 months 14 months 16 months10 months Why image regression? • Extrapolation for change prediction • Align images and cognitive scores acquired at different times • Align subjects with scans acquired at different times • Improved understanding of normal and pathological brain changes 1 of 18 Previous Work Kernel regression Geodesic regression Geodesic regression Davis et al. ICCV 2007 Niethammer et al. MICCAI 2011 Singh et al. ISBI 2013 Require to store many model parameters ~ number of voxels Image evolution described by considerably fewer parameters Concentrated in areas undergoing most dynamic changes 2 of 18 Motivation for Sparsity Fewer parameters Location of parameters • Potential for greater statistical power – less noise in description • Concentrated in areas undergoing the most dynamic changes • Number of parameters should reflect complexity of anatomical changes, not the sampling of the images • Localize potential biomarkers 3 of 18 Compact and generative statistical model of growth Geodesic Image Regression Geodesic path on a sub-group of diffeomorphisms (Dupuis 98, Trouvè 95,98) 4 of 18 Geodesic Image Regression S0 = {c0, α0} I0 O1 O3 O2 5 of 18 Geodesic shooting to evolve control points S0 = {c0, α0} Methods: Shooting 6 of 18 Trajectory of control points defines flow of diffeomorphisms Physical pixel coordinates y follow the trajectory which evolves in time as Methods: Flow (5, 5, 60.25)Deformed images constructed by interpolation 7 of 18 Summary Of Method 1) Shoot control points 2) Trajectory defines flow 3) Flow pixel locations 4) Interpolate in baseline image 8 of 18 Subject to Shoot Flow Regression Criterion 9 of 18 Method Overview Gradient with respect to control points and initial momenta Gradient with respect to initial image Gradient of Regression Criterion 1) Flow voxel Yk(t) to time t and compute residual 2) Grey value in residual is distributed to neighboring voxels with weights from trilinear interoplation 3) Grey values accumulated for every observed image 10 of 18 Method Overview Sparsity on Initial Momenta Fast Iterative Shrinkage-Thresholding Algorithm (Beck 09)  Use previous gradient of criterion without L1 penalty  Threshold momentum vectors with small magnitude Select a small subset of momenta which best describe the dynamics of image evolution 11 of 18  Used in context of atlas building (Durrleman 12,13) Synthetic Evolution (2D) Generated by shooting baseline with 79,804 predefined momenta Time 1 Time 2 Time 3 Time 4 Time 5 Impact of sparsity parameter on model estimation 12 of 18 Synthetic Evolution (2D) From 79,804 to 67 momenta 13 of 18 Pediatric Brain Development (2D) Models estimated backwards in time with varying sparsity T1W image of same child over time 14 of 18 Method Overview Pediatric Brain Development (2D) From 45,435 to 47 momenta 15 of 18 Brain Atrophy in Alzheimer's Disease (3D) T1W image of same patient over time 70.75 years 71.38 years 71.78 years 72.79 years Six years predicted brain atrophy with 35,937 momenta 98% decrease in number of parameters 16 of 18 Conclusions Geodesic image regression framework:  Decouples deformation parameters from image representation  L1 penalty which selects optimal subset of initial momenta Number of parameters reduced with only minimal cost in terms of matching target data Future work:  Kernels at multiple scales (Sommer 11)  Other image matching metrics, LCC (Avants 07, Lorenzi 13)  Combine with a framework for longitudinal analysis 17 of 18 This work was supported by: NIH (NINDS) 1 U01 NS082086-01 (4D shape HD) NIH (NICHD) RO1 HD055741 (ACE, project IBIS) NIH (NIBIB) 2U54 EB005149 (NA-MIC) Acknowledgments Thank you 18 of 18

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

On the geometry and the deformation of shapes represented by piecewise continuous Bézier curves with application to shape optimization Olivier Ruatta XLIM, UNR 7252 Université de Limoges CNRS Geometric Science of Information 2013 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 1 / 32 Motivation: Shapes optimisation problems Let Ω ⊂ P R2 such that for each ω ∈ Ω the frontier ∂ω of this "region" is a regular curve (i.e. piecewise continuous here). Let F : Ω −→ R+ be a positive real valued function. Problem Find ω0 ∈ Ω such that F(ω0) ≤ F(ω) for all ω ∈ Ω. Very often, the computation of F(ω) requires to solve a system of PDE. Two problems : The cost of the computation of F(ω) and its differentiability (and computation of derivatives also). Compatibility of the space of shape and discretization of R2 for the system of PDE. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 2 / 32 Shapes optimization problems R+ M ∂ω F F(ω) ω Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 3 / 32 Shapes optimization methods Geometric gradient techniques (level sets, . . .) : compute how to deform the frontier of the shape and try to deform in a coherent way [Hadamard, Pierre, Henrot, Allaire, Jouve, . . .]. Relaxation method (SIMP, . . .) : compute a density that represent the support of the shape [Bensœ,Sigmund, . . .]. Topologic gradient : generally for PDEs, remove or add finites elements contening the shape [Masmoudi, Sokolowski, . . .]. Parametric optimization : reduce the shapes to a little space controlled by few parameters and look the problem as a parametric optimization problem [Elyssa (Didon in latin, 4 century before J.-C.),. . .,Goldberg,. . .]. Our approach : try to mix the best aspects of the first (level sets) and the last (parametric) approaches. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 4 / 32 Bézier curves Let P0, . . . , Pd ∈ R2, we define a family of curves parametrized over [0, 1] : B([P0], t) := P0 (degree 0 Bézier curve). B([P0, P1], t) := (1 − t)B([P0], t) + tB([P1], t) = (1 − t)P0 + tP1 (degree 1 Bézier curve) . . . B([P0, . . . , Pd ], t) := (1 − t)B([P0, . . . , Pd−1], t) + tB([P1, . . . , Pd ], t) (degree d Bézier curve). Those are polynomial curves. The points P0, . . . , Pd ∈ R2 are called the control polygon of the curve defined by B([P0, . . . , Pd ], t). Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 5 / 32 Bézier curves P0 P1 P2 P3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 6 / 32 Bernstein polynomials Definition Let d be a positive integer, for all i ∈ {0, . . . , d} we define: bi,d (t) = d i (1 − t)d−i ti . The polynomials b0,d , . . . , bd,d are called the Bernstein polynomials of degree d. Proposition The Bernstein polynomials of degree d, b0,d , . . . , bd,d , form a basis of the vector space of polynomials of degree less or equal to d. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 7 / 32 Bernstein polynomials and Bézier curves Theorem B([P0, . . . , Pd ], t) = d i=0 Pibi,d (t) Corollary Every parametrized curve with polynomial parametrization of degree at most can be represented as a Bézier curve of degree at most d. Definition We define Bd the space of all Bézier curves of degree at most d. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 8 / 32 Structure of Bd We denote E = R2 and we consider the following map: Ψd : Ed+1 −→ Bd defined by Ψd (P0, . . . , Pd ) = B([P0, . . . , Pd ], t). Proposition Ψd is a linear isomorphism between Ed+1 and Bd . Let t = t0 = 0 < t1 < · · · < td = 1 be a subdivision of [0, 1]. We define the sampling map: St,d : Γ(t) ∈ Bd −→ (Γ(t0), · · · , Γ(td )) ∈ Ed+1 . Proposition St,d is a linear isomorphism between Bd and Ed+1. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 9 / 32 Evaluation-Interpolation Let t = t0 = 0 ≤ t1 ≤ · · · ≤ td = 1 be a subdivision of [0, 1] and let P0, . . . , Pd ∈ E. Bt,d :=    b0,d (t0) · · · bd,d (t0) ... ... ... b0,d (td ) · · · bd,d (td )    . Proposition (Evaluation) Bt,d    PT 0 ... PT d    =    B([P0, . . . , Pd ], t0)T ... B([P0, . . . , Pd ], td )T    . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 10 / 32 Multi-evaluation t = (0, 1/3, 2/3, 1),     MT 0 MT 1 MT 2 MT 3     = Bt,3     PT 0 PT 1 PT 2 PT 3     P0 P1 P2 P3 M0 M1 M2 M3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 11 / 32 Evaluation-Interpolation Let t = t0 = 0 ≤ t1 ≤ · · · ≤ td = 1 be a subdivision of [0, 1] and let M0, . . . , Md ∈ E. Problem Find P0, . . . , Pd ∈ E such that B([P0, . . . , Pd ], ti) = Mi for all i ∈ {0, . . . , d}. Proposition (Interpolation) The points defined by:    PT 0 ... PT d    = B−1 t,d    B([P0, . . . , Pd ], t0)T ... B([P0, . . . , Pd ], td )T    solve the problem. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 12 / 32 Summary 1 We have 3 spaces: Pd Ed+1 the vector space of the control polygons, St,d Ed+1 the vector space of the sampling of Bézier curves associated to a subdivision t, Bd the vector space of the degree d Bézier parametrizations. Proposition The following diagram of isomorphisms is commutative: Pd Ψd −→ Bd Bt,d ↓ Et,d St,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 13 / 32 Deformation problem Let Γ(t) := B([P0, . . . , Pd ], t) be a degree d Bézier curve s.t. Et,d (Γ) = M =    MT 0 ... MT d    and let δM :=    δMT 0 ... δMT d    ∈ TMSt,d , we consider the following problem: Problem (Deformation problem) Denoting P =    PT 0 ... PT d    find δP ∈ TPPd such that Λ(t) := B([P0 + δP0, . . . , Pd + δPd ], t) satisfies Λ(ti) = Mi + δMi for all i ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 14 / 32 Deformation problem P1 P2 P3 M0 M1 M2 M3 P0 δM0 δM1 δM2 δM3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 15 / 32 Deformation curve Proposition (Deformation polygon) Taking δP = B−1 t,d δM, the curve Ψd (P + δP) is a solution of the "Deformation problem". δP ∈ TPPd is called the deformation polygon and Ψd (δP) ∈ TB([P0,...,Pd ],t)Bd is called the deformation curve. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 16 / 32 Piecewize Bézier curves Let P1 := (P1,0, . . . , P1,d ) ∈ Pd , P2 := (P2,0, . . . , P2,d ) ∈ Pd , . . . and PN := (PN,0, . . . , PN,d ) ∈ Pd be such that P1,d = P2,0, . . . , PN−1,d = PN,0. We define the following parametrization: B((P1, . . . , PN), t) =    B([P1,0, . . . , P1,d ], N ∗ t) if t ∈ [0, 1/N[ B([P2,0, . . . , P2,d ], N ∗ t − 1) if t ∈ [1/N, 2/N[ ... B([PN,0, . . . , PN,d ], N ∗ t − (N − 1)) if t ∈ [N−1 N , N] We denote Ψ(P1, . . . , PN) := B((P1, . . . , PN), t). This is a continuous curve joining P1,0 to PN,d and the curve is a loop if P1,0 = PN,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 17 / 32 Piecewize Bézier curves P1,3 = P2,0 P1,0 P1,1 P1,2 P2,1 P2,2 P2,3 Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 18 / 32 Vector space of Piecewize Bézier curves We define : The vector space PN,d = {(P1, . . . , PN)|P1,d = P2,0, . . . , PN−1,d = PN,0} ⊂ PN d . The vector space LN,d = {(P1, . . . , PN)|P1,d = P2,0, . . . , PN−1,d = PN,0, P1,0 = PN,d } ⊂ PN d . The vector space of PBC BN,d = {B((P1, . . . , Pd ), t)|(P1, . . . , Pd ) ∈ PN,d }. The vector space of PBL Bc N,d = {B((P1, . . . , Pd ), t)|(P1, . . . , Pd ) ∈ LN,d }. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 19 / 32 Sampling piecewize Bézier curves Let t = (t1,0 := 0, t1,1 := 1 N∗d , . . . , t1,d := 1 N , t2,0 = 1 N , . . . , tN−1,d := N−1 N , tN,0 := N−1 N , . . . , tN,d := 1) be a multi-regular subdivision and denote ti := (ti,0, . . . , ti,d ). We define the following linear map: Et,N,d : λ(t) ∈ BN,d −→ (λ(t1,0), . . . , λ(tN,d )) ∈ St,N,d ⊂ SN d The same way we define: Ec t,N,d : λ(t) ∈ Bc N,d −→ (λ(t1,0), . . . , λ(tN,d )) ∈ Sc t,N,d ⊂ SN d Finally, we define: Bt,N,d : (P1,0, . . . , PN,d ) ∈ PN,d −→ Bt1,d × · · · × BtN ,d    PT 1,0 ... PT N,d    ∈ St,N,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 20 / 32 Summary 2 Proposition The following diagram of isomorphisms is commutative: PN,d ΨN,d −→ BN,d Bt,N,d ↓ Et,N,d St,N,d . Remark B−1 t,N,d := B−1 t1,d × · · · × B−1 tN ,d Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 21 / 32 Deformation problem for PBC Let Γ(t) ∈ BN,d be s.t. Et,N,d (Γ) = M :=    MT 1,0 ... MT N,d    ∈ St,N,d and let δM =    δMT 1,0 ... δMT N,d    ∈ TMSt,N,d , we consider the following problem: Problem (Deformation problem for PBC) Denoting P =    PT 1,0 ... PT N,d    find δP ∈ TPPN,d such that Λ(t) := B((P0 + δP0, . . . , Pd + δPd ), t) satisfies Λ(ti,j) = Mi,j + δMi,j for all i ∈ {1, . . . , N} and j ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 22 / 32 Deformation Piecewize Bézier curve Proposition (Deformation polygons) Taking δP = B−1 t,N,d δM, the curve ΨN,d (P + δP) is a solution of the "Deformation problem for PBC". δP ∈ TPPN,d is called the deformation polygons and ΨN,d (δP) ∈ TB((P0,...,Pd ),t)BN,d is called the deformation piecewize Bézier curve. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 23 / 32 Back to shapes optimization Let ω ⊂ E be such that ∂ω is a piecewise continuous curve and F : P(E) −→ R+ the objective functional. The geometric gradient F(ω) : M ∈ ∂ω −→ F(ω)(M) ∈ TME give a perturbation for each point of the frontier to decrease the objective functional. R+ M ∂ω F F(ω) ω F(ω)(M) Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 24 / 32 Basic idea of the approach The space of admissible shape is ΩN,d := ω ∈ P(E)|∂ω ∈ Bc N,d and let ω ∈ ΩN,d such that ∂ω = B((P1, . . . , PN), t). Let M =    MT 1,0 ... MT N,d    = Et,N,d (B((P1, . . . , PN), t), to obtain a better shape we compute δM =    F(ω)(M1,0)T ... F(ω)(MN,d )T   . Then we compute δP = B−1 t,N,d δM and let λ(t) = B((P + δP), t), we have: Proposition λ(ti,j) = Mi,j + F(ω)(Mi,j) for all i ∈ {1, . . . , N} and j ∈ {0, . . . , d}. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 25 / 32 Basic theoretical contribution Let Ω = {ω ∈ P(E)|∂ω ∈ C0 p([0, 1], E)} and F : Ω −→ R+ be a smooth function such that F(ω) : ∂ω −→ TE is everywhere well defined. For every γ ∈ BN,d we associate "the" shape γ such that ∂γ = γ. Proposition For every N and d integer and every compatible subdivision t, we associate to F a vector field VF : BN,d −→ TBN,d by: B((P), t) −→ ΨN,d   B−1 t,N,d    ( F(B(P, t))(B((P), t1,0)))T ... ( F(B(P, t))(B((P), tN,d )))T       . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 26 / 32 Optimal shapes and fixed points Proposition Let ω ∈ Ω such that F(ω) ≡ 0 then, for all N and d and every compatible subdivision t there is γ ∈ BN,d satisfying VF (γ) = 0. In other words, every optimum of F induce at list a fixed point of VF over BN,d . Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 27 / 32 Meta-algorithm for shapes optimization Input: An initial shape ω s.t. ∂ω = B((P), t) ∈ BN,d Output: The control polygon P a the frontier of a local minimum of F(ω). λ ← B((P), t) while criterium not satified do δP ← B−1 t,N,d (Et,N,d (λ)) P ← P + δP λ ← ΨN,d (P) end while Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 28 / 32 Snake-like algorithm for omnidirectional images segmentation Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 29 / 32 Snake-like algorithm for omnidirectional images segmentation Image segmentation can be interpreted as a shapes optimization problem using "snake-approach". The geometric gradient is built from: a "balloon" force making the contour expand, the gradient of the intensity of the image (vanishing at the contours). We use a classical approach: Canny filter to detect contours. This problem is used to detect free space for an autonomous robot with a catadioptric sensor. Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 30 / 32 Snake-like algorithm for classical images segmentation Joint work with Ouiddad Labbani-I. and Pauline Merveilleux-O. of "Université de Picardie - Jules Vernes". Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 31 / 32 Snake-like algorithm for omnidirectional images segmentation Joint work with Ouiddad Labbani-I. and Pauline Merveilleux-O. of "Université de Picardie - Jules Vernes". Ruatta (XLIM) Free Forms for Shapes Optimisation GSI 2013 32 / 32

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Random Spatial Structure of Geometric Deformations and Bayesian Nonparametrics Xavier Pennec Asclepios Research Project INRIA Sophia Antipolis Christof Seiler Department of Statistics Stanford University Susan Holmes Department of Statistics Stanford University GSI2013 - Geometric Science of Information, Paris, http://www.gsi2013.org/ 2 Clinical question Group 1: Back pain patients Group 2: Abdominal pain patients 3 Clinical question What is the right basis to compare anatomical structures? Geometric differences between groups? Learn shape and number of parts from data. 4 Example: Motion-based segmentation of 3D objects [Soumya Ghosh, Erik B. Sudderth, Matthew Loper, and Michael J. Black, From Deformations to Parts: Motion-based Segmentation of 3D Objects, NIPS 2012] Reference pose for female body Five example poses out of 56 Manual segmentati on ddCR P 5 Template Random deformations Motion-based partitioning of geometric deformations Patient 1 Patient 2 … deform Deformations parameterized as stationary velocity fields and estimated using: [Vercauteren et al., NeuroImage 2009] [Ashburner et al., NeuroImage 2007][Hernandez et al., ICCV 2007] [Lorenzi et al., NeuroImage 2013] 6 Bayesian model of the anatomy [V. Arsigny, O. Commowick, N. Ayache, X. Pennec, A Fast and Log-Euclidean Polyaffine Framework for Locally Linear Registration, J Math Imaging Vis 2009] Observed velocity vectors Subset of voxel coordinates: Distribution on partitions Velocity vector noise: Multivariate normal with mean 0 and covariance Deformation parameter: Multivariate normal with mean 0 and diagonal covariance 7 Prior on deformation parameters Hyperparamet ers: Prior: Concentration matrix Skew symmetric part / rotation Symmetric part / scaling (and shearing) Translati on Connection: Affine transformations (motion) and velocity field? Let t = 1, velocity vectors are consistent with the transformation A = exp(M). linear ODE with analytic solution: Initial condition at t=0 8 Simulated samples from prior aw M i.i.d. from multivariate normal. perparameters: Mean = 0, variance rotation, scaling = 0.01, and translation = 1. Decompose into rotation angle (Rodrigues' formula, SVD) and volume change (determinant): Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volumechanges Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 9 x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Sample link for x11 x11 x12 x13 x21 x22 x23 Sample link for all voxels x11 x12 x13 x21 x22 x23 Partitions are given by the link structure Distances between voxels Decay function: Self linking probability Prior on spatial partitions: Distance dependent Chinese Restaurant Process [D.M. Blei, P.I. Frazier, Distance Dependent Chinese Restaurant Processes, Journal of Machine Learning Research 12 (2011) 2383-2410] 10 Sample partitions with the Gibbs sampler x11 x12 x13 x21 x22 x23 Step 1: Delete link x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Step 2: Sample new link Splitting partitions Rejoining partitions x11 x12 x13 x21 x22 x23 Step 3: New partitions Same as before 11 Sample partitions with the Gibbs sampler x11 x12 x13 x21 x22 x23 Step 1: Delete link x11 x12 x13 x21 x22 x23 x11 x12 x13 x21 x22 x23 Step 2: Sample new link Joining partitions Linking to itself x11 x12 x13 x21 x22 x23 Step 3: New partitions Same as before 12 Link to data / Model selection Fixed velocity noise hyperparameter Voxel coordinates Observed: Prior on deformation parameters How well does the data (velocity field) fit a given model (partition) for all possible parameters? Answer: Marginal likelihood 13 Inference with the Gibbs sampler x11 x12 x13 x21 x22 x23 Separate partitions Joined partitions x11 x12 x13 x21 x22 x23 Marginal likelihood for all t velocity fields Results – 2D Velocity Fields 14 Target: 15 Results – 3D Velocity Fields Front view Lateral view Scaling Rotation Translation Rotation 16 Results – 3D Velocity Fields Front view Initializ e Step 1 Step 2 Rotation and translation Scaling and translation 17 Results – 3D Velocity Fields Templat e Patient 1 Patient 2 Partitio n 18 Results – 3D Velocity Fields – Spine Template Step 10 Step 20 Step 30 Abdomin al pain Back pain 19 Conclusions Nonparametric way of estimating the number and structure of partitions. Incorporating uncertainty in a Bayesian fashion (avoiding overfitting). Prior with medically intuitive interpretation. Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volume changes Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 x11 x12 x13 x21 x22 x23 20 Next step and open questions With more data are partitions of the two groups drawn from the same distribution? Group 1 Group 2 Compute posterior of deformation parameters Histogramof rotation angles Degree Frequency 0 5 10 15 20 050010001500200025003000 Histogramof volume changes Scale factor Frequency 0.5 1.0 1.5 2.0 01000200030004000 21 Thanks for your attention! 22 23 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines [Vercauteren et al., NeuroImage 2009] [Ashburner et al., NeuroImage 2007][Hernandez et al., ICCV 2007] [Lorenzi et al., NeuroImage 2013] 24 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines 25 Data: Geometric deformations as stationary velocity fields Velocity vectors Streamlines 26 Dirichlet Process is the de Finetti measure for the Chinese Restaurant Process Two different ways to sample from exchangeable distributions. "parallel" construction First sampling some latent object that then renders all the sequence elements conditionally independent. • [Gosh, van der Vaart, Fundamentals of Nonparametric Bayesian Inference (Chapters 1-5 of book draft)] • [J. Pitman, Combinatorial Stochastic Processes, 2002] • [Question on http://metaoptimize.com/, http://tinyurl.com/n8wtjgy] 27 Prior on affine transformations Affine group: Multiplication of elements of the affine group: First order Baker-Campbell-Hausdorff terms Jordan/Schur decomposition Lie algebraic representation of affine group: 28 Regional Bayesian Linear Regression Prior on deformation parameters: Likelihood given velocity field and fixed noise parameter: Velocity vector noise: 29 Probability distribution over partitions of exchangeable “things”, order doesn’t matter: Random partitions with the Chinese Restaurant Process • [Ghosh, van der Vaart, Fundamentals of Nonparametric Bayesian Inference (Chapters 1-5 of book draft)] • [J. Pitman, Combinatorial Stochastic Processes, 2002] • [QA on http://metaoptimize.com/, http://tinyurl.com/n8wtjgy] Total number of partitions z1z2 z3 z4 z5 … Partition k = 1 Partition k = 2 nk = 3 nk = 2 =2 =2 =1 =1 =1 New partition z2 z3 z4 z1 z5 … 30 Prior on partitions with dependencies Time series t1 …t2 t2 Spatial data x11 …x12 x13 x21 …x22 x23 … … … Graphs g1 g2 g3 g4 … … [S. N. MacEachern, Dependent Dirichlet processes] 2000 [N.J. Foti, S. Williamson, A survey of non-exchangeable priors for Bayesian nonparametric models] 2012 [D.M. Cifarelli, E. Regazzini, Nonparametric statistical problems under partial exchangeability] (in italian) 1978 31 Some implementation details for the spatial ddCPR [Richard Socher, Christopher D. Manning, A Gibbs Sampler for Spatial Clustering with the Distance-dependent Chinese Restaurant Process] In original paper by D. Blei focuses on time series data. In spatial data special care need to be taken for cyclic links. x12 x22 Solution: Recursive function. 32 Not marginal invariant: Links that are unobserved influence the distribution. x11 x12 x13 x21 x22 x23 If x12 is unobserved x11 x13 x21 x22 x23 Prior on spatial partitions: Distance dependent Chinese Restaurant Process then x11 and x12 are not in the same partition. 33 Applications of ddCRP: Image segmentation ddCR with different α rddC RP Thresholded GP [Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, and David M. Blei, Spatial distance dependent Chinese restaurant processes for image segmentation, NIPS 2011] 34 Prior on affine transformations Representation in homogeneous coordinates: Decomposition into translation, rotation and scaling: Transform a point: 35 Overview Results – 2D Velocity Fields 36 Target:

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

TEMPLATE ESTIMATION FOR LARGE DATABASE: A DIFFEOMORPHIC ITERATIVE CENTROID METHOD USING CURRENTS CLAIRE CURY, JOAN A. GLAUNES AND OLIVIER COLLIOT GSI2013 - Geometric Science of Information Note : send me an email at claire.cury.pro @ gmail.com if you want the pptx version of this presentation. INTRODUCTION  Computational Anatomy (CA):  Analysis of anatomical structures variability  Characterizing differences between normal and pathological anatomies.  Link between function and structures Template-based analysis in CA:  Population data is encoded in the template coordinate system, then statistics are processed on this data. Large Deformation Diffeomorphic Metric Mapping (LDDMM) methods:  provides diffeomorphic maps: invertible smooth transformations that preserve topology.  Defines a metric distance that can be used to quantify the similarity between two shapes. GSI2013 - Geometric Science of Information 2 INTRODUCTION Template estimation methods in the LDDMM framework:  J. Glaunès and S. Joshi (MFCA 2006).  S. Durrleman et al. (MFCA 2008, MICCAI 2012).  J. Ma et al. (NeuroImage 2008). GSI2013 - Geometric Science of Information 3 S1 S2 S3 S4 S5 T S1 S2 S3 S4 S5 T S1 S2 S3 S4 S5 TJ0 INTRODUCTION  All these methods need a lot of computation time, which is a limitation for the study of large database.  example: a matching from one surface to another (with about 3000 vertices each): around 30 minutes  Template estimation (N≈100) until convergence: few days or some weeks.  To study large databases : need to go faster  We can increase the convergence speed by providing a better initialization to the template optimization process.  We propose an Iterative Centroid method (IC). GSI2013 - Geometric Science of Information 4 MATHEMATICAL SETUP: LDDMM FRAMEWORK  Large Deformation Diffeomorphic Metric Mapping:  to quantify the difference between shapes.  2 shapes can be connected by a continuum of intermediate anatomically plausible shapes (shape space idea).  Diffeomorphic maps act on the whole 3D space, so spatial organization is preserved. GSI2013 - Geometric Science of Information 5 MATHEMATICAL SETUP: LDDMM FRAMEWORK  In LDDMM framework deformation maps  : R3  R3 are generated by integration of time-varying vector fields vt(x) : vt belong to a RKHS V , the norm controls the regularity of the maps  One can define a right invariant distance on the diffeomorphisms group Geodesic shooting: The last diffeomorphism at t=1 is completely parameterized by the initial condition GSI2013 - Geometric Science of Information 6 The position of point x at time t The velocity of point x at time t The point i of the surface at time t Momentum vector of point i at time t MATHEMATICAL SETUP: CURRENTS GSI2013 - Geometric Science of Information 7  Framework of currents (Vaillant and Glaunès 2005) has been chosen to measure dissimilarities between anatomical structures. Interests  Point correspondence solved  Robust to different sampling and topologies  The set of surfaces gets embedded in a vector space : currents can be added, subtracted  If S is a surface, [S] is a current, i.e. a continuous linear map from a space of differential 2-forms to R : with  a differential 2-form of R3 MATHEMATICAL SETUP: CURRENTS  The space of current W* is the dual space of a RKHS of 2- forms W. Scalar product :  Optimal match, is the diffeomorphism minimizing J : GSI2013 - Geometric Science of Information 8 TEMPLATE ESTIMATION METHOD USED We used the method presented by J. Glaunès and S. Joshi (MFCA 2006): Estimates a template given a collection of unlabeled points sets or surfaces Let Si be N surfaces in R3. In the framework of LDDMM and currents the template estimation problem is posed as a minimum mean squared error estimation problem: The template is composed by all meshes of the population  Alternated optimization: we successively match each on the template [S] = , then we update the template, and we iterate this whole loop Standard initialization: i= Id , which is equivalent to GSI2013 - Geometric Science of Information 9 THE ITERATIVE CENTROID METHOD  Centroid computed iteratively via currents and LDDMM General idea : GSI2013 - Geometric Science of Information 10 THE ITERATIVE CENTROID METHOD WAY 1  We have N shapes Si :  Fast process. There is (N-1) matching of 1 to 1 surfaces. GSI2013 - Geometric Science of Information 11 Start with a first subject : B1=S1  We iterate the following process: •Bi is matched to Si+1  we obtain the deformation map •Bi+1 is set as . Bi is transported along the geodesic and stopped at time t = 1/(i+1). We have N shapes Si : This is a slower way, at each step we add one surface more to the centroid. At the end the Centroid is a combination of N surfaces. Start with a first subject : B1=S1  We iterate the following process: •Bi is matched to Si+1  we obtain the deformation map •Bi+1 is set as where ui(x,t) = -vi(x,1-t), ui is the reverse flow. THE ITERATIVE CENTROID METHOD WAY 2 GSI2013 - Geometric Science of Information 12 DATA  Human hippocampi, small cerebral structures related with memory process. Base of datasets: 95 human hippocampi segmented by SACHA (Chupin et al. NeuroImage, 2009) from Magnetic Resonance Images (MRI) of the IMAGEN database.  We build 3 datasets from this database GSI2013 - Geometric Science of Information 13 DATA RealData: 95 hippocampus meshes from the database IMAGEN.  Rigid alignment to a typical subject.  Meshes from RealData have between 1716 and 2256 vertices GSI2013 - Geometric Science of Information 12 DATA Data1: one subject S0 is decimated to keep about 100 vertices and then deformed using geodesic shooting in random directions composed with small translations and rotations  We have 500 subjects  Data1 is a large database with simple meshes and mainly global deformations GSI2013 - Geometric Science of Information 15 DATA Data2: From S0 , we decimate less (about 1000 vertices), and we match via LDDMM this mesh to the 95 hippocampi  We have 95 subjects  Data2 has more local variability. Closer to the anatomical truth GSI2013 - Geometric Science of Information 12 THE ITERATIVE CENTROID METHOD: RESULTS GSI2013 - Geometric Science of Information 17 Data2 : Iterative Centroid computed via Way 1. T =1.5h Data2 : Iterative Centroid computed via Way 2. T=3.5h THE ITERATIVE CENTROID METHOD: RESULTS GSI2013 - Geometric Science of Information 18 RESULTS : EFFECT OF SUBJECT ORDERING GSI2013 - Geometric Science of Information 19 RESULTS : EFFECTS OF INITIALIZATION AND ORDERING GSI2013 - Geometric Science of Information 20 Data1 C1 C2 C3 Std init 41.1 41.1 40.6 C1 0 0.67 1.17 C2 0.67 0 1.11 C3 1.17 1.11 0 Data2 C1 C2 C3 Std init 20.5 20.2 20.7 C1 0 0.53 0.67 C2 0.53 0 0.84 C3 0.67 0.84 0 RealData C1 C2 C3 Std init 27.4 26.7 26.5 C1 0 7.03 6.24 C2 7.03 0 1.86 C3 6.24 1.86 0 Template initialized by: Std init C1 C2 C3 Data1 0.0062 0.0056 0.0059 0.0212 Data2 0.0077 0.0086 0.0060 0.0206 RealData 0.0073 0.0060 0.0088 0.0094  Distance ||.||W* between template estimated from standard initialization and from I.C. with different orderings.  Is the computed template correctly centered ? •We calculated the ratio: •With vector field corresponding to the initial momentum vector of the deformation from the template to subject i RESULTS : EFFECT OF THE NUMBER OF ITERATIONS GSI2013 - Geometric Science of Information 21 W∗-distance ratios between the I.C. computed with x% of the population and the total population with different orderings. 0 10 20 30 40 50 60 70 80 90 100 Data1 (n=500) 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 Data2 (n=95) 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 RealData (n=95) RESULTS : COMPUTATION TIME GSI2013 - Geometric Science of Information 22  A GPU implementation was used for the kernel computation.  One matching takes between one and five minutes.  Template estimation stopped after 7 loops of alternated optimization. Time computation Standard initialization I.C. Initialized by an I.C. Saving(%) Data1 (n=500, nbPoints=135) 96 h 1,8 h 25 h (26,8h) 72% Data2 (n=95, nbPoints=1001) 21 h 1,5 h 12 h (13,5h) 36% RealData (n=95,nbPoints≈2000) 99 h 2,7 h 26 h (28,7h) 71% CONCLUSION AND PERSPECTIVES  We would like to analyze the Way 2 of the Iterative Centroid Method. Compare Way 1 and Way 2  We also would like to test this method  with others template estimation method  as a template for the analysis of the population, compared to a template estimation  Is an actual and precise template estimation process really required ? GSI2013 - Geometric Science of Information 23 This method provides quickly a centroid in a couple of hours  The method presented here is used as initialization for template estimation method, in order to increase the convergence speed.  This method need an order, but the ordering has an insignificant impact on the template estimation method. THANK YOU FOR YOUR ATTENTION GSI2013 - Geometric Science of Information

Keynote speech 2 (Hirohiko Shima)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo

. . . . . . . . . .. . . Geometry of Hessian Structures Hirohiko Shima h-shima@c-able.ne.jp Yamaguchi University 2013/8/29 Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 1 / 27 . . . . . . . ..1 Hessian Structures . ..2 Hessian Structures and K¨ahlerian Structures . ..3 Dual Hessian Structures . ..4 Hessian Curvature Tensor . ..5 Regular Convex Cones . ..6 Hessian Structures and Affine Differential Geometry . ..7 Hessian Structures and Information Geometry . ..8 Invariant Hessian Structures Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 2 / 27 . . . . . . Preface In 1964 Prof. Koszul was sent to Japan by the French Government and gave lectures at Osaka University. I was a student in those days and attended the lectures together with the late Professor Matsushima and Murakami. The topics of the lectures were a theory of flat manifolds with flat connection D and closed 1-form α such that Dα is positive definite. α being a closed 1-form it is locally expressed as α = dφ, and so Dα = Ddφ is just a Hessian metric in our view point. This is the ultimate origin of the notion of Hessian structures and the starting point of my research. Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 3 / 27 . . . . . . 1. Hessian structures . Definition (Hessian metric) .. . . .. . . M : manifold with flat connection D A Riemannian metric g on M is said to be a Hessian metric if g can be locally expressed by g = Ddφ, gij = ∂2 φ ∂xi ∂xj , where {x1 , · · · , xn } is an affine coordinate system w.r.t. D. (D, g) : Hessian structure on M (M, D, g) : Hessian manifold The function φ is called a potential of (D, g). Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 4 / 27 . . . . . . . Definition (difference tensor γ) .. . . .. . . Let γ be the difference tensor between the Levi-Civita connection ∇ for g and the flat connection D; γ = ∇ − D, γX Y = ∇X Y − DX Y . γi jk(component of γ)=Γi jk(Christoffel symbol for g) . Proposition (characterizations of Hessian metric) .. . . . Let (M, D) be a flat manifold and g a Riemannian metric on M. The following conditions are equivalent. (1) g is a Hessian metric. (2) (DX g)(Y , Z) = (DY g)(X, Z) Codazzi equation i.e. the covariant tensor Dg is symmetric. (3) g(γX Y , Z) = g(Y , γX Z) Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 5 / 27 . . . . . . 2. Hessian Structures and K¨ahlerian Structures . Definition (K¨ahlerian metric) .. . . .. . . A complex manifold with Hermitian metric g = ∑ i,j gij dzi d¯zj is called a K¨ahlerian metric if g is expressed by complex Hessian gij = ∂2 ψ ∂zi ∂¯zj , where {z1 , · · · , zn } is a holomorphic coordinate system. Chen and Yau called Hessian metrics affine K¨ahlerian metrics. . Proposition .. . . .. . . Let TM be the tangent bundle over Hessian manifold (M, D, g). Then TM is a complex manifold with K¨ahlerian metric gT = ∑n i,j=1 gij dzi d¯zj where zi = xi + √ −1dxi . Hirohiko Shima (Yamaguchi University) Geometry of Hessian Structures 2013/8/29 6 / 27 . . . . . . . Example (tangent bundle of paraboloid) .. . . .. . . Ω = { x ∈ Rn | xn − 1 2 n−1∑ i=1 (xi )2 > 0 } paraboloid φ = log { xn − 1 2 n−1∑ i=1 (xi )2 }−1 g = Ddφ : Hessian metric on Ω TΩ ∼= Ω + √ −1Rn ⊂ Cn : tube domain over Ω TΩ ∼

ORAL SESSION 3 Differential Geometry in Signal Processing (Michel Berthier)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo

A Riemannian Fourier Transform via Spin Representations Geometric Science of Information 2013 T. Batard - M. Berthier - Outline of the talk The Fourier transform for multidimensional signals - Examples Three simple ideas The Riemannian Fourier transform via spin representations Applications to filtering The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • T. Ell and S.J. Sangwine transform : Fµφ(U) = ∫ R2 φ(X)exp(−µ⟨X, U⟩)dX (1) where φ : R2 −→ H0 is a color image and µ is a pure unitary quaternion encoding the grey axis. Fµφ = A∥ exp[µθ∥] + A⊥ exp[µθ⊥]ν (2) where ν is a unitary quaternion orthogonal to µ. Allows to define an am- plitude and a phase in the chrominance and in the luminance. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • T. Bülow transform : Fijφ(U) = ∫ R2 exp(−2iπx1u1)φ(X)exp(−2jπu2x2)dX (1) where φ : R2 −→ R. Fijφ(U) = Fccφ(U) − iFscφ(U) − jFcsφ(U) + kFssφ(U) (2) Allows to analyse the symetries of the signal with respect to the x and y variables. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • M. Felsberg transform : Fe1e2e3 φ(U) = ∫ R2 exp(−2πe1e2e3⟨U, X⟩)φ(X)dX (1) where φ(X) = φ(x1e1 + x2e2) = φ(x1, x2)e3 is a real valued function defined on R2 (a grey level image). The coefficient e1e2e3 is the pseudos- calar of the Clifford algebra R3,0. This transform is well adapted to the monogenic signal. The Fourier transform for multidimensional signals The problem : How to define a Fourier transform for a signal φ : Rd −→ Rn that does not reduce to componentwise Fourier transforms and that takes into account the (local) geometry of the graph associated to the signal ? Framework of the talk : the signal φ : Ω ⊂ R2 −→ Rn is a grey-level image, n = 1, or a color image, n = 3. In the latter case, we want to deal with the full color information in a really non marginal way. Many already existing propositions (without geometric considerations) : • F. Brackx et al. transform : F±φ(U) = ( 1 √ 2π )n ∫ Rn exp( i π 2 ΓU) × exp(−i⟨U, X⟩)φ(X)dX (1) where ΓU is the angular Dirac operator. For φ : R2 −→ R0,2 ⊗ C F±φ(U) = 1 2π ∫ R2 exp(±U ∧ X)φ(X)dX (2) where exp(±U ∧ X) is a bivector. Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • Shift theorem : Fφα(u) = e2iπαu Fφ(u) (3) where φα(x) = φ(x + α). Here, the involved group is the group of trans- lations of R. The action is given by (α, x) −→ x + α := τα(x) (4) The mapping (group morphism) χu : τα −→ e2iπuα = χu(α) ∈ S1 (5) is a so-called character of the group (R, +). The Fourier transform reads Fφ(u) = ∫ R χu(−x)φ(x)dx (6) .. Spinor Fourier Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • More precisely : – By means of χu, every element of the group is represented as a unit complex number that acts by multiplication on the values of the function. Every u gives a representation and the Fourier transform is defined on the set of representations. – If the group G is abelian, we only deal with the group morphisms from G to S1 (characters). Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • Some transforms : – G = (Rn, +) : we recover the usual Fourier transform. – G = SO(2, R) : this corresponds to the theory of Fourier series. – G = Z/nZ : we obtain the discrete Fourier transform. – In the non abelian case one has to deal with the equivalence classes of unitary irreducible representations (Pontryagin dual). Some of these irreducible representations are infinite dimensional. Applications to ge- neralized Fourier descriptors with the group of motions of the plane, to shearlets,... Three simple ideas ..1 The abstract Fourier transform is defined through the action of a group. • The problem : Find a good way to represent the group of translations (R2, +) in order to make it act naturally on the values (in Rn) of a multidimensional function Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Usual identifications : X = (x1, x2) ∈ R2 ↔ z = x1 + ix2 ∈ C (3) X = (x1, x2, x3, x4) ∈ R4 ↔ q = x1 + ix2 + jx3 + kx4 ∈ H (4) The fields C and H are the Clifford algebras R0,1 (of the vector space R with the quadratic form Q(x) = −x2) and R0,2 (of the vector space R2 with the quadratic form Q(x1, x2) = −x2 1 − x2 2). • Clifford algebras : the vector space Rn with the quadratic form Qp,q is embedded in an algebra Rp,q of dimension 2n that contains scalars, vectors and more generally multivectors such as the bivector u ∧ v = 1 2 (uv − vu) (5) Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • The spin groups : the group Spin(n) is the group of elements of R0,n that are products x = n1n2 · · · n2k (3) of an even number of unit vectors of Rn. • Some identifications : Spin(2) ≃ S1 (4) Spin(3) ≃ H1 (5) Spin(4) ≃ H1 × H1 (6) • Natural idea : replace the group morphisms from (R2, +) to S1 , the cha- racters, by group morphisms from (R2, +) to Spin(n), the spin characters. Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • The problem : Compute the spin characters, i.e. the group morphisms from (R2, +) to Spin(n) Find meaningful representation spaces for the action of the spin characters Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Spin(3) characters : χu1,u2,B : (x1, x2) −→ exp 1 2 [ B A ( x1 x2 )] = exp 1 2 [(x1u1 + x2u2)B] (3) where A = (u1 u2) is the matrix of frequencies and B = ef with e and f two orthonormal vectors of R3. .. Spinor Fourier Three simple ideas ..2 The vectors of Rn can be considered as generalized numbers. • Spin(4) and Spin(6) characters : (x1, x2) −→ exp 1 2 [ (B1 B2) A ( x1 x2 )] (3) (x1, x2) −→ exp 1 2 [ (B1 B2 B3) A ( x1 x2 )] (4) where A is a 2 × 2, resp. 2 × 3, real matrix and Bi = eifi for i = 1, 2, resp. i = 1, 2, 3, with (e1, e2, f1, f2), resp. (e1, e2, e3, f1, f2, f3), an orthonormal basis of R4, resp. R6. Three simple ideas ..3 The spin characters are parametrized by bivectors. • Fundamental remark : the spin characters are as usual parametrized by frequencies, the entries of the matrix A. But they are also parametrized by bivectors, B, B1 and B2, B1, B2 and B3, depending on the context. • How to involve the geometry ? it seems natural to parametrize the spin characters by the bivector corresponding to the tangent plane of the image graph, more precisely by the field of bivectors corresponding to the fiber bundle of the image graph. Three simple ideas ..3 The spin characters are parametrized by bivectors. • Several possibilities for dealing with representation spaces for the action of the spin characters : – Using Spin(3) characters and the generalized Weierstrass representa- tion of surface (T. Friedrich) : in “Quaternion and Clifford Fourier Trans- form and Wavelets (E. Hitzer and S.J. Sangwine Eds), Trends in Mathe- matics, Birkhauser, 2013. – Using Spin(4) and Spin(6) characters and the so-called standard re- presentations of the spin groups : in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Differential Geometry in Signal Pro- cessing, Vol 7, Issue 4, 2013. The Riemannian Fourier transform The spin representations of Spin(n) are defined through complex represen- tations of Clifford algebras. They do not “descend” to the orthogonal group SO(n, R) (since they send −1 to −Identity contrary to the standard represen- tations). These are the representations used in physics. The complex spin representation of Spin(3) is the group morphism ζ3 : Spin(3) −→ C(2) (5) obtained by restricting to Spin(3) ⊂ (R3,0 ⊗ C)0 a complex irreducible repre- sentation of R3,0. An color image is considered as a section .. Spinor Fourier σφ : (x1, x2) −→ 3∑ k=1 (0, φk (x1, x2)) ⊗ gk (6) of the spinor bundle PSpin(E3(Ω)) ×ζ3 C2 (7) where E3(Ω) = Ω × R3 and (g1, g2, g3) is the canonical basis of R3. The Riemannian Fourier transform Dealing with spinor bundles allows varying spin characters and the most natural choice for the field of bivectors B := B(x1, x2) which generalized the field of tangent planes is B = γ1g1g2 + γ2g1g3 + γ3g2g3 (8) with γ1 = 1 δ γ2 = √∑3 k=1 φ2 k,x2 δ γ3 = − √∑3 k=1 φ2 k,x1 δ δ = 1 + 2∑ j=1 3∑ k=1 φ2 k,xj (9) The operator B· acting on the sections of S(E3(Ω)), where · denotes the Clif- ford multiplication, is represented by the 2 × 2 complex matrix field B· = ( iγ1 −γ2 − iγ3 γ2 − iγ3 −iγ1 ) (10) Since B2 = −1 this operator has two eigenvalue fields i and −i. Consequently, every section σ of S(E3(Ω)) can be decomposed into σ = σB + + σB − where σB + = 1 2 (σ − iB · σ) σB − = 1 2 (σ + iB · σ) (11) The Riemannian Fourier transform The Riemannian Fourier transform of σφ is given by .. Usual Fourier FBσφ(u1, u2) = ∫ R2 χu1,u2,B(x1,x2)(−x1, −x2) · σφ(x1, x2)dx1dx2 (12) .. Spin characters .. Image section The decomposition of a section σφ associated to a color image leads to φ(x1, x2) = ∫ R2 3∑ k=1 [ φk+ (u1, u2)eu1,u2 (x1, x2) √ 1 − γ1 2 +φk− −1 (u1, u2)e−u1,−u2 (x1, x2) √ 1 + γ1 2 ] ⊗ gkdu1du2 (13) where φk+ = φk √ 1 − γ1 2 φk− = φk √ 1 + γ1 2 (14) Low-pass filtering Figure: Left : Original - Center : + Component - Right : - Component Low-pass filtering (a) + Component (b) Variance : 10000 (c) Variance : 1000 (d) Variance : 100 (e) Variance : 10 (f) Variance → 0 Figure: Low-pass filtering on the + component Low-pass filtering (a) - Component (b) Variance : 10000 (c) Variance : 1000 (d) Variance : 100 (e) Variance : 10 (f) Variance → 0 Figure: Low-pass filtering on the - component Thank you for your attention !

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives K-Centroids-Based Supervised Classification of Texture Images using the SIRV modeling Aurélien Schutz Lionel Bombrun Yannick Berthoumieu IMS Laboratory - CNRS UMR5218, Groupe Signal 28-30 august 2013 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 1 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Database classification Database classification Musical genres Images databases Textured images databases Video databases Propositions Information geometry Centroid ¯θi [Choy2007] [Fisher1925], [Burbea1982], [Pennec1999], [Banerjee2005], [Amari2007], [Nielsen2009] Bayesian framework of classification Intrinsic prior p(θ | Hi ) [Bayes1763], [Whittaker1915], [Robert1996], [Bernardo2003] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 2 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Prior capability of handling the intra-class diversity Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 3 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 4 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 5 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Bayesian decision Data space X where x ∼ P Parameter space Θ PΘ a prior parametric model on Θ Riemannian manifold G the Fisher information matrix Nc classes, D = {Hi } Nc i=1 decision space Prerequisites : likelihood p(x | θ, Hi ), prior p(θ | Hi ), 0-1 loss L Decision rule on X : high computational complexity Xi = x | ˆHi = arg min Hj ∈D − log Θj p(x | θ, Hj )p(θ | Hj ).dθ Decision rule on Θ : minimizing conditional risk Duda, Bayesian decision theory Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 6 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Intra-class parametric p(θ | Hi) ¯θi centroid of the class i = 1, . . . , Nc Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 7 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Definition of intrinsic prior based on Jeffrey divergence Prior that follow a Gaussian distribution on manifold Θ p(θ | Hi ) = Zi exp − 1 2 (γ¯θi ,θ(1))T Ci γ¯θi ,θ(1) Pennec, Xavier, "Probabilities and statistics on Riemannian manifolds : basic tools for geometric measurements," NSIP, 1999 Proposition Intrinsic prior as Gaussian distribution on manifold Θ, with λi = (¯θi , σ2 i ) p(θ | λi , Hi ) |G(¯θi )|1/2 (σi √ 2π)d exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi )) Jeffrey divergence J(p(· | θ), p(· | ¯θi )) = X (p(x | θ) − p(x | ¯θi )) log p(x | θ) p(x | ¯θi ) .dx Fisher, R.A., "Theory of statistical estimation," Proc. Cambridge Phil. Soc., 22, pp. 700-–725, 1925 Burbea, Jacob et Rao, C.Radhakrishna, "Entropy differential metric, distance and divergence measures in probability spaces : A unified approach ," Journal of Multivariate Analysis, 4, pp. 575—596, 1982 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 8 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Optimal decision on Θ Decision on X based on Empirical Bayes, Xi = x Hi = arg min Hj ∈D − log Θi p(Xi | θ, Hi )p(θ | Hi ).dθ Kass, R. E. and Steffey, D., "Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)," 1989 Miyata, Y., "Fully Exponential Laplace Approximations Using Asymptotic Modes," Journal of the American Statistical Association, 2004 Proposition Decision on X, Laplace approximation Xi x ˆλi = arg min λj ∈D d 2 log{2σ2 j + 1} + 1 2σ2 j J(p(· | ˆθ(x)), p(· | ¯θj )) ˆθ could be maximum likelihood estimator for p(x | θ, Hi ) [Miyata2004] Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 9 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 10 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Space/scale decomposition Reel/complex wave- lets Gabor Steerable filters Bandelets Grouplets Dual-Tree Mallat, S. A, "Theory for multiresolution signal decomposition : The wavelet representation," IEEE PAMI, 1989 Do, M. and Vetterli, M., "Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance," IEEE IP, 2002 Choy, S.-K. and Tong, C.-S., "Supervised Texture Classification Using Characteristic Generalized Gaussian Density," Journal of Mathematical Imaging and Vision, 2007 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 11 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Stochastic models for likelihood p(x | θ, Hi) Spherically Invariant Random Vector (SIRV) x = g √ τ : g multivariate gaussian distribution Σ τ Weibull distribution a Joint distribution y = (τ, g), θ = (Σ, a) p(y | θ) = pG (g | Σ)pw (τ | a) Separability of Jeffrey divergence J(p(· | θ), p(· | θ )) = J(pG (· | Σ), pG (· | Σ ))+J(pw (· | a), pw (· | a )) Bombrun, L., Lasmar, N.-E., Berthoumieu, Y. and Verdoolaege, G., Multivariate texture retrieval using the SIRV representation and the geodesic distance, 2011 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 12 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Centroid ¯θi computation State of the art : exponential families ; centred multivariate Gaussian ¯ΣR,i = 1 Ni Ni n=1 Σ−1 n −1 and ¯ΣL,i = 1 Ni Ni n=1 Σn Banerjee, A., Merugu, S., Dhillon, I. and Ghosh, J., Clustering with Bregman divergences, 2005 Nielsen, F. and Nock, R. Sided and Symmetrized Bregman Centroids, 2009 Steepest descent algorithm for Weibull centroid Dekker, T. J., Finding a zero by means of successive linear interpolation, 1969 Brent, R. P., An algorithm with guaranteed convergence for finding a zero of a function, 1971 Proposition Separated estimation of each centroid. ¯θi = (1 − i )¯ΣR,i + i ¯ΣL,i , arg min a∈R+ 1 Ni Ni n=1 J(pw (· | an), pw (· | a)) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 13 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 14 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Unique centroid versus multiple centroids (K-CB) Varma, M. and Zisserman, A., A Statistical Approach to Texture Classification from Single Images, 2005 Several centroids per class (¯θi,k )K k=1, likelihood K-CB with binary weights wk pm(θ | (Hi,k )K k=1) = K k=1 wk Zi,k exp − 1 2σ2 i J(p(· | θ), p(· | ¯θi,k )) Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 15 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Algorithms for K-CB K-means ( Hard C-Means ) Proposition 1. Assignment of parametric vector θ Θi,k = θ | ˆθi,k = arg min θi,l ∈Hi 1 2σ2 i J(p(· | θ), p(· | ¯θi,l )) 2. Update ¯θi,k ¯θi,k = arg min ¯θ∈Θ Θi,k 1 2σ2 i J(p(· | θ), p(· | ¯θ)).dθ Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 16 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Textured image database Vision Texture database (VisTex) Brodatz database Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 17 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Vistex, SIRV Weibull, Jeffrey divergence 2/16 5/16 8/16 11/16 14/16 85 90 95 100 No. of training sample Averagekappaindex(%) 1−CB [1] 3−CB 1−NN Spatial Database K 1-NN 1-CB [1] K-CB neigh. NTr = K NTr = NSa/2 NTr = NSa/2 3 × 3 VisTex 3 83.7 % ±2.0 90.4 % ±1.3 96.8 % ±1.2 Brodatz 10 50.6 % ±2.6 79.9 % ±1.5 96.2 % ±1.2 1 × 1 VisTex 3 78.7 % ±2.3 72.7 % ±2.0 88.9 % ±1.7 Brodatz 10 65.8 % ±2.7 70 % ±1 97 % ±2 [1] Choy, S.K., Tong, C.S. : Supervised texture classification using characteristic generalized Gaussian density. Journal of Mathematical Imaging and Vision 29 (Aug. 2007) 35–47 Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 18 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 19 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Conclusions et Perspectives Conclusion 1. Bayesian classification theory and Information geometry 2. Concentred Gaussian distribution as prior p(θ | Hi ) intrinsic when p(θ) or L depends on Fisher information matrix G(θ) Decision rule done on Θ 3. K-Centroids based (K-CB) classification Diversity intra-class too high : a class, K centroids K-means on each class Numerical application : K-CB performances close to 1-NN performances K-CB give a low computing complexity Perspectives 1. K-CB with Possibilistic Fuzzy C-Means (PFCM) algorithm 2. Adapting the number of centroid needed by class Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 20 / 21 Introduction Bayesian classification and Information geometry Textured images K-CB and results Conclusion and Perspectives Content Brodatz, P., Textures : A Photographic Album for Artists and Designers, 1966. Schutz, A. and Bombrun, L. and Berthoumieu, Y. (Labo IMS) K-CB of images with SIRV 28-30 august 2013 21 / 21

Keynote speech 3 (Giovanni Pistone)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo

GSI2013 - Geometric Science of Information, Paris, 28-30 August 2013 Dimensionality reduction for classification of stochastic fibre radiographs C.T.J. Dodson1 and W.W. Sampson2 School of Mathematics1 and School of Materials2 University of Manchester UK ctdodson@manchester.ac.uk Abstract Dimensionality reduction helps to identify small numbers of essential features of stochastic fibre networks for classification of image pixel density datasets from experimental radiographic measurements of commercial samples and simulations. Typical commercial macro-fibre networks use finite length fibres suspended in a fluid from which they are continuously deposited onto a moving bed to make a continuous web; the fibres can cluster to differing degrees, primarily depending on the fluid turbulence, fibre dimensions and flexibility. Here we use information geometry of trivariate Gaussian spatial distributions of pixel density among first and second neighbours to reveal features related to sizes and density of fibre clusters. Introduction Much analytic work has been done on modelling of the statistical geometry of stochastic fibre networks and their behaviour in regard to strength, fluid ingress or transfer [1, 5, 7]. Using complete sampling by square cells, their areal density distribution is typically well represented by a log-gamma or a (truncated) Gaussian distribution of variance that decreases monotonically with increasing cell size; the rate of decay is dependent on fibre and fibre cluster dimensions. Clustering of fibres is well-approximated by Poisson processes of Poisson clusters of differing density and size. A Poisson fibre network is a standard reference structure for any given size distribution of fibres; its statistical geometry is well-understood for finite and infinite fibres. Figure : 1. Electron micrographs of four stochastic fibrous materials. Top left: Nonwoven carbon fibre mat; Top right: glass fibre filter; Bottom left: electrospun nylon nanofibrous network (Courtesy S.J. Eichhorn and D.J. Scurr); Bottom right: paper using wood cellulose fibres—typically flat ribbonlike, of length 1 to 2mm and width 0.02 to 0.03mm. Figure : 2. Areal density radiographs of three paper networks made from natural wood cellulose fibres, of order 1mm in length, with constant mean density but different distributions of fibres. Each image represents a square region of side length 5 cm; darker regions correspond to higher coverage. The left image is similar to that expected for a Poisson process of the same fibres, so typical real samples exhibit clustering of fibres. Spatial statistics We use information geometry of trivariate Gaussian spatial distributions of pixel density with covariances among first and second neighbours to reveal features related to sizes and density of fibre clusters, which could arise in one, two or three dimensions—the graphic shows a grey level barcode for the ordered sequence of the 20 amino acids in a yeast genome, a 1-dimensional stochastic texture. Saccharomyces CerevisiaeAmino Acids SC1 For isotropic spatial processes, which we consider here, the variables are means over shells of first and second neighbours, respectively, which share the population mean with the central pixel. For anisotropic networks the neighbour groups would be split into more, orthogonal, new variables to pick up the spatial anisotropy in the available spatial directions. Typical sample data Figure : 3. Trivariate distribution of areal density values for a typical newsprint sample. Left: source radiograph; centre: histogram of pixel densities ˜βi , average of first neighbours ˜β1,i and second neighbours ˜β2,i ; right: 3D scatter plot of ˜βi , ˜β1,i and ˜β2,i . Information geodesic distances between multivariate Gaussians What we know analytically is the geodesic distance between two multivariate Gaussians, fA, fB, of the same number n of variables in two particular cases [2]: Dµ(fA, fB) when they have a common mean µ but different covariances ΣA, ΣB and DΣ(fA, fB) when they have a common covariance Σ but different means µA, µB. The general case is not known analytically but for the purposes of studying the stochastic textures arising from areal density arrays of samples of stochastic fibre networks, a satisfactorily discriminating approximation is D(fA , fB ) ≈ Dµ(fA , fB ) + DΣ(fA , fB ). Information geodesic distance between multivariate Gaussians [2] (1). µA = µB, ΣA = ΣB = Σ : fA = (n, µA, Σ), fB = (n, µB, Σ) Dµ(fA , fB ) = µA − µB T · Σ−1 · µA − µB . (1) (2). µA = µB = µ, ΣA = ΣB : fA = (n, µ, ΣA), fB = (n, µ, ΣB) DΣ(fA , fB ) = 1 2 n j=1 log2 (λj), (2) with {λj} = Eig(ΣA−1/2 · ΣB · ΣA−1/2 ). From the form of DΣ(fA, fB) in (2) it may be seen that an approximate monotonic relationship arises with a more easily computed symmetrized log-trace function given by ∆Σ(fA, fB) = log 1 2n Tr(ΣA−1/2 · ΣB · ΣA−1/2 ) + Tr(ΣB−1/2 · ΣA · ΣB−1/2 ) . (3) This is illustrated by the plot of DΣ(fA, fB) from equation (2) on ∆Σ(fA, fB) from equation (3) in Figure 4 for 185 trivariate Gaussian covariance matrices. For comparing relative proximity, this is a better measure near zero than the symmetrized Kullback-Leibler distance [6] in those multivariate Gaussian cases so far tested and may be quicker for handling large batch processes. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 DΣ(fA, fB) ∆Σ(fA, fB) Figure : 4. Plot of DΣ(fA , fB ) from (2) on ∆Σ(fA , fB ) from (3) for 185 trivariate Gaussian covariance matrices. Dimensionality reduction for data sets 1. Obtain mutual ‘information distances’ D(i, j) among the members of the data set of textures X1, X2, .., XN each with 250×250 pixel density values. 2. The array of N × N differences D(i, j) is a symmetric positive definite matrix with zero diagonal. This is centralized by subtracting row and column means and then adding back the grand mean to give CD(i, j). 3. The centralized matrix CD(i, j) is again symmetric positive definite with diagonal zero. We compute its N eigenvalues ECD(i), which are necessarily real, and find the N corresponding N-dimensional eigenvectors VCD(i). 4. Make a 3 × 3 diagonal matrix A of the first three eigenvalues of largest absolute magnitude and a 3 × N matrix B of the corresponding eigenvectors. The matrix product A · B yields a 3 × N matrix and its transpose is an N × 3 matrix T, which gives us N coordinate values (xi, yi, zi) to embed the N samples in 3-space. 2 4 6 8 10 12 14 2 4 6 8 10 12 14 -0.5 0.0 0.5 -0.2 0.0 0.2 -0.2 -0.1 0.0 0.1 Figure : 5. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), trivariate Gaussian information distances among 16 datasets of 1mm pixel density differences between a Poisson network and simulated networks from 1mm fibres, same mean density different clustering. Embedding: subgroups show numbers of fibres in clusters and cluster densities. 2 4 6 8 10 12 2 4 6 8 10 12 -1 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 6. DΣ(fA , fB ) as a cubic-smoothed surface (left), contour plot (right), for trivariate Gaussian information distances among 16 datasets of 1mm pixel density arrays for simulated networks made from 1mm fibres, each network with the same mean density but with different clustering. Embedding: subgroups show numbers of fibres in clusters and cluster densities; the solitary point is an unclustered Poisson network. 2 4 6 8 10 12 2 4 6 8 10 12 0 1 0.0 0.5 -0.2 0.0 0.2 Figure : 7. DΣ(fA , fB ) as a cubic-smoothed surface (left), and as a contour plot (right), for trivariate Gaussian information distances among 16 simulated Poisson networks made from 1mm fibres, with different mean density, using pixels at 1mm scale. Second row: Embedding of the same Poisson network data, showing the effect of mean network density. -5 0 5 0 2 4 -1.0 -0.5 0.0 0.5 Figure : 8. Embedding using 182 trivariate Gaussian distributions for samples from a data set of radiographs of commercial papers. The embedding separates different forming methods into subgroups. References [1] K. Arwini and C.T.J. Dodson. Information Geometry Near Randomness and Near Independence. Lecture Notes in Mathematics. Springer-Verlag, New York, Berlin, 2008, Chapter 9 with W.W. Sampson, Stochasic Fibre Networks pp 161-194. [2] C. Atkinson and A.F.S. Mitchell. Rao’s distance measure. Sankhya: Indian Journal of Statistics 48, A, 3 (1981) 345-365. [3] K.M. Carter, R. Raich and A.O. Hero. Learning on statistical manifolds for clustering and visualization. In 45th Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, 2007. https://wiki.eecs.umich.edu/global/data/hero/images/c/c6/Kmcarter- learnstatman.pdf [4] K.M. Carter Dimensionality reduction on statistical manifolds. PhD thesis, University of Michigan, 2009. http://tbayes.eecs.umich.edu/kmcarter/thesis [5] M. Deng and C.T.J. Dodson. Paper: An Engineered Stochastic Structure. Tappi Press, Atlanta, 1994. [6] F. Nielsen, V. Garcia and R. Nock. Simplifying Gaussian mixture models via entropic quantization. In Proc. 17th European Signal Processing Conference, Glasgow, Scotland 24-28 August 2009, pp 2012-2016. [7] W.W. Sampson. Modelling Stochastic Fibre Materials with Mathematica. Springer-Verlag, New York, Berlin, 2009.

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo

Nonparametric Information Geometry http://www.giannidiorestino.it/GSI2013-talk.pdf Giovanni Pistone de Castro Statistics Initiative Moncalieri, Italy August 30, 2013 Abstract The differential-geometric structure of the set of positive densities on a given measure space has raised the interest of many mathematicians after the discovery by C.R. Rao of the geometric meaning of the Fisher information. Most of the research is focused on parametric statistical models. In series of papers by author and coworkers a particular version of the nonparametric case has been discussed. It consists of a minimalistic structure modeled according the theory of exponential families: given a reference density other densities are represented by the centered log likelihood which is an element of an Orlicz space. This mappings give a system of charts of a Banach manifold. It has been observed that, while the construction is natural, the practical applicability is limited by the technical difficulty to deal with such a class of Banach spaces. It has been suggested recently to replace the exponential function with other functions with similar behavior but polynomial growth at infinity in order to obtain more tractable Banach spaces, e.g. Hilbert spaces. We give first a review of our theory with special emphasis on the specific issues of the infinite dimensional setting. In a second part we discuss two specific topics, differential equations and the metric connection. The position of this line of research with respect to other approaches is briefly discussed. References in • GP, GSI2013 Proceedings. A few typos corrected in arXiv:1306.0480; • GP, arXiv:1308.5312 • If µ1, µ2 are equivalent measures on the same sample space, a statistical model has two representations L1(x; θ)µ1(dx) = L2(x; θ)µ2(dx). • Fisher’s score is a valid option s(x; θ) = d dθ ln Li (x; θ), i = 1, 2, and Eθ [sθ] = 0. • Each density q equivalent to p is of the form q(x) = ev(x) p(x) Ep [ev ] = exp (v(x) − ln (Ep [ev ])) p(x), where v is a random variable such that Ep [ev ] < +∞. • To avoid borderline cases, we actually require Ep eθv < +∞, θ ∈ I open ⊃ [0, 1]. • Finally, we require Ep [v] = 0. Plan Part I Exponential manifold Part II Vector bundles Part III Deformed exponential Part I Exponential manifold Sets of densities Definition P1 is the set of real random variables f such that f dµ = 1, P≥ the convex set of probability densities, P> the convex set of strictly positive probability densities: P> ⊂ P≥ ⊂ P1 • We define the (differential) geometry of these spaces in a way which is meant to be a non-parametric generalization of Information Geometry • We try to avoid the use of explicit parameterization of the statistical models and therefore we use a parameter free presentation of differential geometry. • We construct a manifold modeled on an Orlicz space. • We look for applications to applications intrisically non parametric, i.e. Statistical Physics, Information Theory, Optimization, Filtering. Banach manifold Definition 1. Let P be a set, E ⊂ P a subset, B a Banach space. A 1-to-1 mapping s : E → B is a chart if the image s(E) = S ⊂ B is open. 2. Two charts s1 : E1 → B1, s2 : E2 → B2, are both defined on E1 ∩ E2 and are compatible if s1(E1 ∩ E2) is an open subset of B1 and the change of chart mapping s2 ◦ s−1 1 : s1(E1 ∩ E2) s−1 1 // E1 ∩ E2 s2 // s2(E1 ∩ E2) is smooth. 3. An atlas is a set of compatible charts. • Condition 2 implies that the model spaces B1 and B2 are isomorphic. • In our case: P = P>, the atlas has a chart sp for each p ∈ P> such that sp(p) = 0 and two domains Ep1 and Ep2 are either equal or disjoint. Charts on P> Model space Orlicz Φ-space If φ(y) = cosh y − 1, the Orlicz Φ-space LΦ (p) is the vector space of all random variables such that Ep [Φ(αu)] is finite for some α > 0. Properties of the Φ-space 1. u ∈ LΦ (p) if, and only if, the moment generating function α → Ep [eαu ] is finite in a neighborhood of 0. 2. The set S≤1 = u ∈ LΦ (p) Ep [Φ(u)] ≤ 1 is the closed unit ball of a Banach space with norm u p = inf ρ > 0 Ep Φ u ρ ≤ 1 . 3. u p = 1 if either Ep [Φ(u)] = 1 or Ep [Φ(u)] < 1 and Ep Φ u ρ = ∞ for ρ < 1. If u p > 1 then u p ≤ Ep [Φ(u)]. In particular, lim u p→∞ Ep [Φ (u)] = ∞. Example: boolean state space • In the case of a finite state space, the moment generating function is finite everywhere, but its computation can be challenging. • Boolean case: Ω = {+1, −1} n , uniform density p(x) = 2−n , x ∈ Ω. A generic real function on Ω has the form u(x) = α∈L ˆu(α)xα , with L = {0, 1} n , xα = n i=1 xαi i , ˆu(α) = 2−n x∈Ω u(x)xα . • The moment generating function of u under the uniform density p is Ep etu = B∈B(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)), where B(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2. • Ep [Φ(tu)] = B∈B0(ˆu) α∈Bc cosh(tˆu(α)) α∈B sinh(tˆu(α)) − 1, where B0(ˆu) are those B ⊂ Supp ˆu such that α∈B α = 0 mod 2 and α∈Supp ˆu α = 0. Example : the sphere is not smooth in general • p(x) ∝ (a + x)−3 2 e−x , x, a > 0. • For the random variable u(x) = x, the function Ep [Φ(αu)] = 1 ea Γ −1 2, a ∞ 0 (a+x)−3 2 e−(1−α)x + e−(1+α)x 2 dx−1 is convex lower semi-continuous on α ∈ R, finite for α ∈ [−1, 1], infinite otherwise, hence not smooth. −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.0 \alpha E_p(\Phi(\alphau) q q Isomorphism of LΦ spaces Theorem LΦ (p) = LΦ (q) as Banach spaces if p1−θ qθ dµ is finite on an open neighborhood I of [0, 1]. It is an equivalence relation p q and we denote by E(p) the class containing p. The two spaces have equivalent norms Proof. Assume u ∈ LΦ (p) and consider the convex function C : (s, θ) → esu p1−θ qθ dµ. The restriction s → C(s, 0) = esu p dµ is finite on an open neighborhood Jp of 0; the restriction θ → C(0, θ) = p1−θ qθ dµ is finite on the open set I ⊃ [0, 1]. hence, there exists an open interval Jq 0 where s → C(s, 1) = esu q dµ is finite. q q J_p J_q I e-charts Definition (e-chart) For each p ∈ P>, consider the chart sp : E(p) → LΦ 0 (p) by q → sp(q) = log q p + D(p q) = log q p − Ep log q p For u ∈ LΦ 0 (p) let Kp(u) = ln Ep [eu ] the cumulant generating function of u and let Sp the interior of the proper domain. Define ep : Sp u → eu−Kp(u) · p ep ◦ sp is the identity on E(p) and sp ◦ ep is the identity on Sp. Theorem (Exponential manifold) {sp : E (p)|p ∈ P>} is an affine atlas on P>. Cumulant functional • The divergence q → D(p q) is represented in the chart centered at p by Kp(u) = log Ep [eu ], where q = eu−Kp(u) · p, u ∈ Bp = LΦ 0 (p). • Kp : Bp → R≥ ∪ {+∞} is convex and its proper domain Dom (Kp) contains the open unit ball of Tp. • Kp is infinitely Gˆateaux-differentiable on the interior Sp of its proper domain and analytic on the unit ball of Bp. • For all v, v1, v2, v3 ∈ Bp the first derivatives are: d Kpuv = Eq [v] d2 Kpu(v1, v2) = Covq (v1, v2) d3 Kpu(v1, v2, v3) = Covq(v1, v2, v3) Change of coordinate The following statements are equivalent: 1. q ∈ E (p); 2. p q; 3. E (p) = E (q); 4. ln q p ∈ LΦ (p) ∩ LΦ (q). 1. If p, q ∈ E(p) = E(q), the change of coordinate sq ◦ ep(u) = u − Eq [u] + ln p q − Eq ln p q is the restriction of an affine continuous mapping. 2. u → u − Eq [u] is an affine transport from Bp = LΦ 0 (p) unto Bq = LΦ 0 (q). Summary p q =⇒ E (p) sp // Sp sq◦s−1 p  I // Bp d(sq◦s−1 p )  I // LΦ (p) E (q) sq // Sq I // Bq I // LΦ (q) • If p q, then E (p) = E (q) and LΦ (p) = LΦ (q). • Bp = LΦ 0 (p), Bq = LΦ 0 (q) • Sp = Sq and sq ◦ s−1 p : Sp → Sq is affine sq ◦ s−1 p (u) = u − Eq [u] + ln p q − Eq ln p q • The tangent application is d(sq ◦ s−1 p )(v) = v − Eq [v] (does not depend on p) Duality Young pair (N–function) • φ−1 = φ∗, • Φ(x) = |x| 0 φ(u) du • Φ∗(y) = |y| 0 φ∗(v) dv • |xy| ≤ Φ(x) + Φ∗(y) 0 1 2 3 4 5 050100150 v phi φ∗(u) φ(v) Φ∗(x) Φ(y) ln (1 + u) ev − 1 (1 + |x|) ln (1 + |x|) − |x| e|y| − 1 − |y| sinh−1 u sinh v |x| sinh−1 |x| − √ 1 + x2 + 1 cosh y − 1 • LΦ∗ (p) × LΦ (p) (v, u) → u, v p = Ep [uv] • u, v p ≤ 2 u Φ∗,p v Φ,p • (LΦ∗ (p)) = LΦ (p) because Φ∗(ax) ≤ a2 Φ∗(x) if a > 1 (∆2). m-charts For each p ∈ P>, consider a second type of chart on f ∈ P1 : ηp : f → ηp(f ) = f p − 1 Definition (Mixture manifold) The chart is defined for all f ∈ P1 such that f /p − 1 belongs to ∗ Bp = LΦ+ 0 (p). The atlas (ηp : ∗ E (p)), p ∈ P> defines a manifold on P1 . If the sample space is not finite, such a map does not define charts on P>, nor on P≥. Example: N(µ, Σ), det Σ = 0 I G = (2π)− n 2 (det Σ)− 1 2 exp − 1 2 (x − µ)T Σ−1 (x − µ) µ ∈ Rn , Σ ∈ Symn + . ln f (x) f0(x) = − 1 2 ln (det Σ) − 1 2 (x − µ)T Σ−1 (x − µ) + 1 2 xT x = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) Ef0 ln f f0 = 1 2 (n − Tr Σ−1 ) − 1 2 µT Σ−1 µ − 1 2 ln (det Σ) u(x) = ln f (x) f0(x) − Ef0 ln f f0 = 1 2 xT (I − Σ−1 )x + µT Σ−1 x − 1 2 (n − Tr Σ−1 ) Kf0 (u) = − 1 2 (n − Tr Σ−1 ) + 1 2 µT Σ−1 µ + 1 2 ln (det Σ) Example: N(µ, Σ), det Σ = 0 II G as a sub-manifold of P> G = x → eu(x)−K(u) f0(x) u ∈ H1,2 ∩ Sf0 • H1,2 is the Hemite space of total degree 1 and 2, that is the vector space generated by the Hermite polynomials X1, . . . , Xn, (X2 1 − 1), . . . , (X2 n − 1), X1X2, . . . , Xn−1Xn • If the matrix S, Sii = βii − 1 2 , Sij

ORAL SESSION 4 Relational Metric (Jean-François Marcotorchino)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

A general framework for comparing heterogeneous binary relations Julien Ah-Pine (julien.ah-pine@eric.univ-lyon2.fr) University of Lyon - ERIC Lab GSI 2013 Paris 28/08/2013 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 1 Outline 1 Introduction 2 Kendall’s general coefficient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 2 Introduction Outline 1 Introduction 2 Kendall’s general coefficient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 3 Introduction Binary relations (BR) A Binary Relation (BR) R over a finite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a finite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reflexive, symmetric and transitive BR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Binary relations (BR) A Binary Relation (BR) R over a finite set A = {a, . . . , i, j, . . . , n} of n items is a subset of A × A. If (i, j) ∈ R we say “i is in relation with j for R” and this is denoted iRj. Equivalence Relations (ER) are reflexive, symmetric and transitive BR. Order Relations (OR) are of different types : preorders, partial orders and total (or linear or complete) orders. If ties and missing values : preorders (reflexive, transitive BR) If no tie but missing values : partial orders (reflexive, antisymmetric, transitive BR) If no tie and no missing value : total orders (reflexive, antisymmetric, transitive and complete BR) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 4 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Equivalence Relations and qualitative variables ER are related to qualitative or nominal categorical variables. Example : Color of eyes x = a b c d e Brown Brown Blue Blue Green X is the ER “has the same color of eyes than” and can be represented by a graph and its adjacency matrix (AM) denoted X such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X =       a b c d e a 1 1 . . . b 1 1 . . . c . . 1 1 . d . . 1 1 . e . . . . 1       J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 5 Introduction Order Relations and quantitative variables OR are related to quantitative variables. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction Order Relations and quantitative variables OR are related to quantitative variables. Example : Ranking of items x = a b c d e 1 2 4 3 5 X is the OR “has a lower rank than” and its AM X is again such that ∀i, j : Xij = 1 if iXj and Xij = 0 otherwise : X =       a b c d e a 1 1 1 1 1 b . 1 1 1 1 c . . 1 . 1 d . . 1 1 1 e . . . . 1       J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 6 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have different number of categories and different distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Introduction How to compare the relationships between BR ? We are given two variables of measurements x and y of the same kind (both qualitative or both quantitative). How can we measure the proximity between the BR underlying the two variables ? How to deal with heterogeneity ? When ER have different number of categories and different distributions ? For example : x = (A, A, B, B, C) ; y = (D, D, D, D, E) When OR are of different types ? For example : x = (1, 2, 4, 3, 5) ; y = (1, 1, 1, 4, 5) J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 7 Kendall’s general coefficient Γ Outline 1 Introduction 2 Kendall’s general coefficient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 8 Kendall’s general coefficient Γ Kendall’s Γ coefficient In statistics, Kendall in [Kendall(1948)] proposed a general correlation coefficient in order to define a broad family of association measures between x and y : Γ(x, y) = i,j Xij Yij i,j X2 ij i,j Y2 ij (1) where X and Y are two square matrices derived from x and y. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 9 Kendall’s general coefficient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Kendall’s general coefficient Γ Particular cases of Γ Particular cases of Γ given in [Vegelius and Janson(1982), Kendall(1948)]. Association measure Xij Tchuprow’s T n nx u − 1 if xi = xj −1 if xi = xj J-index px − 1 if xi = xj −1 if xi = xj Table: Particular cases of Γ as for ER nx u is the nb of items in category u of x and px is the nb of categories of x. Association measure Xij Kendall’s τa 1 if xi < xj −1 if xi > xj Spearman’s ρa Xij = xi − xj Table: Particular cases of Γ as for OR For Spearman’s ρa, xi is the rank of item i. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 10 Another view of Kendall’s Γ Outline 1 Introduction 2 Kendall’s general coefficient Γ 3 Another view of Kendall’s Γ Relational Matrices Reinterpreting Kendall’s Γ using RM The Weighted Indeterminacy Deviation Principle 4 Extending Kendall’s Γ for heterogeneous BR Heterogeneous BR A geometrical framework Similarities of order t > 0 5 A numerical example 6 Conclusion J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 11 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more specifically called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Relational Matrices Relational Matrices (RM) and some properties AM of BR have particular properties and they are more specifically called Relational Matrices (RM) by Marcotorchino in the Relational Analysis approach[Marcotorchino and Michaud(1979)]. For instance, the relational properties of X can be expressed as linear equations of X : reflexivity, ∀i (Xii = 1) ; symmetry, ∀i, j (Xij − Xji = 0) ; antisymmetry, ∀i, j (Xij + Xji ≤ 1) ; complete (or total), ∀i = j (Xij + Xji ≥ 1) ; transitivity, ∀i, j, k (Xij + Xjk − Xik ≤ 1). J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 12 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the definition of the opposite of an ER and of an OR. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the definition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the definition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Reinterpreting Kendall’s Γ using RM There have been several works about using RM to reformulate association measures in order to better understand their differences [Marcotorchino(1984-85), Ghashghaie(1990), Najah Idrissi(2000), Ah-Pine and Marcotorchino(2010)]. In this work, we propose to reinterpret Kendall’s Γ in terms of RM and which emphasizes the so-called weighted indeterminacy deviation principle. 1 We give the definition of the opposite of an ER and of an OR. 2 We introduce Λ, our formulation of Kendall’s Γ in terms of RM of BR, RM of opposites of BR and weighting schemes. 3 We show how Λ yields to well-known association measures. 4 We explain the weighted indeterminacy deviation principle. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 13 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. J. Ah-Pine (University of Lyon) A gen. fram. for compar. heterog. BR GSI 2013 / 14 Another view of Kendall’s Γ Reinterpreting Kendall’s Γ using RM Opposite relation of an ER and of an OR We introduce the opposite relation X of an ER or an OR X. If x is a categorical variable then X = X (the complement relation) : Xij

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Comparison of linear modularization criteria of networks using relational metric Patricia Conde C´espedes LSTA, Paris 6 August 2013 Thesis supervised by J.F. Marcotorchino (Thales Scientific Director) Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 1 / 35 Outline 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 2 / 35 Introduction and objective Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 3 / 35 Introduction and objective Description of the problem Objective: Compare the partitions found by different linear criteria Nowadays, we can find networks everywhere: (biology, computer programming, marketing, etc). Some practical applications are: Cyber-Marketing: Cyber-Security: It is difficult to analyse a network directly because of its big size. Therefore, we need to decompose it in clusters or modules ⇐⇒ modularize it. Different modularization criteria have been formulated in different contexts in the last few years and we need to compare them. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 4 / 35 Introduction and objective Graph partition Definition of module or community. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 5 / 35 Mathematical Relational representations of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 6 / 35 Mathematical Relational representations of criteria Mathematical Relational modeling Modularizing a graph G(V, E) ⇔ defining an equivalence relation on V . Let X be a square matrix of order N = |V | defining an equivalence relation on V as follows: xii = 1 if i and i are in the same cluster ∀i, i ∈ V × V 0 otherwise (1) We present a modularization criterion as a linear function to optimize: Max X F(A, X) (2) subject to the constraints of an equivalence relation: xii ∈ {0, 1} Binarity (3) xii = 1 ∀i Reflexivity xii − xi i = 0 ∀(i, i ) Symmetry xii + xi i − xii ≤ 1 ∀(i, i , i ) Transitivity Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 7 / 35 Mathematical Relational representations of criteria Properties verified by linear criteria Every linear criteria is separable as it can be written in the general form (it is possible to separate the data from the variables): F(X) = N i=1 N i =1 φ(aii )xii + constant (4) where aii is the general term of the adjacency matrix A and φ(aii ) is a function of the adjacency matrix only. Besides, the criterion is balanced if it can be written in the form: F(X) = N i=1 N i =1 φ(aii )xii + N i=1 N i =1 ¯φ(aii )¯xii (5) Where: ¯xii = 1 − xii represents the opposite relation of X, noted ¯X. φ(aii ) ≥ 0 ∀i, i and ¯φ(aii ) ≥ 0 ∀i, i are non negative functions verifying: N i=1 N i =1 φii > 0 and N i=1 N i =1 ¯φii > 0. As we will see later the functions φ and ¯φ behave as ”costs”. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 8 / 35 Mathematical Relational representations of criteria The property of Linear balance Given a graph If ¯φii = 0 ∀i, i all the nodes are clustered together, then κ = 1. If φii = 0 ∀i, i all nodes are separated, then κ = N If N i=1 N i =1 φii = N i=1 N i =1 ¯φii the criterion is a null model and therefore it has a resolution limit. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 9 / 35 Mathematical Relational representations of criteria Existing linear functions Function Relational notation Zahn-Condorcet (1785, 1964) FZC(X) = N i=1 N i =1 (aii xii + ¯aii ¯xii ) Owsi´nski-Zadro˙zny (1986) FZOZ (X) = N i=1 N i =1 ((1−α)aii xii +α¯aii ¯xii ) with 0 < α < 1 Newman-Girvan (2004) FNG(X) = 1 2M N i=1 N i =1 aii − ai.a.i 2M xii Table: Linear criteria Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 10 / 35 Mathematical Relational representations of criteria Three new linear criteria Function Relational notation Deviation to uniformity (2013) FUNIF(X) = 1 2M N i=1 N i =1 aii − 2M N2 xii Deviation to indeter- mination (2013) FDI(X) = 1 2M N i=1 N i =1 aii − ai. N − a.i N + 2M N2 xii Balanced modularity (2013) FBM (X) = N i=1 N i =1 (aii − Pii ) xii + (¯aii − ¯Pii )¯xii where Pii = ai.a.i 2M and ¯Pii = ¯aii − (N−ai.)(N−a.i ) N2−2M Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 11 / 35 Mathematical Relational representations of criteria Interpretation of new linear criteria Uniformity structure Indetermination structure Duality independance and indetermination structure: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 12 / 35 Mathematical Relational representations of criteria Some properties of these new criteria Whereas the Newman-Girvan modularity is based on the ”deviation from independance” structure, the DI index criterion is based on the ”deviation to the indetermination” structure. All these three new criteria are null models as Newman-Girvan modularity. The balanced modularity is a balanced version of Newman-Girvan modularity. If all the nodes had the same degree : dav = N i=1 ai N = 2M N all the new criteria would have the same behavior as Newman-Girvan modularity does: FNG ≡ FUNIF ≡ FBM ≡ FDI. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 13 / 35 Algorithm and some results Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 14 / 35 Algorithm and some results The number of clusters The Louvain algorithm is easy to adapt to Separable criteria. Data Jazz Internet N = 198 N = 69949 M = 2742 M = 351380 Function κ κ Zahn-Condorcet 38 40123 Owsi´nski-Zadro˙zny 6 α = 2% 456 α < 1% Newman-Girvan 4 46 Deviation to uniformity 20 173 Deviation to indetermination 6 45 Balanced modularity 5 46 How to explain these differences? Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 15 / 35 Comparison of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 16 / 35 Comparison of criteria Impact of merging two clusters Now let us suppose we want to merge two clusters C1 and C2 in the network of sizes n1 and n2 respectively. Let us suppose as well they are connected by l edges and they have average degree d1 av et d2 av respectively. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 17 / 35 Comparison of criteria Impact of merging two clusters What is the contribution of merging two clusters to the value of each criterion? The contribution C of merging two clusters will be: C = n1 i∈C1 n2 i ∈C2 (φii − ¯φii ) (6) The objective is to compare function φ(.) to function ¯φ(.) If C > 0 the criterion merges the two clusters, the contribution is a gain. If C < 0 the criterion separates the two clusters, the contribution is a cost. l is the number of edges between clusters C1 and C2. ¯l is the number of missing edges between clusters C1 and C2. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 18 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Zahn-Condorcet criterion: CZC = (l − ¯l) = l − n1n2 2 (7) The Zahn-Condorcet criterion requires that the connexions within the cluster be bigger than the absence of connexions ⇐⇒ the number of connections l between C1 and C2 must be at least as half as the possible connexions between the two subgraphs. This criterion does not have resolution limit as the contribution depends only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. With this criterion we obtain many small clusters or cliques, some of them are sigle nodes. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 19 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Owsi´nski-Zadro˙zny criterion: COZ = (l − αn1n2) 0 < α < 1 (8) As this Zahn-Condorcet criterion is so exigent we obtain many small clusters, the Owsi´nski-Zadro˙zny criterion gives the choice to define the minimum required percentage of within-cluster α edges. This coefficient defines the balance between φ and ¯φ. For α = 0.5 the wsi´nski-Zadro˙zny criterion ≡ the Zahn-Condorcet criterion. α is defined by the user as the minimum required fraction of within- cluster edges. This criterion does not have resolution limit as the contribution depend only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 20 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Newman-Girvan criterion: CNG = l − n1n2 d1 avd2 av 2M (9) The contribution depends on the degree distribution of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M. The optimal partition has no clusters with a single node. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 21 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to Uniformity criterion: CUNIF = l − n1n22M N2 (10) This criterion is a particular case of Zahn-Condorcet (or Owsi´nski-Zadro˙zny) criterion with α = 2M N2 which can be interpreted as a density occupancy of edges among the nodes δ. To merge the two clusters l n1n2 > δ the fraction of within clusters edges must be greater than the global density of edges δ. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 22 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to indetermination criterion: CDI = l − n1n2 d1 av N + d2 av N − 2M N2 (11) The contribution depends on the degree distribution of the clusters and on their sizes. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. It favors big cluster with high average degree and small clusters with low average degree. So, the degree distribution of each cluster obtained by this criterion tends to be more homogeneous than that of the clusters obtained by optimizing the Newman-Girvan criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 23 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Balanced modularity criterion: CBM = 2l + n1n2 (N − d1 av)(N − d2 av) N2 − 2M − n1n2 − n1n2 d1 avd2 av 2M (12) The contribution depends on the degree distribution of the clusters and on the sizes of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Depending upon δ and dav this criterion behaves like a regulator between the Newman-Girvan criterion and the deviation to indetermination criterion. On one hand the degree distribution within clusters is more homogeneous than that found with Newman-Girvan criterion. On the other hand, the degree distribution within clusters is more heterogeneous than that found with deviation to indetermination criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 24 / 35 Comparison of criteria Summary by criterion Criterion Characteristics of the clustering Zahn-Condorcet Clusters have a fraction of within cluster edges greater than 50%. Owsi´nski-Zadro˙zny A generalization of ZC criterion where the user defines the minimum fraction of within cluster edges. Deviation to Unifor- mity The OZ criterion for α = δ the density of edges among the nodes. Newman-Girvan It has a resolution limit and the optimal clus- tering does not contain isolated nodes. Deviation to uniformity Within cluster degree distribution is more homogeneous than that found by the Newman-Girvan criterion. Balanced modularity It behaves like a regulator between the NG criterion and the DI criterion. . Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 25 / 35 Applications Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 26 / 35 Applications ”Data: Zachary karate club network” A network of friendships between the 34 members of a karate club at a US university. N = 34 nodes, M = 78 edges, dav = 4.6 and δ = 0.13 Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 27 / 35 Applications ”Data: Zachary karate club network” The number of clusters per criterion (with Louvain Algorithm): Criterion Number of clusters Single nodes Zahn-Condorcet 19 12 Owsi´nski-Zadro˙zny 7 (α = 0.2) 3 Deviation to uniformity 6 2 Newman-Girvan 4 Deviation to indetermination 4 Balanced modularity 4 The partitions found with Newman-Girvan, Deviation to indetermination and the Balanced modularity are the same. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 28 / 35 Applications ”Data: Zachary karate club network” Density of within cluster edges: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 29 / 35 Applications ”Data: Zachary karate club network” Partitions obtained by the criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 30 / 35 Applications The Jazz network A network of jazz musicians, N = 198, M = 2742, dav ∼= 27.7 and δ = 0.14. The clusters found by three criteria: Newman-Girvan nj dj av σj cvj 62 32.3 18.5 0.57 53 30.5 16.2 0.53 61 20.3 14.1 0.69 22 28.4 20.1 0.71 Balanced modularity nj dj av σj cvj 60 33.1 18.2 0.55 53 31.3 16.3 0.52 61 20.3 14.1 0.69 23 26 19.4 0.75 1 1 0 0 Deviation to indetermination nj dj av σj cvj 63 19.8 14.2 0.71 63 33.7 16 0.48 18 13.8 5.2 0.37 51 36.4 17.7 0.49 2 2.5 2.1 0.85 1 1 0 0 Where nj is the size of the cluster, dj av is the average degree, σj is the standard deviation and cvj is the coefficient of variation of the degree of the cluster j. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 31 / 35 Applications The Jazz network The coefficient of variation for the three criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 32 / 35 Conclusions Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 33 / 35 Conclusions Conclusions We presented 6 modularization criteria in Relational Analysis notation, which allowed us to easily calculate their contribution, cost or gain, when merging two clusters. We analysed important characteristics of different criteria. We compared the differences found in the partitions provided by each criterion. However the 3 criteria we introduced have nearly the same properties they differ depending mainly on the degree distribution, on the sizes of the clusters and on global characteristics of the graph. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 34 / 35 Conclusions Thanks for your attention! Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 35 / 35

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

On Prime-valent Symmetric Bicirculants and Cayley Snarks Klavdija Kutnar University of Primorska Paris, 2013 Joint work with Ademir Hujdurovi´c and Dragan Maruˇsiˇc. Snarks A snark is a connected, bridgeless cubic graph with chromatic index equal to 4. non-snark = bridgeless cubic 3-edge colorable graph The Petersen graph is a snark Blanuˇsa Snarks Not vertex-transitive. Vertex-transitive graph An automorphism of a graph X = (V , E) is an isomorphism of X with itself. Thus each automorphism α of X is a permutation of the vertex set V which preserves adjacency. A graph is vertex-transitive if its automorphism group acts transitively on vertices. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cayley graph A vertex-transitive graph is a Cayley graph if its automorphism group has a regular subgroup. Given a group G and a subset S of G \ {1}, such that S = S−1, the Cayley graph Cay(G, S) has vertex set G and edges of the form {g, gs} for all g ∈ G and s ∈ S. Cay(G, S) is connected if and only if G = S . Example The Cayley graph Cay(Z7, {±1, ±2}) on the left-hand side and the Petersen graph on the right-hand side. Snarks Any other snarks amongst vertex-transitive graphs, in particular Cayley graphs? Snarks Nedela, ˇSkoviera, Combin., 2001 If there exists a Cayley snark, then there is a Cayley snark Cay(G, {a, x, x−1}) where x has odd order, a2 = 1, and G = a, x is either a non-abelian simple group, or G has a unique non-trivial proper normal subgroup H which is either simple non-abelian or the direct product of two isomorphic non-abelian simple groups, and |G : H| = 2. Potoˇcnik, JCTB, 2004 The Petersen graph is the only vertex-transitive snark containing a solvable transitive subgroup of automorphisms. Snarks The hunting for vertex-transitive/Cayley snarks is essentially a special case of the Lovasz question regarding hamiltonian paths/cycles. Existence of a hamiltonian cycle implies that the graph is 3-edge colorable, and thus a non-snark. Hamiltonicity problem is hard, the snark problem is hard too, but should be easier to deal with. The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) The Coxeter graph is not a snark (easy) vs the Coxeter graph is not hamiltonian (harder) Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. Hamiltonian cycles in cubic Cayley graphs (hard) vs Cayley snarks (still hard (but easier)) Types of cubic Cayley graphs Cay(G, S): Type 1: S consists of 3 involutions; no snarks; nothing known about hamiltonian cycles except YES for the case when two involutions commute (Cherkassov, Sjerve). Type 2: S = {a, x, x−1 }, where a2 = 1 and x is of even order; no snarks; nothing known about hamiltonian cycles Type 3: S = {a, x, x−1 }, where a2 = 1 and x is of odd order. See next slides. Partial results for Type 3 graphs A (2, s, t)-generated group is a group G = a, x | a2

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and Applications to Large Graphs and Networks Modularity Jean-Franc¸ois Marcotorchino Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6 August 2013 Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 1 / 29 Outline 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 2 / 29 Goal of the Presentation Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 3 / 29 Goal of the Presentation Goal of the Presentation Exhibiting a Relationship between Monge & Condorcet (1781-1785) 1 Using the Optimal Transport Theory, based on G. Monge (1781) and L. Kantorovitch MK Problem, for defining two alternatives for measuring ”correlation” within ”stressed contingency structures” according to M. Frechet’s first attempt of 1951. 2 Introducing two extended variants of the MKP Problem concerned with Spatial interaction Models: The ”Alan Wilson’s Entropy Model” and the Minimal Trade Model. 3 Deriving and Justifying from those models two ”dual structures” of correlation measures: Deviation from Independance (Mutual Information Index), Deviation from Indetermination (Indetermination Index). Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 4 / 29 Goal of the Presentation Goal of the Presentation 1 Justifying this duality through the so called Monge’s Conditions. 2 Translating those specific situations into very differentiate but usual indexes (the Tchuprow - χ2: and the Janson-Vegelius’s Index). 3 Explaining ”Deviation from Indetermination” by its filiation with the ”Relational Analysis scheme” of A. de Condorcet. 4 Applying this principle to ”Graphs Modularization Criteria”. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 5 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 6 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems The Monge-Kantorovich Problem The Monge-Kantorovich Problem: P[π∗ ] = inf π∈Π(µ,ν) X×Y c(x, y)dπ(x, y) (1) The linear Monge-Kantorovich problem has a dual formulation: D[ϕ, ψ] = sup (ϕ,ψ) { X ϕdµ+ Y ψdν : c(x, y) ≥ ϕ(x)+ψ(y) on X ×Y } (2) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 7 / 29 The optimal transport problem: Monge and Monge-Kantorovich Problems The Monge-Kantorovich Duality Theorem (Kantorovich duality) If there exists π∗ ∈ Π(µ, ν) and an admissible pair (ϕ∗, ψ∗) ∈ £ such that: X×Y c(x, y)dπ∗ (x, y) = X ϕ∗ (x)dµ(x) + Y ψ∗ (y)dν(y) then π∗ is an Optimal Transport Plan and the pair (ϕ∗, ψ∗) solves the problem (2). So there is no gap between the values: inf π P[π] = sup (ϕ,ψ) D[ϕ, ψ] Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 8 / 29 Extensions and variants of the MKP problem Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 9 / 29 Extensions and variants of the MKP problem The discrete version of the MKP problem min π p u=1 q v=1 c(u, v)πuv (3) subject to: q v=1 πuv = µu ∀u ∈ {1, 2, ..., p} (4) p u=1 πuv = νv ∀v ∈ {1, 2, ..., q} (5) p u=1 q v=1 πuv = 1 (6) πuv ≥ 0 ∀u ∈ {1, ..., p}; v ∈ {1, ..., q} (7) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 10 / 29 Extensions and variants of the MKP problem Variants of the MKP problem The Alan Wilson’s Entropy Model: Objective function Subject to Optimal solution max π − p u=1 q v=1 πuv ln πuv Contraints (4),(5) and (7) π∗ uv = µuνv∀(u, v) n∗ uv = nu.n.v N The Minimal Trade Model: Objective function Subject to Optimal solution min π p u=1 q v=1 πuv − 1 pq 2 Contraints (4),(5) and (7) π∗ uv = µu q + νv p − 1 pq n∗ uv = nu. q + n.v p − N pq Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 11 / 29 Extensions and variants of the MKP problem The Continuous version of the Minimal Trade Problem The optimal solution of the Continuous version of the Minimal Trade Problem, obtained by considering the Kantorovich duality (2), is given by: π∗ (x, y) = f(x) B + g(y) A − 1 AB ∀ (x, y) ∈ [a, b] × [c, d] where π : [a, b] × [c, d] −→ [0, 1] is defined on the product of two closed intervals of the cartesian plan; A = (b − a) and B = (d − c) are the respective lengths of those intervals; µ and ν (the marginals of π) have densities f and g respectively. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 12 / 29 Monge and Anti-Monge matrices and some related structural properties Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 13 / 29 Monge and Anti-Monge matrices and some related structural properties The Monge and anti-Monge Matrices Definition A p × q real matrix {cuv} is called a Monge matrix, if C satisfies the so called Monge’s property: cuv + cu v ≤ cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (8) Reciprocally, an ”Inverse Monge Matrix” (or Anti Monge matrix) C satisfies the following inequality: cuv + cu v ≥ cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (9) In case both inequalities (8) and (9) hold: cuv + cu v = cuv + cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (10) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 14 / 29 Monge and Anti-Monge matrices and some related structural properties The Monge and anti-Monge Matrices Theorem Let {πuv} be a p × q real nonnegative frequency Matrix, then the following properties hold and are equivalent: i) If {πuv} is a Monge and Anti-Monge Matrix then: πuv + πu v = πuv + πu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q ii) πuv = µu q + νv p − 1 pq is a minimizer of the Minimal Trade Model. iii) All the sub tables {u, v, u , v } of size 2 × 2 with 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q have the sum of their diagonals equal to the sum of their anti-diagonals. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 15 / 29 Monge and Anti-Monge matrices and some related structural properties The Log-Monge and Log-Anti-Monge Matrices Definition A p × q positive real matrix {cuv} is called a Log Monge matrix, if C satisfies the Log-Monge’s property: ln cuv+ln cu v ≤ ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (11) Reciprocally, an ”Inverse Log-Monge Matrix” (or Log-Anti-Monge matrix) C satisfies the following inequality: ln cuv+ln cu v ≥ ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (12) In case both inequalities (11) and (12) hold: ln cuv+ln cu v = ln cuv +ln cu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q (13) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 16 / 29 Monge and Anti-Monge matrices and some related structural properties The Log-Monge and Log-Anti-Monge Matrices Theorem Let {πuv} be a p × q real positive frequency Matrix, then the following properties hold and are equivalent: i) If {πuv} is a Log-Monge and Log-Anti-Monge Matrix then: ln πuv + ln πu v = ln πuv + ln πu v ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q ii) πuv = µuνv ∀ 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q is a minimizer of the Alan Wilson’s Program of Spatial Interaction System based upon Entropy Model, with fixed Margins. iii) All the sub tables {u, v, u , v } of size 2 × 2 with 1 ≤ u < u ≤ p, 1 ≤ v < v ≤ q have the product of their diagonal terms equal to the product of their anti-diagonals terms. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 17 / 29 Monge and Anti-Monge matrices and some related structural properties A contingency table X\Y 1 . . . v . . . q Total 1 n11 . . . n1v . . . n1q n1. ... ... ... ... ... u nu1 . . . nuv . . . nuq nu. ... ... ... ... ... p np1 . . . npv . . . npq np. Total n.1 . . . n.v . . . n.q n.. where: nuv = Nπuv : quantity of mass transported from u ∈ X to v ∈ Y . nu. = Nπu.: total mass located originaly at u. n.v = Nπ.v: total mass transported to v. n.. = N Total exchange mass. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 18 / 29 Monge and Anti-Monge matrices and some related structural properties Example of applications of Monge’s conditions on two Contingency Tables subject to the same marginals Indetermination structure: A + B = C + D Independance Structure: AB = CD Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 19 / 29 Duality related to ”Independence” and ”Indetermination” structures Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 20 / 29 Duality related to ”Independence” and ”Indetermination” structures The Mutual Information index (MI) and the The Deviation to indetermination Index (IND) The Mutual Information index (MI) ρMI: ρMI = SX + SY − S(X,Y ) where S(X,Y ) = − p u=1 q v=1 πuv ln πuv; SX = − p u=1 µu ln µu and SY = − q v=1 νv ln νv. The Deviation to indetermination Index (IND) ρIND: ρIND(X, Y ) = K(X,Y ) − KX − KY where K(X,Y ) = pq p u=1 q v=1 πuv − 1 pq 2 ; K(X) = p p u=1 µu − 1 p 2 and K(Y ) = q q v=1 νv − 1 q 2 . Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 21 / 29 Duality related to ”Independence” and ”Indetermination” structures Duality between independence and indetermination structures ∀X, Y |X ∼ µ , Y ∼ ν , (X, Y ) ∼ π The Independence case The Indetermination case S(X,Y ) ≤ SX + SY K(X,Y ) ≥ KX + KY with equality in case of inde- pendence with equality in case of indetermination ρMI(X, Y ) = SX +SY −S(X,Y ) = p u=1 q v=1 πuv ln πuv µuνv ρIND(X, Y ) = K(X,Y ) − KX − KY = pq p u=1 q v=1 πuv − µu q + νv p + 1 pq 2 Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 22 / 29 Duality related to ”Independence” and ”Indetermination” structures Translation of the duality between Independance and Indetermination into Contingency Correlation Measures Mutual Information Index behaves as the χ2 does in the Neibourhood of Independance: ρMI(X, Y ) ∼= p u=1 q v=1 (πuv − µuνv)2 µuνv = 1 n.. Fχ2 [π] The Janson-Vegelius index is fully derived from the Indetermination index ρIND(X, Y ): JV (X, Y ) = pq p u=1 q v=1 π2 uv − p p u=1 µ2 u. − q q v=1 ν2 .v + 1 p(p − 2) p u=1 µ2 u. + 1 q(q − 2) q v=1 ν2 .v + 1 ∀(X, Y ) Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 23 / 29 Relational Analysis Approach Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 24 / 29 Relational Analysis Approach Relational Analysis Approach Principle: representing relations between objects by binary coding. A partition is nothing but an equivalence relation on the set of objects, which is represented by a relational N × N matrix X, whose entries are defined as follows: xij = 1 if i and j belong to the same cluster. 0 otherwise. (14) As X is an equivalence Relation, it must be Reflexive, Symmetric and Transitive, those properties can be turned into linear constraints on the general terms of the relational matrix X. ¯xij = 1 − xij ∀(i, j) is the inverse relation of X. Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 25 / 29 Relational Analysis Approach The Relational Transfer Principle p u=1 q v=1 π2 uv = 1 N2 N i=1 N j=1 xijyij ; p u=1 µ2 u. = 1 N2 N i=1 N j=1 xij; q v=1 ν2 .v = 1 N2 N i=1 N j=1 yij; p u=1 q v=1 π2 uv µu.ν.v = N i=1 N j=1 xij xi. yij y.j ; Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 26 / 29 Relational Analysis Approach Relational Transfer Principle Using the ”Relational Transfer Principle” we get: N2 ρIND(X, Y ) = pq N i=1 N j=1 xijyij −p N i=1 N j=1 xij −q N i=1 N j=1 yij +N2 (15) Origin of the Indetermination Concept: when ρIND(X, Y ) = 0 we get: N i=1 N j=1 xijyij + N i=1 N j=1 ¯xij ¯yij Votes in favor = N i=1 N j=1 ¯xijyij + N i=1 N j=1 xij ¯yij Votes against Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 27 / 29 A new Graph modularization criterion Plan 1 Goal of the Presentation 2 The optimal transport problem: Monge and Monge-Kantorovich Problems 3 Extensions and variants of the MKP problem 4 Monge and Anti-Monge matrices and some related structural properties 5 Duality related to ”Independence” and ”Indetermination” structures 6 Relational Analysis Approach 7 A new Graph modularization criterion Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 28 / 29 A new Graph modularization criterion Graph Modularization Criteria The Newman-Girvan criterion: 1 2M N i=1 N i =1 aii − ai.a.i 2M xii The deviation to indetermination criterion: 1 2M N i=1 N i =1 aii − ai. N − a.i N + 2M N2 xii Jean-Franc¸ois Marcotorchino (Thales Communications et S´ecurit´e, TCS and LSTA, Paris 6)Optimal Transport and Minimal Trade Problem, Impacts on Relational Metrics and ApplicatioAugust 2013 29 / 29

ORAL SESSION 5 Discrete Metric Spaces (Michel Deza)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Introduction The Main Result Counting the number of solutions of K DMDGP instances Leo Liberti1,2, Carlile Lavor3, Jorge Alencar3, Germano Abud3 1IBM “T.J. Watson” Research Center, Yorktown Heights, 10598 NY, USA 2LIX, Ecole Polytechnique, 91128 Palaiseau, France 3Dept. of Applied Math. , University of Campinas, Campinas – SP, Brazil Introduction The Main Result Contents 1 Introduction Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections 2 The Main Result Counting Incongruent Realizations Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Distance Geometry Problem (DGP) Given an integer K > 0 and an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, is there a function x : V → RK such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? In other words: find an embedding (or a realization) of G in RK , such that the Euclidean distances in RK match the given edge weights. NP-complete, for K = 1. Strongly NP-hard, for K > 1. Important case: K = 3. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Discretizable Molecular DGP (DMDGP) Given an undirected simple graph G = (V , E), whose edges are weighted by d : E → R+, and such that there is there is an order v1, v2, . . . , vn of V satisfying (a) ∀i ∈ {4, . . . , n}, ∀j, k ∈ {i − 3, . . . , i} : {vj , vk} ∈ E (b) ∀i ∈ {2, . . . , n − 1}, d(vi−1, vi+1) < d(vi−1, vi ) + d(vi , vi+1), is there an embedding x : V → R3 such that x(u) − x(v) = d({u, v}), ∀{u, v} ∈ E? Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections For each vertex vi (with i ≥ 4), if we know the realizations of its 3 immediate predecessors, as well as their distances to vi , then there are, with probability 1, two possible positions for vi . S2 (i − 3, di−3,i ) ∩ S2 (i − 2, di−2,i ) i − 3 i − 2 i i − 1 di−3,i−2 di−2,i−1 θi−3,i−1 i di−3,i di−3,i θi−2,i Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Let ED = {{v, v − j} : j ∈ {1, . . . , K}}, EP = E \ ED and m = |E|. A discrete search is possible (Branch-and-Prune). Let X the set of all realizations. Our goal is to determine the cardinality of X. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Standard BP Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Incongruence Two sets in RK are congruent if there is a sequence of translations, rotations and reflections that turns one into the other. X is partially correct in this respect: there is a “four level symmetry” . Half of the realizations in X are reflections of the other, along the plane through the first K vertices. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Probability 1 The theory supporting BP algorithm is based on d satisfying strict simplex inequalities. Intersection of K spheres in RK might have uncountable cardinality, or be a singleton set. We have manifolds of Lebesgue measure zero in RK . The probability of uniformly sampling d such that it yields a YES K DMDGP instance satisfying the strict simplex inequalities is 1. We state most of our results “with probability 1”. Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Partial Reflections For x ∈ X and K < v ∈ V let Rv x be the reflection along the hyperplane through xv−K , . . . , xv−1. The partial reflection operator with respect to x is: gv (x) = (x1, . . . , xv−1, Rv x (xv ), Rv x (xv+1), . . . , Rv x (xn)). Figure: The action of the reflection Rv x in RK . Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Partial Reflections For v > u > K and x ∈ X, we define a product between partial reflections: gugv (x) = gu(gv (x)) = gu(x1, . . . , xv?1, Rv x (xv ), . . . , Rv x (xn)) = = (x1, . . . , xu−1, Ru x (xu), . . . , Ru x (xv−1), Ru gv (x)(xv ), . . . , Ru gv (x)(xn)). Let ΓD = {gv : v > K} and GD = ΓD the invariant group of the set of realizations XD (found by the BP) of GD = (V , ED). Introduction The Main Result Distance Geometry Problem (DGP) Discretizable Molecular DGP (DMDGP) Incongruence Probability 1 Partial Reflections Assuming EP = ∅ Let {u, w} ∈ EP and define Suw = {u + K + 1, . . . , w} (u < w). GP is the subgroup of GD generated by ΓP = {gv : v > K ∧ ∀{u, w} ∈ EP(v /∈ Suw )}. 5 1 2 3 4 3 1 45 2 Figure: On the left: the set XD. On the right: the effect of the pruning edge {1, 4} on XD. Introduction The Main Result Counting Incongruent Realizations Counting Incongruent Realizations There is a integer such that |X| = 2 with probability 1. We can easily refine this: Proposition With probability 1, |X| = 2ΓP . Proof. GD ∼= Cn−K 2 , so that |GD = 2n−K |. Since GP GD, |GP| divides GD. But |GP| = 2|ΓP |. The action of GP on X has only one orbit: GPx = X, ∀x ∈ X. Every partial reflection operator is idempotent. Thus, gx = g x implies g g = 1 whence g = g and |GPx| = |GP|. For any x ∈ X, |X| = |GPx| = |GP| = 2|ΓP |. Introduction The Main Result Counting Incongruent Realizations Thank you !

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization Orders for Distance Geometry Discretizable DG group Carlile Lavor, Leo Liberti, Nelson Maculan, Antonio Mucherino GSI13 Paris, France August 28th 2013 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Distance Geometry Problem (for molecular conformations) Let G = (V, E, d) be a simple weighted undirected graph, where V the set of vertices of G − it is the set of atoms; E the set of edges of G − it is the set of known distances; E′ ⊂ E the subset of E where distances are exact; d the weights associated to the edges of G the numerical value of each weight corresponds to the known distance; it can be an interval. Definition The DGP. Determine whether there exists a function x : V −→ ℜK for which, for all edges (u, v) ∈ E, ||xu − xv || = d(u, v). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Sphere intersections In the 3-dimensional space, the intersection of 2 spheres gives one circle 3 spheres gives two points 2 spheres and 1 spherical shell gives two disjont curves Spheres and spherical shells can be centered in known vertex positions, while their radii are related to the distance information. All this is true with probability 1: the reference vertices cannot be aligned, the strict triangular inequality needs to be satisfied. Generalization to any dimension K: the volume of the (K − 1)-simplex defined by the reference vertices needs to be strictly positive (simplex inequalities). Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The Branch & Prune algorithm The Branch & Prune (BP) algorithm is based on the idea of branching over all possible positions for each vertex, and of pruning by using additional information not used in the discretization process. In this tree, it is supposed that all available distances are exact. D sample (exact) distances can be taken from interval distances. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Importance of orders The definition of an order on the vertices in V allows us to ensure that vertex coordinates are available when needed. Given 1 a simple weighted undirected graph G = (V, E, d) 2 a vertex v ∈ V how to identify K vertices wi , with i = 1, 2, . . . , K, for which the coordinates of every wi are available every edge (wi , v) ∈ E ???? We refer to wi as a reference vertex for v (wi , v) as a reference distance for v Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Definition of order Definition An order for V is a sequence r : N → V ∪ {0} with length |r| ∈ N (for which ri = 0 for all i > |r|) such that, for each v ∈ V, there is an index i ∈ N for which ri = v. Some facts about orders: they allow for vertex repetitions (|r| ≥ |V|); however, each vertex can be used as a reference only once; simplex inequalities (generally satisfied with probability 1) would not be satisfied if the same vertex were used twice as a reference. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Counting the reference vertices Let r be an order for V. Let us consider the following counters. α(ri ): counter of adjacent predecessors of ri; β(ri ): counter of adjacent successors of ri; αex (ri ): counter of adjacent predecessors of ri related to an exact distance. Necessary condition for V to admit a discretization order is that, for any order r on V, ∀i ∈ {1, 2, . . . , |r|}, α(ri ) + β(ri ) ≥ K. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders We refer to any order r that allows for the discretization of the DGP as Discretization order Two big families: Discretization orders with consecutive reference vertices for each ri , reference vertices always immediately precede ri in the order Discretization orders without consecutive reference vertices any vertex with rank < i can be a reference for ri Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Discretization orders For K = 3. the reference vertices for ri are searched only in the “window” [ri−K , . . . , ri−1] the simplex inequality must be satisfied on the window. the reference vertices for ri are in the “big window” [r1, . . . , ri−1] the simplex inequality must be satisfied for the K reference vertices inside the big window. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Why different discretization assumptions? Different motivations: historical reasons the habit to handcraft orders symmetry properties of BP trees the methods for intersecting the spheres (and spherical shells) the interest in methods for automatic detection of discretization orders Consecutivity: IN FAVOUR vs. AGAINST Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . The ordering problem Definition Given a simple weighted undirected graph G = (V, E, d) and a positive integer K, establish whether there exists an order r such that: (a) GC = (VC , EC) ≡ G[{r1, r2, . . . , rK }] is a clique and EC ⊂ E′ ; (b) ∀i ∈ {K + 1, . . . , |r|}, α(ri ) ≥ K and αex (ri ) ≥ K − 1. Remarks: this problem is NP-complete when K is not fixed no consecutivity assumption: solvable in polynomial time when K is known when dealing with proteins, K = 3 Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . A greedy algorithm 0: reorder(G) while (a valid order r is not found yet) do let i = 0; find a K-clique C in G with exact distances; // position C at the beginning of new order for (all vertices v in C) do let i = i + 1; let ri = v; end for // greedy search while (V is not covered) do v = arg max{α(u) | ∃j ≤ i : rj = u and αex (u) ≥ K − 1}; if (α(v) < K) then break the inner loop: there are no possible orderings for C; end if // adding the vertex to the order let i = i + 1; let ri = v; end while end while return r; Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . An order for the protein backbone This order was automatically obtained by the greedy algorithm. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments NMR-like instances are considered in these experiments, including protein backbones and side chains. PDB name naa n |E| D LDE 1brv 4 51 368 3 2.10e-4 1brv 8 98 853 9 5.88e-4 1ccq 6 114 1181 3 1.16e-4 1ccq 10 183 2169 8 1.63e-4 1acz 6 94 929 3 1.63e-4 1acz 13 199 2144 3 1.95e-4 1acz 21 308 3358 10 4.93e-4 1k1v 6 110 1236 3 3.04e-4 1k1v 18 317 4169 3 3.66e-4 1k1v 30 519 7068 3 5.63e-4 All instances were automatically reordered by the greedy algorithm, and the BP algorithm was invoked for finding one solution. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Computational experiments Two solutions obtained during the experiments. On the left, a 4-amino acid fragment of 1brv; on the right, a 18-amino acid fragment of 1k1v. Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Work in progress . . . Can we find orders that help BP in finding solutions? minimize in length the subsequences in the order having no pruning distances (for proteins) maximize the interval distances that are related to pairs of hydrogen atoms . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Other work in progress . . . identify clusters of solutions in BP solution sets find a way for avoiding discretizing the intervals improve and tailor the parallel versions of BP to interval data (for proteins) use real NMR data and compare our results to what is currently available on the PDB . . . Discretization Orders for DG A. Mucherino Discretization of the DGP Getting started Discretization assumptions Ordering problem A greedy algorithm Computational experiments Ending . . . Thanks!

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo

Studying new classes of graph metrics Pavel Chebotarev Russian Academy of Sciences: Institute of Control Sciences pavel4e@gmail.com GSI’2013 – Geometric Science of Information Paris – Ecole des Mines August 28, 2013 Pavel Chebotarev New classes of graph metrics Page 1 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 2 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 2 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter definition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter definition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 Distance Let M be an arbitrary set. A distance on M is a function d : M × M → R such that for all x, y, z ∈ M, 1. d(x, y) ≥ 0 2. d(x, y) = 0 iff x = y 3. d(x, y) = d(y, x) 4. d(x, y) + d(y, z) ≥ d(x, z) A shorter definition: d : M × M → R such that for all x, y, z ∈ M, 2. d(x, y) = 0 iff x = y 4 . d(x, y) + d(x, z) ≥ d(y, z) M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, 2013. Pavel Chebotarev New classes of graph metrics Page 3 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) =       0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0       G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) =       0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0       G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The shortest path distance 1 The shortest path distance on a graph G, ds(i, j) is... the length of a shortest path between i and j in G. A graph and its distance matrix 2 3 4 1 5 Ds (G) =       0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0       G F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, 1990. Pavel Chebotarev New classes of graph metrics Page 4 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The weighted shortest path distance Let G be a weighted graph with weighted adjacency matrix A. The weights are positive. 2 The weighted shortest path distance: dws (i, j) = min π e∈E(π) e the minimum is taken over all paths π from i to j e = 1/we is the weight-based length of e (we is the weight) If we is the conductivity of e then e is the resistance of e. Pavel Chebotarev New classes of graph metrics Page 5 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the first to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Pacific Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the first to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Pacific Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 The resistance distance 3 The resistance distance d r(i, j) is the effective resistance between i and j in the electrical network corresponding to G. Gerald Subak-Sharpe was the first to study this distance: G.E. Sharpe, Solution of the (m + 1)-terminal resistive network problem by means of metric geometry. In: Proc. First Asilomar Conference on Circuits and Systems, Pacific Grove, CA (November 1967) 319–328. Pavel Chebotarev New classes of graph metrics Page 6 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Rediscovering and review: A.D. Gvishiani, V.A. Gurvich, Metric and ultrametric spaces of resistances, Russian Math. Surveys 42 (1987) 235–236. D.J. Klein, M. Randi´c, Resistance distance, J. Math. Chem. 12 (1993) 81–95. F. Harary: Electric metric A nice paper: R.B. Bapat, Resistance distance in graphs, Math. Student 68 (1999) 87–98. A short review is in: Y. Yang, D.J. Klein, A recursion formula for resistance distances and its applications, DAM, 2013. In press. Pavel Chebotarev New classes of graph metrics Page 7 Resistance distance: Connections The resistance distance is proportional to the commute-time distance in the corresponding Markov chain. It is expressed as follows: d r (i, j) = + ii + + jj − 2 + ij , where ( + ij )n×n = L+ is the Moore-Penrose pseudoinverse of L, L = diag(A1) − A is the Laplacian matrix of G. Here, diag(A1) is the matrix of weighted vertex degrees. Pavel Chebotarev New classes of graph metrics Page 8 Resistance distance: Connections The resistance distance is proportional to the commute-time distance in the corresponding Markov chain. It is expressed as follows: d r (i, j) = + ii + + jj − 2 + ij , where ( + ij )n×n = L+ is the Moore-Penrose pseudoinverse of L, L = diag(A1) − A is the Laplacian matrix of G. Here, diag(A1) is the matrix of weighted vertex degrees. Pavel Chebotarev New classes of graph metrics Page 8 Example For any tree, the resistance distance coincides with the shortest path distance! For our graph: 2 3 4 1 5 Ds (G) =       0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0       ; Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         G Pavel Chebotarev New classes of graph metrics Page 9 Example For any tree, the resistance distance coincides with the shortest path distance! For our graph: 2 3 4 1 5 Ds (G) =       0 1 2 2 3 1 0 1 1 2 2 1 0 1 2 2 1 1 0 1 3 2 2 1 0       ; Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         G Pavel Chebotarev New classes of graph metrics Page 9 A combinatorial interpretation A combinatorial interpretation of the resistance distance 2 3 4 1 5 Dr(G) =          0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0          d r(i, j) = f[2](i,j) T Pavel Chebotarev New classes of graph metrics Page 10 A combinatorial interpretation A combinatorial interpretation of the resistance distance 2 3 4 1 5 Dr(G) =          0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0          d r(i, j) = f[2](i,j) T Pavel Chebotarev New classes of graph metrics Page 10 Connection with the graph structure 2 3 4 1 5 Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Definition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Definition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Definition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 Connection with the graph structure 2 3 4 1 5 Dr (G) =         0 1 5 3 5 3 8 3 1 0 2 3 2 3 5 3 5 3 2 3 0 2 3 5 3 5 3 2 3 2 3 0 1 8 3 5 3 5 3 1 0         d r (1, 2) + d r (2, 5) = d r (1, 5) d r (1, 3) + d r (3, 5) > d r (1, 5) i j k [i−j−k] Definition d(·, ·): V2 → R is cutpoint additive provided that d(i, j) + d(j, k) = d(i, k) iff all i → k paths pass through j. Pavel Chebotarev New classes of graph metrics Page 11 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisfies the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisfies a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisfies the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisfies a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 As we could conjecture... Theorem (Gvishiani, Gurvich, 1992) The electric metric is cutpoint additive. However... The shortest path distance only satisfies the “if” part: d(i, j) + d(j, k) = d(i, k) does not imply [i−j−k]: 1 + 1 = 2 Observation The Euclidean distance satisfies a similar condition resulting by replacing “path” by “line segment”. Pavel Chebotarev New classes of graph metrics Page 12 Proximity measures Let’s think of proximity measures... Pavel Chebotarev New classes of graph metrics Page 13 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Applications World Wide Web Social networks Semantic networks Transport networks Other... ... Cry: “Measure us!” So functions that “measure” networks are necessary ... including proximity measures Pavel Chebotarev New classes of graph metrics Page 14 Proximity measures Spanning forest measures Reliability measures Path measures Walk measures Pavel Chebotarev New classes of graph metrics Page 15 The spanning forest measure Q = (I + L)−1 Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. For networks, the number of forests is replaced by the weight of forests. Pavel Chebotarev New classes of graph metrics Page 16 The spanning forest measure Q = (I + L)−1 Q = (qij)n×n Matrix Forest Theorem (Ch. & Shamis, ’95) qij = fij f , where f is the number of spanning rooted forests in G; fij is the number of such of them that have i in a tree rooted at j. Q is a proximity measure having natural properties. For networks, the number of forests is replaced by the weight of forests. The weight is the product of edge weights. Pavel Chebotarev New classes of graph metrics Page 16 Transitional measures Theorem (a property of the forest measure) The matrix Q satisfies: qij qjk ≤ Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisfies: qij qjk ≤ qik qjj Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisfies: qij qjk ≤ qik qjj (transition inequality) and qij qjk = Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisfies: qij qjk ≤ qik qjj (transition inequality) and qij qjk = qik qjj iff all paths from i to k in G contain j (bottleneck identity). Pavel Chebotarev New classes of graph metrics Page 17 Transitional measures Theorem (a property of the forest measure) The matrix Q satisfies: qij qjk ≤ qik qjj (transition inequality) and qij qjk = qik qjj iff all paths from i to k in G contain j (bottleneck identity). Definition In this case we say that Q determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 17 Transitional Measures A matrix S =(sij)∈Rn×n satisfies the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ (1) i j k Pavel Chebotarev New classes of graph metrics Page 18 Transitional Measures A matrix S =(sij)∈Rn×n satisfies the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ sik sjj. (1) i j k Pavel Chebotarev New classes of graph metrics Page 18 Transitional Measures A matrix S =(sij)∈Rn×n satisfies the transition inequality if for all i, j, k = 1, . . . , n, sij sjk ≤ sik sjj. (1) i j k Is there any connection between transitional measures and cutpoint additive distances? Pavel Chebotarev New classes of graph metrics Page 18 The connection reliability measure Let the edge weight wij ∈ (0, 1] be the intactness probability of the (i, j) edge. Pavel Chebotarev New classes of graph metrics Page 19 The connection reliability measure Let the edge weight wij ∈ (0, 1] be the intactness probability of the (i, j) edge. Definition Let pij be the i → j connection reliability, i.e., the probability that at least one path from i to j remains intact, provided that the edge failures are independent. P = (pij) is the matrix of pairwise connection reliabilities. Pavel Chebotarev New classes of graph metrics Page 19 Is the connection reliability a transitional measure? Theorem For any graph G with edge weights wp ij ∈ (0, 1], the matrix P = (pij) of connection reliabilities determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 20 Is the connection reliability a transitional measure? Theorem For any graph G with edge weights wp ij ∈ (0, 1], the matrix P = (pij) of connection reliabilities determines a transitional measure on G. A representation of the connection reliabilities1 pij = k Pr(Pk ) − k 0 such that for every α ∈ (0, α0), the matrix Pα = (pα ij ) of α-path proximities determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 21 The walk-counting proximity measures Walks and paths... The number of walks is infinite. Pavel Chebotarev New classes of graph metrics Page 22 The walk-counting proximity measures Walks and paths... The number of walks is infinite. G → Gt transformation by multiplying edge weights by t > 0 The weight of a walk is the product of its edge weights. Pavel Chebotarev New classes of graph metrics Page 22 The walk-counting proximity measures Walks and paths... The number of walks is infinite. G → Gt transformation by multiplying edge weights by t > 0 The weight of a walk is the product of its edge weights. Definition Let rt ij be the total weight of all i → j walks in Gt if it is finite. Pavel Chebotarev New classes of graph metrics Page 22 The walk-counting proximity measures Walks and paths... The number of walks is infinite. G → Gt transformation by multiplying edge weights by t > 0 The weight of a walk is the product of its edge weights. Definition Let rt ij be the total weight of all i → j walks in Gt if it is finite. Rt = (rt ij) is the matrix of t-walk proximities in G (if it is finite). Leo Katz, A new status index derived from sociometric analysis, Psychometrika 18 (1953) 39–43. Pavel Chebotarev New classes of graph metrics Page 22 The walk proximity measures www. .com uses this index (which generalizes PageRank). Pavel Chebotarev New classes of graph metrics Page 23 The walk proximity measures www. .com uses this index (which generalizes PageRank). Computation of Rt : (tA)k provides the total weights of k-length walks in G Pavel Chebotarev New classes of graph metrics Page 23 The walk proximity measures www. .com uses this index (which generalizes PageRank). Computation of Rt : (tA)k provides the total weights of k-length walks in G Rt = (rt ij) = ∞ k=0 (tA)k , A is the weighted adjacency matrix. Pavel Chebotarev New classes of graph metrics Page 23 The walk proximity measures www. .com uses this index (which generalizes PageRank). Computation of Rt : (tA)k provides the total weights of k-length walks in G Rt = (rt ij) = ∞ k=0 (tA)k , A is the weighted adjacency matrix. Observation (corollary of Frobenius’ theorem) Rt is finite iff t < ρ−1 (ρ is the spectral radius of A). In this case, Rt = (I − tA)−1. Pavel Chebotarev New classes of graph metrics Page 23 Walks determine a transitional measure! For our example, ∞ k=0 (1A)k is infinite. 2 3 4 1 5 R0.4 = ∞ k=0 (0.4·A)k ≈      5.63 5.94 4.34 4.90 1.96 5.94 8.92 6.50 7.34 2.94 4.34 6.50 5.98 5.94 2.38 4.90 7.34 5.94 7.52 3.01 1.96 2.94 2.38 3.01 2.20      . A measure of proximity Pavel Chebotarev New classes of graph metrics Page 24 Walks determine a transitional measure! For our example, ∞ k=0 (1A)k is infinite. 2 3 4 1 5 R0.4 = ∞ k=0 (0.4·A)k ≈      5.63 5.94 4.34 4.90 1.96 5.94 8.92 6.50 7.34 2.94 4.34 6.50 5.98 5.94 2.38 4.90 7.34 5.94 7.52 3.01 1.96 2.94 2.38 3.01 2.20      . A measure of proximity t = 0.4 is the proportion of counting shorter and longer walks Pavel Chebotarev New classes of graph metrics Page 24 Walks determine a transitional measure! For our example, ∞ k=0 (1A)k is infinite. 2 3 4 1 5 R0.4 = ∞ k=0 (0.4·A)k ≈      5.63 5.94 4.34 4.90 1.96 5.94 8.92 6.50 7.34 2.94 4.34 6.50 5.98 5.94 2.38 4.90 7.34 5.94 7.52 3.01 1.96 2.94 2.38 3.01 2.20      . A measure of proximity t = 0.4 is the proportion of counting shorter and longer walks Theorem For any G, the matrix Rt with 0 < t < ρ−1 , where ρ is the spectral radius of A, determines a transitional measure on G. Pavel Chebotarev New classes of graph metrics Page 24 Walks determine a transitional measure! For our example, ∞ k=0 (1A)k is infinite. 2 3 4 1 5 R0.4 = ∞ k=0 (0.4·A)k ≈      5.63 5.94 4.34 4.90 1.96 5.94 8.92 6.50 7.34 2.94 4.34 6.50 5.98 5.94 2.38 4.90 7.34 5.94 7.52 3.01 1.96 2.94 2.38 3.01 2.20      . A measure of proximity t = 0.4 is the proportion of counting shorter and longer walks Theorem For any G, the matrix Rt with 0 < t < ρ−1 , where ρ is the spectral radius of A, determines a transitional measure on G. r12 · r24 = r14 · r22 : 5.94 · 7.34 ≈ 4.90 · 8.92 (bottleneck identity). Pavel Chebotarev New classes of graph metrics Page 24 Maybe all proximity measures are transitional measures... Of course not. Are not: The path measure with a large enough α; Pavel Chebotarev New classes of graph metrics Page 25 Maybe all proximity measures are transitional measures... Of course not. Are not: The path measure with a large enough α; The communicability measure: eA. E. Estrada, The communicability distance in graphs, Linear Algebra and its Applications 436 (2012) 4317-4328. Pavel Chebotarev New classes of graph metrics Page 25 Maybe all proximity measures are transitional measures... Of course not. Are not: The path measure with a large enough α; The communicability measure: eA. E. Estrada, The communicability distance in graphs, Linear Algebra and its Applications 436 (2012) 4317-4328. In fact, this property is not typical. Pavel Chebotarev New classes of graph metrics Page 25 Maybe all proximity measures are transitional measures... Of course not. Are not: The path measure with a large enough α; The communicability measure: eA. E. Estrada, The communicability distance in graphs, Linear Algebra and its Applications 436 (2012) 4317-4328. In fact, this property is not typical. Is there any connection between transitional measures and cutpoint additive distances? Pavel Chebotarev New classes of graph metrics Page 25 Central theorem The transformation theorem If S =(sij )n×n determines a transitional measure for some graph G and has positive off-diagonal entries, then D = (dij )n×n defined by d(i, j) = −γ ln sij sii sjj , i, j = 1, . . . , n, γ > 0 (2) is the matrix of a cutpoint additive distance on V(Γ). Pavel Chebotarev New classes of graph metrics Page 26 Central theorem The transformation theorem If S =(sij )n×n determines a transitional measure for some graph G and has positive off-diagonal entries, then D = (dij )n×n defined by d(i, j) = −γ ln sij sii sjj , i, j = 1, . . . , n, γ > 0 (2) is the matrix of a cutpoint additive distance on V(Γ). Corollary For any connected G, the matrices of: walk weights path weights with a small enough α the weights of spanning forests connection reliabilities produce cutpoint additive metrics by applying the logarithmic cosine transform (2). Pavel Chebotarev New classes of graph metrics Page 26 Logarithmic cosine transform Transitional measures → cutpoint additive distances Pavel Chebotarev New classes of graph metrics Page 27 The properties of Walk distances “Topological” and matrix representations Pavel Chebotarev New classes of graph metrics Page 28 Walk distances (an example) For our graph, the logarithmic cosine transform gives: 2 3 4 1 5 D0.3 ≈      0 1.18 2.19 2.17 3.59 1.18 0 1.01 0.98 2.40 2.19 1.01 0 1.02 2.44 2.17 0.98 1.02 0 1.42 3.59 2.40 2.44 1.42 0      . Pavel Chebotarev New classes of graph metrics Page 29 Walk distances (an example) For our graph, the logarithmic cosine transform gives: 2 3 4 1 5 D0.3 ≈      0 1.18 2.19 2.17 3.59 1.18 0 1.01 0.98 2.40 2.19 1.01 0 1.02 2.44 2.17 0.98 1.02 0 1.42 3.59 2.40 2.44 1.42 0      . Here, d(2, 4) 0 Pavel Chebotarev New classes of graph metrics Page 43 The e-Walk distances (a different family) w(α) = w ρ e− 1 αw , α > 0 The construction is similar: Construction of the family deW α (i, j) of e-Walk distances 1. Rα = ∞ k=0 (A(α))k = (I − A(α))−1 2. deW α (i, j) = −θα α ln   rij rii rjj   θα is the normalizing factor. Pavel Chebotarev New classes of graph metrics Page 43 Asymptotic of the e-Walk distances Theorem (on e-Walk distances as α → 0+ ) For any vertices i, j ∈ V(G), lim α→0+ deW α (i, j) = dws (i, j), where dws (·, ·) is the weighted shortest path distance. Pavel Chebotarev New classes of graph metrics Page 44 Asymptotic of the e-Walk distances Theorem (on e-Walk distances as α → 0+ ) For any vertices i, j ∈ V(G), lim α→0+ deW α (i, j) = dws (i, j), where dws (·, ·) is the weighted shortest path distance. Theorem (on e-Walk distances as α → ∞) For any vertices i, j ∈ V(G) such that j = i, lim α→∞ deW α (i, j) = θ∞ 2 p−1 i (ρI − A¯¯)−1 ˇA¯ i + p−1 j (ρI − A¯ı¯ı)−1 ˇA¯ı j p, where p is the Perron vector of A, ˇA = (ˇaij )n×n results from A by replacing the nonzero entries by 1, ˇA¯ı is ˇA with the ith row removed. Pavel Chebotarev New classes of graph metrics Page 44 The Long e-Walk distance Pavel Chebotarev New classes of graph metrics Page 45 The Long e-Walk distance Define the Long e-Walk distance: d LeW (i, j) = lim α→∞ deW α (i, j), i, j ∈ V(G) Pavel Chebotarev New classes of graph metrics Page 46 A “topological” expression for the Long e-Walk distance Theorem For any vertices i, j ∈ V(G) such that i = j, d LeW (i, j) = θ∞ 2ρ (c i(j) + c j(i) ), where c i(j) is the weight of Ci(j), which is a set of specific “cycles with a jump”. Pavel Chebotarev New classes of graph metrics Page 47 A relationship between d LeW (i, j) and d LW (i, j) Theorem If θ∞ = 2 n · pT (A/ρ)p pTˇAp , then d LeW (i, j) = d LW (i, j), i, j ∈ V(G). Pavel Chebotarev New classes of graph metrics Page 48 Logarithmic forest distances ...as a special case of walk distances Pavel Chebotarev New classes of graph metrics Page 49 Logarithmic forest distances as a subclass of walk distances 1. Qα = (I + Lα)−1, where Lα =diag(Aα1)−Aα and Aα are the Laplacian and weighted adjacency matrices of the graph Gα that differs from G by the edge weights: wα = ϕα(w). The logarithmic forest distances: 2. dα(i, j) = −0.5θ ln qij(α) qii(α) qjj(α) Pavel Chebotarev New classes of graph metrics Page 50 Logarithmic forest distances as a subclass of walk distances 1. Qα = (I + Lα)−1, where Lα =diag(Aα1)−Aα and Aα are the Laplacian and weighted adjacency matrices of the graph Gα that differs from G by the edge weights: wα = ϕα(w). The logarithmic forest distances: 2. dα(i, j) = −0.5θ ln qij(α) qii(α) qjj(α) The marginal cases of the logarithmic forest distance are the shortest path distance and the resistance distance. Pavel Chebotarev New classes of graph metrics Page 50 The logarithmic forest distances are walk distances Theorem (actually, a sketch of the theorem) For any connected graph G, the family of logarithmic forest distances with any edge weight transformation ϕα(w) coincides with a certain family of modified walk distances obtained through balancing the graphs Gα by loops. Pavel Chebotarev New classes of graph metrics Page 51 The logarithmic forest distances are walk distances Theorem (actually, a sketch of the theorem) For any connected graph G, the family of logarithmic forest distances with any edge weight transformation ϕα(w) coincides with a certain family of modified walk distances obtained through balancing the graphs Gα by loops. G is a balance-graph of G G Pavel Chebotarev New classes of graph metrics Page 51 The logarithmic forest distances are walk distances Theorem (actually, a sketch of the theorem) For any connected graph G, the family of logarithmic forest distances with any edge weight transformation ϕα(w) coincides with a certain family of modified walk distances obtained through balancing the graphs Gα by loops. G is a balance-graph of G 2 2 1 1 G G Pavel Chebotarev New classes of graph metrics Page 51 Is there any connection between the long walk distance and the resistance distance? Pavel Chebotarev New classes of graph metrics Page 52 The resistance distance is also a long walk distance Corollary For any connected G, the resistance distance in G coincides with the long walk distance d LW (i, j) in G, where G is any balance-graph of G. Pavel Chebotarev New classes of graph metrics Page 53 The resistance distance is also a long walk distance Corollary For any connected G, the resistance distance in G coincides with the long walk distance d LW (i, j) in G, where G is any balance-graph of G. G is a balance-graph of G 2 2 1 1 G G The resistance distance on G coincide with the long walk distances on G . Pavel Chebotarev New classes of graph metrics Page 53 A novel expression for the resistance distance Corollary For any connected graph G on n vertices, let L be the Laplacian matrix of G and let d r(·, ·) be the resistance distance on V(G). Then for any i, j ∈ V(G) such that j = i, d r (i, j) = n−1 (L¯¯)−1 i + (L¯ı¯ı)−1 j 1 holds, where 1 is the vector of n − 1 ones and (L¯¯)−1 i is the ith row of the inverse principal submatrix L¯¯. Pavel Chebotarev New classes of graph metrics Page 54 An inverse simulation Is the long walk distance equal to the resistance distance in a certain modified graph? Pavel Chebotarev New classes of graph metrics Page 55 A “resistance” representation of the long walk distance Theorem Suppose that: An×n is the weighted adjacency matrix of a connected graph G p is the Perron vector of A A = P AP , where P = diag(p ) and p = √ n p 2 p Then the long walk distance in G coincides with the resistance distance in G whose weighted adjacency matrix is A . Pavel Chebotarev New classes of graph metrics Page 56 Are there SIMPLE matrix representations of the long walk distance? Pavel Chebotarev New classes of graph metrics Page 57 Simple representations of the long walk distance Corollary (of the previous theorem) d LW (i, j) = det(L¯ı¯ı)¯¯ det L¯ı¯ı , j = i, d LW (i, j) = + ii + + jj − 2 + ij , where L + =( + ij ) is the pseudoinverse of L = P (ρI − A)P . Pavel Chebotarev New classes of graph metrics Page 58 Simple representations of the long walk distance Corollary (of the previous theorem) d LW (i, j) = det(L¯ı¯ı)¯¯ det L¯ı¯ı , j = i, d LW (i, j) = + ii + + jj − 2 + ij , where L + =( + ij ) is the pseudoinverse of L = P (ρI − A)P . Remark. Since L is symmetric and irreducible, L + coincides with the group inverse L# = (L + J)−1 − J, where J = 1 n 11T . Pavel Chebotarev New classes of graph metrics Page 58 Expressions for the long walk distance in terms of B = ρI − A Theorem (Ch., R.Bapat, R.Balaji) d LW (i, j) = (np2 j )−1 · det(B¯ı¯ı)¯¯ det B¯ı¯ı , j = i, d LW (i, j) = zT (i, j)B+ z(i, j) n , where B = ρI − A, p = p/ p 2, and z(i, j) is the n-vector whose ith element is 1/pi , jth element is −1/pj , and the other elements are 0. B = ρI − A is the para-Laplacian matrix of G. Pavel Chebotarev New classes of graph metrics Page 59 Cutpoint additive metrics on P4 Pavel Chebotarev New classes of graph metrics Page 60 Cutpoint additive metrics on P4 On the next slide, the distances are listed in descending order of d(1, 4). Each picture presents the projection of the distance-obeying polygon onto the plane parallel to the line segments (1, 4) and (2, 3). Pavel Chebotarev New classes of graph metrics Page 60 Cutpoint non-additive metrics on P4 Pavel Chebotarev New classes of graph metrics Page 61 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 62 Classical graph distances 1 Shortest path distance 2 Weighted shortest path distance 3 Resistance distance Are other distances needed? Pavel Chebotarev New classes of graph metrics Page 62 Are new graph metrics needed in practice? Yen, Saerens, Mantrach, Shimbo (2008) performed machine clustering with noisy data and used a parametric family of dissimilarity measures that reduce to the shortest path and the resistance distance at the marginal values of the parameter and are not distances in general. Pavel Chebotarev New classes of graph metrics Page 63 Are new graph metrics needed in practice? Yen, Saerens, Mantrach, Shimbo (2008) performed machine clustering with noisy data and used a parametric family of dissimilarity measures that reduce to the shortest path and the resistance distance at the marginal values of the parameter and are not distances in general. “The results obtained for intermediate values of θ are usually better than those obtained with the commute-time and the shortest-path kernels.” Pavel Chebotarev New classes of graph metrics Page 63 Are new graph metrics needed in practice? Yen, Saerens, Mantrach, Shimbo (2008) performed machine clustering with noisy data and used a parametric family of dissimilarity measures that reduce to the shortest path and the resistance distance at the marginal values of the parameter and are not distances in general. “The results obtained for intermediate values of θ are usually better than those obtained with the commute-time and the shortest-path kernels.” “We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions.” (von Luxburg, Radl, Hein, 2010) Pavel Chebotarev New classes of graph metrics Page 63 Are new graph metrics needed in practice? Yen, Saerens, Mantrach, Shimbo (2008) performed machine clustering with noisy data and used a parametric family of dissimilarity measures that reduce to the shortest path and the resistance distance at the marginal values of the parameter and are not distances in general. “The results obtained for intermediate values of θ are usually better than those obtained with the commute-time and the shortest-path kernels.” “We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions.” (von Luxburg, Radl, Hein, 2010) There is a strong demand for alternative graph distances. Pavel Chebotarev New classes of graph metrics Page 63 Finally, some problems Study the the family of p-resistance distances by Alamgir and von Luxburg (2011). Pavel Chebotarev New classes of graph metrics Page 64 Finally, some problems Study the the family of p-resistance distances by Alamgir and von Luxburg (2011). Study links of the e-distances with entropy, free energy, etc. Pavel Chebotarev New classes of graph metrics Page 64 Finally, some problems Study the the family of p-resistance distances by Alamgir and von Luxburg (2011). Study links of the e-distances with entropy, free energy, etc. Study the new distances in the context of clustering on large random graphs in high dimensions. Pavel Chebotarev New classes of graph metrics Page 64 Finally, some problems Study the the family of p-resistance distances by Alamgir and von Luxburg (2011). Study links of the e-distances with entropy, free energy, etc. Study the new distances in the context of clustering on large random graphs in high dimensions. Study connections of the new distances with Ricci curvature on graphs. Pavel Chebotarev New classes of graph metrics Page 64 Thank you! Pavel Chebotarev New classes of graph metrics Page 65 Some references F. Buckley, F. Harary, Distance in Graphs, Addison-Wesley, Redwood City, CA, 1990. P.Yu. Chebotarev, E.V. Shamis. On proximity measures for graph vertices, Automation and Remote Control 59 (1998) No. 10, Part 2 1443–1459. P. Chebotarev, The walk distances in graphs, Discr. Appl. Math. 160 (2012) 1484-1500. P. Chebotarev, The graph bottleneck identity, Adv. Appl. Math. 47 (2011) 403–413. P. Chebotarev, A Class of graph-geodetic distances generalizing the shortest-path and the resistance distances, Discr. Appl. Math. 159 (2011) 295–302. P. Chebotarev, R. Agaev, Forest matrices around the Laplacian matrix, Linear Algebra Appl. 356 (2002) 253–274. P. Chebotarev, E. Shamis, The forest metrics for graph vertices, Electron. Notes Discrete Math. 11 (2002) 98–107. H. Chen, F. Zhang, Resistance distance and the normalized Laplacian spectrum, Discrete Appl. Math. 155 (2007) 654–661. R.L. Graham, A.J. Hoffman, H. Hosoya, On the distance matrix of a directed graph, J. Graph Theory 1 (1977) 85–88. A.D. Gvishiani, V.A. Gurvich, Dynamical Classification Problems and Convex Programming in Applications, Nauka (Science), Moscow, 1992 (in Russian). U. von Luxburg, A. Radl, M. Hein, Getting lost in space: Large sample analysis of the commute distance, Working paper of Saarland University, Saarbrücken, Germany, 2010. L. Yen, M. Saerens, A. Mantrach, M. Shimbo, A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances, 14th ACM SIGKDD Intern. Conf. on Knowledge Discovery & Data Mining, 2008, pp. 785–793. Pavel Chebotarev New classes of graph metrics Page 66 Definitions: hitting walk, cycle, commute cycle A hitting v0 → vm walk is a v0 → vm walk containing only one occurrence of vm. A v0 → vm walk is called a v0 → v0 cycle if1 vm = v0. A v0 → v0 cycle is called a v0 vm commute cycle if it contains vm and at most two occurrences of v0. 1 Such a walk is also called a closed walk. Pavel Chebotarev New classes of graph metrics Page 67 A balance-graph of G G is a balance-graph of G if G is obtained from G by attaching some loops and assigning the loop weights that provide G with uniform weighted vertex degrees. Balancing G by loops means constructing a balance-graph of G. Pavel Chebotarev New classes of graph metrics Page 68 On the simplest logarithmic forest distances Corollary For any connected graph G, if ϕα(w) = αw, then the family of logarithmic forest distances with A = R+ coincides with the family of walk distances calculated for any balance-graph of G. Pavel Chebotarev New classes of graph metrics Page 69 On the simplest logarithmic forest distances Corollary For any connected graph G, if ϕα(w) = αw, then the family of logarithmic forest distances with A = R+ coincides with the family of walk distances calculated for any balance-graph of G. G is a balance-graph of G 2 2 1 1 G G The simplest logarithmic forest distances on G coincide with the walk distances on G . Pavel Chebotarev New classes of graph metrics Page 69

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Tessellabilities, Reversibilities, and Decomposabilities of Polytopes ― A Survey ― École nationale supérieure des mines de Paris Paris, August 28, 2013 Jin Akiyama: Tokyo University of Science Ikuro Sato: Miyagi Cancer Center Hyunwoo Seong: The University of Tokyo Tokyo, Japan 1. P1-TILES AND P2-TILES 2 3 A P1-tile is a polygon which tiles the plane with translations only. Two families of convex P1-tiles : (1) parallelograms and (2) hexagons with three pairs of opposite sides parallel and of the same lengths (P1-hexagons). Parallelogram P1-hexagon Parallelepiped(PP) Rhombic Dodecahedron(RD) Hexagonal Prism(HP) Elongated Rhombic Dodecahedron(ERD) Truncated Octahedron(TO) 4 F1 F2 F3 F4 F5 A 3-dimensional P1-tile is a polyhedron which tiles the space with translations only. Five families of convex 3-dimensional P1-tiles (Fedorov) : 10 Triangle Quadrilateral P2-pentagon (BC∥ED) P2-hexagon (QPH) (AB∥ED and |AB|=|ED|) Theorem A Every convex P2-tile belongs to one of the following four families: F1 F2 F3 F4 A P2-tile is a polygon which tiles the plane by translations and 180° rotations only. 11 Determine all convex 3-dimensional P2-tiles, i.e., convex polyhedra each of which tiles the space in P2-manner. (cf) triangular prism, … A net of a convex polyhedron P is defined to be a connected planar object obtained by cutting the surface of P. An ART (almost regular tetrahedron) is a tetrahedron with four congruent faces. CG Theorem B (J.A(2007)) Every net (convex or concave) of an ART tiles the plane in P2-manner. Artworks Artworks Artworks 2. REVERSIBILITY 17 18 Volvox, a kind of green alga known as one of the most simple colonial (≒ multicellular) organisms, reproduces itself by reversing its interior offspring and its surface. Theorem C (J.A. (2007)) If a pair of polygons A and B is reversible, then each of them tiles the plane by translations and 180°rotations only (P2- tiling). 19 A : red quadrilateral, B: blue triangle CG CG 20 Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem D ( J.A., I. Sato, H. Seong (2013)) For an arbitrary convex P2-tile P and an arbitrary family Fi (i= 1, 2, 3, and 4) of convex P2-tiles, there exists a polygon Q ∈ Fi such that the pair P and Q is reversible. A king in a cage 22 Spider ⇔ Geisha 23 A 3-dimensional P1-tile is said to be canonical if it is convex and symmetric with respect to each orthogonal axis. F5F4F3F2F1 24 UFO ⇔ Alien CG Let Π be the set of the five Platonic 1, σ2, σ3, σ4. Then Φ = {σ1, . . ., σ4} is an element set for Π, and the decomposition of each Platonic solid into these elements is summarized in Table 3. Theorem E ( J.A., I. Sato, H. Seong (2011)) For an arbitrary canonical 3-dimensional P1-tile P and an arbitrary family Fi (i= 1, 2, 3, 4, and 5) of canonical 3- dimensional P1-tiles, there exists a polyhedron Q ∈ Fi such that the pair P and Q is reversible. Cube -> Hexagonal Prism 25 CG Hexagonal Prism -> Truncated Octahedron Rhombic Dodecahedron -> Elongated Rhombic Dodecahedron CG CG 3. TILINGS AND ATOMS 26 2 2 2 6 6 2 6 2 2 2 4 4 3 2 23 A symmetric pair of pentadra Pentadron is a convex pentahedron whose net is as follows: 28 Tetrapak is a special kind of ART(tetrahedron with four congruent faces) made by pentadra as follows: 29 Theorem F (J.A.) A tetrapak tiles the space and its net tiles the plane. Problem Determine all convex polyhedra, each of which tiles the space and one of its nets tiles the plane. Theorem G (J.A, G.Nakamura, I.Sato (2012)) Every convex 3-dimensional P1-tile (or its affine- stretching transform) can be constructed by copies of a pentadron. 31 Cube Hexagonal prism 32 33 Truncated octahedron Rhombic dodecahedron 35 Elongated rhombic dodecahedron

ORAL SESSION 6 Computational Information Geometry (Frank Nielsen)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

A new implementation of k-MLE for mixture modelling of Wishart distributions Christophe Saint-Jean Frank Nielsen Geometric Science of Information 2013 August 28, 2013 - Mines Paris Tech Application Context (1) 2/31 We are interested in clustering varying-length sets of multivariate observations of same dim. p. X1 =   3.6 0.05 −4. 3.6 0.05 −4. 3.6 0.05 −4.   , . . . , XN =       5.3 −0.5 2.5 3.6 0.5 3.5 1.6 −0.5 4.6 −1.6 0.5 5.1 −2.9 −0.5 6.1       Sample mean is a good but not discriminative enough feature. Second order cross-product matrices tXi Xi may capture some relations between (column) variables. Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Application Context (2) 3/31 The problem is now the clustering of a set of p × p PSD matrices : χ = x1 = t X1X1, x2 = t X2X2, . . . , xN = t XNXN Examples of applications : multispectral/DTI/radar imaging, motion retrieval system, ... Outline of this talk 4/31 1 MLE and Wishart Distribution Exponential Family and Maximum Likehood Estimate Wishart Distribution Two sub-families of the Wishart Distribution 2 Mixture modeling with k-MLE Original k-MLE k-MLE for Wishart distributions Heuristics for the initialization 3 Application to motion retrieval Reminder : Exponential Family (EF) 5/31 An exponential family is a set of parametric probability distributions EF = {p(x; λ) = pF (x; θ) = exp { t(x), θ + k(x) − F(θ)|θ ∈ Θ} Terminology: λ source parameters. θ natural parameters. t(x) sufficient statistic. k(x) auxiliary carrier measure. F(θ) the log-normalizer: differentiable, strictly convex Θ = {θ ∈ RD|F(θ) < ∞} is an open convex set Almost all commonly used distributions are EF members but uniform, Cauchy distributions. Reminder : Maximum Likehood Estimate (MLE) 6/31 Maximum Likehood Estimate principle is a very common approach for fitting parameters of a distribution ˆθ = argmax θ L(θ; χ) = argmax θ N i=1 p(xi ; θ) = argmin θ − 1 N N i=1 log p(xi ; θ) assuming a sample χ = {x1, x2, ..., xN} of i.i.d observations. Log density have a convenient expression for EF members log pF (x; θ) = t(x), θ + k(x) − F(θ) It follows ˆθ = argmax θ N i=1 log pF (xi ; θ) = argmax θ N i=1 t(xi ), θ − NF(θ) MLE with EF 7/31 Since F is a strictly convex, differentiable function, MLE exists and is unique : F(ˆθ) = 1 N N i=1 t(xi ) Ideally, we have a closed form : ˆθ = F−1 1 N N i=1 t(xi ) Numerical methods including Newton-Raphson can be successfully applied. Wishart Distribution 8/31 Definition (Central Wishart distribution) Wishart distribution characterizes empirical covariance matrices for zero-mean gaussian samples: Wd (X; n, S) = |X| n−d−1 2 exp − 1 2tr(S−1X) 2 nd 2 |S| n 2 Γd n 2 where for x > 0, Γd (x) = π d(d−1) 4 d j=1 Γ x − j−1 2 is the multivariate gamma function. Remarks : n > d − 1, E[X] = nS The multivariate generalization of the chi-square distribution. Wishart Distribution as an EF 9/31 It’s an exponential family: log Wd (X; θn, θS ) = < θn, log |X| >R + < θS , − 1 2 X >HS + k(X) − F(θn, θS ) with k(X) = 0 and (θn, θS ) = ( n − d − 1 2 , S−1 ), t(X) = (log |X|, − 1 2 X), F(θn, θS ) = θn + (d + 1) 2 (d log(2) − log |θS |)+log Γd θn + (d + 1) 2 MLE for Wishart Distribution 10/31 In the case of the Wishart distribution, a closed form would be obtained by solving the following system ˆθ = F−1 1 N N i=1 t(xi ) ≡    d log(2) − log |θS | + Ψd θn + (d+1) 2 = ηn − θn + (d+1) 2 θ−1 S = ηS (1) with ηn and ηS the expectation parameters and Ψd the derivative of the log Γd . Unfortunately, no closed-form solution is known. Two sub-families of the Wishart Distribution (1) 11/31 Case n fixed (n = 2θn + d + 1) Fn(θS ) = nd 2 log(2) − n 2 log |θS | + log Γd n 2 kn(X) = n − d − 1 2 log |X| Case S fixed (S = θ−1 S ) FS (θn) = θn + d + 1 2 log |2S| + log Γd θn + d + 1 2 kS (X) = − 1 2 tr(S−1 X) Two sub-families of the Wishart Distribution (2) 12/31 Both are exponential families and MLE equations are solvable ! Case n fixed: − n 2 ˆθ−1 S = 1 N N i=1 − 1 2 Xi =⇒ ˆθS = Nn N i=1 Xi −1 (2) Case S fixed : ˆθn = Ψ−1 d 1 N N i=1 log |Xi | − log |2S| − d + 1 2 , ˆθn > 0 (3) with Ψ−1 d the functional reciprocal of Ψd . An iterative estimator for the Wishart Distribution 13/31 Algorithm 1: An estimator for parameters of the Wishart Input: A sample X1, X2, . . . , XN of Sd ++ Output: Final values of ˆθn and ˆθS Initialize ˆθn with some value > 0; repeat Update ˆθS using Eq. 2 with n = 2ˆθn + d + 1; Update ˆθn using Eq. 3 with S the inverse matrix of ˆθS ; until convergence of the likelihood; Questions and open problems 14/31 From a sample of Wishart matrices, distr. parameters are recovered in few iterations. Major question : do you have a MLE ? probably ... Minor question : sample size N = 1 ? Under-determined system Regularization by sampling around X1 Mixture Models (MM) 15/31 A additive (finite) mixture is a flexible tool to model a more complex distribution m: m(x) = k j=1 wj pj (x), 0 ≤ wj ≤ 1, k j=1 wj = 1 where pj are the component distributions of the mixture, wj the mixing proportions. In our case, we consider pj as member of some parametric family (EF) m(x; Ψ) = k j=1 wj pFj (x; θj ) with Ψ = (w1, w2, ..., wk−1, θ1, θ2, ..., θk) Expectation-Maximization is not fast enough [5] ... Original k-MLE (primal form.) in one slide 16/31 Algorithm 2: k-MLE Input: A sample χ = {x1, x2, ..., xN}, F1, F2, ..., Fk Bregman generator Output: Estimate ˆΨ of mixture parameters A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argmaxj log ˆwj pFj (xi ; ˆθj ); foreach Cj := {xi ∈ χ|zi = j} do ˆθj = MLEFj (Cj ); until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE’s properties 17/31 Another formulation comes with the connection between EF and Bregman divergences [3]: log pF (x; θ) = −BF∗ (t(x) : η) + F∗ (t(x)) + k(x) Bregman divergence BF (. : .) associated to a strictly convex and differentiable function F : Original k-MLE (dual form.) in one slide 18/31 Algorithm 3: k-MLE Input: A sample χ = {y1 = t(x1), y2 = x2, ..., yn = t(xN)}, F∗ 1 , F∗ 2 , ..., F∗ k Bregman generator Output: ˆΨ = (ˆw1, ˆw2, ..., ˆwk−1, ˆθ1 = F∗(ˆη1), ..., ˆθk = F∗(ˆηk)) A good initialization for Ψ (see later); repeat repeat foreach xi ∈ χ do zi = argminj BF∗ j (yi : ˆηj ) − log ˆwj ; foreach Cj := {xi ∈ χ|zi = j} do ˆηj = xi ∈Cj yi /|Cj | until Convergence of the complete likelihood; Update mixing proportions : ˆwj = |Cj |/N until Further convergence of the complete likelihood; k-MLE for Wishart distributions 19/31 Practical considerations impose modifications of the algorithm: During the assignment empty clusters may appear (High dimensional data get this worse). A possible solution is to consider Hartigan and Wang’s strategy [6] instead of Lloyd’s strategy: Optimally transfer one observation at a time Update the parameters of involved clusters. Stop when no transfer is possible. This should guarantees non-empty clusters [7] but does not work when considering weighted clusters... Get back to an“old school”criterion : |Czi | > 1 Experimentally shown to perform better in high dimension than the Lloyd’s strategy. k-MLE - Hartigan and Wang 20/31 Criterion for potential transfer (Max): log ˆwzi pFzi (xi ; ˆθzi ) log ˆwz∗ i pFz∗ i (xi ; ˆθzi ∗ ) < 1 with z∗ i = argmaxj log ˆwj pFj (xi ; ˆθj ) Update rules : ˆθzi = MLEFj (Czi \{xi }) ˆθz∗ i = MLEFj (Cz∗ i ∪ {xi }) OR Criterion for potential transfer (Min): BF∗ (yi : ηz∗ i ) − log wz∗ i BF∗ (yi : ηzi ) − log wzi < 1 with z∗ i = argminj (BF∗ (yi : ηj ) − log wj ) Update rules : ηzi = |Czi |ηzi − yi |Czi | − 1 ηz∗ i = |Cz∗ i |ηz∗ i + yi |Cz∗ i | + 1 Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Towards a good initialization... 21/31 Classical initializations methods : random centers, random partition, furthest point (2-approximation), ... Better approach is k-means++ [8]: “Sampling prop. to sq. distance to the nearest center.” Fast and greedy approximation : Θ(kN) Probabilistic guarantee of good initialization: OPTF ≤ k-meansF ≤ O(log k)OPTF Dual Bregman divergence BF∗ may replace the square distance Heuristic to avoid to fix k 22/31 K-means imposes to fix k, the number of clusters We propose on-the-fly cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to fix k 22/31 K-means imposes to fix k, the number of clusters We propose on-the-fly cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” Heuristic to avoid to fix k 22/31 K-means imposes to fix k, the number of clusters We propose on-the-fly cluster creation together with the k-MLE++ (inspired by DP-k-means [9]) : “Create cluster when there exists observations contributing too much to the loss function with already selected centers” It may overestimate the number of clusters... Initialization with DP-k-MLE++ 23/31 Algorithm 4: DP-k-MLE++ Input: A sample y1 = t(X1), . . . , yN = t(XN), F , λ > 0 Output: C a subset of y1, . . . , yN, k the number of clusters Choose first seed C = {yj }, for j uniformly random in {1, 2, . . . , N}; repeat foreach yi do compute pi = BF∗ (yi : C)/ N i =1 BF∗ (yi : C) where BF∗ (yi : C) = minc∈CBF∗ (yi : c) ; if ∃pi > λ then Choose next seed s among y1, y2, . . . , yN with prob. pi ; Add selected seed to C : C = C ∪ {s} ; until all pi ≤ λ; k = |C|; Motion capture 24/31 Real dataset: Motion capture of contemporary dancers (15 sensors in 3d). Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Remark: Size of each sub-motion is known (so its θn) Application to motion retrieval(1) 25/31 Motion capture data can be view as matrices Xi with different row sizes but same column size d. The idea is to describe Xi through one mixture model parameters ˆΨi . Mixture parameters can be viewed as a sparse representation of local dynamics in Xi . Application to motion retrieval(2) 26/31 Comparing two movements amounts to compute a dissimilarity measure between ˆΨi and ˆΨj . Remark 1 : with DP-k-MLE++, the two mixtures would not probably have the same number of components. Remark 2 : when both mixtures have one component, a natural choice is KL(Wd (.; ˆθ)||Wd (.; ˆθ )) = BF∗ (ˆη : ˆη ) = BF (ˆθ : ˆθ) A closed form is always available ! No closed form exists for KL divergence between general mixtures. Application to motion retrieval(3) 27/31 A possible solution is to use the CS divergence [10]: CS(m : m ) = − log m(x)m (x)dx m(x)2dx m (x)2dx It has a analytic formula for m(x)m (x)dx = k j=1 k j =1 wj wj exp F(θj +θj )−(F(θj )+F(θj )) Note that this expression is well defined since natural parameter space Θ = R+ ∗ × Sp ++ is a convex cone. Implementation 28/31 Early specific code in MatlabTM. Today implementation in Python (based on pyMEF [2]) Ongoing proof of concept (with Herranz F., Beuriv´e A.) Conclusions - Future works 29/31 Still some mathematical work to be done: Solve MLE equations to get F∗ = ( F)−1 then F∗ Characterize our estimator for full Wishart distribution. Complete and validate the prototype of system for motion retrieval. Speeding-up algorithm: computational/numerical/algorithmic tricks. library for bregman divergences learning ? Possible extensions: Reintroduce mean vector in the model : Gaussian-Wishart Online k-means -> online k-MLE ... References I 30/31 Nielsen, F.: k-MLE: A fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing. (2012) pp. 869–872 Schwander, O. and Nielsen, F. pyMEF - A framework for Exponential Families in Python in Proceedings of the 2011 IEEE Workshop on Statistical Signal Processing Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J. Clustering with bregman divergences. Journal of Machine Learning Research (6) (2005) 1705–1749 Nielsen, F., Garcia, V.: Statistical exponential families: A digest with flash cards. http://arxiv.org/abs/0911.4863 (11 2009) Hidot, S., Saint Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering. Pattern Recognition Letters 31(14) (2010) 2318–2324 References II 31/31 Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1) (1979) 100–108 Telgarsky, M., Vattani, A.: Hartigan’s method: k-means clustering without Voronoi. In: Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS). (2010) pp. 820–827 Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (2007) pp. 1027–1035 Kulis, B., Jordan, M.I.: Revisiting k-means: New algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML). (2012) Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR). (2012) pp. 1723–1726

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Hypothesis testing, information divergence and computational geometry Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org Sony Computer Science Laboratories, Inc. August 2013, GSI, Paris, FR c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/20 The Multiple Hypothesis Testing (MHT) problem Given a rv. X with n hypothesis H1 : X ∼ P1, ..., Hn : X ∼ Pn, decide for a IID sample x1, ..., xm ∼ X which hypothesis holds true? Pm correct = 1 − Pm error Asymptotic regime: α = − 1 m log Pm e , m → ∞ c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/20 Bayesian hypothesis testing (preliminaries) prior probabilities: wi = Pr(X ∼ Pi ) > 0 (with n i=1 wi = 1) conditional probabilities: Pr(X = x|X ∼ Pi ). Pr(X = x) = n i=1 Pr(X ∼ Pi )Pr(X = x|X ∼ Pi ) = n i=1 wi Pr(X|Pi ) Let ci,j = cost of deciding Hi when in fact Hj is true. Matrix [cij ]= cost design matrix Let pi,j(u) = probability of making this decision using rule u. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/20 Bayesian detector Minimize the expected cost: EX [c(r(x))], c(r(x)) = i  wi j=i ci,jpi,j(r(x))   Special case: Probability of error Pe obtained for ci,i = 0 and ci,j = 1 for i = j: Pe = EX   i  wi j=i pi,j(r(x))     The maximum a posteriori probability (MAP) rule considers classifying x: MAP(x) = argmaxi∈{1,...,n} wi pi (x) where pi (x) = Pr(X = x|X ∼ Pi ) are the conditional probabilities. → MAP Bayesian detector minimizes Pe over all rules [8] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/20 Probability of error and divergences Without loss of generality, consider equal priors ( w1 = w2 = 1 2): Pe = x∈X p(x) min(Pr(H1|x), Pr(H2|x))dν(x) (Pe > 0 as soon as suppp1 ∩ suppp2 = ∅) From Bayes’ rule Pr(Hi |X = x) = Pr(Hi )Pr(X=x|Hi ) Pr(X=x) = wi pi (x)/p(x) Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) Rewrite or bound Pe using tricks of the trade: Trick 1. ∀a, b ∈ R, min(a, b) = a+b 2 − |a−b| 2 , Trick 2. ∀a, b > 0, min(a, b) ≤ minα∈(0,1) aαb1−α, c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/20 Probability of error and total variation Pe = 1 2 x∈X p1(x) + p2(x) 2 − |p1(x) − p2(x)| 2 dν(x), = 1 2 1 − 1 2 x∈X |p1(x) − p2(x)|dν(x) Pe = 1 2 (1 − TV(P1, P2)) total variation metric distance: TV(P, Q) = 1 2 x∈X |p(x) − q(x)|dν(x) → Difficult to compute when handling multivariate distributions. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/20 Bounding the Probability of error Pe min(a, b) ≤ minα∈(0,1) aαb1−α for a, b > 0, upper bound Pe: Pe = 1 2 x∈X min(p1(x), p2(x))dν(x) ≤ 1 2 min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x). C(P1, P2) = − log min α∈(0,1) x∈X pα 1 (x)p1−α 2 (x)dν(x) ≥ 0, Best error exponent α∗ [7]: Pe ≤ wα∗ 1 w1−α∗ 2 e−C(P1,P2) ≤ e−C(P1,P2) Bounding technique can be extended using any quasi-arithmetic α-means [13, 9]... c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/20 Computational information geometry Exponential family manifold [4]: M = {pθ | pθ(x) = exp(t(x)⊤ θ − F(θ))} Dually flat manifolds [1] enjoy dual affine connections [1]: (M, ∇2F(θ), ∇(e), ∇(m)). η = ∇F(θ), θ = ∇F∗ (η) Canonical divergence from Young inequality: A(θ1, η2) = F(θ1) + F∗ (η2) − θ⊤ 1 η2 ≥ 0 F(θ) + F∗ (η) = θ⊤ η c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/20 MAP decision rule and additive Bregman Voronoi diagrams KL(pθ1 : pθ2 ) = B(θ2 : θ1) = A(θ2 : η1) = A∗ (η1 : θ2) = B∗ (η1 : η2) Canonical divergence (mixed primal/dual coordinates): A(θ2 : η1) = F(θ2) + F∗ (η1) − θ⊤ 2 η1 ≥ 0 Bregman divergence (uni-coordinates, primal or dual): B(θ2 : θ1) = F(θ2) − F(θ1) − (θ2 − θ1)⊤ ∇F(θ1) log pi (x) = −B∗ (t(x) : ηi ) + F∗ (t(x)) + k(x), ηi = ∇F(θi ) = η(Pθi ) Optimal MAP decision rule: MAP(x) = argmaxi∈{1,...,n}wi pi (x) = argmaxi∈{1,...,n} − B∗ (t(x) : ηi ) + log wi , = argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi → nearest neighbor classifier [2, 10, 15, 16] c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/20 MAP & nearest neighbor classifier Bregman Voronoi diagrams (with additive weights) are affine diagrams [2]. argmini∈{1,...,n}B∗ (t(x) : ηi ) − log wi ◮ point location in arrangement [3] (small dims), ◮ Divergence-based search trees [16], ◮ GPU brute force [6]. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/20 Geometry of the best error exponent: binary hypothesis On the exponential family manifold, Chernoff α-coefficient [5]: cα(Pθ1 : Pθ2 ) = pα θ1 (x)p1−α θ2 (x)dµ(x) = exp(−J (α) F (θ1 : θ2)), Skew Jensen divergence [14] on the natural parameters: J (α) F (θ1 : θ2) = αF(θ1) + (1 − α)F(θ2) − F(θ (α) 12 ), Chernoff information = Bregman divergence for exponential families: C(Pθ1 : Pθ2 ) = B(θ1 : θ (α∗) 12 ) = B(θ2 : θ (α∗) 12 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/20 Geometry of the best error exponent: binary hypothesis Chernoff distribution P∗ [12]: P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) e-geodesic: Ge(P1, P2) = {E (λ) 12 | θ(E (λ) 12 ) = (1 − λ)θ1 + λθ2, λ ∈ [0, 1]}, m-bisector: Bim(P1, P2) : {P | F(θ1) − F(θ2) + η(P)⊤ ∆θ = 0}, Optimal natural parameter of P∗: θ∗ = θ (α∗) 12 = argminθ∈ΘB(θ1 : θ) = argminθ∈ΘB(θ2 : θ). → closed-form for order-1 family, or efficient bisection search. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/20 Geometry of the best error exponent: binary hypothesis P∗ = Pθ∗ 12 = Ge(P1, P2) ∩ Bim(P1, P2) pθ1 pθ2 pθ∗ 12 m-bisector e-geodesic Ge(Pθ1 , Pθ2 ) η-coordinate system Pθ∗ 12 C(θ1 : θ2) = B(θ1 : θ∗ 12) Bim(Pθ1 , Pθ2 ) c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/20 Geometry of the best error exponent: multiple hypothesis n-ary MHT [8] from minimum pairwise Chernoff distance: C(P1, ..., Pn) = min i,j=i C(Pi , Pj ) Pm e ≤ e−mC(Pi∗ ,Pj∗ ) , (i∗ , j∗ ) = argmini,j=iC(Pi , Pj ) Compute for each pair of natural neighbors [3] Pθi and Pθj , the Chernoff distance C(Pθi , Pθj ), and choose the pair with minimal distance. (Proof by contradiction using Bregman Pythagoras theorem.) → Closest Bregman pair problem (Chernoff distance fails triangle inequality). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/20 Hypothesis testing: Illustration η-coordinate system Chernoff distribution between natural neighbours c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/20 Summary Bayesian multiple hypothesis testing... ... from the viewpoint of computational geometry. ◮ probability of error & best MAP Bayesian rule ◮ total variation & Pe, upper-bounded by the Chernoff distance. ◮ Exponential family manifolds: ◮ MAP rule = NN classifier (additive Bregman Voronoi diagram) ◮ best error exponent from intersection geodesic/bisector for binary hypothesis, ◮ best error exponent from closest Bregman pair for multiple hypothesis. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/20 Thank you 28th-30th August, Paris. @incollection{HTIGCG-GSI-2013, year={2013}, booktitle={Geometric Science of Information}, volume={8085}, series={Lecture Notes in Computer Science}, editor={Frank Nielsen and Fr\’ed\’eric Barbaresco}, title={Hypothesis testing, information divergence and computational geometry}, publisher={Springer Berlin Heidelberg}, author={Nielsen, Frank}, pages={241-248} } c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/20 Bibliographic references I Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Oxford University Press, 2000. Jean-Daniel Boissonnat, Frank Nielsen, and Richard Nock. Bregman Voronoi diagrams. Discrete & Computational Geometry, 44(2):281–307, 2010. Jean-Daniel Boissonnat and Mariette Yvinec. Algorithmic Geometry. Cambridge University Press, New York, NY, USA, 1998. Lawrence D. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Institute of Mathematical Statistics, Hayworth, CA, USA, 1986. Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. Vincent Garcia, Eric Debreuve, Frank Nielsen, and Michel Barlaud. k-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In IEEE International Conference on Image Processing (ICIP), pages 3757–3760, 2010. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/20 Bibliographic references II Martin E. Hellman and Josef Raviv. Probability of error, equivocation and the Chernoff bound. IEEE Transactions on Information Theory, 16:368–372, 1970. C. C. Leang and D. H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, January 1997. Frank Nielsen. Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. submitted, 2012. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2012. preliminary, technical report on arXiv. Frank Nielsen. Hypothesis testing, information divergence and computational geometry. In Frank Nielsen and Fr´ed´eric Barbaresco, editors, Geometric Science of Information, volume 8085 of Lecture Notes in Computer Science, pages 241–248. Springer Berlin Heidelberg, 2013. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/20 Bibliographic references III Frank Nielsen. An information-geometric characterization of Chernoff information. IEEE Signal Processing Letters (SPL), 20(3):269–272, March 2013. Frank Nielsen. Pattern learning and recognition on statistical manifolds: An information-geometric review. In Edwin Hancock and Marcello Pelillo, editors, Similarity-Based Pattern Recognition, volume 7953 of Lecture Notes in Computer Science, pages 1–25. Springer Berlin Heidelberg, 2013. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. Frank Nielsen, Paolo Piro, and Michel Barlaud. Bregman vantage point trees for efficient nearest neighbor queries. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (ICME), pages 878–881, 2009. Paolo Piro, Frank Nielsen, and Michel Barlaud. Tailored Bregman ball trees for effective nearest neighbors. In European Workshop on Computational Geometry (EuroCG), LORIA, Nancy, France, March 2009. IEEE. c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/20

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

The exponential family in abstract information theory Jan Naudts and Ben Anthonis Universiteit Antwerpen Paris, August 2013 1 Outline Fisher information Example Abstract Information Theory Assumptions Examples of generalized divergences Deformed exponential families Conclusions 2 Fisher information The standard expression for the Fisher information matrix Ik,l (θ) = Eθ ∂ ∂θk ln pθ ∂ ∂θl ln pθ is a relevant quantity when pθ belongs to the exponential family. A different quantity is needed in the general case. It involves the Kullback-Leibler divergence D(p||pθ) = Ep ln p pθ . Remember that Ik,l (θ) = ∂2 ∂θk ∂θl D(p||pθ) p=pθ . 3 The divergence D(p||pθ) is a measure for the distance between an arbitrary state p and a point pθ of the statistical manifold. Let D(p||M) denote the minimal ‘distance’. Let Fθ denote the ‘fiber’ of all points p for which D(p||M) = D(p||pθ). minimum contrast leaf, Eguchi 1992 Definition The extended Fisher information of a pdf p (not necessarily in M) is Ik,l (p) = ∂2 ∂θk ∂θl D(p||pθ) p∈Fθ . 4 Note that on the manifold the two definitions coincide Ik,l(θ) = Ik,l (pθ). Proposition Ik,l (p) is covariant. Proof Let η be a function of θ. One calculates ∂2 ∂θk ∂θl D(x||θ) = ∂2 ∂ηm∂ηn D(x||θ) ∂ηm ∂θk ∂ηn ∂θl + ∂ ∂ηm D(x||θ) ∂2 ηm ∂θk ∂θl . The latter term vanishes because p ∈ Fθ. The former term is manifestly covariant. 5 Proposition If pθ belongs to the exponential family then Ik,l(p) is constant on the fibre Fθ. Proof p, pθ satisfies the Pythagorean relation D(p||pη) = D(p||pθ) + D(pθ||pη). Hence taking derivatives w.r.t. η only involves D(pθ||pη). Only afterwards put η = θ. One concludes that Ik,l (p) = Ik,l (pθ). Coordinate-independent method to verify that M is not an exponential family! 6 Example Suggested by H. Matsuzoe. Consider the manifold of normal distributions pµ,σ(x) with mean µ and standard deviation σ pµ,σ(x) = 1 √ 2πσ2 e−(x−µ)2 /2σ2 . Consider the submanifold M of normal distributions for which µ = σ pθ(x) = 1 √ 2πθ2 e−(x−θ)2 /2θ2 . Question Is M an exponential family? Answer It is known to be curved (Efron, 1975). Let us show that I(pµ,σ) is not constant along fibers Fθ. 7 The Kullback-Leibler divergence D(pµ,σ||pθ) is minimal when θ is the positive root of the equation θ2 + µθ = µ2 + σ2 . The Fisher information I(pµ,σ) equals I(pµ,σ) = θ2 + µ2 + σ2 θ4 . It is not constant on Fθ — it cannot be written as a function of θ. This implies that M is not an exponential family. 8 Abstract Information Theory Our aims ◮ Formulate the notion of an exponential family in the context of abstract information theory ◮ If M is not an exponential family w.r.t. the Kullback-Leibler divergence, can it be exponential w.r.t. some other divergence? Abstract information theory does not rely on probability theory. We try to bring classical and quantum information theory together in a single formalism. 9 A generalized divergence is a map D : X × M → [0, +∞] between two different spaces. ◮ A divergence is generically asymmetric in its two arguments. This is an indication that the two arguments play a different role. X is the space of data sets, M is a manifold of models. ◮ D(x||m) has the meaning of a loss of information when the data set x is replaced by the model point m. ◮ In the classical setting X is the space of empirical measures, M is a statistical manifold. One has in this case M ⊂ X. 10 Assumptions Let Q denote a linear space of continuous real functions of X. Instead of q(x) we write x|q to stress that Q is not an algebra. In the classical setting Q is the space of random variables. In the quantum setting Q is a space of operators on a Hilbert space. We consider a class of generalized divergences which can be written into the form D(x||m) = ξ(m) − ζ(x) − x|Lm , where ξ and ζ are real functions and L : M → Q is a map from the manifold M into the linear space Q. We assume in addition a compatibility and a consistency condition — see a later slide. 11 For instance, the quantities ln p, ln pθ appearing in the Kullback-Leibler divergence D(p||pθ) = Ep ln p − Ep ln pθ = p| ln p − p| ln pθ are used as random variables and belong to Q. One can define a map L : M → Q by Lpθ = ln pθ and write the divergence as D(p||pθ) = ξ(pθ) − ζ(p) − p|Lpθ with ξ(pθ) = 0, ζ(p) = −Ep ln p and p|q = Epq. The quantity ξ(pθ) has been called the corrector by Flemming Topsøe. ζ(p) is the entropy. We call L the logarithmic map. 12 Compatibility condition For each x ∈ X there exists a unique point m ∈ M which minimizes the divergence D(x||m). This means that each point of X belongs to some fiber Fm. Consistency condition Each point m of M can be approached by points x of Fm in the sense that D(x||m) can be made arbitrary small. 13 Example: Bregman divergence A divergence of the Bregman type is defined by D(x||m) = a F(x(a)) − F(m(a)) − (x(a) − m(a))f(m(a)) = a x(a) m(a) du [f(u) − f(m(a))] , where F is any strictly convex function defined on the interval (0, 1] and f = F′ is its derivative. L.M. Bregman, The relaxation method to find the common point of convex sets and its applications to the solution of problems in convex programming, USSR Comp. Math. Math. Phys. 7 (1967) 200–217. 14 In the notations of our abstract information theory one has ◮ x|q = Ex q; ◮ Lm(a) = f(m(a)); ◮ ζ(x) = − a F(x(a)); ◮ ξ(m) = a m(a)f(m(a)) − a F(m(a)). 15 Note that the Bregman divergence can be written as D(x||m) = a f(m(a)) f(x(a)) du [g(u) − x(a)]. g is the inverse function of f. N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi, Information Geometry of U-Boost and Bregman Divergence, Neural Computation 16, 1437–1481 (2004). In the language of non-extensive statistical physics is f the deformed logarithm, g the deformed exponential function. The Kullback-Leibler divergence is recovered by taking F(u) = u ln u − 1. This implies g(u) = eu and f(u) = ln u. 16 Deformed exponential families A parametrized exponential family is of the form mθ(a) = c(a) exp(−α(θ) − θk Hk (a)). physicists′ notation This implies a logarithmic map of the form Lmθ(a) = ln mθ(a) c(a) = −α(θ) − θk Hk (a). It is obvious to generalize this definition by replacing the exponential function by a deformed exponential function. J. Naudts, J. Ineq. Pure Appl. Math. 5 102 (2004). S. Eguchi, Sugaku Expositions (Amer. Math. Soc.) 19, 197–216 (2006). P. D. Grünwald and A. Ph. Dawid, Ann. Statist. 32,1367–1433 (2004). 17 Can we give a definition of a deformed exponential family - which relies only on the divergence? - which does not involve canonical coordinates? - which has a geometric interpretation? Lafferty 1999: additive models mθ minimizes d(m||m0) + θk EmHk . This is a constraint maximum entropy principle. Our proposal: The Fisher information I(x) is constant along the fibers of minimal divergence. This property is a minimum requirement for a distribution to be a (deformed) exponential family. It is satisfied for the deformed exponential families based on Bregman type divergences. 18 Csiszár type of divergences Csiszár type of divergence D(x||m) = a m(a)F x(a) m(a) . The choice F(u) = u ln u reproduces Kullback-Leibler. Example In the context of non-extensive statistical mechanics both Csiszár and Bregman type divergences are being used. Fix the deformation parameter q = 1, 0 < q < 2. Csiszár Dq(x||m) = 1 q − 1 a x(a) x(a) m(a) q−1 − 1 , Bregman Dq(x||m) = 1 q − 1 a x(a) m(a)1−q − x(a)1−q + a [m(a) − x(a)] m(a)1−q . 19 Introduce the q-deformed exponential function expq(u) = [1 + (1 − q)u] 1/(1−q) + . The distribution of the form mθ(a) = expq(−α(θ) − θk Hk (a)) is a deformed exponential family relative to the Bregman type divergence, but not relative to the Csiszár type divergence. In the latter case the extended Fisher info is given by Ik,l (x) = z(x) ∂2 α ∂θk ∂θl with ∂α ∂θk = − 1 z(θ) a x(a)q Hk (a) and z(x) = a x(a)q . If q = 1 then z(x) = 1 and the extended Fisher info is constant along Fθ. If q = 1 it is generically not constant along Fθ. 20 Conclusions ◮ We consider Fisher information not only on the statistical manifold of model states but also for empirical measures. ◮ If the model is an exponential family then the Fisher information is constant along fibers of minimal divergence. ◮ We extend the notion of an exponential family to an abstract setting of information theory ◮ In the abstract setting the definition of a generalized exponential family only depends on the choice of the divergence. 21

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Variational Problem in Euclidean Space With Density Lakehal BELARBI1 and Mohamed BELKHELFA2 1Departement de Math´ematiques, Universit´e de Mostaganem B.P.227,27000,Mostaganem, Alg´erie. 2Laboratoire de Physique Quantique de la Mati`ere et Mod´elisations Math´ematiques (LPQ3M), Universit´e de Mascara B.P.305 , 29000,Route de Mamounia Mascara, Alg´erie. GEOMETRIC SCIENCE OF INFORMATION Paris-Ecole des Mines 28,29 and 30 August 2013 1 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Outline 1 Introduction : • What is a manifold with density. • Examples of a manifold with density. 2 Preliminaries: 3 Plateau’s problem in R3 with density. • Theorem. • Motivation. 4 The Divergence operator in manifolds with density. 2 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References What is a manifold with density A manifold with density is a Riemannian manifold Mn with positive density function eϕ used to weight volume and hyperarea (and sometimes lower-dimensional area and length).In terms of underlying Riemannian volume dV0 and area dA0 , the new weighted volume and area are given by dV = eϕ .dV0, dA = eϕ .dA0. 3 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Examples of a manifold with density One of the first examples of a manifold with density appeared in the realm of probability and statistics, Euclidean space with the Gaussian density e−π|x| (see ([13]) for a detailed exposition in the context of isoperimetric problems). 4 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For reasons coming from the study of diffusion processes,Bakry and ´Emery ([1]) defined a generalization of the Ricci tensor of Riemannian manifold Mn with density eϕ (or the ∞−Bakry-´Emery-Ricci tensor) by Ric∞ ϕ = Ric − Hessϕ, (1) where Ric denotes the Ricci curvature of Mn and Hessϕ the Hessian of ϕ. 5 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References By Perelman in ([11],1.3,p.6),in a Riemannian manifold Mn with density eϕ in order for the Lichnerovicz formula to hold, the corresponding ϕ−scalar curvature is given by S∞ ϕ = S − 2∆ϕ− | ϕ |2 , (2) where S denotes the scalar curvature of Mn.Note that this is different than taking the trace of Ric∞ ϕ which is S − ∆ϕ. 6 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Following Gromov ([6],p.213), the natural generalization of the mean curvature of hypersurfaces on a manifold with density eϕ is given by Hϕ = H − 1 n − 1 d ϕ dN , (3) where H is the Riemannian mean curvature and N is the unit normal vector field of hypersurface . 7 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References For a 2-dimensional smooth manifold with density eϕ , Corwin et al.([5],p.6) define a generalized Gauss curvature Gϕ = G − ∆ϕ. (4) and obtain a generalization of the Gauss-Bonnet formula for a smooth disc D: D Gϕ + ∂D κϕ = 2π, (5) where κϕ is the inward one-dimensional generalized mean curvature as (1.3) and the integrals are with respect to unweighted Riemannian area and arclength ([9],p.181). 8 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Bayle ([2]) has derived the first and second variation formulae for the weighted volume functional (see also [9],[10],[13]).From the first variation formula, it can be shown that an immersed submanifold Nn−1 in Mn is minimal if and only if the generalized mean curvature Hϕ vanishes (Hϕ = 0). Doan The Hieu and Nguyen Minh Hoang ([8]) classified ruled minimal surfaces in R3 with density Ψ = ez. 9 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In ([4]) , we have previously written the equation of minimal surfaces in R3 with linear density Ψ = eϕ (in the case ϕ(x, y, z) = x, ϕ(x, y, z) = y and ϕ(x, y, z) = z), and we gave some solutions of the equation of minimal graphs in R3 with linear density Ψ = eϕ. In ([3]),we gave a description of ruled minimal surfaces by geodesics straight lines in Heisenberg space H3 with linear density Ψ = eϕ = eαx+βy+γz,where (α, β, γ) ∈ R3 − {(0, 0, 0)} (in particular ϕ(x, y, z) = αx and ϕ(x, y, z) = βy), and we gave the ∞−Bakry-´Emery Ricci curvature tensor and the ϕ−scalar curvature of Heisenberg space H3 with radial density e−aρ2+c,where ρ = x2 + y2 + z2. 10 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References In this section, we introduce notations, definitions, and preliminary facts which are used throughout this paper. We deal with two-dimensional surfaces in Euclidean 3-space.We assume that the surface is given parametrically by X : U ⊆ R2 → R3. We denote the parameters by u and v. We denote the partial derivatives with respect to u and v by the corresponding subscripts. The normal vector N to the surface at a given point is defined by N = Xu ∧ Xv Xu ∧ Xv . 11 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References The first fundamental form of the surface is the metric that is induced on the tangent space at each point of the surface.The (u, v) coordinates define a basis for the tangent space.This basis consists of the vectors Xu and Xv .In this basis the matrix of the first fundamental form is E F F G , where E = Xu.Xu, F = Xu.Xv , and G = Xv .Xv . In this basis, the second fundamental form of the surface is given by the matrix : L M M N , where L = −Xu.Nu, M = −Xu.Nv , and N = −Xv .Nv . 12 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Definition ([12]) The area AX(R) of the part X(R) of a surface patch X : U ⊆ R2 → R3 corresponding to a region R ⊆ U is AX(R) = R Xu ∧ Xv dudv. and Xu ∧ Xv = (EG − F2 ) 1 2 . 13 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References We shall now study a family of surface St parameterized by Xt : U → R3 in R3 with density eϕt ,where U is an subset of R2 independent of t,and t lies in some open interval ] − , [,for some > 0. Let S = S0 and eϕ0 = eϕ .The family is required to be smooth, in the sense that the map (u, v, t) → Xt(u, v) from the open subset {(u, v, t)/(u, v) ∈ U, t ∈] − , [} of R3 to R3 is smooth. The surface variation of the family is the function η : U → R3 given by η = ∂Xt ∂t /t=0, Let γ be a simple closed curve that is contained,along with its interior int(γ),in U. Then γ corresponds to a closed curve γt = Xt ◦ γ in the surface St, and we define the ϕt−area function Aϕt (t) in R3 with density eϕt to be the area of the surface St inside γt in R3 with density eϕt : Aϕt (t) = int(γ) eϕt dAXt . 14 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Theorem With the above notation, assume that the surface variation ηt vanishes along the boundary curve γ.Then, ∂Aϕt (t) ∂t /t=0 = Aϕ(0) = −2 int(γ) Hϕ.η.N.eϕ .(EG − F2 ) 1 2 dudv, (6) where Hϕ = H − 1 2 ϕ.N is the ϕ−mean curvature of S in R3 with density eϕ , E, F and G are the coefficients of its first fundamental form,and N is the standard unit normal of S. 15 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Motavation If S in R3 with density eϕ has the smallest ϕ−area among all surfaces in R3 with density eϕ with the given boundary curve γ, then Aϕ must have an absolute minimum at t = 0, so Aϕ(0) = 0 for all smooth families of surfaces as above. This means that the integral in Eq.(6) must vanish for all smooth functions ζ = η.N : U → R. This can happen only if the term that multiplies ζ in the integrand vanishes,in other words only ifHϕ = 0. This suggests the following definition. 16 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Definition A minimal surface in R3 with density eϕ is a surface whose ϕ−mean curvature is zero everywhere. 17 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proposition The minimal equation of surface S : z = f (x, y) in R3 with linear density ex given by the parametrization: X : (x, y) → (x, y, f (x, y)) , where (x, y) ∈ R2 is 1 + ∂f ∂x 2 ∂2f ∂y2 + ∂f ∂x + 1 + ∂f ∂y 2 ∂2f ∂x2 + ∂f ∂x −2 ∂f ∂x . ∂f ∂y . ∂2f ∂x∂y − ∂f ∂x = 0. 18 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Example The surface S in R3 with linear density ex defined by the parametrization : X : (x, y) → x, y, − a2 √ 1 + a2 arcsin(βe − 1+a2 a2 x ) + ay + b + γ , where (x, y) ∈ R2 , a, b, β ∈ R∗ is minimal. 19 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let (Mn, g) be a Riemannian manifold equipped with the Riemannian metric g.For any smooth function f on M, the gradient f is a vector field on M, which is locally coordinates x1, x2....., xn has the form ( f )i = gij ∂f ∂xj , where summation is assumed over repeated indices. For any smooth vector field F on M, the divergence divF is a scalar function on R, which is given in local coordinates by divF = 1 detgij ∂ ∂xi ( detgij Fi ) Let ν be the Riemannian volume on M, that is ν = detgij dx1.....dxn. By the divergence theorem, for any smooth function f and a smooth vector field F, such that either f or F has compact support, M fdivFdν = − M < f , F > dν. (7) 20 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References where < ., . >= g(., .). In particular, if F = ψ for a function ψ then we obtain M fdiv ψdν = − M < f , ψ > dν. (8) provided one of the functions f , ψ has compact support. The operator ∆ := div ◦ is called the Laplace (or Laplace-Beltrami ) operator of the Riemannian manifold M. From (8) we obtain the Green formulas M f ∆ψdν = − M < f , ψ > dν = M ψ∆fdν. (9) 21 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Let now µ be another measure on M defined by dµ = eϕ dν where ϕ is a smooth function on M. A triple (Mn, g, µ) is called a weighted manifold or manifold with density.The associative divergence divµ is defined by divµF = 1 eϕ detgij ∂ ∂xi (eϕ detgij Fi ), and the Laplace-Beltrami operator ∆µ of (Mn, g, µ) is defined by ∆µ. := divµ ◦ . = 1 eϕ div(eϕ .) = ∆. + ϕ .. (10) It is easy to see that the Green formulas hold with respect to the measure µ, that is, M f ∆µψdµ = − M < f , ψ > dµ = M ψ∆µfdµ. (11) provided f or ψ belongs to C∞ 0 (M). 22 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Theorem Let S a surface in M3 with density Ψ = eϕ, we have divϕN = −2Hϕ. (12) where Hϕ is the ϕ−mean curvature of a surface S and N is the unit normal vector field of a surface S. 23 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References Proof. by definition we have divϕN = 1 eϕ div(eϕN).+ < ϕ, N > = divN + N. ϕ = ∑i=2 i=1 < ei N, ei > +N. ϕ = ∑i=2 i=1( ei < ei , N > − < ei ei , N >) + N. ϕ = −2 < HN, N > +N. ϕ = −2(H − 1 2 ϕ.N) = −2Hϕ. where we have used that < ei , N >= 0 and the definition of the mean curvature vector. 24 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References D.Bakry,M.´Emery.Diffusions hypercontractives. S´eminaire de Probabilit´es,XIX, 1123 (1983/1984,1985), 177-206. V.Bayle,Propri´et´es de concavit´e du profil isop´erim´etrique et applications.graduate thesis,Institut Fourier,Univ.Joseph-Fourier,Grenoble I ,(2004). L.Belarbi,M.Belkhelfa.Heisenberg space with density,( submetted) . L.Belarbi,M.Belkhelfa.Surfaces in R3 with density,i-manager’s Journal on Mathematics,Vol. 1 .No. 1.(2012),34-48. I.Corwin,N.Hoffman,S.Hurder,V.Sesum,and Y.Xu,Differential geometry of manifolds with density,Rose-Hulman Und.Math.J.,7(1) (2006). 25 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References M.Gromov, Isoperimetry of waists and concentration of maps,Geom.Func.Anal 13(2003),285-215. J.Lott,C.Villani,Ricci curvature metric-measure space via optimal transport.Ann Math,169(3)(2009),903-991. N.Minh,D.T.Hieu,Ruled minimal surfaces in R3 with density ez,Pacific J. Math. 243no. 2 (2009), 277–285. F.Morgan,Geometric measure theory,A Beginer’s Guide,fourth edition,Academic Press.(2009). 26 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References F.Morgan,Manifolds with density,Notices Amer.Math.Soc.,52 (2005),853-858. G.Ya.Perelman,The entropy formula for the Ricci flow and its geometric applications,preprint,http://www.arxiv.org/abs/math.DG/0211159. (2002). A.Pressley.Elementary Differential Geometry,Second Edition,Springer. (2010). C.Rosales,A.Ca˜nete,V.Bayle,F.Morgan.On the isoperimetric problem in Euclidean space with density.Cal.Var.Partial Differential Equations.31(1) (2008) 27-46. 27 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density Introduction Preliminaries Plateau’s problem in R3 with density The Divergence operator in manifolds with density References THANK YOU FOR YOUR ATTENTION 28 / 28 Lakehal BELARBI and Mohamed BELKHELFA Variational Problem in Euclidean Space With Density

ORAL SESSION 7 Hessian Information Geometry I (Michel Nguiffo Boyom)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Complexification of Information Geometry in view of quantum estimation theory Introduction • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually flat structure) Introduction • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) As H. Shima pointed out in his book: and (dually flat structure) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Introduction (cont.) A similar situation will appear in the context of quantum estimation theory, where will be replaced with an(classical and quantum) exponential family • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the s pure states) and • M : manifold with affine structure (flat connection) (θi ) ↓ TM : tangent bundle with a complex structure • M : manifold with a Hessian structure ((θi ), g, ψ) gij(θ) = ∂i∂jψ(θ) (∂ = ∂ ∂θi ) ↓ TM : tangent bundle with a K¨ahler structure with a K¨ahler potential ψ(θ) M TM will be replaced with the complex projective space (the set of q will be replaced with the complex projective space (the set of quantum pure states) Classical Exponential Families Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Classical Exponential Families Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Classical Exponential Families Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ p is injective. Let • X : a finite set, • P = P(X) := {p | p : X → (0, 1), X x∈X p(x) = 1}, • M = {pθ | θ ∈ Θ(⊂ Rn } (⊂ P), where pθ(x) = p0(x) exp h nX j=1 θi fi(x) − ψ(θ) i , ψ(θ) := log X x∈X p0(x) exp h nX j=1 θi fi(x) i . We assume {1, f1, . . . , fn} are linearly independent, which implies θ ￿→ pθ is injective. Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: affine coordinates flat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually flat Geometrical Structure of Exponential Family • Fisher information metric: gij = Eθ[∂i log pθ∂j log pθ] = ∂i∂jψ(θ) ( ⇒ Cram´er-Rao inequality : V (estimator) ≥ [gij]−1 ) • e-, m-connections: affine coordinates flat connection θi −→ ∇(e) ηi := Eθ[fi] −→ ∇(m) • Duality: Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ⇒ (M, g, ∇(e) , ∇(m) ) is dually flat ⇒ (M, g, ∇(e) , ∇(m) ) is dually flat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an efficient estimator). • P itself is an exponential family. ⇒ (M, g, ∇(e) , ∇(m) ) is dually flat • ˆη := (f1, . . . , fn) is an estimator achieving the Cram´er-Rao bound (: an efficient estimator). • P itself is an exponential family. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. Quantum State Space Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. Let H ∼= Cd be a Hilbert space with an inner product ￿· | ·￿, and define L(H) := {A | A : H linear −→ H} = {linear operators}, Lh(H) := {A ∈ L(H) | A = A∗ } = {hermitian operators}, ¯S := {ρ | ρ ∈ Lh(H) | ρ ≥ 0, Tr[ρ] = 1} = {quantum states} = n[ r=1 Sr, where Sr := {ρ ∈ S | rank ρ = r}. We mainly treat S1 and Sd in the sequel. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] defines a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] defines a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric Given a manifold M = {ρθ | θ = (θi ) ∈ Θ} ⊂ ¯S, let • Lθ,i ∈ Lh(H) s.t. ∂ ∂θi ρθ = 1 2 (ρθLθ,i + Lθ,iρθ) : Symmetric Lgarithmic Derivatives, or SLDs of M • gij := Re Tr [ρθLθ,iLθ,j] . ⇒ g = [gij] defines a Riemannian metric on M. In particular, every Sr becomes a Riemannian space with g. SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? SLD Fisher Metric (cont.) • The metric g is a quantum version of the classical Fisher metric, and is called the SLD metric. • A quantum version of Cram´er-Rao inequality: V (estimator) ≥ [gij]−1 . (Helstrom, 1967) • The minimum monotone metric. (Petz, 1996) • Every Sr becomes a Riemannian space with the SLD metric. How about the e-, m-connections and the dualistic structure? r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the affine space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is defined as the natural flat con- nection. • The e-connection ∇(e) is defined as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually flat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the affine space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is defined as the natural flat con- nection. • The e-connection ∇(e) is defined as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually flat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the affine space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is defined as the natural flat con- nection. • The e-connection ∇(e) is defined as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually flat. r=d: faithful states • Sd = © ρ ∈ ¯S ρ > 0 ™ = {faithful states}. • Since Sd is an open subset in the affine space {A A = A∗ and TrA = 1}, the m-connection ∇(m) on Sd is defined as the natural flat con- nection. • The e-connection ∇(e) is defined as the dual of ∇(m) w.r.t. g: Xg(Y, Z) = g(∇(e) XY, Z) + g(Y, ∇(m) XZ) • R(e) = 0 (curvature), T(e) ￿= 0 (torsion), so (Sd, g, ∇(e) , ∇(m) ) is not dually flat. r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). r=1: pure states • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pure states}. • S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor field J satisfying J2 = −1 (almost complex structure) is canonicaly defined by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diffirential 2-form ω is defined by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor field J satisfying J2 = −1 (almost complex structure) is canonicaly defined by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diffirential 2-form ω is defined by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor field J satisfying J2 = −1 (almost complex structure) is canonicaly defined by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diffirential 2-form ω is defined by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. as a complex manifold• S1 ∼= P(H) := (H \ {0})/ ∼ (complex projective space), where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides with the well-known Fubini-Study metric on P(H) (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • A (1, 1)-tensor field J satisfying J2 = −1 (almost complex structure) is canonicaly defined by J µ ∂ ∂xj ∂ = ∂ ∂yi , J µ ∂ ∂yj ∂ = − ∂ ∂xi for an arbitrary holomorphic (complex analytic) coordinate system (zj ) = (xj + √ −1yj ). • g(JX, JY ) = g(X, Y ). • A diffirential 2-form ω is defined by ω(X, Y ) = g(X, JY ). • g (or (J, g, ω)) is a K¨ahler metric in the sense that ω is a symplectic form: dω = 0, or equivalently that there is a funtion called a K¨ahler potential f satisfying ω = √ −1 2 ∂ ¯∂f. Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iff ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Kahler potential Let ajk = g µ ∂ ∂xj , ∂ ∂xk ∂ = g µ ∂ ∂yj , ∂ ∂yk ∂ , bjk = g µ ∂ ∂yj , ∂ ∂xk ∂ = −g µ ∂ ∂xj , ∂ ∂yk ∂ . Then f is a K¨ahler potential iff ajk = 1 4 µ ∂2 f ∂xj∂xk + ∂2 f ∂yj∂yk ∂ , and bjk = 1 4 µ ∂2 f ∂xj∂yk − ∂2 f ∂yj∂xk ∂ . Quasi-Classical Exponential Family (QCEF) M = {ρθ | θ ∈ Rn } ⊂ ¯S is called a quasi-classical exponential family when it is represented as ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # where {F1, . . . , Fn} ⊂ Lh(H), [Fi, Fj] := FiFj − FjFi = 0 (commutative), {ρ0, F1ρ0, . . . , Fnρ0} are linearly independent, ψ(θ) = log Tr h ρ0 exp £X j θi Fj §i . Properties of QCEFs • e-, m-connections are defined by affine coordinates flat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually flat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are defined by affine coordinates flat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually flat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are defined by affine coordinates flat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually flat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs • e-, m-connections are defined by affine coordinates flat connection θi −→ ∇(e) ηi := Tr[ρθFi] −→ ∇(m) • (M, g, ∇(e) , ∇(m) ) is dually flat, where g is the SLD metric. • Suppose M ⊂ Sd. Then M is e-autoparallel in Sd, and (g, ∇(e) , ∇(m) ) on M is induced from (Sd, g, ∇(e) , ∇(m) ). • (F1 . . . , Fn) is an estimator for the coordi- nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x￿}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |x￿￿x|. Then we have: pθ(x):= ￿x|ρθ|x￿ = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Properties of QCEFs (cont.) nates (η1, . . . , ηn) achieving the SLD Cram´er- Rao bound. • Since {Fi} are commutative, there exist an or- thonormal basis {|x￿}x∈X (eigenvectors) with X = {1, 2, · · · , d = dim H} and functions (eigen- values) fi : X → R (i = 1, . . . , n) such that Fi = X x∈X fi(x) |x￿￿x|. Then we have: pθ(x):= ￿x|ρθ|x￿ = p0(x) exp[ X i θi fi(x) − ψ(θ)] (: a classical exponential family) and M = {ρθ} ∼= {pθ} w.r.t. (g, ∇(e) , ∇(m) ). Complexification of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexification of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexification of a pure state QCEF Let M = {ρθ} be a quasi-classical exp. family: ρθ = exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i θi Fi − ψ(θ) ¥ # (with the same assumption on {Fi} as before), and suppose that M ⊂ S1(H) ∼= P(H). For z = (z1 , . . . , zn )∈ Cn , zi = θi + √ −1 yj , θi , yi : real, let ρz := exp " 1 2 ≥X i zi Fi − ψ(θ) ¥ # ρ0 exp " 1 2 ≥X i ¯ziFi − ψ(θ) ¥ # = UyρθU∗ y where Uy := exp "√ −1 2 X i yi Fi # : unitary. Complexification of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, define ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexification of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, define ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexification of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, define ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexification of pure state QCEF (cont.) Letting V be a nbd of Rn in Cn for which V ￿ z ￿→ ρz is injective, define ˜M := {ρz | z ∈ V } (⊃ M = {ρθ | θ ∈ Rn }). Then, ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. gM = (Fubini-Study)|M . • 4ψ(θ) gives a K¨ahler potential on ˜M: ωM := ω|M = 2 √ −1 ∂ ¯∂ ψ. i y MV M • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = 1} = {pur • S1 ∼= P(H) := (H \ {0})/ ∼ (complex where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c ξ2. • The SLD metric g on S1 coincides wi well-known Fubini-Study metric on P (up to constant). S1 ∼= P(H) Rn • S1 = {|ξ￿￿ξ| | ξ ∈ H, ￿ξ￿ = • S1 ∼= P(H) := (H \ {0})/ ∼ where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C • The SLD metric g on S1 c well-known Fubini-Study (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn • S1 = P(H) := (H \ {0})/ ∼ (com where ξ1 ∼ ξ2 def ⇐⇒ ∃c ∈ C, ξ1 = c • The SLD metric g on S1 coincide well-known Fubini-Study metric (up to constant). z ￿−→ ρz S1 ∼= P(H) Rn Cn • S1 = {|ξ￿￿ξ| | ξ • S1 ∼= P(H) := ( where ξ1 ∼ ξ2 • The SLD metr well-known Fu (up to constan z ￿−→ ρz S1 ∼= P(H) Rn Complexification of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexification of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Complexification of pure state QCEF (cont.) • ˜M is a complex (holomorphic) submanifold of S1 with a holomorphic coordinate system (zi ), and hence is K¨ahler w.r.t. g ˜M = (Fubini-Study)| ˜M . • When n = d − 1, ˜M is open in S1. • 4ψ(θ) gives a K¨ahler potential on ˜M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: n i Similar to the case of Shima's observation on M and TM Complexification of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the flat connection with affine coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexification of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the flat connection with affine coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. Complexification of pure state QCEF (cont.) • 4ψ(θ) gives a K¨ahler potential on M: ω ˜M := ω| ˜M = 2 √ −1 ∂ ¯∂ ψ. • ( ˜M, ηi, yi ) forms a Darboux coordinate system: ω ˜M = n i=1 dηi ∧ dyi . • Letting (m) be the flat connection with affine coordinates (ηi; yi ) and (e) be its dual w.r.t. g ˜M , (e) ◦ J = J ◦ (m) and (e) ω ˜M = (m) ω ˜M = 0. duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X￿ X X￿ e −→ X￿ X e −→ X￿ m −→ X￿ X m −→ X￿ Y ￿ Y Y ￿ 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ → X￿ X e −→ X￿ X e −→ X￿ → X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ → Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ m → Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X m −→ X￿ X m −→ X￿ X m −→ Y Y ￿ Y Y ￿ X e −→ X￿ X X m −→ X￿ X Y Y e −→ Y ￿ Y Y m −→ Y ￿ Y Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) (m) (m) (m) Relation to parallel displacement = ∇(m) ω = 0 Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X￿ X X￿ e −→ X￿ X e −→ X￿ m −→ X￿ X m −→ X￿ Y ￿ Y Y ￿ 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (e) X Z) = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ ⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y, ∇ (m) X Z) X X￿ X X￿ → X￿ X e −→ X￿ X e −→ X￿ → X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ → Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ m → Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X m −→ X￿ X m −→ X￿ X m −→ Y Y ￿ Y Y ￿ X e −→ X￿ X X m −→ X￿ X Y Y e −→ Y ￿ Y Y m −→ Y ￿ Y Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ￿ ￿ ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). X e −→ X￿ ↓ J ↓ J J(X) m −→ J(X￿ ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) Relation to parallel displacement (cont.) duality ⇐⇒ ∀X, Y, Z, Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇X ∇(e) ◦J = J◦∇(m) ⇔ ∀X, Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z, Xω(Y, Z) = g(∇ (m) X Y, Z) + g(Y ￿ ￿ ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). X e −→ X￿ ↓ J ↓ J J(X) m −→ J(X￿ ) 1 (Y, Z) = g(∇ (e) X Y, Z) + g(Y, ∇ (m) X Z) Y, ∇ (e) X J(Y ) = J(∇ (m) X Y ) (m) ω = 0 0 ⇐⇒ ∀X, Y, Z Z) + ω(Y, ∇ (e) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ m ￿ m ￿ m ￿ ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ Relation to parallel displacement (cont.)∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ m ￿ m ￿ m ￿ ∇(e) ω = ∇(m) ω = 0 ∇(e) ω = ∇(m) ω = 0 ⇐⇒ ∀X, Y, Z Xω(Y, Z)= ω(∇ (e) X Y, Z) + ω(Y, ∇ (e) X Z) = ω(∇ (m) X Y, Z) + ω(Y, ∇ (m) X Z) X X￿ X X￿ X e −→ X￿ X e −→ X￿ X e −→ X￿ X m −→ X￿ X m −→ X￿ X m −→ X￿ Y Y ￿ Y Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y e −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ Y m −→ Y ￿ g(X, Y ) = g(X￿ , Y ￿ ) ω(X, Y ) = ω(X￿ , Y ￿ ) X e −→ X￿ iff J(X) m −→ J(X￿ ), and X m −→ X￿ iff J(X) e −→ J(X￿ ). ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿ ∇ (m) X Z) X m −→ X￿ ↓ J ↓ J J(X) e −→ J(X￿ ) X e −→ X￿ and Y e −→ Y ￿ X m −→ X￿ and Y m −→ Y ￿

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Fisher Information Geometry of the Barycenter of Probability Measures Mitsuhiro Itoh and Hiroyasu Satoh Institute of Mathematics, University of Tsukuba, Japan and Tokyo Denki University, Japan Motivation. Consider the following character- ization problem. Let (Xo, go) be a Damek-Ricci space. Let (X, g) be an Hadamard manifold, a simply connected complete Riemannian manifold of nonpositive curvature. Assume (X, g) ∼= (Xo, go) (quasi-isometric). Then, is (X, g) itself Damek- Ricci ? Here (Xo, go) is Damek-Ricci, an R-extention of a generalized Heisenberg group N. A Damek- Ricci space is a solvable Lie group with a left invariant metric. A Damek-Ricci space is Riemannian homogeneous and of nonpos- itive curvature. Moreover, a Damek-Ricci space is harmonic, namely, mean curvature of a geodesic sphere is a function of radius. A Damek-Ricci space is a rank one symmetric space of noncompact type, when it is strictly negative curvature. RHn, CHn, HHn and a Cayley hyperbolic space QH2 exhaust the rank one symmetric spaces of noncompact type. §1 Barycenter and barycenter-isometric maps Denote by ∂X the ideal boundary of (X, g). Let P+(∂X) = P+(∂X, dθ) be the space of probability measures on ∂X, absolutely con- tinuous with respect to the canonical mea- sure dθ and having positive density function. So any µ ∈ P+(∂X) is written as µ(θ) = f(θ)dθ, θ ∈ ∂X, f(θ) > 0. Let Bθ(x) = B(x, θ), x ∈ X, θ ∈ ∂X be the Busemann function on X associated to θ, normalized at a reference point o, defined by Bθ(x) = lim t→∞ {d(x, γ(t)) − t}, where γ(t) denotes the geodesic starting o and going to θ. It holds |∇Bθ(·)| = 1 at any point x. Further we have the so-called Busemann cocycle formula with respect to a Riemannian isom- etry φ of (X, g) Bθ(φx) = Bφ−1θ(x) + Bθ(o) ∀ (x, θ) ∈ X × ∂X See [G-J-T]. Definition 1.1. Let µ ∈ P+(∂X). Then a point y ∈ X is called a barycenter of µ, if the function Bµ : X → R, defined by Bµ(x) = ∫ ∂X Bθ(x)dµ(θ) (1) takes a least value at y. Note the Busemann function and hence the function Bµ(x) is convex in a non-positively curved manifold X. To discuss the existence and uniqueness of barycenter we need fur- ther the strict convexity hypothesis on the Busemann function. Proposition 1.1. Let (X, g) be an Hadamard manifold. Assume that the Hessian DdB(x,θ) of any Busemann function B(x, θ) is strictly positive, except for the gradient direction ∇B(·, θ). Then, there exists uniquely a barycenter for every µ ∈ P+(∂X). See [B-C-G-1, Appendice A] for the proof. We have thus a map bar : P+(∂X, dθ) → X; µ → y, and call it the barycenter map and write y = bar(µ). Note. bar(ˆφ♯µ) = φ(bar(µ)) for a Riemannian isometry φ of (X, g). Here ˆφ denotes the bijective map (homeomorphism) : ∂X → ∂X induced from φ. Remark. in [B-C-G-1] Besson, Courtois and Gallot utilize the notion of barycenter to as- sert the Mostow rigidity of hyperbolic man- ifolds. In fact, let f : ∂X → ∂X be a cer- tain map, where X = RHn, n ≥ 3, a real hyperbolic space. Then, there exists a map F : X → X; F(y) = bar(f∗µy), y ∈ X asso- ciated to the map f, where µy ∈ P+(∂X, dθ) is a special probability measure, called Pois- son kernel probability measure, appeared in §2 and they showed F : X → X is an isometry by using Schwarz’s iequality lemma([B-C-G- 1]). Now, let Φ : ∂X → ∂X be a bijective map (homeomorphism) and Φ♯ : P+(∂X, dθ) → P+(∂X, dθ) be the push-forward map induced from Φ. Φ♯ satisfies from definition of push-forward map ∫ θ∈∂X f(θ) d[Φ♯µ](θ) = ∫ θ∈∂X (f ◦ Φ)(θ) dµ(θ) for any function f = f(θ) on ∂X. Definition 1.2. We consider the following sit- uation: The map Φ♯ yields a bijective map φ : X → X satisfying bar ◦ Φ♯ = φ ◦ bar in the diagram P+(∂X, dθ) Φ♯ −→ P+(∂X, dθ) (2) ↓ bar ↓ bar X φ −→ X We call such a φ a barycenter-isometric map of (X, g) and denote it bar(Φ). Lemma 1.1. The composition φ◦φ1 of barycenter- isometric maps φ = bar(Φ), φ1 = bar(Φ1) is also barycenter-isometric, with φ ◦ φ1 = bar(Φ ◦ Φ1). Proof. With respect to their composition one can check bar(Φ ◦ Φ1) = bar(Φ) ◦ bar(Φ1) (3) Theorem 1.1. Let φ : X → X be a barycenter- isometric map induced from a homeomorphic map Φ : ∂X → ∂X. Assume that φ is of C1, then φ is a Riemannian isometric map of (X, g), i.e., φ fulfills φ∗g = g. (4) For its proof we need the notion of Fisher in- formation geometry together with the Pois- son kernel. §2. Poisson kernel and Fisher Information Geometry Now we assume that an Hadamard manifold (X, g) admits Poisson kernel. Definition 2.1. A function P(x, θ) of (x, θ) ∈ X ×∂X is called Poisson kernel, when (i) it induces the fundamental solution of the Dirichlet problem at the ideal boundary ∆u = 0 on X and u|∂X = f for a given data f ∈ C(∂X) so the solution u is described as u = u(x) = ∫ ∂X P(x, θ)f(θ)dθ, (ii) P(x, θ) > 0 for any (x, θ). Then, the mea- sure P(x, θ)dθ is a probability measure on ∂X parametrized by a point x of X and (iii) P(o, θ) = 1 for any θ(normalization at the ref. point o). (iv) limx→θ1 P(x, θ) = 0, ∀θ, θ1 ∈ ∂X, θ1 ̸= θ A Damek-Ricci space admits a Poisson kernel described specifically as P(x, θ) = exp{−QB(x, θ)}. in terms of B(x, θ) and the volume entropy Q > 0. See [B-C-G-1], [I-S-1], [I-S-2],[I-S-3], [A-B] Lemma 2.1. µx := P(x, θ)dθ ∈ P+(∂X) is a probability measure, parametrized in x for which bar(µx) = x. For a point x ∈ X, let bar−1(x) := {µ ∈ P+(∂X) | bar(µ) = x}. Then, the set bar−1(x) ⊂ P+(∂X) is path- connected and we can discuss the tangent space Tµbar−1(x) to bar−1(x), and then ν ∈ TµP+(∂X) belongs to Tµbar−1(x) if and only if ∫ θ dB(x,θ)(U) dν(θ) = 0 for any tangent vector U ∈ TxX. Now we take µx = P(x, θ)dθ. Then µx ∈ bar−1(x), seen as before. Let Θ : X → P+(∂X); x → µx be the canon- ical map, which we call Poisson kernel map. Proposition 2.1. Let x be a fixed point and U a tangent vector at x. For any ν ∈ Tµxbar−1(x) G(dΘx(U), ν) = 0, where G is the Fisher information metric de- fined on the space P+(∂X). From the proposition we have the fibration of P+(∂X) over the Hadamard manifold X whose fibre over x is bar−1(x). Further the Poisson kernel map Θ : X → P+(∂X) gives a cross section of the fibration. Proof of Proposition 2.1. Since P(x, θ) = exp{−QB(x, θ)}, dΘx(U) = −Q dB(x,θ)(U) µx which we denote by νo. Then, from definition of the Fisher information metric we have Gµx(νo, ν) = ∫ ∂X dνo dµx dν dµx dµx = ∫ −QdB(x,θ)(U)P(x, θ) P(x, θ) × f(θ) P(x, θ) P(x, θ)dθ = −Q ∫ dB(x,θ)(U)dν(θ) which must be zero, since ν = f(θ)dθ belongs to Tµxbar−1(x). Remark. At µx ∈ P+(∂X) the tangent space TµxP+(∂X) is written in an orthogonal direct sum as TµxP+(∂X) = dΘx(TxX) ⊕ Tµxbar−1(x) (5) with respect to the Fisher information metric G. Remark. (5) is valid also with respect to the L2-inner product < f, f1 >= ∫ ∂X f(θ) f1(θ) dθ. Here, the differential of the Poisson kernel map (dΘ)x : TxX → TµxP+(∂X) is injec- tive. In fact, assume that (dΘ)x(U) = 0 in TµxP+(∂X) for U ∈ TxX. Then, this means dB(x,θ)(U)P(x, θ)dθ = 0. Since P(x, θ) > 0, this implies dB(x,θ)(U) = 0 for any θ. To get a conclusion that U = 0 from this we assume U is not zero and then may assume U is unit. Then, we have a geodesic γ(t) = expx tU and hence a point θo = [γ] so dB(x,θo)(U) = −1. This is a contradiction and thus the map dΘx is injective. Proof of Theorem 1.1. For a x ∈ X let y = ϕx, where ϕ = bar(Φ). From definition of barycenter for any µ ∈ bar−1(x) ∫ dB(x,θ)(U) dµ(θ) = 0, ∀U ∈ TxX. Since Φ♯µ ∈ bar−1(y) for µ ∈ bar−1(x), y is a barycenter of Φ♯µ if and only if ∫ dB(y,θ)(V ) d(Φ♯µ)(θ) = 0, ∀V ∈ TyX. Since θ = Φ−1Φθ, from this we have the fol- lowing ∫ dB(y,Φ−1Φθ)(V ) d(Φ♯µ)(θ) = ∫ (Φ♯dB(y,Φθ)(V )dµ)(θ) = ∫ dB(y,Φθ)(V )dµ(θ) = 0 which is valid for any µ ∈ bar−1(x) and in- dicates that dB(y,Φθ)(V )dµ(θ) is orthogonal to the tangent space Tµbar−1(x). In par- ticular, dB(y,Φθ)(V )dµx(θ) belongs from (5) to dΘx(TxX). So, we conclude that for any V ∈ TϕxX there exists U ∈ TxX such that dB(ϕx,Φθ)(V ) = dB(x,θ)(U). The vector V depends on a vector U so we may write V = dϕxU, where dϕx is the defferential map : TxX → TϕxX of the map ϕ. Then, we may assume ⟨∇B(ϕx,Φθ), dϕx(U)⟩ϕx = ⟨∇B(x,θ)), U⟩x, which is reduced into, by using the formal adjoint dϕ∗ x : TϕxX → TxX ⟨dϕ∗ x∇B(ϕx,Φθ), U⟩x = ⟨∇B(x,θ)), U⟩x, for any U. As a consequence of this, the gradient vector fields must satisfy dϕ∗ x∇B(ϕx,Φθ) = ∇B(x,θ) (6) for any x in X and θ ∈ ∂X. Now take an arbitrary unit vector V ∈ TϕxX. So, we have V = ∇B(ϕx,Φθ) for some θ. Then from the above equation we have |dϕ∗ xV | = |dϕ∗ x∇B(ϕx,Φθ)| = |∇B(x,θ)| = 1 where we used |∇B(x,θ)| = 1. This holds for any unit vector so dϕ∗ x and hence dϕx : TxX → TϕxX is a linear isometry and hence ϕ : X → X is a Riemannian isometry of (X, g). §3 Quasi-isometries and quasi-geodesics Let X be an Hadamard manifold with the ideal boundary ∂X. Definition 3.1 Let φ : X → X be a (smooth) map. It is called rough-isometric , or quasi- isometric, when φ satisfies the following, that is, there exist λ > 1 and k > 0 such that for any points x, x′ in X 1 λ d(x, x′) − k < d(φ(x), φ(x′)) < λd(x, y) + k.(7) Note a rough-isometric map is not necessarily continuous. See [Bourd], More generally, a map f : X1 → X2 is called a (λ, k)-quasi-isometric map, if there exist con- stants λ > 1 and k > 0 such that λ−1d1(x, x′) − k < d2(Fx, Fx′) < λd1(x, x′) + k A quasi-isometric map is a generalization of an isometric map. Note we say a (λ, k)-quasi-isometric map sim- ply a quasi-isometric map by abbreviating, when we do not mention the constants λ, k, precisely We say that metric spaces X1 and X2 are quasi-isometric, X1 ∼= X2 (quasi-isometric), if they satisfy one of the following two con- ditions; (i) There exist quasi-isometric maps f : X1 → X2 and g : X2 → X1 and a positive num- ber ε such that g ◦ f and f ◦ g are in an ε- neighborhood of the identity maps idX1 , idX2 , respectively. (ii) There exist a quasi-isometry f : X1 → X2 and ε > 0 such that f(X1) is ε-dense in X2. Let (X, g) be a Riemannian manifold which is quasi-isometric to another Riemannian mani- fold (Xo, go). Then any Riemannian isometry of (Xo, go) induces a bijective quasi-isometric map of (X, g). A curve c : R → X is called a quasi-geodesic, if c is a quasi-isometric map, that is, λ−1|t′ − t| − k < d(c(t), c(t′)) < λ|t′ − t| + k, t, t′ ∈ R for some λ > 1, k > 0. We also call a curve c : [a, b] → X a quasi-geodesic segment, when it satisfies the above inequality in any t, t′ ∈ [a, b]. A geodesic is quasi-geodesic. A quasi-isometric map f : (Xo, go) → (X, g) maps a geodesic γ : R → Xo into a quasi- geodesic f ◦ γ : R → X. Moreover, it holds that let φ : X → X be a quasi-isometric and γ : R → X be a quasi-geodesic. Then the curve φ ◦ γ : R → X is quasi-geodesic. Let F : ∂Xgeod → ∂Xq−geod : [γ]geod → [γ]q−geod(8) be the inclusion map. If an Hadamard mani- fold (X, g) satisfies a certain negative curva- ture condition or a hyperbolicity condition, then the F is bijective. In fact, if the cur- vature satisfies K < −k2 < 0, so is F. See [K-1] for a strictly negative curvature case and [Bourd], [K-2] for the case of manifolds satisfying the hyperbolicity condition. Now we consider the following situation: an Hadamard manifold (X, g) is quasi-isometric with another Hadamard manifold (Xo, go) which is equipped with isometries. An isometry φ of (Xo, go) gives rise of a quasi- isometric bijective map of (X, g). So, φ in- duces a bijective map ˆφ : ∂Xq−geod → ∂Xq−geod, since, for any quasi-geodesic σ, φ◦σ is quasi- geodesic, and if σ ∼ σ1, then φ ◦ σ ∼ φ ◦ σ1. However, ∂Xq−geod is identified with ∂Xgeod = ∂X by the natural map F. So, φ induces a bijective map ˜φ = F ◦ ˆφ ◦ F−1 : ∂X → ∂X. References [A-N] S.Amari and H.Nagaoka, Methods of Information Geometry, AMS,2000. [B-G-S] W.Ballmann, M.Gromov and V.Schroeder, Manifolds of Nonpositive Curvature, Birkh¨auser, 1985, Boston. [Bar] F. Barbaresco, Chap.9 Information Ge- ometry of Covariance Matrix: Cartan-Siegel Homogeneous Bounded Domains, Mostow/Berger Fibration and Fr´echet Median, pdf file, 2013. [Bern-T-V] J. Berndt, F.Tricerri and L.Vanhecke, Generalized Heisenberg Groups and Damek- Ricci Harmonic Spaces, Lecture Notes, 1598, Springer, 1991. [B-C-G-1] G.Besson, G.Courtois and S.Gallot, Entropies et Rigidit´es des espaces localement sym´etriques de courbure strictement n´egative, Geom Func. Anal. 5(1995), 731-799. [B-C-G-2] G.Besson, G.Courtois and S.Gallot, A simple and constructive proof of Mostow’s rigidity and the minimal entropy theorems, Erg. Th. Dyn. Sys., 16(1996), 623-649. [Bourd] M. Bourdon, Structure conforme au bord et flot g´eod´esique d’un CAT(-1)-espace, L’Enseignement Math., 41(1995), 63-102. [D-E] E. Douady, C. Earle, Conformally nat- ural extension of homeomorphisms of the cir- cle, Acta Math., 157(1986), 23-48. [G-J-T] Y. Guivarc’h, L. Ji and J.C. Tay- lor, Compactifications of Symmetric Spaces, Birkh¨auser, 1997. [I-Sat-1] M.Itoh and H.Satoh, Information geometry of Poisson Kernels on Damek-Ricci spaces, Tokyo J.Math., 33(2010), xx-xx. [I-Sat-2] M.Itoh and H.Satoh, Fisher Infor- mation Geometry, Poisson Kernel and Asymp- totically Harmonicity, Differ. Geom. Appl., 29(2011), S107-S115. [I-Shi] M.Itoh and Y.Shishido, Fisher Infor- mation Metric and Poisson Kernels, Differ. Geom. Appl., 26(2008), 347-356. [K-1] G.Knieper, Hyperbolic Dynamics and Riemannian Geometry, in Handbook of Dy- namical Systems, Vol.1A, edited by B.Hasselblatt and A.Katok, 453-545, Elsevier Science B.V., 2002. [K-2] G.Knieper, New results on noncompact harmonic manifold, Comment. Math. Helv., 87(2012),669-703. [S] T.Sakai, Riemannian Geometry, AMS, 2000.

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

' & $ % Foliations on Affinely Flat Manifolds Information Geometry Robert Wolak Jagiellonian University, Krakow (Poland) joint work with Michel Nguiffo Boyom UMR CNRS 5149. 3M D´epartement de Math´ematiques et de Mod´elisation Universit´e Montpellier2, Montpellier GSI2013 - Geometric Science of Information MINES ParisTech Paris 28-08-2013 - 30-08-2013 GSI2013 Foliations on Affinely Flat Manifolds. 1/17 ' & $ % Contents 1. Algebraic preliminaries Koszul-Vinberg algebra (KV-algebra) Algebroid of Koszul-Vinberg Twisted KV-cochain complex Chevalley-Eilenberg complex of the twisted module 2. Foliations on locally flat manifolds 3. Fisher information metric 4. Dual pairs of connections 5. Foliations on locally flat manifolds cont. Ehresmann connections Topological properties GSI2013 Foliations on Affinely Flat Manifolds. 2/17 ' & $ % An algebra A is an R-vector space endowed with a bilinear map µ : A × A → A. This map µ is the multiplication map of A. For a, b ∈ A, ab will stand for µ(a, b). Given an algebra A, the Koszul-Vinberg anomaly (KV-anomaly) of A is the three-linear map KV : A3 → A defined by KV (a, b, c) = (ab)c − a(bc) − (ba)c + b(ac) Definition An algebra A is called a Koszul-Vinberg algebra (KV-algebra) if its KV anomaly vanishes identically. GSI2013 Foliations on Affinely Flat Manifolds. 3/17 ' & $ % Definition An algebroid of Koszul-Vinberg is a couple (V, a) where V is a vector bundle over the base manifold M and whose module of sections Γ(V ) has the structure of a Koszul-Vinberg algebra over R and a is a homomorphism of the vector bundle V into TM satisfying the following properties: (i) (fs).s′ = f(ss′ ) ∀s, s′ ∈ Γ(V ), ∀f ∈ C∞ (M, R) ; (ii) s.(fs′ ) = (a(s)f)s′ + f(ss′ ). Remark (i) If we equip Γ(V ) with the bracket [s, s′ ] = ss′ − s′ s then the Koszul-Vinberg algebroid (V, a) becomes a Lie algebroid. (ii) The condition [s, fs′ ] = (a(s)f)s′ + f[s, s′ ] ensures that a is a homomorphism of Lie algebras, i.e. a([s, s′ ]) = [a(s), a(s′ )] The vector space spanned by the commutators [a, b] = ab − ba of a KV-algebra A is a Lie algebra denoted by AL. GSI2013 Foliations on Affinely Flat Manifolds. 4/17 ' & $ % Let A be a Koszul-Vinberg algebra; a A -module is a vector space W equipped with a right action and a left action of A related by the following equalities: for any a, b ∈ A, and any w ∈ W we have a(bw) − (ab)w = b(aw) − (ba)w and a(wb) − (aw)b = w(ab) − (wa)b. Let A = (X(M), ·) be an algebra with the multiplication given by X · Y = DXY , then A is a Koszul-Vinberg algebra and the space T(M) of tensors on M is a two sided A-module. T(M) is a bigraded by subspaces Tp,q (M) of tensors of type (p, q), GSI2013 Foliations on Affinely Flat Manifolds. 5/17 ' & $ % Twisted KV-cochain complex Let A be a KV-algebra and let W be a two-sided KV-module over A. We equip the vector space W with the left module structure A × W −→ W defined by a ∗ w = aw − wa, ∀a ∈ A, w ∈ W. (1) One has KV (a, b, w) = (a, b, w) − (b, a, w) = 0, where (a, b, w) = (ab) ∗ w − a ∗ (b ∗ w). Definition The left KV-module structure defined by (1) is called the twisted KV-module structure derived from the two-sided KV-module W. The vector space W endowed with the twisted module structure is denoted by Wτ . GSI2013 Foliations on Affinely Flat Manifolds. 6/17 ' & $ % The map (a, w) −→ a ∗ w defines on Wτ a left module structure over the Lie algebra AL. The complex CCE (AL, Wτ ) is called the Chevalley-Eilenberg complex of the twisted module. Let A be a KV-algebra and let W be a two-sided KV-module over A. We consider the graded vector space CKV (A, Wτ ) = q∈Z Cq KV (A, Wτ ) where Cq KV (A, Wτ ) = {0} if q < 0, C0 KV (A, Wτ ) = Wτ for q ≥ 1, Cq KV (A, Wτ ) = HomR(⊗q A, Wτ ). If no risk of confusion, C(A, Wτ ) will stand for CKV (A, Wτ ). GSI2013 Foliations on Affinely Flat Manifolds. 7/17 ' & $ % Let us define the linear mapping d : Cq (A, Wτ ) −→ Cq+1 (A, Wτ ): ∀w ∈ Wτ , f ∈ Cq (A, Wτ ), a ∈ A and ζ = a1 ⊗ ... ⊗ aq+1 ∈ ⊗q+1 A, (dw)(a) = −aw + wa, (df)(ζ) = q+1 i=1 (−1)i {ai ∗ (f(∂iζ)) − f(ai.∂iζ)} (2) the action ai.∂iζ is defined by the standard tensor product extension. Theorem (i) The pair (C(A, Wτ ), d) is a cochain complex whose qth cohomology space is denoted by Hq KV (A, Wτ ). (ii) The graded space CN (A, Wτ ) = W ⊕ q>0 Hom(∧q A, Wτ ) is a subcomplex of (C(A, Wτ ), d) whose cohomology coincides with the cohomology of the Chevalley-Eilenberg complex CCE (AL, Wτ ). GSI2013 Foliations on Affinely Flat Manifolds. 8/17 ' & $ % (M, ∇) - a locally flat manifold. A∇ = (X(M), ∇) - the KV-algebra associated to (M, ∇) Wτ = C∞ (M) - the left KV-module over A∇ under the covariant derivative C0(A∇, Wτ ) the vector subspace of C∞ (A∇, Wτ ) formed by cochains of order 0, thus C0(A∇, Wτ ) consists of C∞ (M)-multilinear mappings. Theorem The second cohomology space H2 0 (A∇, Wτ ) can be decomposed as it follows: H2 0 (A∇, Wτ ) = H2 dR(M) ⊕ H0 (A∇, Hom(S2 A∇, Wτ )) (3) where H2 dR(M) is the 2nd de Rham cohomology space of M. GSI2013 Foliations on Affinely Flat Manifolds. 9/17 ' & $ % H(A∇, Wτ ) = q≥0 Hq (A∇, Wτ ) - a geometric invariant of (M, ∇), bq(∇) = dim Hq 0 (A∇, C∞ (M)) - qth Betti number of (M, ∇) bq(M) = dim Hq dR(M, R). - the classical qth Betti number of M. bq(M) ≤ bq(∇). M. Nguiffo Boyom, F. Ngakeu, P. M. Byande, R. Wolak, KV-cohomology and differential geometry of affinely flat manifolds. information geometry, African Diaspora Journal of Mathematics, Special Volume in Honor of Prof. Augustin Banyaga Vol. 14, 2, pp. 197–226 (2012) GSI2013 Foliations on Affinely Flat Manifolds. 10/17 ' & $ % Definition Let (M, ∇) be a locally flat manifold. (i) A totally geodesic foliation F of (M, ∇) is called affine foliation. (ii) A totally geodesic foliation F of M is tranversally euclidean if its normal bundle TM/TF is endowed with a ∇-parallel (pseudo) euclidean scalar product. Q(M) = HomC∞(M) (S2 A, Wτ ), the vector space of tensorial quadratic forms on (sections of) TM. For σ ∈ H0 KV (A, Q(M)), let ¯σ be the quadratic form on TM/ ker σ deduced from σ and let sign(σ) be the Morse index of ¯σ. We define the following numerical invariants: Definition We set: ρ∇(M) = min{ρ∇(σ) = dim ker σ, σ ∈ H0 (A, Q(M))} and S∇(M) = min{S∇(σ) = dim ker σ + sign(σ), σ ∈ H0 (A, Q(M))}. GSI2013 Foliations on Affinely Flat Manifolds. 11/17 ' & $ % (Ξ, Ω) a measurable set. Θ ⊂ Rn be a connected subset. Definition A connected open subset Θ ⊂ Rn is an n-dimensional statistical model for a measurable set (Ξ, Ω) if there exists a real valued positive function p : Θ × Ξ → R subject to the following requirements. (i) For every fixed ξ ∈ Ξ the function θ → p(θ, ξ) is smooth. (ii) For every fixed θ ∈ Θ the function ξ → p(θ, ξ) is a probability density in (Ξ, Ω) viz Ξ p(θ, ξ)dξ = 1. (iii) For every fixed ξ ∈ Ξ there exists a couple (θ, θ′ ) such that p(θ, ξ) = p(θ′ , ξ) ∇ a torsion free linear connection in the manifold Θ and let one set ln(θ, ξ) = log(p(θ, ξ)). GSI2013 Foliations on Affinely Flat Manifolds. 12/17 ' & $ % At each point θ ∈ Θ we define the the family {q(θ,ξ))} of bilinear forms. Let (X, Y ) be a couple of smooth vector fields in Θ. We put q(θ,ξ)(X, Y ) = −(∇dln)(X, Y )(θ, ξ). Since ∇ is torsion free q(θ,ξ)(X, Y ) is symmetric w.r.t. the couple (X, Y ). Definition The Fisher information g of the local model (Θ, p) is the mathematical expectation of the bilinear form q(θ,ξ), g(X, Y )(θ) = Ξ p(θ, ξ)q(θ,ξ)(X, Y )dξ. The Fisher information g does not depend on the choice of the symmetric connection ∇. The Fisher information g is a semi definite positive. When g is definite it is called Fisher metric of the model (Θ, p). GSI2013 Foliations on Affinely Flat Manifolds. 13/17 ' & $ % The dualistic relation between linear connections. Definition In a Riemannian manifold (M, g) a couple (∇, ∇∗ ) of linear connections are dual of each other if the identity Xg(Y, Z) = g(∇X Y, Z) + g(Y, ∇∗ X Z) holds for all vector fields X, Y, Z on the manifold M. A dual pair (∇, ∇∗ ) in a Riemannian manifold (M, g). Assume that both (M, ∇) and (M, ∇∗ ) are locally flat structures. They define the pair ([ρ∇], [ρ∇∗ ]) of conjugation class of canonical representations. Therefore we have the following two properties. Theorem The pair ([ρ∇], [ρ∇∗ ]) does not depend on the choice of the riemannian structure g. GSI2013 Foliations on Affinely Flat Manifolds. 14/17 ' & $ % Theorem Every locally flat manifold (M, ∇) whose 2-dimensional twisted cohomology H2 0 (A∇, Wτ ) differs from the de Rham cohomology space H2 dR(M) is either a flat (pseudo)-Riemannian manifold or is foliated by a pair (F, F∗) of g-orthogonal foliations for every Riemannian metric g. Moreover, these foliations are totally geodesic w.r.t. the g-dual pair (D, D∗ ) (respectively). GSI2013 Foliations on Affinely Flat Manifolds. 15/17 ' & $ % TM = TF ⊕ TF∗ Define a torsion free linear connection by setting ˜D(X1,X2)(Y1, Y2) = (DX1 Y1 + [X2, Y1], D∗ X2 Y2 + [X1, Y2]) for all (X1, X2), (Y1, Y2) ∈ Γ(TF) × Γ(TF∗). ˜D is the unique torsion free linear connection which preserves (F, F∗). GSI2013 Foliations on Affinely Flat Manifolds. 16/17 ' & $ % Assume that one of the connections (D, D∗ ) is geodesically complete. The foliations are Ehresmann connections for the other. – the universal coverings of leaves of the foliation F, respectively, F∗, are D-affinely isomorphic, respectively, D∗-affinely isomorphic. – the universal covering ˜M of the manifold M is the product K × L where K is the universal covering of leaves of the foliation F and L is the universal covering of leaves of the foliation F∗. Assume that the connection D is complete. Then the restriction of D to leaves of F is complete and each leaf of F is a geodesically complete locally flat manifold, so its universal covering is diffeomorphic Rp where p is the dimension of leaves of F. The same is true if the connection D∗ is complete. GSI2013 Foliations on Affinely Flat Manifolds. 17/17 ' & $ % Merci Thank you

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Hessian structures on deformed exponential families MATSUZOE Hiroshi Nagoya Institute of Technology joint works with HENMI Masayuki (The Institute of Statistical Mathematics) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 1 PRELIMINARIES 1 Preliminaries 1.1 Geometry of statistical models   Definition 1.1 S is a statistical model or a parametric model on Ω def ⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ s.t. S = { p(x; ξ) ∫ Ω p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn } .   We regard S as a manifold with a local coordinate system {Ξ; ξ1 , . . . , ξn }   gF = (gF ij) is the Fisher metric (Fisher information matrix) of S def ⇐⇒ gF ij(ξ) := ∫ Ω ∂ ∂ξi log p(x; ξ) ∂ ∂ξj log p(x; ξ)p(x; ξ)dx = ∫ Ω ∂ipξ ( ∂ ∂ξj log pξ ) dx = Eξ[∂ilξ∂jlξ]   ∂ipξ def ⇐⇒ mixture representation, ∂ilξ = ( ∂ipξ pξ ) def ⇐⇒ exponential representation. (the score function) 2/29 1 PRELIMINARIES A statistical model Se is an exponential family def ⇐⇒ Se = { p(x; θ) p(x; θ) = exp[C(x) + n∑ i=1 θi Fi(x) − ψ(θ)] } , C, F1, · · · , Fn : random variables on Ω ψ : a function on the parameter space Θ The coordinate system [θi ] is called the natural parameters.   Proposition 1.2 For an exponential family Se, (1) ∇(1) is flat (2) [θi ] is an affine coordinate, i.e., Γ (1) k ij ≡ 0   For simplicity, assume that C = 0. gF ij(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))] = E[−∂i∂j log p(x; θ)] = E[∂i∂jψ(θ)] = ∂i∂jψ(θ) :the Fisher metric CF ijk(θ) = E[(∂i log p(x; θ))(∂j log p(x; θ))(∂k log p(x; θ))] = ∂i∂j∂kψ(θ) :the cubic form The triplets (Se, ∇(e) , gF ) and (Se, ∇(m) , gF ) are Hessian manifolds. Remark: (S, ∇(α) , gF ) is an invariant statistical manifold. 3/29 1 PRELIMINARIES Normal distributions Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane). S = { p(x; µ, σ) p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ]} The Fisher metric is (gij) = 1 σ2 ( 1 0 0 2 ) ( S is a space of constant negative curvature − 1 2 ) .   ∇(1) and ∇(−1) are flat affine connections. In addition, θ1 = µ σ2 , θ2 = − 1 2σ2 ψ(θ) = − (θ1 )2 4θ2 + 1 2 log ( − π θ2 ) =⇒ p(x; µ, σ) = 1 √ 2πσ exp [ − (x − u)2 2σ2 ] = exp [ xθ1 + x2 θ2 − ψ(θ) ] . {θ1 , θ2 }: natural parameters. (∇(1) -geodesic coordinate system) η1 = E[x] = µ, η2 = E [ x2 ] = σ2 + µ2 . {η1, η2}: moment parameters. (∇(−1) -geodesic coordinate system)   4/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex) The Fisher metric: (gij) = 1 η0      1 + η0 η1 1 · · · 1 1 1 + η0 η2 ... ... ... ... 1 · · · · · · 1 + η0 ηn      , where η0 = 1 − n∑ j=1 ηj. ( Sn is a space of constant positive curvature 1 4 ) . 5/29 1 PRELIMINARIES Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex)   {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log  1 + n∑ j=1 eθj   {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem)   6/29 1 PRELIMINARIES   Proposition 1.3 For Se, the following hold: (1) (Se, gF , ∇(e) , ∇(m) ) is a dually flat space. (2) {θi } is a ∇(e) -affine coordinate system on Se. (3) ψ(θ) is the potential of gF w.r.t. {θi }: gF ij(θ) = ∂i∂jψ(θ). (4) Set the expectations of Fi(x) by ηi =Eθ[Fi(x)] =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set ϕ(η) = Eθ[log pθ]. =⇒ ϕ(η) is the potential of gF w.r.t. {ηi}.   Since (Se, gF , ∇(e) , ∇(m) ) is a dually flat space, the Legendre transfor- mation holds. ∂ψ ∂θi = ηi, ∂ϕ ∂ηi = θi , ψ(p) + ϕ(p) − m∑ i=1 θi (p)ηi(p) = 0 gF ij = ∂2 ψ ∂θi∂θj , CF ijk = ∂3 ψ ∂θi∂θj∂θk 7/29 1 PRELIMINARIES Kullback-Leibler divergence (or relative entropy on S def ⇐⇒ DKL(p, r) = ∫ Ω p(x) log p(x) r(x) dx = Ep[log p(x) − log r(x)] ( = ψ(r) + ϕ(p) − n∑ i=1 θi (r)ηi(p) = D(r, p) ) For Se, DKL coincides with the canonical divergence D on a dually flat space (Se, ∇(m) , gF ). Construction of a divergence from an estimating function  s(x; ξ) =   ∂/∂ξ1 log p(x; ξ) ... ∂/∂ξn log p(x; ξ)  : the score function of p(x; ξ) (estimating function) by Integrating of the score function and by taking an expectation, dKL(p, r) := ∫ Ω p(x; ξ) log r(x; ξ′ )dx the cross entropy on S The KL-divergence is given by the difference of cross entropies. DKL(p, r) = dKL(p, p) − dKL(p, r)   8/29 1 PRELIMINARIES 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 9/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 2 Deformed exponential family (χ-exp. family) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm  Definition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t)   Usually, the χ-exponential is called ϕ-exponential in statistical physics. In this talk, ϕ is used as the dual potential on a dually flat space.   Example 2.2 In the case χ(t) = tq , we have ∫ x 1 1 χ(t) dt = ∫ x 1 1 tq dt = x1−q − 1 1 − q = logq x q-logarithm λ(t) = (1 + (1 − q)t) q 1−q 1 + ∫ x 0 λ(t) dt = (1 + (1 − q)x) 1 1−q q-exponential   10/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) χ : (0, ∞) → (0, ∞) : strictly increasing χ-exponential, χ-logarithm  Definition 2.1 logχ x := ∫ x 1 1 χ(t) dt χ-logarithm expχ x := 1 + ∫ x 0 λ(t)dt χ-exponential where λ(logχ t) = χ(t)   F1(x), . . . , Fn(x) : functions on Ω θ = {θ1 , . . . , θn } : parameters S = { p(x, θ) p(x; θ) > 0, ∫ Ω p(x; θ)dx = 1 } : statistical model   Definition 2.3 Sχ = {p(x; θ)} : χ-exponential family, deformed exponential family def ⇐⇒ Sχ := { p(x, θ)p(x; θ) = expχ [ n∑ i=1 θi Fi(x) − ψ(θ) ] , p(x, θ) ∈ S }   11/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY)   Proposition 2.4 (discrete distributions) The set of discrete distributions is a χ-exponential family for any χ   (Proof) Ω = {x0, x1, . . . , xn} Sn = { p(x; η) ηi > 0, n∑ i=0 ηi = 1, p(x; η) = n∑ i=0 ηiδi(x) } , η0 = 1 − n∑ i=1 ηi Set θi = logχ p(xi) − logχ p(x0) = logχ ηi − logχ η0 Then logχ p(x) = logχ ( n∑ i=0 ηiδi(x) ) = n∑ i=1 ( logχ ηi − logχ η0 ) δi(x) + logχ(η0) ψ(θ) = − logχ η0 12/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Finite sample space Ω = {x0, x1, · · · , xn}, dim Sn = n p(xi; η) = { ηi (1 ≤ i ≤ n) 1 − ∑n j=1 ηj (i = 0) Ξ = { {η1, · · · , ηn} ηi > 0 (∀ i), ∑n j=1 ηj < 1 } (an n-dimensional simplex)   {θ1 , · · · , θn }: natural parameters. (∇(1) -geodesic coordinate system) where θi = log p(xi) − log p(x0) = log ηi 1 − ∑n j=1 ηj ψ(θ) = log  1 + n∑ j=1 eθj   {η1, · · · , ηn}: moment parameters. (∇(−1) -geodesic coordinate sys- tem)   13/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) Example 2.5 (Student t-distribution (q-normal distribution)) Ω = R, n = 2, ξ = (µ, σ) ∈ R2 + (the upper half plane), q > 1. p(x; µ, σ) = 1 zq [ 1 − 1 − q 3 − q (x − µ)2 σ2 ] 1 1−q Set θ1 = 2 3 − q zq−1 q · µ σ2 , θ2 = − 1 3 − q zq−1 q · 1 σ2 . Then logq pq(x) = 1 1 − q (p1−q − 1) = 1 1 − q { 1 z1−q q ( 1 − 1 − q 3 − q (x − µ)2 σ2 ) − 1 } = 2µzq−1 q (3 − q)σ2 x − zq−1 q (3 − q)σ2 x2 − zq−1 q 3 − q · µ2 σ2 + zq−1 q − 1 1 − q = θ1 x + θ2 x2 − ψ(θ) ψ(θ) = − (θ1 )2 4θ2 − zq−1 q − 1 1 − q   The set of Student t-distributions is a q-exponential family.   14/29 2 DEFORMED EXPONENTIAL FAMILY (χ-EXP. FAMILY) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 15/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 3 Geometry of deformed exponential family (1) Sχ : a deformed exponential family ψ(θ) : strictly convex (normalization for Sχ)   sχ (x; θ) = ( (sχ )1 (x; θ), . . . , (sχ )n (x; θ) )T is the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). (1)   Statistical structure for Sχ  Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logχ p(x; θ) dx Dual affine connections ∇M(e) , ∇M(m) : Γ M(e) ij,k (θ) = ∫ Ω ∂kp(x; θ)∂i∂j logχ p(x; θ)dx Γ M(m) ij,k (θ) = ∫ Ω ∂i∂jp(x; θ)∂k logχ p(x; θ)dx   (Sχ, ∇M(e) , gM ) and (Sχ, ∇M(m) , gM ) are Hessian manifolds. 16/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1)   Proposition 3.1 For Sχ, the following hold: (1) (Sχ, gM , ∇M(e) , ∇M(m) ) is a dually flat space. (2) {θi } is a ∇M(e) -affine coordinate system on Sχ. (3) Ψ(θ) is the potential of gM with respect to {θi }, that is, gM ij (θ) = ∂i∂jΨ(θ). (4) Set the expectations of Fi(x) by ηi = Eθ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gM . (5) Set Φ(η) = −Iχ(pθ). =⇒ Φ(η) is the potential of gM with respect to {ηi}.   Iχ(pθ) = − ∫ Ω {Uχ(p(x; θ)) + (p(x; θ) − 1)Uχ(0)} dx, where Uχ(t) = ∫ t 1 logχ(s) ds, Uχ(0) = lim t→+0 Uχ(t) < ∞. the generalized entropy functional Ψ(θ) = ∫ Ω p(x; θ) logχ p(x; θ)dx + Iχ(pθ) + ψ(θ), the generalized Massieu potential 17/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) Construction of β-divergence (β = 1 − q)   uq(x; θ): a weighted score function def ⇐⇒ uq(x; θ) = (u1 q(x; θ), . . . , un q (x; θ))T ui q(x; θ) = p(x; θ)1−q si (x; θ) − Eθ[p(x; θ)1−q si (x; θ)].   From the definition of q-logarithm function, uq(x; θ) is written by ui q(x; θ) = ∂ ∂θi { 1 1 − q p(x; θ)1−q − 1 2 − q ∫ Ω p(x; θ)2−q dx } = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Hence, this estimating function is the bias-corrected q-score function. 18/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) By integrating uq(x; θ), and taking the expectation, we define a cross entropy by d1−q(p, r) = − 1 1 − q ∫ Ω p(x; θ)r(x; θ)1−q + 1 2 − q ∫ Ω r(x; θ)2−q dx Then the β-divergence (β = 1 − q) is given by D1−q(p, r) = −d1−q(p, p) + d1−q(p, r) = 1 (1 − q)(2 − q) ∫ Ω p(x)2−q dx − 1 1 − q ∫ Ω p(x)r(x)1−q dx + 1 2 − q ∫ Ω r(x)2−q dx Remark 3.2 A β-divergence D1−q induces Hessian manifolds (Sq, ∇M(m) , gM ) and (Sq, ∇M(e) , gM ). 19/29 3 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (1) 1 Statistical manifolds and statistical models 2 Deformed exponential family 3 Geometry of deformed exponential family (1) 4 Geometry of deformed exponential family (2) 5 Maximum q-likelihood estimator 20/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) 4 Geometry of deformed exponential family (2)   Definition 4.1 Pχ(x) : the escort distribution of p(x; θ), def ⇐⇒ Pχ(x; θ) = 1 Zχ(θ) χ(p(x; θ)), Zχ(θ) = ∫ Ω χ(p(x; θ))dx Eχ,θ[f(x)] : the χ-expectation of p(x) def ⇐⇒ the expectation of f(x) with respect to the escort distribution: Eχ,θ[f(x)] = ∫ f(x)Pχ(x; θ)dx = 1 Zχ(θ) ∫ f(x)χ(p(x; θ))dx     Definition 4.2 Sχ = {p(x; θ)}: a deformed exponential family gχ ij(θ) = ∂i∂jψ(θ) : the χ-Fisher information metric Cχ ijk(θ) = ∂i∂j∂kψ(θ) : the χ-cubic form   Set Γ χ(e) ij,k := Γ χ(0) ij,k − 1 2 Cχ ijk, Γ χ(m) ij,k := Γ χ(0) ij,k + 1 2 Cχ ijk, 21/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2)   Proposition 4.3 For Sχ, the following hold: (1) (Sχ, gχ , ∇χ(e) , ∇χ(m) ) is a dually flat space. (2) {θi } is a ∇χ(e) -affine coordinate system on Sχ. (3) ψ is the potential of gχ with respect to {θi }, that is, gχ ij(θ) = ∂i∂jψ(θ). (4) Set the χ-expectation of Fi(x) by ηi = Eχ,θ[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θi } with respect to gχ . (5) Set ϕ(η) = Eχ,θ[logχ p(x; θ)] =⇒ ϕ(η) is the potential of gχ with respect to {ηi}.   Proof: Statements 1, 2 and 3 are obtained from the definition of χ-Fisher metric and χ-cubic form. Statements 4 and 5 follow the fact that Eχ,θ[logχ p(x; θ)] = Eχ,θ [ n∑ i=1 θi Fi(x) − ψ(θ) ] = n∑ i=1 θi ηi − ψ(θ) 22/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. The generalized relative entropy Dχ of Sχ coincides with the canonical divergence D(r, p) on (Sχ, ∇χ(e) , gχ ). In fact, Dχ (pθ, rθ′) = Eχ,p [( n∑ i=1 θi Fi(x) − ψ(θ) ) − ( n∑ i=1 (θ′ )i Fi(x) − ψ(θ′ ) )] = ψ(θ′ ) + ( n∑ i=1 θi ηi − ψ(θ) ) − n∑ i=1 (θ′ )i ηi = D(rθ′, pθ). Tsallis relative entropy (q-exponential case)  Dq (p, r) = Eq,p [ logq p(x) − logq r(x) ] = 1 − ∫ p(x)q r(x)1−q dx (1 − q)Zq(p) = q Zq(p) D(1−2q) (p, r). The Tsallis relative entropy is conformal to α-divergence (α = 1−2q).   23/29 4 GEOMETRY OF DEFORMED EXPONENTIAL FAMILY (2) The generalized relative entropy (or χ-relative entropy) of Sχ by Dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)]. Construction of χ-relative entropy  sχ (x; θ): the χ-score function def ⇐⇒ (sχ )i (x; θ) = ∂ ∂θi logχ p(x; θ), (i = 1, . . . , n). p The χ-score is unbiased w.r.t. χ-expectation, Eχ,θ[(sχ )i (x; θ)] = 0. =⇒ We regard that sχ (x; θ) is a generalization of estimating function. By integrating the χ-score function, we define the χ-cross entropy by dχ (p, r) = − ∫ Ω P (x) logχ r(x)dx. Then we obtain the generalized relative entropy by Dχ (p, r) = −dχ (p, p) + dχ (p, r) = Eχ,p[logχ p(x) − logχ r(x)].   24/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5 Maximum q-likelihood estimators 5.1 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)] (p1(x) > 0, p2(y) > 0)   x > 0, y > 0 and x1−q + y1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y def ⇐⇒ x ⊗q y := [ x1−q + y1−q − 1 ] 1 1−q = expq [ logq x + logq y ]   expq x ⊗q expq y = expq(x + y), logq(x ⊗q y) = logq x + logq y. X and Y : q-independent with m-normalization (mixture normalization) def ⇐⇒ pq(x, y) = p1(x) ⊗ p2(y) Zp1,p2 where Zp1,p2 = ∫ ∫ XY p1(x) ⊗q p2(y)dxdy 25/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq.   Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) )   In the case q → 1, Lq is the standard likelihood function on Ξ.   expq(x1 + x2 + · · · + xN) = expq x1 ⊗q expq x2 ⊗q · · · ⊗q expq xN = expq x1 · expq ( x2 1 + (1 − q)x1 ) · · · expq ( xN 1 + (1 − q) ∑N−1 i=1 xi )   Each measurement influences the others.    26/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS 5.2 Geometry for q-likelihood estimators Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN} : N-observations from p(x; ξ) ∈ Sq.   Lq(ξ) : q-likelihood function def ⇐⇒ Lq(ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN; ξ) ( ⇐⇒ logq Lq(ξ) = N∑ i=1 logq p(xi; ξ) )   In the case q → 1, Lq is the standard likelihood function on Ξ.   ˆξ : the maximum q-likelihood estimator def ⇐⇒ ˆξ = arg max ξ∈Ξ Lq(ξ) ( = arg max ξ∈Ξ logq Lq(ξ) ) .     the q-likelihood is maximum ⇐⇒ the canonical divergence (Tsallis relative entropy) is minimum.   27/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) )  estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)]   Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) )  estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)]   28/29 5 MAXIMUM Q-LIKELIHOOD ESTIMATORS Summary (in the case of q-exponential) β-divergence (Sq, gM , ∇M(e) , ∇M(m) )  estimating function uq(x; θ): ui q(x; θ) = ∂ ∂θi logq p(x; θ) − Eθ [ ∂ ∂θi logq p(x; θ) ] Riemannian metric gM : gM ij (θ) = ∫ Ω ∂ip(x; θ)∂j logq p(x; θ)dx dual coordinates {ηi}: ηi = Ep[Fi(x)]   Tsallis relative entropy (Sq, gq , ∇q(e) , ∇q(m) )  estimating function (sq )(x; θ): (sq )i (x; θ) = ∂ ∂θi logq p(x; θ) (unbiased under q-expectation) Riemannian metric gq : gq ij(θ) = ∂2 ∂θiθj ψ(θ) dual coordinates {ηi}: ηi = Eq,p[Fi(x)]   The notion of expectations, independence are determined from a geometric structure of the statistical model. 29/29

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Isometric Reeb Flow and Related Results on Hermitian Symmetric Spaces of Rank 2 Young Jin Suh Department of Mathematics Kyungpook National University Taegu 702-701, Korea Ecole des Meines, Paris, France Geometric Science of Information, GSI’13 28-30th, August, 2013 E-mail: yjsuh@knu.ac.kr August 29, 2013 Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Contents 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hermitian Symmetric Spaces Hereafter let us note that HSSP means Hermitian Symmetric Space. HSSP of compact type with rank 1: CPm, QPm HSSP of noncompact type with rank 1: CHm, QHm. HSSP of compact type with rank 2: SU(2 + q)/S(U(2)×U(q)), Qm, SO(8)/U(4), Sp(2)/U(2) and (e6(−78), SO(10) + R) HSSP of compact type with rank 2: SU(2, q)/S(U(2)×U(q)), Q∗m, SO∗(8)/U(4), Sp(2, R)/U(2) and (e6(2), SO(10) + R) (See Helgason [6], [7]). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Hypersurfaces in Hermitian Symmetric Spaces Let M be a hypersurfaces in a Hermitian Symmetric Space ¯M with Kaehler structure J. AX = − ¯ X N : Weingarten formula Here A: the shape operator of M in ¯M. ξ = −JN : the Reeb vector field. JX = φX + η(X)N, X ξ = φAX for any vector field X∈Γ(M). Then (φ, ξ, η, g): almost contact structure on a hypersurface M Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Define) A hypersurfcae M: Isometric Reeb Flow ⇐⇒ Lξg = 0 ⇐⇒ g(dφt X, dφt Y) = g(X, Y) for any X, Y∈Γ(M), where φt denotes a one parameter group, which is said to be an isometric Reeb flow of M, defined by dφt dt = ξ(φt (p)), φ0(p) = p, ˙φ0(p) = ξ(p). Note) Lξg = 0 ⇐⇒ jξi + iξj = 0, ξ: skew-symmetric ⇐⇒ g( X ξ, Y) + g( Y ξ, X) = 0 ⇐⇒ g((φA − Aφ)X, Y) = 0 for any X, Y∈Γ(M). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Define) A hypersurfcae M: Isometric Reeb Flow ⇐⇒ Lξg = 0 ⇐⇒ g(dφt X, dφt Y) = g(X, Y) for any X, Y∈Γ(M), where φt denotes a one parameter group, which is said to be an isometric Reeb flow of M, defined by dφt dt = ξ(φt (p)), φ0(p) = p, ˙φ0(p) = ξ(p). Note) Lξg = 0 ⇐⇒ jξi + iξj = 0, ξ: skew-symmetric ⇐⇒ g( X ξ, Y) + g( Y ξ, X) = 0 ⇐⇒ g((φA − Aφ)X, Y) = 0 for any X, Y∈Γ(M). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow In the future homeogeneous hypersurfaces in HSSP satisfying certain geometric conditions might be solved completely as follows: Problem 1 Classify all of homogeneous hypersurfaces in HSSP. In this talk let us consider hypersurfaces with isometric Reeb flow in Hermitian Symmetric Spaces as follows: Problem 2 If M is a complete hypersurface in HSSP ¯M with isometric Reeb flow, then M becomes homogeneous ? Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow In the future homeogeneous hypersurfaces in HSSP satisfying certain geometric conditions might be solved completely as follows: Problem 1 Classify all of homogeneous hypersurfaces in HSSP. In this talk let us consider hypersurfaces with isometric Reeb flow in Hermitian Symmetric Spaces as follows: Problem 2 If M is a complete hypersurface in HSSP ¯M with isometric Reeb flow, then M becomes homogeneous ? Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb flow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb flow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb flow will be classified. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb flow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb flow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb flow will be classified. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Note 1) In CPm, CHm and QPm with isometric Reeb flow (See Okumura 1976, Montil and Romero 1986, Perez and Martinez 1986 ). Note 2) In G2(Cm+2), G∗ 2(Cm+2) and complex quadric Qm = SO(m + 2)/SO(2)SO(m) with isometric Reeb flow (See Berndt and Suh, 2002 and 2012, Suh, 2013, Berndt and Suh, 2013 ). Note 3) In near future, in noncompact complex quadric Qm∗ = SO(2, m)/SO(2)SO(m) with isometric Reeb flow will be classified. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Complex Projective Space Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classified hypersurfaces in CHm with isometric Reeb flow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb flow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classified hypersurfaces in CHm with isometric Reeb flow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb flow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classified hypersurfaces in CHm with isometric Reeb flow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb flow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Montiel and Romero classified hypersurfaces in CHm with isometric Reeb flow as follows: Theorem 1.1 (Montiel and Romero 1986) Let M be a real hypersurfaces in CHm with isometric Reeb flow. Then we have the following (A) M is an open part of a tube around a totally geodesic CHk in CHm, (C) geodesic hypersphere, (D) horosphere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Complex Two-Plane Grassmannians Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classified real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classified real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow When the maximal complex subbundle C (resp. quaternionic subbundle) of M in G2(Cm+2) is invariant, that is AC⊂C (resp. AQ⊂Q) , we say M is Hopf (resp. curvature adapted). Berndt and Suh (Monat, 1999) have classified real hypersurfaces in G2(Cm+2) as follows: Theorem 1.2 A real hypersurface of G2(Cm+2), m≥3, is Hopf and curvature adapted if and only if it is congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2), (B) a tube over a totally geodesic totally real QPn, m = 2n, in G2(Cm+2). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Berndt and Suh (Monat. 2002) have given a classification of hypersurfaces in G2(Cm+2), m≥3 wih isometric Reeb flow as follows: Theorem 1.3 Let M be a real hypersurface in G2(Cm+2), m≥3, with isometric Reeb flow. Then M is locally congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2). The two singular orbits are totally geodesically embedded CPm and G2(Cm+1), Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Homogeneous Hypersurfaces Isometric Reeb Flow Berndt and Suh (Monat. 2002) have given a classification of hypersurfaces in G2(Cm+2), m≥3 wih isometric Reeb flow as follows: Theorem 1.3 Let M be a real hypersurface in G2(Cm+2), m≥3, with isometric Reeb flow. Then M is locally congruent to (A) a tube over a totally geodesic G2(Cm+1) in G2(Cm+2). The two singular orbits are totally geodesically embedded CPm and G2(Cm+1), Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow The Riemannian symmetric space SU(2, m)/S(U(2)×U(m)) is a connected, simply connected, irreducible Riemannian symmetric space of noncompact type with rank 2. Let G = SU(2, m) and K = S(U(2)×U(m)), and denote by G and K the corresponding Lie algebra. Let B denotes the Cartan Killing form of G and by P the orthogonal complement of K in G with respect to B. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow The decomposition G = K⊕P is a Cartan decomposition of G = su(2, m). The Cartan involution θ∈Aut(g) on su(2, m) is given by θ(A) = I2,mAI2,m for A∈su(2, m), where I2,m = −I2 02,m 0m,2 Im Then < X, Y >= −B(X, θY): a positive definite Ad(K)-invariant on G. Its restriction to P: a Riemannian metrc g, where g: the Killing metric on SU(2, m)/S(U(2)×U(m)). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative definite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive definite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative definite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive definite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Killing Cartan forms related to sl(n, C) The Killing Cartan form B(X, Y) of sl(n, C) is given by B(X, Y) = 2nTrXY for any X, Y∈sl(n, C). In su(m + 2) = {X∈M(m + 2, C)|X∗ + X = 0, TrX = 0}, B(X, Y) is negative definite, because B(X, X) = −2nTrXX∗≤0. So < X, Y >= −B(X, Y). In su(2, m) = {X∈M(m + 2, C)|X∗I2,m + I2,mX = 0, TrX = 0}, the product < X, Y >= −B(X, θY), θ2 = I, is positive definite, because < X, X > = −2nTrXθX = −2nTrXI2,mXI2,m = 2nTrXX∗ I2 2,m = 2nTrXX∗ ≥0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Let C = {X∈TM|JX∈TM} : the maximal complex subbundle and Q = {X∈TM|JX⊂TM} the maximal quaternionic subbundle for M in SU(2, m)/S(U(2)×U(m)). When C and Q of TM are both invariant by the shape operator A of M , we write h(C, C⊥ ) = 0 and h(Q, Q⊥ ) = 0, where h denotes the second fundamental form defined by g(h(X, Y), N) = g(AX, Y) for any X, Y on M. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at infinity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at infinity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at infinity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow By using the theory of Focal points and the method due to P.B. Eberlein, Berndt and Suh proved the following (See Int. J. Math., 2012) Theorem 2.1 Let M be a connected hypersurface in SU2,m/S(U2Um), m≥2. Then h(C, C⊥) = 0 and h(Q, Q⊥) = 0 if and only if M is congruent to an open part of the following: (A) a tube around a totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) , or (B) a tube around a totally geodesic HHn in SU2,m/S(U2Um), m = 2n, (C) a horosphere whose center at infinity is singular . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Horosphere Let Ht = cos te1 + sin te2∈A: a unit normal to a horosphere Mt , where A denotes a maximal abelian subspace of P for the E. Cartan’s decomposition G = K⊕P. Here a horosphere is given by Mt = SHt ·o, where SHt denotes the Lie subgroup of G corresponding to the Lie subalgebra SH = S RH, S = A⊕N and N = ⊕λ∈Σ+ Gλ for the Iwasawa decomposition G = K⊕A⊕N with corresponding G = KAN. The shape operator of a horosphere Mt is given by AH = ad(H). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classification with isometric Reeb flow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb flow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at infinity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classification with isometric Reeb flow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb flow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at infinity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (A) and a Horosphere In this subsection we introduce a classification with isometric Reeb flow in SU2,m/S(U2Um) as follows (See Suh, Advances in Applied Math., 2013): Theorem 2.5 Let M be a connected orientable real hypersurface in SU2,m/S(U2Um), m ≥ 3. Then the Reeb flow on M is isometric if and only if M is congruent to an open part of the following: (A) a tube around some totally geodesic SU2,m−1/S(U2Um−1) in SU2,m/S(U2Um) or, (C) a horosphere whose center at infinity is singular. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Characterization of type (B) and a Horosphere Definition For a real hypersurface M in SU2,m/S(U2Um) is said to be a contact ⇐⇒ ∃ a non-zero constant function ρ defined on M such that φA + Aφ = kφ, k = 2ρ, The condition is equivalent to g((φA + Aφ)X, Y) = 2dη(X, Y), where dη is defined by dη(X, Y) = ( X η)Y − ( Y η)X for any X, Y on M in SU2,m/S(U2Um). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classification in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at infinity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classification in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at infinity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow Then we give another classification in noncompact complex two-plane Grassmannian SU2,m/S(U2Um) in terms of the contact hypersurface as follows: Theorem 2.6 Let M be a contact real hypersurface in SU2,m/S(U2Um) with constant mean curvature. Then one of the following statements holds: (B) M is an open part of a tube around a totally geodesic HHn in SU2,2n/S(U2U2n), m = 2n, (C) M is an open part of a horosphere in SU2,m/S(U2Um) whose center at infinity is singular and of type JN ⊥ JN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Reeb flow on a real hypersurface in G2(Cm+2) is isometric if and only if M is an open part of a tube around a totally geodesic G2(Cm+1) ⊂ G2(Cm+2). In view of the previous results a natural expectation would lead to the totally geodesic Qm−1 ⊂ Qm. Surprisingly, this is not the case. In fact, we will prove Theorem 3.1 Let M be a real hypersurface of the complex quadric Qm, m ≥ 3. The Reeb flow on M is isometric if and only if m is even, say m = 2k, and M is an open part of a tube around a totally geodesic CPk ⊂ Q2k . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The homogeneous quadratic equation Qm = {z∈Cm+2 |z2 1 + . . . + z2 m+2 = 0}⊂CPm+1 defines a complex hypersurface in complex projective space CPm+1 = SUm+2/S(Um+1U1). For a unit normal vector N of Qm at a point [z] ∈ Qm we denote by AN the shape operator of Qm in CPm+1 with respect to N. The shape operator is an involution on T[z]Qm and T[z]Qm = V(AN) ⊕ JV(AN), where V(AN) is the (+1)-eigenspace and JV(AN) is the (−1)-eigenspace of AN. Geometrically this means that AN defines a real structure on the complex vector space T[z]Qm, or equivalently, is a complex conjugation on T[z]Qm. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Riemannian curvature tensor ¯R of Qm can be expressed as follows: ¯R(X, Y)Z = g(Y, Z)X − g(X, Z)Y + g(JY, Z)JX −g(JX, Z)JY − 2g(JX, Y)JZ +g(AY, Z)AX − g(AX, Z)AY +g(JAY, Z)JAX − g(JAX, Z)JAY. A nonzero tangent vector W ∈ T[z]Qm is called singular if it is tangent to more than one maximal flat in Qm. 1. If a conjugation A ∈ A[z] such that W ∈ V(A), then W is singular, that is A-principal. 2. If a conjugation A ∈ A[z] and orthonormal vectors X, Y ∈ V(A) such that W/||W|| = (X + JY)/ √ 2, then W is said to be A-isotropic. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The Riemannian curvature tensor ¯R of Qm can be expressed as follows: ¯R(X, Y)Z = g(Y, Z)X − g(X, Z)Y + g(JY, Z)JX −g(JX, Z)JY − 2g(JX, Y)JZ +g(AY, Z)AX − g(AX, Z)AY +g(JAY, Z)JAX − g(JAX, Z)JAY. A nonzero tangent vector W ∈ T[z]Qm is called singular if it is tangent to more than one maximal flat in Qm. 1. If a conjugation A ∈ A[z] such that W ∈ V(A), then W is singular, that is A-principal. 2. If a conjugation A ∈ A[z] and orthonormal vectors X, Y ∈ V(A) such that W/||W|| = (X + JY)/ √ 2, then W is said to be A-isotropic. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector field of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also define δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector field of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also define δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let M be a real hypersurface of Qm and denote ξ = −JN, where N is a (local) unit normal vector field of M. For A ∈ A[z] and X ∈ T[z]M we decompose AX as follows: AX = BX + ρ(X)N where BX is the tangential component of AX and ρ(X) = g(AX, N) = g(X, AN) = g(X, AJξ) = −g(X, JAξ) = g(JX, Aξ). Since JX = φX + η(X)N and Aξ = Bξ + ρ(ξ)N we also have ρ(X) = g(φX, Bξ) + η(X)ρ(ξ) = g(−φBξ + ρ(ξ)ξ, X). We also define δ = g(N, AN) = g(JN, JAN) = −g(JN, AJN) = −g(ξ, Aξ). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Geometric Descriptions of the Tube We assume that m is even, say m = 2k. The map CPk → Q2k ⊂ CP2k+1 , [z1, . . . , zk+1] → [z1, . . . , zk+1, iz1, . . . , izk+1] gives an embedding of CPk into Q2k as a totally geodesic complex submanifold. Define a complex structure j on C2k+2 by j(z1, . . . , zk+1, zk+2, . . . , z2k+2) = (−zk+2, . . . , −z2k+2, z1, . . . , zk+1). Then j2 = −I and note that ij = ji. We can then identify C2k+2 with Ck+1 ⊕ jCk+1 and get T[z]CPk = {X + jiX | X ∈ Ck+1 [z]} = {X + ijX|X∈V(A¯z)}. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem The normal space becomes ν[z]CPk = A¯z(T[z]CPk ) = {X − ijX|X∈V(A¯z)}. The normal N of T[z]CPk : A-isotropic, the four vectors {N, JN, AN, JAN}: pairwise orthonormal. The normal Jacobi operator ¯RN is given by ¯RNZ = ¯R(Z, N)N = Z − g(Z, N)N + 3g(Z, JN)JN −g(Z, AN)AN − g(Z, JAN)JAN. Both T[z]CPk and ν[z]CPk are invariant under RN, and RN has three eigenvalues 0, 1, 4 according to RN⊕[AN], T[z]Q2k ([N]⊕[AN]) and RJN. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Principal Curvatures and Spaces of the Tube To calculate the principal curvatures of the tube of radius 0 < r < π/2 around CPk : the standard Jacobi field method as described in Section 8.2 of Berndt, Console and Olmos. Let γ: the geodesic in Q2k with γ(0) = [z] and ˙γ(0) = N. γ⊥ : the parallel subbundle of TQ2k along γ defined by γ⊥ γ(t) = T[γ(t)]Q2k R˙γ(t). Let us define the γ⊥-valued tensor field R⊥ γ along γ by R⊥ γ(t)X = R(X, ˙γ(t))˙γ(t). Now consider the End(γ⊥)-valued differential equation Y + R⊥ γ ◦ Y = 0. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Let D be the unique solution of this differential equation with initial values D(0) = I 0 0 0 , D (0) = 0 0 0 I , where the decomposition of the matrices is with respect to γ⊥ [z] = T[z]CPk ⊕ (ν[z]CPk RN) and I denotes the identity transformation on the corresponding space. Then the shape operator S(r) of the tube of radius 0 < r < π/2 around CPk with respect to ˙γ(r) is given by S(r) = −D (r) ◦ D−1 (r). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem If we decompose γ⊥ [z] further into γ⊥ [z] = (T[z]CPk [AN]) ⊕ [AN] ⊕ (ν[z]CPk [N]) ⊕ RJN, we get by explicit computation that S(r) =     0 0 0 0 0 tan(r) 0 0 0 0 − cot(r) 0 0 0 0 −2 cot(2r)     with respect to that decomposition. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem If we decompose γ⊥ [z] further into γ⊥ [z] = (T[z]CPk [AN]) ⊕ [AN] ⊕ (ν[z]CPk [N]) ⊕ RJN, we get by explicit computation that S(r) =     0 0 0 0 0 tan(r) 0 0 0 0 − cot(r) 0 0 0 0 −2 cot(2r)     with respect to that decomposition. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Proposition 3.1 Let M be the tube of radius 0 < r < π/2 around the totally geodesic CPk in Q2k . Then the following hold: 1. M is a Hopf hypersurface. 2. The normal bundle of M consists of A-isotropic singular. 3. M has four distinct constant principal curvatures. principal curvature eigenspace multiplicity 0 C Q 2 tan(r) TCPk (C Q) 2k − 2 − cot(r) νCPk CνM 2k − 2 −2 cot(2r) F 1 4. Sφ = φS. 5. The Reeb flow on M is an isometric flow. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem 1 Introduction Homogeneous Hypersurfaces Isometric Reeb Flow 2 Hyperbolic Grassmannians Hypersurfaces in SU2,m/S(U2Um) Isometric Reeb Flow 3 Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Now we investigate real hypersurfaces in Qm for which the Reeb flow is isometric. From this, we get a complete expression for the covariant derivative as follows: ( X S)Y = {dα(X)η(Y) + g((αSφ − S2 φ)X, Y) +δη(Y)ρ(X) + δg(BX, φY) + η(BX)ρ(Y)}ξ +{η(Y)ρ(X) + g(BX, φY)}Bξ + g(BX, Y)φBξ −ρ(Y)BX − η(Y)φX − η(BY)φBX. Lemma 3.1 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then the normal vector field N is A-isotropic everywhere. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem From Proposition and Lemma the principal curvature function α is constant. Then we get (λ2 − αλ)Y + (λ2 − αλ)Z = (S2 − αS)X = Y. By virtue of this equation, we can assert the following propositions: Proposition 3.2 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then the distributions Q and C Q = [Bξ] are invariant. Proposition 3.3 Let M be a real hypersurface in Qm, m ≥ 3, with isometric Reeb flow. Then m is even, say m = 2k, and the real structure A maps Tλ onto Tµ, and vice versa. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem For each point [z] ∈ M we denote by γ[z] the geodesic in Q2k with γ[z](0) = [z] and ˙γ[z](0) = N[z] and by F the smooth map F : M −→ Qm , [z] −→ γ[z](r). F is the displacement of M at distance r in the direction of N. Thee differential d[z]F of F at [z] can be computed by d[z]F(X) = ZX (r), where ZX is the Jacobi vector field along γ[z] with ZX (0) = X and ZX (0) = −SX. The A-isotropic N gives that RN = R(Z, N)N has the three constant eigenvalues 0, 1, 4 with corresponding eigenbundles νM ⊕ (C Q) = νM ⊕ Tν, Q = Tλ ⊕ Tµ and F = Tα. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem Rigidity of totally geodesic submanifolds : =⇒ M is an open part of a tube of radius r around a k-dimensional connected, complete, totally geodesic complex submanifold P of Q2k . Klein classified the totally geodesic submanifolds P in Q2k as follows: The focal submanifold P : a totally geodesic Qk ⊂ Q2k or a totally geodesic CPk ⊂ Q2k . ⇐⇒ M is an open part of a tube around CPk . Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric flows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric flows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric flows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References J. Berndt and Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 127(1999), 1-14. J. Berndt and Y.J. Suh, Isometric flows on real hypersurfaces in complex two-plane Grassmannians, Monatshefte für Math. 137(2002), 87-98. S. Montiel and A. Romero, On some real hypersurfaces of a complex hyperbolic space, Geom. Dedicata 20(1986), 245-261. M. Okumura, On some real hypersurfaces of a complex projective space, Trans. Amer. Math. Soc. 212(2006), 355-364. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References II J.D. Perez and Y.J. Suh, Real hypersurfaces of quaternionic projective space satisfying Ui R = 0, Diff. Geom. Appl. 7(1997), 211-217. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with commuting Ricci tensor, J. of Geom. and Physics, 60(2010), 1792-1805. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with parallel Ricci tensor, Proc. Royal Soc. Edinburgh 142(A)(2012), 1309-1324. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References III Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with Reeb parallel Ricci tensor, J. of Geom. and Physics, 64(2013), 1-11. Y.J. Suh, Real hypersurfaces in complex two-plane Grassmannians with harmonic curvature, Journal de Math. Pures Appl., 100(2013), 16-33. J. Berndt and Y.J. Suh, Real hypersurfaces in the noncompact Grassmannians SU2,m/S(U2·Um), http://arxiv.org/abs/0911.3081, International J. of Math., World Sci. Publ., 23(2012), 1250103(35 pages). Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb flow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb flow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb flow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References IV J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex two-plane Grassmannians, Monatschefte fur Math. 137(2002), 87-98. Y.J. Suh, Hypersurfaces with isometric Reeb flow in complex hyperbolic two-plane Grassmannians, Advances in Applied Mathematics, 50(2013), 645-659. J. Berndt and Y.J. Suh, Real hypersurfaces with isometric Reeb flow in complex quadrics, International J. Math., 24(2013),(in press). J. Berndt, S. Console and C. Olmos, Submanifolds and holonomy, Research Notes in Mathematics 434, Chapman & Hall/CRC, 2003. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem References V P.B. Eberlein, Geometry of Non positively Curved Manifolds, Chicago Lectures in Math., The Univ. of Chicago Press, 1996, A.W. Knapp, Lie Groups beyond an Introduction, Progress in Math., Birkhäuser, 2002, S. Helgason, Differential Geometry, Lie Group and Symmetric Spaces, Graduate Studies in Mathematics 34, Amer. Math. Soc. 2001, S. Helgason, Geometric Analysis on Symmetric Spaces, The 2nd Edition, Math. Survey and Monographs 39, Amer. Math. Soc. 2008. Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces Introduction Hyperbolic Grassmannians Complex Quadrics Real hypersurfaces in Q2k Tubes around the totally geodesic CPk ⊂ Q2k Proof of Main Theorem ENDING THANKS FOR YOUR ATTENTION! Y.J.Suh Isometric Reeb Flow on Hermitian Symmetric Spaces

ORAL SESSION 8 Computational Aspects of Information Geometry in Statistics (chaired by Frank Critchley)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

A General Metric for Riemannian Hamiltonian Monte Carlo Michael Betancourt University College London August 30th, 2013 I’m going to talk about probability and geometry, but not information geometry! Instead our interest is Bayesian inference ⇡(✓|D) / ⇡(D|✓) ⇡(✓) Markov Chain Monte Carlo admits the practical analysis and manipulation of posteriors even in high dimensions Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) Markov transitions can be though of as an “average” isomorphism that preserves a target distribution (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Markov transitions can be though of as an “average” isomorphism that preserves a target distribution ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ T(✓, ✓0 ) = N ✓0 |✓, 2 min ✓ 1, ⇡(✓0 ) ⇡(✓) ◆ Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i Random Walk Metropolis and the Gibbs sampler have been the workhorse Markov transitions ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i T(✓, ✓0 ) = Y i ⇡ ✓0 i|✓j\i MCMC performance is limited by complex posteriors, which are common in large dimensions Random walk Metropolis sampling explores only slowly Random walk Metropolis sampling explores only slowly Gibbs sampling doesn’t fare much better Gibbs sampling doesn’t fare much better RWM and Gibbs explore incoherently in large dimensions a|✏ ⇠ Ber ✓ min ✓ 1, ⇡(✓ + ✏) ⇡(✓) ◆◆ ✏ ⇠ N 0, 2 t : ✓ ! ✓ + a · ✏ ti : ✓i ! ✏ ✏ ⇠ ⇡ ✏|✓j\i How do we generate coherent transitions? ⇡T = ⇡ (⌦, B(⌦), ⇡) ( , B( ), ) t : ⌦ ! ⌦, 8t 2 How do we generate coherent transitions? ⇡T = ⇡ ( , B( ), ) t : M ! M, 8t 2 (M, B(M) , ⇡) T : M ! T⇤ M ! T⇤ M ! M Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map T : M ! T⇤ M ! T⇤ M ! MT : M ! T⇤ M ! T⇤ M ! M Random Lift Hamiltonian Flow Hamiltonian flow is a coherent, measure-preserving map Marginalization We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q) We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) T We just need to define a lift from the sample space to its cotangent bundle ⇡(q) ! ⇡(p|q) ⇡(q) H = log ⇡(p|q) log ⇡(q)H = log ⇡(p|q) log ⇡(q) V Quadratic kinetic energies with constant metrics emulate dynamics on a Euclidean manifold ⇡(p|q) = N(0, M) T = 1 2 pipj M 1 ij The coherent flow the Markov chain along the target distribution, avoiding random walk behavior The coherent flow the Markov chain along the target distribution, avoiding random walk behavior Unfortunately, EHMC is sensitive to large variations in curvature As well as variations in the target density V = T = d 2 These weaknesses are particularly evident in hierarchical models x1 x2 xn 1 xn. . . v ⇡(x, v) = nY i=1 ⇡(xi|v) ⇡(v) These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models These weaknesses are particularly evident in hierarchical models DV ⇡ 250 These weaknesses are particularly evident in hierarchical models DV ⇡ 250 Quadratic kinetic energies with dynamic metrics emulate dynamics on a Riemannian manifold ⇡(p|q) = N(0, ⌃(q)) T = 1 2 pipj ⌃ 1 (q) ij + 1 2 log |⌃(q)| Optimal numerical integration suggests using the Hessian, but the Hessian isn’t positive-definite ⌃(q)ij = @i@jV (q) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) @i@jV (q|D) Fisher-Rao is both impractical and ineffective ⌃(q)ij = ED [@i@jV (q|D)] ( ) ED [@i@jV (q|D)] We can regularize without appealing to expectations [exp(↵Hlj) exp( ↵Hlj)] 1 ·Hkl· ⌃ij(q) = [exp(↵Hik) + exp( ↵Hik)] The “SoftAbs” metric serves as a differentiable absolute value of the Hessian λ’ λ 1 / α The SoftAbs metric locally standardizes the target distribution The SoftAbs metric locally standardizes the target distribution And the log determinant admits full exploration of the funnel And the log determinant admits full exploration of the funnel The SoftAbs metric admits a general- purpose, practical implementation of RHMC -15 -10 -5 0 5 10 15 x1 -10 -5 0 5 10v

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo

Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Computational Information Geometry (CIG) in Statistics: foundations Karim Anaya-Izquierdo, FC, Paul Marriott and Paul Vos (Bath, OU, Waterloo and East Carolina) GSI'13: Paris, August 2013 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda OVERALL AIM: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC mixture estimation (see PM's talk) Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Multinomials on k + 1 categories $ int(∆k ), the r.i. of ∆k Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda KEY IDEA: NB: Statist'l. model $ (sample space Ω, {proby. d/ns. on Ω}). Represent inference problems arising in such models inside adequately large but nite dimensional spaces. In these embedding spaces, the building blocks of IG in statistics are explicit, computable & algorithmically usable. Modulo a possible initial discretisation, for a r.v. of interest, an operational universal model space $ the simplex: ∆k := fπ = (π0, π1, ..., πk ) : πi 0, ∑k i=0 πi = 1g, (1) having a unique label for each vertex, representing the r.v. Multinomials on k + 1 categories $ int(∆k ), the r.i. of ∆k (1) allows d/ns. with different support sets. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda (One Iteration of) Statistical Science: Working Problem Formulation: WPF = (Q, p.c., model, data, inference) ) A Q takes the form: `what is θQ θQ[F]?', so that θQ has same (= population) meaning in all models perturbations of problem formulation are pertinent ) sensitivity analyses are sensible perturb (weight) data via CSF: see CALB, (2001), JRSS, B Focus: (perturb) the working model, M say, ... a set of (often, explicitly parameterised) d/ns. on Ω Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Overall aim of CIG in Statistics Key Idea Statistical Science (... needs Sensitivity Analysis) Agenda AGENDA: Represent working model M by a subset of ∆k (cf. coarse-graining). Use IG of ∆k to: numerically compute statistically important features of M ... including: properties of likelihood (can be nontrivial here) adequacy of rst order asymptotic methods ... notably, via higher order asymptotic expansions curvature based dimension reduction mixture model structure/inference (see PM's talk). Focus: ideas, not proofs (given in arXiv paper [2]). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions KEY QUESTION: CIG approach is inherently discrete and nite. Sometimes: this is without loss. In general: 9 appropriate theory for suitably ne partitions: cost: some loss of generality (obvious ~ relation induced). bene t: excellent foundation for a computational theory. ... while: FMP ) models can (arguably, should) be seen as fundamentally categorical. Poses the key question: What is the effect on the inferential objects of interest of a particular selection of such categories? Addressed in Theorems 1 & 2 but, rst, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: leukaemia patient data 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for (later) expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t directly illustrates 2 points: 1 whereas model is continuous, data are discrete ) ZERO loss in treating them as sparse categorical Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: leukaemia patient data 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for (later) expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t directly illustrates 2 points: 1 whereas model is continuous, data are discrete ) ZERO loss in treating them as sparse categorical 2 a further level of coarseness using bin size = 4 days produces effectively NO inferential loss ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXAMPLE 1: log-likelihood for interest parameter µ Panel (a): bin size: circles = 1 day; solid line = 4 days 600 800 1000 1200 1400 1600 1800 -3.5-2.5-1.5-0.5 (a) Log-likelihoods mu log-likelihood -0.002 0.000 0.001 0.002 0.003 0.004 -0.50.00.51.01.5 (b)Fullexponentialfamily theta1 theta2 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions Information loss under discretisation for continuous r.v.'s, need to: truncate & discretise Ω into a nite number of bins. Theorems 1 & 2 show: the associated info. loss can be made arbitrarily small. Key: control bin-conditional moments of r.v.'s of interest, uniformly in the parameters of the model. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 1: likelihood ratios Let: f (x; θ), θ 2 Θ, be a family of density functions with common support X Rd , each continuously diff'ble. on r.i.(X ) 6= ? X be compact fk∂f (x; θ)/∂xk : x 2 X g be uniformly bounded in θ 2 Θ. Then, 8 > 0 and 8 sample sizes N > 0, 9 a nite measurable partition fBi g k( ,N) i=0 of X such that: for all (x1, ..., xN ) 2 X N, and for all (θ0, θ) 2 Θ2, log Likc(θ) Likc(θ0) log Likd (θ) Likd (θ0) where Likc and Likd are the likelihood functions for the continuous and discretised d/ns. respectively. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions Theorem 2 considers discretisation of an EF ) the tools of classical IG can be applied In general: a discretised full EF 6= a full EF, and 9 information loss However, Theorem 2 shows: this loss can be made inferentially unimportant all IG results on the 2 families can be made arbitrarily close Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 2: Amari structure Let: f (x; θ) = ν(x) expfθT s(x) ψ(θ)g, x 2 X , θ 2 Θ, be an EF satisfying the regularity conditions of Amari (1990), p.16 s(x) be uniformly continuous s(X ) be compact. Then, 8 > 0, 9 a nite measurable partition fBi g k( ) i=0 of X such that, for all choices of bin labels si 2 s(Bi ): all terms of Amari's IG for f (x; θ) ... can be approximated to the relevant order of ... by the corring. terms for the discretised family: n (πi (θ), si ) : πi (θ) := R Bi f (x; θ)dx, si 2 s(Bi ) o . In particular, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions THEOREM 2: (continued) (a) for all θ 2 Θ, and any norm, kµc(θ) µd (θ)k = O( ) where µc(θ) = R X xf (x; θ)dx and µd (θ) = ∑i si πi (θ). (b) the expected Fisher information matrices for θ of f (x; θ) and of fπi (θ)g, denoted Ic(θ) and Id (θ) resp., satisfy: kIc(θ) Id (θ)k∞ = O( 2 ) (c) the skewness tensors [Amari, (1990), p. 105] Tc(θ) and Td (θ), for f (x; θ) and fπi (θ)g resp., satisfy: kTc(θ) Td (θ)k∞ = O( 3 ). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Key question for a computational theory Example 1 Information loss under discretisation Theorem 1: likelihood ratios Theorem 2: Amari structure Extensions EXTENSIONS: Above: compactness condition keeps the geometry nite. Later paper: case where compactness not needed. There: `space of all d/ns.' = (closure of) ∞-D simplex extending classical IG ) convergence issues use appropriate Hilbert space structures ... esp., to bound loss of inferential information when move to nite () computable) simplex. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull BACKGROUND: IG $ 1 af ne geometries, non-linearly related via duality & Fisher information. In a full EF context: +1 geometry $ natural parameterisation 1 geometry $ mixture parameterisation. Closures of EF's have been studied by, e.g.: B-N ('78), Brown ('86), Lauritzen ('96) & Rinaldo (2006) and, in ∞-D case: Csiszar & Matus (2005). Here, rather than pointwise limits, focus = limits of families of d/ns. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull IG theory follows Amari (1990) via Murray & Rice's (1993) af ne space construction, extended by Marriott (2002). Recall: r.v.'s take values in a nite set of categories (bins) B = fBi gi2I ) d/n. = set of corring. probabilities fπi gi2I NB: identify bin Bi with its label i 2 I = f0, ..., kg 1 af ne space structure over d/ns. on B: (Amix , Vmix , +) where: Amix = fai gi2I : ∑i2I ai = 1 , Vmix = fvi gi2I : ∑i2I vi = 0 and `+' is the usual addition of sequences. ∆k is a 1-cvx. subset of (Amix , Vmix , +). Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull +1 af ne space structure over d/ns. on B: {sets of d/ns. with same support} form a simplicial complex support $ ? 6= F I ... where `F' connotes `face' each F has a separate +1 structure: (Aexp,F , Vexp,F , F ) de ning F on AF := ffai gi2F : ai > 0g by fai g F fbi g , 9λ > 0 s.t. 8i 2 F, ai = λbi, we put: Aexp,F := AF / F and Vexp,F := ffvi gi2F : vi 2 Rg, de ning F by hfai gi F fvi g := hfai exp(vi )gi. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Extended TRInomial IG: (obvious extns. ) general case in [2]) ∆2: bin probs. π = (π0, π1, π2), πi 0 panels (a) to (d) show 1 geodesics in 1 parameters panels (a), (c) $ ∆2 in 1 (mixture) parameters panels (b), (d) $ +1 (natural) parameters (each πi > 0) cT = (1, 2, 3), X Trinomial(1; π) (a), (b): blue lines = level sets of E(cT X) = -1 geodesics (d), (c): black lines = 1-D full EFs* = +1 geodesics *with probs. of form: πi exp(θci )/ ∑2 j=0 πj exp(θcj ) these -1-parallel blue lines & +1-parallel black lines ... are everywhere orthogonal w.r.t. the Fisher info. metric Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull (a)-1-g eodesicsin-1-simple x 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 (b)-1-g eodesicsin+1-simple x -10 -5 0 5 10 -10-50510 (c)+1-g eodesicsin-1-simple x 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 (d)+1-g eodesicsin+1-simple -10 -5 0 5 10 -10-50510 (b): -1 geodesics nonlinear in +1 parameters ... & v.v.: see (c). (a): -1 geodesics extend naturally to the bdy. in -1-parameters (c): limits of +1 geodesics lie in bdy. of ∆2; de ne +1 closure s.t. these continuous limits are de ned `at ∞' in +1-parameters: – shown schematically as dotted triangle in (b); – key to understanding the simplicial nature of +1-geometry. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations (a): -1 in -1 (b): -1 in +1 (c): +1 in -1 (d): +1 in +1 Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: natural spaces for CIG = high-D simplicial structures ) primary question: behaviour of log-likelihood l( ) in them? 2 important issues ... typically, sample size N << k = dimension of the simplex ∆k contains sub-simplices of varying support ... ) standard intuition about shape of l( ) will not hold, ... esp.: standard χ2-approxn. to d/n. of the deviance fails. discretising: data fxi gN i=1 f (x; θ) ! counts fni gi2I Multinomial(N; π(θ)), (I = f0, ..., kg) . I = P [ Z where P := fi : ni > 0g & Z := fi : ni = 0g. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: (continued) observed face := face spanned by vertices (bins) in P unobserved face := face spanned by vertices (bins) in Z The log-likelihood l( ) is: strictly concave on the observed face strictly decreasing in the normal direction from it to the unobserved face and, otherwise, constant. For more re geometry of the observed face: see PM's talk. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull SHAPE OF LOG-LIKELIHOOD: (continued) Theorem 3: Let: Vmix := fvi gi2I : ∑i2I vi = 0 , V0 := fvi gi2I 2 Vmix : vi = 0, i 2 P for any iZ 2 Z: ViZ := fvi gi2I 2 Vmix : vi = 0, i 2 Z n fiZ g . Then: V0 is a linear subspace of Vmix l( ) is constant on -1-af ne subspaces of the form π + V0 Vmix has direct sum V0 ViZ . Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: denote: all bin probs. except π0 by π(0) := (π1, ..., πk )T . viewed as the covariance matrix of the score, N 1 (Fisher info. matrix for +1-params.) is: I(π) := diag(π(0)) π(0)πT (0) ... whose explicit spectral decomposition is, in all cases, an example of interlacing eigenvalue results. Accordingly, ... Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: continued ... the Fisher spectrum mimics key features of the bin probabilities. Of central importance: 1 eigenvalues are exponentially small , the same is true of the fπi gk i=0 the Fisher info. matrix is singular , one of the fπi gk i=0 vanishes. [Again, typically, 2 eigenvalues are close whenever 2 corresponding πi are.] Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Spectrum of Fisher information: continued In particular, if fπi gk i=1 comprise g > 1 distinct values λ1 > ... > λg > 0, λi occurring mi times, ... then, the spectrum of I(π) comprises g simple eigenvalues feλi gg i=1, the roots of an explicit polynomial, satisfying λ1 > eλ1 > ... > λg > eλg 0 together, if g < k, with fλi : mi > 1g, each such λi having multiplicity mi 1, while eλg > 0 , π0 > 0. [Further, each eλi (i < g) is typically (much) closer to λi than to λi+1, making it a near replicate of λi.] Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull CLOSURE: Given a full EF embedded in a high-D sparse simplex, an important question is to identify its limit points – how it is connected to the boundary. Panel (c) in the trinomial example above illustrates that: 1-D EF limits lie at vertices which vertex is determined by the rank order of the components of the tangent vector of the +1-geodesic. In general (see [2]): nding the limit points $ nding redundant linear constraints this can be converted, via duality, into: nding extremal points in a nite-D af ne space. cf.: Geyer (2009): directions of recession. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion Af ne geometries Extended trinomial IG The shape of the log-likelihood Spectrum of Fisher information Closure Total positivity and the convex hull Total positivity and the convex hull: The -1-convex hull of an EF is of great interest, mixture models being widely used in statistical science. Explored further in PM's talk, we simply state the main result here. It follows easily from the total positivity of EFs that, generically, convex hulls are of maximal dimension k. Here, `generically' means that the +1 tangent vector which de nes the EF has components which are all distinct. Theorem 4: The -1-convex hull of an open subset of a generic 1-D EF is of full dimension. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion EXAMPLE 1 (continued): leukaemia patient data Return now to Example 1 to illustrate above results. In particular, to show: an application of dimension reduction based on IG. Recall, we have ... 43 survival times Z from diagnosis, measured in days Q: what is the mean survival time µ µ[F]? for expository purposes: suppose Z Exponential, but only observe censored value Y = minfZ, tg ) ... Y a 1-D curved EF, inside a 2-D regular EF [PM & West (2002)] t chosen to give reasonable, but not perfect, t Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion DIMENSION REDUCTION: 600 800 1000 1200 1400 1600 1800 -3.5-2.5-1.5-0.5 (a)Log-likelihoods mu log-likelihood -0.002 0.000 0.001 0.002 0.003 0.004 -0.50.00.51.01.5 (b)Fullexponentialfamily theta1 theta2 Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion EXAMPLE 1 (continued): DIMENSION REDUCTION (a): plot of l(µ): shows appreciable skewness ... suggests standard rst order asymptotics can be improved by the higher order asymptotic methods of classical IG. (b): in +1-params: solid curve = Y's 1-D curved EF embedded in 2-D full EF dashed lines = contours of l( ) for full EF clear, even visually: Y has low +1 curvature on this inferential scale ) its curved EF behaves inferentially like a 1-D full EF ) can use Marriott & Vos (2004) DR techniques Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion (c)DistributionofMLE mu Density 500 1000 1500 2000 2500 3000 0.00000.00050.00100.00150.0020 Panel (c) shows how well a saddlepoint-based approxn. does at approximating the d/n. of bµMLE. Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations Introduction Discretisation Extended Multinomial IG Example 1 (continued) Conclusion CONCLUSION: The power and elegance of IG have yet to be fully exploited in statistical practice, to which end ... the overall aim of CIG here is to provide tools to help resolve outstanding problems in statistical science, via ... an operational `universal space of all possible models', ... such problems including: (local-to-global) sensitivity analysis handling both data and model uncertainty inference in graphical & related models transdimensional & other issues in MCMC mixture estimation (see PM's talk) Anaya-Izquierdo, FC, Marriott & Vos CIG in Statistics: foundations

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Computational information geometry in statistics: mixture modelling Paul Marriott University of Waterloo GSI2013 - Geometric Science of Information CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Overview • Joint work with Karim Anaya-Izquierdo, Frank Critchley and Paul Vos • This paper applies the tools of computation information geometry, see Frank’s talk • High dimensional extended multinomial families as proxies for the ‘space of all distributions’ • Look in the inferentially demanding area of statistical mixture modelling. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Overview • We look at, and show the relationship between, different geometrically approaches to mixture modelling • Lindsay’s data dependent, finite dimensional affine space • Our, extended multinomial embedding space • Show a new algorithm which uses the full Information Geometry of the problem to its advantage • Exploit the idea of polytope approximation in the ‘correct’ geometry CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixture models • Mixture models form an extremely flexible class of models • Used when some data not observed, hidden dependence structures or when there is unexplained heterogeneity • They are of the form ρifX (x; θi) or fX (x; θ)dQ(θ) • Consider ρ0N(µ0, σ2 0) + ρ1N(µ1, σ2 1) + ρ2N(µ2, σ2 2) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixtures 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Mixing distribution ρρ1 ρρ2 q −3 −2 −1 0 1 2 3 0.00.51.01.5 Mixed density x Density CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Inference for mixture models can be problematic • They can be ‘too flexible’ hence can overfit • The likelihood function can have multiple modes, singularities and be unbounded • The underlying structure is not a manifold so have to be careful using calculus • Inference questions where Z ∼ f(z; θ)dQ(θ) 1 what can we learn about E(Z)? 2 what can we learn about Q? 3 Can we predict the next value of Z? CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Mixture of binomial distributions • First example comes from Kupper and Haseman (1978) • Concerns frequency of death of implanted foetuses in laboratory animals • It could be expected that there is underlying clustering - hence mixture modelling is appropriate • Paper states: ‘simple one-parameter binomial and Poisson models generally provide poor fits to this type of binary data’ • It is of interest to look in a ‘neighbourhood’ of these models. • The extended multinomial space is a natural place to define such a ‘neighbourhood’ • Our new computational algorithm is used for inference. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Tripod model • Second example is the tripod, discussed in Zwiernik and Smith (2011) • Graphical model given by 2 1 H 3 • Binary variables Xi, i = 1, 2, 3, on each of the terminal nodes, these being assumed independent given the binary variable at the internal node H • H is unobserved • Get very complex likelihood structure - problematic for inference CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Extended Multinomial • Look at discrete models • Space of distributions simplicial • Boundaries where probabilities are zero • Information geometry of extended multinomial models • Applications to graphical models and elsewhere • Proxy for space of all models • IG explicit: computable? CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Lindsay’s (1995) fundamental result characterises the maximum likelihood estimate in the class of all mixtures of fX (x; θ) • Finds Q which maximises the likelihood of f(x, θ)dQ(θ) over all possible Q when f(x, θ) is exponential family • This is called the Non-parametric maximum likelihood estimate of Q. • Uses results from finite dimensional convex geometry • Tangent spaces replaced by tangent cones • Asymptotic limits are mixtures of χ2 distributions. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Convex Geometry • Lindsay’s geometry lies in an affine space which is determined by the observed data. • In particular, it is always finite dimensional, and the dimension is determined by the number of distinct observation • Define Lθ = (L1(θ), . . . , LN∗ (θ)) represent the N∗ distinct likelihood values. • The likelihood on the space of mixtures is defined on the convex hull of the image of the map θ → (L1(θ), . . . , LN∗ (θ)) ⊂ RN∗ . • Find the non-parametric likelihood estimate, f(y; Q), maximising a concave function over this convex set. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • In our examples can embed families in different affine space • Assume a discrete sample space and data is of the form (n0, n1, . . . , nk ) • Define ∆k := π = (π0, π1, . . . , πk) : πi ≥ 0 , k i=0 πi = 1 • Embed unmixed model in ∆k and look convex hull • Define the observed face P to be determined by index set of the strictly positive observed counts. • The affine structure of Lindsay is determined by the vertices of P (Theorem 1) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • Definition: Define ΠL to be the Euclidean orthogonal projection from the simplex ∆k to the smallest vector space containing the vertices indexed by P. • Theorem: (a) The likelihood on the simplex is completely determined by the likelihood on the image of ΠL. In particular, all elements of the pre-image of ΠL have the same likelihood value. (b) ΠL maps −1 convex hulls in the −1-simplex to the convex hull of Lindsay’s geometry. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial (a) (1,0,0) (0,1,0) (0,0,1) (0,0,0) Observed face (b) (1,0,0) (0,0,1) (0,0,0) Observed face CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Embedding in Extended Multinomial • There are some definite advantages to working in larger space • Can define a new search algorithm which exploits the information geometry of the full simplex. • Enables finessing the label-switching problem encountered by many other methods. • Lindsay’s geometry captures the −1- and likelihood structure, it does not capture the full information geometry. • For example, the expected Fisher information cannot be represented CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Total positivity and local mixing • Two seemingly contradictory results: • Theorem: The −1-convex hull of an open subset of a generic one dimensional exponential family π(θ) is of full dimension. • Anaya and Marriott (2007) show, under regularity but for many applications, mixtures of exponential families have accurate low dimensional representations: local mixtures • Curve π(θ) for θ ∈ U ⊂ Θ lies ‘close’ to a low dimensional −1-affine subspace, then all mixtures over U ⊂ Θ also lie ‘close’ to this space. • Such subspaces are determined by −1-curvature • Can get good approximations using polygonal approximations CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximations • Given a norm · , the curve π(θ) and the polygonal path ∪Si, define the distance function by, for each θ, d(π(θ)) := inf π∈∪Si π(θ) − π . • Which norm? • Define the inner product v, w π := k i=0 viwi πi for v, w ∈ Vmix and π such that πi > 0 for all i. • This defines a preferred point metric as discussed in Critchley et al (1993) . Further, let · π be the corresponding norm. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximations • As motivation for using such a metric, consider the Taylor expansion for the likelihood around ˆπ (π) − (ˆπ) ≈ − N 2 π − ˆπ 2 ˆπ . • Theorem: Let π(θ) be an exponential family, and {θi} a finite and fixed set of support points such that d(π(θ)) ≤ for all θ. Further, denote by ˆπNP and ˆπ the maximum likelihood estimates in the convex hulls of π(θ) and {π(θi)|i = 1, . . . , M} respectively, and by ˆπG i := ni N the global maximiser in the simplex. Then, (ˆπNP ) − (ˆπ) ≤ N||(ˆπG − ˆπNP )||ˆπ + o( ) (1) CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions 0 1 2 3 4 5 6 7 0.00.10.20.30.4 Data and Fit counts Probability/Proportion X X X X X X X X 0 1 2 3 4 5 6 7 0.00.10.20.30.40.50.60.7 Mixing proportions Support points Probability 0 1 2 3 4 5 6 7 −500−400−300−200−1000 Directional Derivative mu DirectionalDerivative Figure : The mixture fit using polygonal approximation CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Polygonal approximation tripod example Figure : The bipod model: space of unmixed independent distributions showing the ruled-surface structure. CIG in statistics: mixture modelling Paul Marriott Introduction Examples Extended Multinomial Models Inference on Mixtures Total positivity and local mixing New algorithm Examples Conclusions Conclusions • We look at, and show the relationship between, different geometrically approaches to mixture modelling • Lindsay’s data dependent, finite dimensional affine space • Our, extended multinomial embedding space • Show a new algorithm which uses the full Information Geometry of the problem to its advantage • Exploit the idea of polytope approximation in the correct geometry

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo

Visualizing projective shape space John Kent University of Leeds hello j.t.kent@leeds.ac.uk http://maths.leeds.ac.uk/~john GSI August 2013 Overview This talk is about a camera view of a “scene”, where the scene contains a set of collinear points in the plane (using a one-dimensional film), or a set of coplanar points in three dimensions (using a two-dimensional film). We are interested in the information in the scene that is invariant to the location of the focal point of the camera and the orientation of the film. Thus we are looking for features in the scene that are invariant under the group of projective transformations. Such features are known as projective invariants. The collection of information in projective invariants is called “projective shape”. Unfortunately, projective invariants, as usually formulated, are not suitable for quantitative statistical analysis — there is no obvious metric between different sets of projective invariants. The purpose of this talk is to give a standardized representation of projective shape that is amenable to metric comparisons. The simplest case — 4 collinear points For much of the talk we focus on the simplest case (k = 4 points in m = 1 dimension), where there is just projective invariant — the cross ratio. We then generalize the methodology to higher values of k and m. The next slides illustrate the main issues. First is a figure containing a scene of 4 collinear points, a focal point of a camera, and a linear film. The effect of changing camera position is then illustrated by two images from my back garden taken from different positions. Camera view of 4 collinear points * * * * X X XX My back garden View 1 of lanterns View 2 of lanterns The cross ratio Given four numbers u1, . . . , u4 (representing coordinates for four labelled collinear points in a two-dimensional scene), the cross-ratio is defined by τ = (u2 − u1)(u4 − u3) (u3 − u1)(u4 − u2) . It can be shown that the cross ratio is the one and only projective invariant in this situation. If the landmarks are re-labelled (there are 24 permutations), the cross ratio takes 6 possible forms (spanning all of R if the original value of τ is restricted to the interval (0, 1/2)): τ, 1 − τ, 1/(1 − τ), 1/τ, −(1 − τ)/τ, −τ/(1 − τ) (0, 1/2), (1/2, 1), (1, 2), (2, ∞), (−∞, −1), (−1, 0) Cross ratios in the back garden From the two images of my back garden, I extracted the coordinates of the lanterns and computed τ in each case. The answers are very similar (as expected)! τ1 = 0.489, τ2 = 0.487. Unsuitability of cross ratio for metric comparions The behavior of the cross ratio under relabelling underscores its unsuitability for metric comparisons. In particular if we want to compare two cross ratios near 0 (e.g. τ1 = 0.1, τ2 = 0.01), they look very close together on the τ scale (|0.1 − 0.01| = 0.09), but quite far apart on the 1/τ scale (|10 − 100| = 90), which means the labelling of the landmarks affects metric comparisons between cross ratios. What to do? We shall look at a geometric solution (limited to 4 collinear landmarks) and an algebraic solution (more landmarks and higher dimensions). Geometric standardization for 4 collinear landmarks Suppose the four landmarks are labelled, A,B,C,D in increasing order on the line. Draw two semi-circles, one with diameter AC and the other with diameter BD. The two semicircles intersect in a point O, say. Make this point the focal point of the camera. Switch from linear film to circular film. The image of a landmark is now a pair of antipodal points on the circle. The angles AOC and BOD are right angles. The angle AOB, δ, say, is related to the cross ratio by τ = sin2δ. Further under relabellings the cross ratio takes the following forms in terms of δ: sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, Geometric choice of preferred focal point. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.51.0 O A B C D q Standardized image of 4 collinear points on circular film −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Standardized configuration Y X X XX Homogeneous coordinates To understand why this choice of focal point is useful for metric comparisons, we need to do some algebraic calculations. The first step is to construct homogeneous coordinates. Starting with the four real coordinates u1, . . . , u4 construct a 4 × 2 “augmented” configuration matrix by adding a column of ones, X =     u1 1 u2 1 u3 1 u4 1     =     x1 x2 x3 x4     where xT i denotes the ith row of X, and think of each row as defined only up to a scalar multiple. (In general X is a k × p matrix, p = m + 1.) Projective shape as an equivalence class of matrices It can be shown that the projective shape is precisely the information in X that is invariant under the transformations X → DXB, where D = diag(di ) is a k × k diagonal nonsingular matrix (the distance between the focal point and each landmark in the scene is unknown), and B(p × p) nonsingular (representing the effect of focal point position). Thus projective shape can be described in terms of an equivalence class of matrices. How can we choose a preferred element of the equivalence class? Tyler standardization for projective shape — 1 For projective shape recall that X ≡ DXB. Let us choose D and B so that after standardization (a) the rows of X are unit vectors, xT i xi = 1, i = 1, . . . , k, and (b) the columns of X are orthonormal, up to a factor k/p, XT X = (k/p)Ip. Choice of D: Since each row of X is defined only up to a multiplicative constant, we can scale each row of X so the first element is 1 (the conventional choice, appropriate for flat film) or to have norm 1 (the Tyler choice, appropriate for spherical film), in both cases with the focal point of the camera at the origin. The existence of a solution for D and B is due to Dave Tyler who developed a similar result in the context of robust estimation of a covariance matrix in multivariate analysis. In general D and B must be found numerically using an iterative algorithm. Tyler standardization for projective shape — 2 Let Y = DXB denote the Tyler standardized configuration after using the optimal D and B. Then the rows yi , i = 1, . . . , k are unit vectors and the columns are orthonormal up to a factor k/p. On our spherical film the yi are “uniformly spread” around the unit sphere in Rp in terms of their moment of inertia matrix, Y T Y = yi yT i = (k/p)Ip. Note that Y is unique up to (a) multiplying each row of Y by ±1, and (b) multiplying Y on the right by a p × p orthogonal matrix. How to remove these remaining indeterminacies in Y ? Embedding From a standardized configuration Y , define an “absolute inner product” matrix M(k × k) by mij = |yT i yj |, i, j = 1, . . . , k. Then (a) mij is invariant under sign changes for each row and under rotation/reflection of data around the circle. (b) At least for p = 2, it is possible to reconstruct the projective shape of Y from M. (c) Hence, at least for p = 2, M is a representation of the projective shape of Y Tyler standardization for 4 collinear points In the case k = 4, p = 2 it can be shown that that a standardized configuration Y takes the form Y =     v(−δ/2)T v(δ/2)T v(π/2 − δ/2)T v(π/2 + δ/2)T     =     c −s c s s c s −c     , where v(θ) = (cos(θ), sin(θ))T c = cos(δ/2), s = sin(δ/2), 0 < δ < π/4 unique up to (a) permutation of landmarks, (b) sign of each row, (c) rotation/reflection of data around the circle. Then τ is related to δ by one of the trig functions sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, depending on the permutation. Standardized representation of 4 collinear points −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Standardized configuration Y X X XX Embedding for 4 collinear points In this case Y =     c −s c s s c s −c     , c = cos(δ/2) s = sin(δ/2) , where 0 < δ < π/2. Then M =     1 C 0 S C 1 S 0 0 S 1 C S 0 C 1     where C = cos(δ), S = sin(δ). Note m2 12 + m2 13 + m2 14 = 1 with one structural 0, so M can be represented as the edges of a spherical triangle, in unit sphere in R3. Projective shape space for 4 collinear points as a spherical triangle (a) A=C A=B A=D A~C A~D A~B q q q q q q 0 0.5 1 2−1 +/− ∞ Interpretation of the spherical triangle The position of the structural 0 in M is closely related to the ordering of the landmarks. In particular it identfies which pairs of landmarks are perpendicular in the circular film image. In our earlier picture with ordered landmarks, A,B,C,D, angles AOC (and hence also BOD) were right angles. At one end of this edge (i.e. vertex of the spherical triangle), landmarks A & B coalesce (as do landmarks B & D). At the other vertex, landmarks A & D coalesce (as do landmarks B & C). Why corners? Why does the spherical triangle representation of projective shape space for 4 collinear landmarks have corners? In terms of the cross ratio, τ = {B − A)(D − C)}/{C − A)(D − B)}, there seems no reason for corners. E.g. hold A < C < D fixed and let B vary through the extended real line. Then the cross ratio varies in a bijective fashion through the extended real line. If we avoid the singularity at B = D, then the cross ratio is an infinitely differentiable function of B. In particular, there is no hint of a singularity as B passes through A and C. But at these points (B = A and B = C) the cross ratio takes the values 0 and 1, respectively, corresponding to two of the vertices in projective shape space. Where do these singularities (i.e. vertices or corners) come from? The reasons for corners (a) The first answer is that when B approaches one of the other three landmarks, e.g. B → A, Tyler standardization forces the other two landmarks to come together as well. Thus the single-pair singularity in the simple cross ratio description (B = A) is actually a double-pair singularity (B = A, D = C) in the Tyler-standardized description. (b) Further, there are two distinct ways to move away from a singularity (e.g. B = A, D = C) in terms of the separation of the landmarks. On one edge (the lower edge of the spherical triangle we have A is separated from C (and hence B is separated from D). On the other edge (the left edge of the spherical triangle) we have A is separated from D (and hence B is separated from D). (c) The rank of the Tyler standardized configuration Y drops from 2 to 1 at the corners. Further ideas I: Statistical issues It is possible to do distribution theory in some simple cases (e.g. 4 iid normally distributed landmarks on the line), but the results are complicated, the pdfs have singularities at the corners of the spherical triangle, and such models are not very realistic. A more promising approach is to look in more detail at the effect of small-scale variability about a fixed configuration/projective shape. But the pose of the object affects the distribution of projective shape. Further ideas II: Four types of projective shape space In many cases there is partial information about the camera: (a) oriented vs. unoriented, and (b) directional vs. axial. (a) In an oriented camera we know the side of the scene that the camera lies on. That is, mathematically we know whether det(B) is positive or negative. Conversely, for an unoriented camera, the sign of det(B) is unknown. (b) In a directional camera we know whether an image point lies between the focal point of the camera and the corresponding real-world point, or whether the focal point lies between the image point and the real-world point. In an axial camera this information is not available. Mathematically, in terms of the k × k diagonal matrix D, we require the di > 0 for a directional camera, and merely that di = 0 for an axial camera. Which version of projective shape space to use? Projective geometry focuses mainly on an unoriented axial camera. However, in real life a camera is usually oriented and directional. We now illustrate these ideas for the simplest situation of k = 4 collinear points (m = 1 dimension). Comments for 4 collinear points Directional vs. axial For a directional camera, the red “X”s are observed. For an axial camera, we cannot distinguish each red “X” from the opposite point on the circle. Oriented vs. unoriented For an oriented camera, we see the circle as given. For an unoriented camera, we cannot distinguish the circle from its reflection.

ORAL SESSION 9 Optimization on Matrix Manifolds (Silvere Bonnabel)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Fachgebiet Geometrische Optimierung und Maschinelles Lernen A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices Martin Kleinsteuber joint work with Hao Shen Research Group for Geometric Optimization and Machine Learning www.gol.ei.tum.de August 29, 2013 Slide 1/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Outline Mixing model and problem statement Separating with second order information The geometric setting for NUJD Uniqueness result for complex NUJD Conclusion Slide 2/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The complex linear BSS Model Mixing model: w(t) = As(t), A ∈ Gl(m). s(t) = [s1(t), . . . , sm(t)]T : m-dimensional complex signal w(t): observed signals Mixing matrix A ∈ Gl(m) (set of all invertible complex (m × m)-matrices) Task: Recover s(t) given w(t) only, via the demixing model y(t) = XH w(t) Demixing matrix X ∈ Gl(m) Slide 3/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The complex linear BSS Model Mixing model: w(t) = As(t), A ∈ Gl(m). s(t) = [s1(t), . . . , sm(t)]T : m-dimensional complex signal w(t): observed signals Mixing matrix A ∈ Gl(m) (set of all invertible complex (m × m)-matrices) Task: Recover s(t) given w(t) only, via the demixing model y(t) = XH w(t) Demixing matrix X ∈ Gl(m) Slide 3/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Idea Use the uncorrelation assumption of the source signals to estimate the mixing matrix. Covariance of observations: Cw (t) := E[w(t)wH (t)] = A E[s(t)sH (t)] =:Cs(t) AH , Cs(t) is diagonal for non-stationary signals: Cw (ti) = Cw (tj) estimate A by simultaneously diagonalizing a set of covariance matrices Slide 4/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Idea Use the uncorrelation assumption of the source signals to estimate the mixing matrix. Covariance of observations: Cw (t) := E[w(t)wH (t)] = A E[s(t)sH (t)] =:Cs(t) AH , Cs(t) is diagonal for non-stationary signals: Cw (ti) = Cw (tj) estimate A by simultaneously diagonalizing a set of covariance matrices Slide 4/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Pseudo-Covariance of observations: Rw (t) := E[w(t)wT (t)] = ARs(t)AT . Rs(t) is diagonal for non-stationary signals: Rw (ti) = Rw (tj) estimate A by simultaneously diagonalizing a set of Pseudo-covariance matrices Slide 5/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Pseudo-Covariance of observations: Rw (t) := E[w(t)wT (t)] = ARs(t)AT . Rs(t) is diagonal for non-stationary signals: Rw (ti) = Rw (tj) estimate A by simultaneously diagonalizing a set of Pseudo-covariance matrices Slide 5/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Second-Order Statistics Based ICA Approaches Time-delayed (pseudo-)correlation Sw (t, τ) := E[w(t)w† (t + τ)] = ASs(t, τ)A† . Ss(t, τ) is diagonal, † = T, H Sw are not Hermitian or Symmetric in general estimate A by simultaneously diagonalizing a set of time-delayed (pseudo-)correlation estimate A by combining all the second order information Slide 6/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The SUT-algorithm [Eriksson, Koivunen 2004] Particular task Diagonalize Cw (t) and Rw (t) simultaneously via XH Cw (t)X, and XH Rw (t)X∗ Pseudo-Code: 1. Diagonalize C = UΦUH via SVD 2. Compute R = Φ−1/2UHRU∗Φ−1/2 3. Diagonalize R = VΨVT via Takagi factorization 4. Output X = UΦ−1/2V Slide 7/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The SUT-algorithm [Eriksson, Koivunen 2004] Particular task Diagonalize Cw (t) and Rw (t) simultaneously via XH Cw (t)X, and XH Rw (t)X∗ Pseudo-Code: 1. Diagonalize C = UΦUH via SVD 2. Compute R = Φ−1/2UHRU∗Φ−1/2 3. Diagonalize R = VΨVT via Takagi factorization 4. Output X = UΦ−1/2V Slide 7/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Non-unitary joint diagonalization Problem Given a set of complex symmetric (Pseudo-covariance-) matrices {Ri}i=1,...,n, find X ∈ Gl(m) such that XTRiX are all diagonal. Permutation and scale ambiguity of solutions X is solution ⇐⇒ XDΠ is solution D diagonal, Π Permutation Optimization methods like to have isolated solutions, what to do? Slide 8/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD let D := {D | D ∈ Gl(m) is diagonal} equivalence classes X := {XD ∈ Gl(m) | D ∈ D } complex oblique projective (COP) manifold Op := { X | X ∈ Gl(m) } Slide 9/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Let f : Gl(m) → R be a function that measures simultaneous diagonality. e.g. reconstruction error X → n i=1 1 4 Ri − X−T ddiag(XT Ri X)X−1 2 F then f(X) = f(XD) naturally induces a function ˆf on Op Idea Optimize ˆf on the COP manifold. lower dimensional search space chance to have non-degenerated minima Slide 10/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Let f : Gl(m) → R be a function that measures simultaneous diagonality. e.g. reconstruction error X → n i=1 1 4 Ri − X−T ddiag(XT Ri X)X−1 2 F then f(X) = f(XD) naturally induces a function ˆf on Op Idea Optimize ˆf on the COP manifold. lower dimensional search space chance to have non-degenerated minima Slide 10/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Op is open and dense Riemannian submanifold of product of CPm−1 tangent spaces, geodesics, parallell transport coincide locally remains to consider CPm−1 = Sm/S1 with Sm := {x ∈ Cm | xH x = 1} [Absil et al. Optimization Algorithms on Matrix Manifolds, Princeton Press, 2008] No representation of CPm−1 in Cm. Use rank-1 projection matrices in Cm×m? Here: Use quotient-space properties. Slide 11/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Op is open and dense Riemannian submanifold of product of CPm−1 tangent spaces, geodesics, parallell transport coincide locally remains to consider CPm−1 = Sm/S1 with Sm := {x ∈ Cm | xH x = 1} [Absil et al. Optimization Algorithms on Matrix Manifolds, Princeton Press, 2008] No representation of CPm−1 in Cm. Use rank-1 projection matrices in Cm×m? Here: Use quotient-space properties. Slide 11/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD identify tangent space at x ∈ CPm−1 with the horizontal lift of the tangent space at x T x CPm−1 = z∈S1 TxzSm = z∈S1 {h ∈ Cm | (hH xz) = 0} = {h ∈ Cm | hH x = 0}. Slide 12/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD Result A The geodesics in CPm−1 through x are given by γ(t) with γ(t) := et(hxH−xhH) x. Result B The parallel transport from T γ(0) CPm−1 to T γ(t) CPm−1 along the geodesic γ(t) is given by τ(t) := et(hxH−xhH) h. Slide 13/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen The geometric setting for NUJD results transfer straightforwardly to Op (use product manifold structure) All ingredients for a geometric Conjugate Gradient method for minimizing ˆf. Slide 14/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Separation performance AC/DC Off−norm CG Proposed CG 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Amarierror AC/DC: alternating algorithm minimizing the direct fit cost function Off-norm CG: a CG algorithm minimizing the off-norm cost function Slide 15/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Well known: Good local convergence properties of CG if minimum has non degenerated Hessian Under what conditions on the source signals is the minimizer isolated on the COP manifold? Equivalently: Under what conditions on the source signals is the diagonalizer unique (up to permutation and diagonal scaling)? Slide 16/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Let Dk be the (diagonal) Pseudo-Covariances of the signals. Let di be a complex vector consisting of all diagonal entries of Dk at the i − th position. L PROCESSING MAGAZINE: SPECIAL ISSUE ON SOURCE SEPARATION AND APPLICATIONS d12 0 0 d11 D1 d22 0 0 d21 D2 · · · dK2 0 0 dK1 DK =⇒ [d11, d21, . . . , dK1] =: dT 1 . . . =⇒ [d12, d22, . . . , dK2] =: dT 2 Dk := diag(dk1, dk2) ∈ C2×2 for k = 1, . . . , K. For a fixed diagonal position i, we denote by di := [d1i, . . . , dKi]T ∈ onsisting of the i-th diagonal element of each matrix, respectively. PIEEE Signal Processing Magazine Slide 17/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Uniqueness result for complex NUJD Result (MK, Shen 2013) The simultaneous diagonalizer is unique up to permutation and scaling if and only if |c(di, dj)| = 1, i = j, with c(v, w) := vH w v w if v = 0 and w = 0, 1 otherwise, Slide 18/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Literature Absil et al.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ (2008). Kleinsteuber, M., Shen, H.: Uniqueness analysis of non-unitary matrix joint diagonalization. IEEE Transactions on Signal Processing 61(7) (2013) 1786–1796. Shen, H., Kleinsteuber, M.: Complex blind source separation via simultaneous strong uncorrelating transform. In: LNCS, Proc. 9th International Conference on Latent Variable Analysis and Signal Separation. Volume 6365., Berlin/Heidelberg, Springer-Verlag (2010) 287–294 Slide 19/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Conclusion Simultaneous diagonalization of matrices based on second order statistics. Complex Oblique Projective Manifold is appropriate geometric setting. Under generic conditions on the sources, the diagonalizer is isolated on the COP manifold. Slide 20/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Preprints and Matlab codes available at www.gol.ei.tum.de Slide 21/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber | Fachgebiet Geometrische Optimierung und Maschinelles Lernen Properties of complex signals Second order stationarity: (i) E[s(t)] = E[s(t + τ)] (ii) E[s(t1)s(t2)] = E[s(t1 + τ)s(t2 + τ)] Circularity: s(t) and eiαs(t) have the same probability distribution This implies E[s(t)2 ] = 0 and motivates the circularity coefficient λs(t) := |E[s(t)2 ]| E[|s(t)|2] Slide 22/22 | A Geometric Framework for Non-unitary Joint Diagonalization of Complex Symmetric Matrices | Martin Kleinsteuber |

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

An extrinsic look at the Riemannian Hessian Pierre-Antoine Absil (UCLouvain) Robert Mahony (Australian National University) Jochen Trumpf (Australian National University) GSI 2013, Paris 29 August 2013 Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 1 Broader topic: easy-to-implement Newton-type methods on manifolds M f R x Given: ◮ A manifold M, i.e., a set endowed (often implicitly) with a manifold structure (i.e., a collection of compatible charts). ◮ A function f : M → R, smooth in the sense of the manifold structure. Task: Compute a local minimizer of f . Approach: Newton-type methods on manifolds. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 2 Some specific manifolds and related applications I ◮ Stiefel manifold St(p, n) and orthogonal group Op = St(n, n) St(p, n) = {X ∈ Rn×p : XT X = Ip} Applications: computer vision; principal component analysis; independent component analysis... ◮ Grassmann manifold Gr(p, n) Set of all p-dimensional subspaces of Rn Applications: various dimensionality reduction problems... Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 3 Some specific manifolds and related applications II ◮ Low-rank symmetric PSD manifold: Rn×p ∗ /Op ≃ {YY T : Y ∈ Rn×p ∗ } where Rn×p ∗ is the set of all full-rank n × p matrices. Applications: Low-rank approximation of positive-definite matrices, e.g., for metric learning. ◮ Low-rank manifold: M(p, m × n) = {X ∈ Rm×n : rank(X) = p}. Applications: low-rank approximation of matrices, e.g., for recommander systems. ◮ Shape manifolds. Applications: shape analysis, e.g., for medical applications. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 4 Some specific manifolds and related applications III ◮ Oblique manifold Rn×p ∗ /Sdiag+ Rn×p ∗ /Sdiag+ ≃ {Y ∈ Rn×p ∗ : diag(Y T Y ) = Ip} Applications: independent component analysis; factor analysis (oblique Procrustes problem)... ◮ Flag manifold Rn×p ∗ /Supp∗ Elements of the flag manifold can be viewed as a p-tuple of linear subspaces (V1, . . . , Vp) such that dim(Vi ) = i and Vi ⊂ Vi+1. Applications: analysis of QR algorithm... Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 5 Topic of talk: easy-to-implement Newton-type methods on manifolds M f R x Given: ◮ A manifold M, i.e., a set endowed (often implicitly) with a manifold structure (i.e., a collection of compatible charts). ◮ A function f : M → R, smooth in the sense of the manifold structure. Task: Compute a local minimizer of f . Approach: Newton-type methods on manifolds. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 6 Reminder: Newton in Rn Required: (smooth) real-valued function f on Rn. Iteration xk ∈ Rn → xk+1 ∈ Rn defined by 1. Solve the Newton equation Hess f · ηk = −∂f (xk) for the unknown ηk ∈ Txk Rn ≃ Rn, where ∂f (x) := ∂1f (x) . . . ∂nf (x) T and, for all zx ∈ Tx M, Hess f · zx := Dzx (∂f ). 2. Set xk+1 := xk + ηk. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 7 Newton on Riemannian submanifolds (with Levi-Civita connection) Required: Riemannian submanifold M of Euclidean space E; retraction R on M; real-valued function f on M; extension ¯f of f on E. Iteration xk ∈ M → xk+1 ∈ M defined by 1. Solve the Newton equation Hess f · ηk = −grad f (xk) for the unknown ηk ∈ Txk M, where grad f (x) = Px (∂¯f (x)), with Px orthog proj onto Tx M and, for all zx ∈ Tx M, Hess f · zx := Px Dzx (grad f ). 2. Set xk+1 := R(xk, ηk). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 8 An extrinsic look at the Riemannian Hessian For all zx ∈ Tx M (and dropping the subscript x to lighten the notation), we have Hess f · z = Px Dz (grad f ) = Px Dz P∂¯f = Px (Px ∂2¯f (x)z + DzP ∂¯f (x)) = Px ∂2¯f (x)z + Px Dz P∂¯f (x) Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 9 An extrinsic look at the Riemannian Hessian Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ Theorem: Px Dz Pu = Px DzPP⊥ x u = −Px Dz(P⊥ U) =: Ax (z, P⊥ x u), for all x ∈ M, z ∈ Tx M, u ∈ Tx E ≃ E, and all extension U of u. The symbol Ax stands for the Weingarten map of the submanifold M of the Euclidean space E. ◮ Proof: For the first equality, observe that 0 = PP⊥ = Dz (PP⊥ ) = Dz PP⊥ x + Px Dz P⊥ = DzPP⊥ x − Px Dz P. Multiplying by Px on the left and using the identity Px Px = Px yields Px DzP = Px Dz PP⊥ x . For the second equality, observe that, for all extensions U of u, − Px Dz(P⊥ U) = −Px Dz P⊥ U − Px P⊥ x Dz U = −Px DzP⊥ U = Px DzPU. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 10 An extrinsic look at the Riemannian Hessian: the Stiefel manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The Stiefel manifold St(p, n) is the set of orthonormal p-frames in Rn: St(p, n) = {X ∈ Rn×p : XT X = Ip}. ◮ The orthogonal projector PX onto TX St(p, n) is given by PX U = (I − XXT )U + X 1 2 (XT U − UT X) = U − X 1 2 (XT U + UT X). ◮ Let Z ∈ TX M and W ∈ T⊥ X M. We have PX DZ PW = −ZXT W − X 1 2 (ZT W + W T Z). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 11 An extrinsic look at the Riemannian Hessian: the Grassmann manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The Grassmann manifold Grm,n is the set of m-dimensional subspaces of Rn. Equivalently, it can be viewed as the set of rank-m orthogonal projectors in Rn, i.e., Grm,n = {X ∈ Rn×n : XT = X, X2 = X, trX = n}. ◮ It is known that PX = ad2 X with adX A := [X, A] := XA − AX and ad2 X := adX ◦ adX . ◮ We obtain that, for all Z ∈ TX Grm,n and all W ∈ T⊥ X Grm,n, PX DZ P W = −adX adW Z. One recovers herewith the Hessian formula given by Helmke et al (arXiv:0709.2205). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 12 An extrinsic look at the Riemannian Hessian: the fixed-rank manifold Recall: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x). ◮ The fixed-rank manifold Mp(m × n) is the set of all m × n matrices of rank p. ◮ The projector PX onto TX Mp(m × n) is given by PX W = PUW PV +P⊥ U W PV +PUW P⊥ V = W PV +PUW −PUW PV , where PU := UUT and P⊥ U := I − PU. ◮ Let Z ∈ TX Mp(m × n). Let W ∈ T⊥ X Mp(m × n). We obtain PX DZ P W = WZT (X+ )T + (X+ )T ZT W . Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 13 Newton on Riemannian submanifolds (with Levi-Civita connection) Required: Riemannian submanifold M of Euclidean space E; retraction R on M; real-valued function f on M; extension ¯f of f on E. Iteration xk ∈ M → xk+1 ∈ M defined by 1. Solve the Newton equation Hess f · ηk = −grad f (xk) for the unknown ηk ∈ Txk M, where grad f (x) = Px (∂¯f (x)), with Px orthog proj onto Tx M and, for all zx ∈ Tx M, Hess f · zx := Px Dzx (grad f ). 2. Set xk+1 := R(xk, ηk). Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 14 Take-home message ◮ Several reasons to optimize a real-valued function on a manifold (Stiefel manifold, Grassmann manifold, fixed-rank manifold...). ◮ Newton’s method is the archetypal second-order method. ◮ Newton’s method on a submanifold: Hess f · ηk = −grad f (xk) xk+1 := R(xk, ηk). ◮ Recent results for the Hessian: Hess f · z = Px ∂2¯f (x)z + Px DzP∂¯f (x), with formulas for Px and Px Dz P on several specific manifolds. Ref: PAA, Mahony & Trumpf, http://sites.uclouvain.be/absil/2013.01 ◮ Recent results for retractions: Projection-like techniques to construct R. Ref: PAA, Malick, http://sites.uclouvain.be/absil/2010.038 Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 15 A freely available toolbox for optimization on manifolds: www.manopt.org Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 16 Projection-like retractions on submanifolds Ref: PAA, Malick, http://sites.uclouvain.be/absil/2010.038 ◮ A retraction on M is a smooth mapping R : TM → M such that R(x, 0x ) = x and d dt R(x, tu) t=0 = u, for all (x, u) ∈ TM. ◮ A retractor on a d-dim submanifold M of an n-dim Euclidean space E is a smooth mapping D : TM → Gr(n − d, E) such that, for all x ∈ M, D(x, 0) is transverse to Tx M. ◮ Define the affine space D(x, u) = x + u + D(x, u). Let R(x, u) be the point of M ∩ D(x, u) nearest to x + u. ux D(x, u) M R(x, u) ◮ Theorem: R is a retraction on M. Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 17 Projection-like retractions: orthographic retraction ux M D(x, u) R(x, u) Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 18 Projection-like retractions: orthographic retraction on fixed-rank manifold ux M D(x, u) R(x, u) ◮ Now M := Mp(m × n), the set of all m × n matrices of rank p. ◮ Let X = U Σ0 0 0 0 V T with Σ0 ∈ Rp×p the diagonal matrix of non-zero singular values, and let Z = U A C B 0 V T be in TX Mp(m × n). ◮ The orthographic retraction R on M is given by R(X, Z) = U Σ0 + A C B B(Σ0 + A)−1C V T = U Σ0 + A B I (Σ0 + A)−1C V T . Easy-to-implement Newton-type optimization methods on manifolds: An extrinsic look at the Riemannian Hessian 19

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Discrete curve fitting on manifolds Nicolas Boumal joint work with Pierre-Antoine Absil Universit´e catholique de Louvain August 2013 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 Motivation: interpolation on SO(3) 2 The regression problem in R2 A balance between fitting and smoothness 3 The regression problem in R2 A balance between fitting and smoothness p1 p2 p3 p4 Each data point pi corresponds to a fixed time ti. 3 The regression problem in R2 A balance between fitting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between fitting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between fitting and smoothness p1 p2 p3 p4 3 The regression problem in R2 A balance between fitting and smoothness p1 p2 p3 p4 Regression is about denoising and filling the gaps. 3 The regression problem in R2 can be seen as an optimization problem Minimize: ˆE(γ) = Penalty on misfit N i=1 pi − γ(ti) 2 +λ Penalty on velocity tN t1 ˙γ(t) 2 dt + µ Penalty on acceleration tN t1 ¨γ(t) 2 dt λ and µ (≥ 0) balance fitting VS smoothness. Minimize over some curve space ˆΓ: dim ˆΓ may be infinite. 4 We discretize the curves γ hence reverting to finite dimensional optimization p1 p2 p3 p4 5 We discretize the curves γ hence reverting to finite dimensional optimization p1 p2 p3 p4 γi γi+1 γi−1 γ1 γNd Each point γi corresponds to a fixed time τi Γ = Rn × · · · × Rn ≡ RNd×n 6 We thus need a new objective E defined over the new curve space Γ E(γ) = N i=1 pi − γ(ti) 2 ⇓ N i=1 pi − γsi 2 +λ tN t1 ˙γ(t) 2 dt ⇓ Nd i=1 αi vi 2 +µ tN t1 ¨γ(t) 2 dt ⇓ Nd i=1 βi ai 2 7 What if the data lies on a manifold? Manifolds are smoothly “curved” spaces. Simple toy example: the sphere S2 in R3 More exciting manifolds discussed in this work: Pn + and SO(n). 8 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 The regression problem on S2 9 We need a few concepts from Riemannian geometry to define discrete regression on S2 Redefine E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 10 We need a few concepts from Riemannian geometry to define discrete regression on S2 Redefine E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 ⇓ N i=1 dist2 (pi, γsi ) +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 10 We need a few concepts from Riemannian geometry to define discrete regression on S2 Redefine E over Γ = S2 × · · · × S2: E(γ) = N i=1 pi − γsi 2 ⇓ N i=1 dist2 (pi, γsi ) +λ Nd i=1 αi vi 2 + µ Nd i=1 βi ai 2 11 Finite differences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 12 Finite differences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 can be rewritten like this: ai = (γi+1 − γi) + (γi−1 − γi) ∆τ2 . 12 Finite differences are linear combinations but S2 is not a vector space :( The linear combination ai = γi+1 − 2γi + γi−1 ∆τ2 can be rewritten like this: ai = (γi+1 − γi) + (γi−1 − γi) ∆τ2 . Now, we can interpret the terms: γi+1 − γi is a vector rooted at γi and pointing toward γi+1 12 Logarithms on manifolds generalize differences We use them to define geometric finite differences Loga (b) is a vector rooted at a, in the tangent space to S2 at a, pointing toward b. Furthermore, Loga (b) = dist (a, b). 13 Logarithms on manifolds generalize differences We use them to define geometric finite differences Loga (b) is a vector rooted at a, in the tangent space to S2 at a, pointing toward b. Furthermore, Loga (b) = dist (a, b). b − a is replaced by Loga (b) Hence: vi = Logγi (γi+1) ∆τ ai = Logγi (γi+1) + Logγi (γi−1) ∆τ2 13 We now have a proper objective for manifolds E(γ) = Penalty on misfit N i=1 dist2 (pi, γsi ) +λ Penalty on velocity Nd−1 i=1 αi Logγi (γi+1) ∆τ 2 + µ Nd−1 i=2 βi Logγi (γi+1) + Logγi (γi−1) ∆τ2 2 Penalty on acceleration Minimize over Γ = S2 × · · · × S2, a finite dimensional manifold. The constraint γ ∈ Γ is tough for standard software. 14 To minimize E, we use Manopt A Matlab toolbox for optimization on manifolds Manopt is a user-friendly, documented package which gives access to 1 A large collection of manifold descriptions; 2 A number of solvers (including Riemannian trust-regions); 3 And helper tools to get things right. It is available at www.manopt.org. 15 In a nutshell 1 We defined the discrete regression problem in Rn; 16 In a nutshell 1 We defined the discrete regression problem in Rn; 2 Then generalized it to manifolds as an optimization problem on Γ; 16 In a nutshell 1 We defined the discrete regression problem in Rn; 2 Then generalized it to manifolds as an optimization problem on Γ; 3 And we feed it to Manopt. 16 Example of convergence on S2 with geometric non-linear CG and iterative refinement 17 Example of convergence on S2 with geometric non-linear CG and iterative refinement 18 It works well, but the manifold has to be “gentle” You need to compute the logarithmic map and its derivatives. . . Second order methods seem to help, but they require more work. . . . and that may not always be possible. If your manifold is not nice enough, perhaps you can make do with an approximate log map? 19

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 1 August 29, 2013 1 This work was supported by EPSRC grant EP/H031936/1. Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 1 / 16 Main Result: Shannon-Pythagorean Theorem w q p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q q ⊗ p w joint measure (state) q, p its marginals Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w q ⊗ q q ⊗ p w joint measure (state) q, p its marginals w defines T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w  q ⊗ q // << q ⊗ p w joint measure (state) q, p its marginals w defines T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w  q ⊗ q I(p,q) // << q ⊗ p w joint measure (state) q, p its marginals w defines T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w I(w,q⊗p)  q ⊗ q I(p,q) // << q ⊗ p w joint measure (state) q, p its marginals w defines T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Main Result: Shannon-Pythagorean Theorem w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) << q ⊗ p w joint measure (state) q, p its marginals w defines T : P → P transofrming q → T(q) = p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 2 / 16 Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 3 / 16 Duality: Observables and States Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 4 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Y dual of X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x Y dual of X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x Y dual of X Involution x, y∗ = x∗, y ∗ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Y dual of X Involution x, y∗ = x∗, y ∗ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Transposition: ∀ z ∈ X ∃ z ∈ Y : zx, y = x, z y Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Duality: Observables ∈ X ← ·, · → Y States ·, · : X × Y → C x, y := xiyi , x, y := x dy , x, y := tr {xy} X is a ∗-algebra with 1 ∈ X Involution (x∗z)∗ = z∗x X+ := {x : z∗z = x, ∃ z ∈ X} Observables x = x∗ Y dual of X Involution x, y∗ = x∗, y ∗ Y+ := {y : x, y ≥ 0, ∀ x ∈ X+} States y ≥ 0, 1, y = 1 The base of Y+ is the set of all states (statistical manifold): P(X) := {p ∈ Y+ : 1, p = 1} Transposition: ∀ z ∈ X ∃ z ∈ Y : zx, y = x, z y Y is a left (resp. right) module over X ⊆ Y w.r.t. z y (resp. yz∗ ∗). Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 5 / 16 Duality: Observables and States Exponents and Logarithms Define by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Define by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Define by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Group homomorphisms for tesnor product ⊗ and Kronecker ⊕: ex⊕z = ex ⊗ ez and ln(y ⊗ z) = ln y ⊕ ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Duality: Observables and States Exponents and Logarithms Define by the power series ex := ∞ n=0 xn n! , ln y := ∞ n=1 (−1)n−1 n (y − 1)n Group homomorphisms for xz = zx and yz = zy: ex+z = ex ez and ln(yz) = ln y + ln z Group homomorphisms for tesnor product ⊗ and Kronecker ⊕: ex⊕z = ex ⊗ ez and ln(y ⊗ z) = ln y ⊕ ln z Because X ⊆ Y , we can consider exp : X → Y and ln : Y → X Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 6 / 16 Quantum Information Distance Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 7 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-differentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-differentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Example F(y) = ln y − 1, y F∗ (x) = 1, ex Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additivity Axiom (Khinchin, 1957) I(p1 ⊗ p2, q1 ⊗ q2) = I(p1, q1) + I(p2, q2) Let F : Y → R ∪ {∞} and F∗ : X → R ∪ {∞} be dual cl. convex F∗ (x) := sup{ x, y − F(y)} F∗∗ = F Sub-differentials ∂F : Y → 2X, ∂F∗ : X → 2Y are inverse of each other (Moreau, 1967; Rockafellar, 1974): ∂F(y) x ⇐⇒ y ∈ ∂F∗ (x) Example F(y) = ln y − 1, y F∗ (x) = 1, ex F(y) = ln y ex = F∗ (x) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 8 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iff xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iff xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) y/z := y1/2z−1y1/2 z−1/2yz−1/2 (Belavkin & Staszewski, 1984) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Quantum Information Distance Additive Quantum Information Distance Definition (I : Y × Y → R ∪ {∞}) I(y, z) := ln y − ln z, y − 1, y − z yI(y, z) = ln y − ln z 2 yI(y, z) = y−1 I∗(x, z) := 1, exz Non-commutativity ln(ex+z) = x + z iff xz = zx, so that Radon-Nikodym derivative y/z: y/z := exp(ln y − ln z) (Araki, 1975; Umegaki, 1962) y/z := y1/2z−1y1/2 z−1/2yz−1/2 (Belavkin & Staszewski, 1984) I(y, z) := I∗∗(y, z) = sup{ x, y − I∗(x, z)}, where I∗ (x, z) := 1, z1/2 ex z1/2 or I∗ (x, z) := 1, ex/2 zex/2 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 9 / 16 Law of Cosines and Shannon-Pythagorean Theorem Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 10 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with finite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with finite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Proof. First order Taylor expansion of I(·, z) at y: I(w, z) = I(y, z) + wI(y, z), w − y + R1(y, w) where wI(y, z) = ln y − ln z and the remainder R1(y, w) = I(w, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Logarithmic Law of Cosines) For any w, y, z in Y with finite distances I(w, z) = I(w, y) + I(y, z) − ln y − ln z, y − w w z y Proof. First order Taylor expansion of I(·, z) at y: I(w, z) = I(y, z) + wI(y, z), w − y + R1(y, w) where wI(y, z) = ln y − ln z and the remainder R1(y, w) = I(w, y) Corollary (Log-Pythagorean Theorem) If ln y − ln z, y − w = 0, then I(w, z) = I(w, y) + I(y, z) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 11 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt = 1 2 1, 2 yI(ξ, w)(y − z)2 for some ξ ∈ [z, y) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Inequality for Information) I(y, z) ≥ 1, (y − z)2 2 max{ y ∞, z ∞} Proof. Recall that I(y, z) is the remainder R1(z, y) in Taylor expansion: I(y, w) = I(z, w) + yI(z, w), y − z + R1(z, y) R1(z, y) = 1 0 (1 − t) 1, 2 yI(z + t(y − z), w)(y − z)2 dt = 1 2 1, 2 yI(ξ, w)(y − z)2 for some ξ ∈ [z, y) Corollary (Stratonovich, 1975) I(p, q) + I(q, p) ≥ 1, (p − q)2 for all p, q ∈ P(X). Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 12 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) B b → 1A ⊗ b ∈ A ⊗ B, where B = ln p − ln q Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Law of Cosines and Shannon-Pythagorean Theorem Theorem (Shannon-Pythagorean) w ∈ P(A ⊗ B), q ∈ P(A), p ∈ P(B), A ⊆ B If q = 1, w B, p = 1, w A, then I(w, q ⊗ q) = I(w, q ⊗ p) + I(p, q) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Proof. I(w, q⊗q) = I(w, q⊗p)+I(q ⊗ p, q ⊗ q) I(p,q) − ln q ⊗ p − ln q ⊗ q, q ⊗ p − w 0 ln q ⊗ p − ln q ⊗ q = 1A ⊗ (ln p − ln q) B b → 1A ⊗ b ∈ A ⊗ B, where B = ln p − ln q b, p = 1A ⊗ b, w = 1A ⊗ b, z ⊗ p , as p = 1, w A = 1, z ⊗ p A Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 13 / 16 Discussion Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 14 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Type III Maximize Ew{v} = v, w subject to I(w, q ⊗ p)} ≤ γ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 Discussion Applications to Optimisation of Dynamical Systems w ∈ P(A ⊗ B) defines a channel (Markov morphism or operation): T : P(A) q → Tq = p ∈ P(B) I(p(t), q) = I(Ttq, q) divergence in t ∈ N0 I(w, q ⊗ p) capacity of T I(w, q ⊗ q) hypotenuse of T w I(w,q⊗p)  q ⊗ q I(p,q) // I(w,q⊗q) :: q ⊗ p Information-Theoretic Variational Problems Type I Maximize Ep{u} = u, p subject to I(p, q) ≤ λ Type III Maximize Ew{v} = v, w subject to I(w, q ⊗ p)} ≤ γ I+III=IV I(w, q ⊗ q) ≤ γ + λ Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 15 / 16 References Duality: Observables and States Quantum Information Distance Law of Cosines and Shannon-Pythagorean Theorem Discussion Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 16 / 16 Discussion Araki, H. (1975). Relative entropy of states of von Neumann algebras. Publications of the Research Institute for Mathematical Sciences, 11(3), 809–833. Belavkin, V. P., & Staszewski, P. (1984). Relative entropy in C∗-algebraic statistical mechanics. Reports in Mathematical Physics, 20, 373–384. Khinchin, A. I. (1957). Mathematical foundations of information theory. New York: Dover. Moreau, J.-J. (1967). Functionelles convexes. Paris: Coll´ege de France. Rockafellar, R. T. (1974). Conjugate duality and optimization (Vol. 16). PA: Society for Industrial and Applied Mathematics. Stratonovich, R. L. (1975). Information theory. Moscow, USSR: Sovetskoe Radio. (In Russian) Umegaki, H. (1962). Conditional expectation in an operator algebra. IV. entropy and information. Kodai Mathematical Seminar Reports, 14(2), 59–85. Roman Belavkin (Middlesex University) Shannon-Pythagorean Theorem for Quantum Information August 29, 2013 16 / 16

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

Some remarks on the intrinsic Cramer-Rao bound GSI 2013, Paris Axel Barrau* and Silv`ere Bonnabel Mines Paristech august, 29th (Mines Paristech) august, 29th 1 / 21 Introduction Problem: estimate a covariance matrix Σ given a sample of X ∼ N(0, Σ) Sample covariance matrix estimation. S.T. Smith has proven 1 for the natural distance d on the cone of covariance matrices: E(d2 (Σ, ˆΣ)) Σ with Σ = Cste. This result can be seen as: A consequence of information geometry A consequence of the invariances of the problem 1 S. T. Smith, ”Covariance, Subspace, and Intrinsic Cramer-Rao Bounds” IEEE Trans. signal process., vol. 53, no. 5, May 2005 (Mines Paristech) august, 29th 2 / 21 1 Cramer-Rao bound in classical estimation theory 2 Intrinsic Cramer-Rao bound 3 Invariant parametric families (Mines Paristech) august, 29th 3 / 21 Cramer-Rao bound in classical estimation theory Cramer-Rao bound in classical estimation theory (Mines Paristech) august, 29th 4 / 21 Cramer-Rao bound in classical estimation theory Cramer-Rao bound Parametric family of densities: p(x|θ), θ ∈ Rn Classical Fisher Information Matrix: Ii,j (θ) = E( ∂ ∂θi log p(x|θ) ∂ ∂θj log p(x|θ)) Cramer-Rao Lower Bound for unbiased estimators: Var(ˆθ) I−1(θ) (Mines Paristech) august, 29th 5 / 21 Cramer-Rao bound in classical estimation theory Fisher Metric Definition The Fisher metric is the Riemannian metric defined by the local scalar product dθT I(θ)dθ. Definition The Fisher distance is the geodesic distance associated to the Fisher metric. (Mines Paristech) august, 29th 6 / 21 Cramer-Rao bound in classical estimation theory Examples: Location parameter: p(x|θ) = f (x − θ) The Fisher distance is proportional to the euclidian distance. Scale parameter: p(x|θ) = 1 θ f ( x θ ) The Fisher distance is proportional to d(θ1, θ2) = || log(θ1)−log(θ2)||. (Mines Paristech) august, 29th 7 / 21 Intrinsic Cramer-Rao bound Intrinsic Cramer-Rao bound (Mines Paristech) august, 29th 8 / 21 Intrinsic Cramer-Rao bound Normal coordinates θ ∈ M endowed with a Riemannian metric gθ. An orthogonal basis X1, ...Xn of the tangent plane defines a set of local coordinates through (a1, ..., an) → expθ(a1X1 + ... + anXn). gθ becomes the euclidian scalar prduct. (Mines Paristech) august, 29th 9 / 21 Intrinsic Cramer-Rao bound Basic statistical tools 2 The exponential coordinates map M to its tangent plane at θ. Bias of an estimator ˆθ: b(θ) = E(exp−1 θ (ˆθ)) Covariance of an estimator ˆθ: C(θ) = Cov(exp−1 θ (ˆθ)) 2 X. Pennec, ”Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements” Journal of Mathematical Imaging and Vision, 25:127-164, 2006 (Mines Paristech) august, 29th 10 / 21 Intrinsic Cramer-Rao bound Examples: Estimation of a covariance matrix in statistics Subspace estimation in signal processing Pose estimation in robotics (Mines Paristech) august, 29th 11 / 21 Intrinsic Cramer-Rao bound Intrinsic Cramer-Rao bound The Intrinsic Fisher Information Matrix is defined using local coordinates: Ii,j (θ) = E( ∂ ∂θi log p(x|θ) ∂ ∂θj log p(x|θ)) Intrinsic Cramer-Rao lower bound without bias: C(θ) Ii,j (θ)−1 + curvature terms (Mines Paristech) august, 29th 12 / 21 Intrinsic Cramer-Rao bound Intrinsic root mean square error Let d(., .) denote the riemannian distance on M. Definition 2 θ = E(d(θ, ˆθ)2 )) (= E(|| exp−1 θ (ˆθ)||2 ) = E(Tr(exp−1 θ (ˆθ) exp−1 θ (ˆθ)T ) = Tr[C(θ)]) If d is the Fisher distance: I(θ) = Id Neglecting the curvature terms: C(θ) I(θ)−1 = Id 2 θ = Tr(C(θ)) n (Mines Paristech) august, 29th 13 / 21 Intrinsic Cramer-Rao bound Application Sample Covariance Matrix estimation: p(x|Σ) = N(0, Σ) The Fisher metric is the natural metric: GΣ(D, D) = Tr(DΣ−1 )2 As proved by Smith: 2 n(n + 1) 2 which doesn’t depend on Σ. (Mines Paristech) august, 29th 14 / 21 Invariant parametric families Invariant parametric families (Mines Paristech) august, 29th 15 / 21 Invariant parametric families Invariances Consider a parametric family p(x|θ). Assume there exist two actions of a group G: (g, x) → φg (x) is an action of G on X. (g, θ) → ρg (θ) is an action of G on M. Definition Invariance under the action of G : y = φg (x) has density function py (y|ρg (θ)) = px (x|θ) (Mines Paristech) august, 29th 16 / 21 Invariant parametric families Example: radioactive decay. Law: p(t|θ) = 1 θ exp(− t θ ) This law has to be insensitive to a change of units (for instance from minuts to seconds): θ → Θ = 60 × θ t → T = 60 × t pT (T|Θ) = px (T|Θ) (Mines Paristech) august, 29th 17 / 21 Invariant parametric families Properties of invariant families Proposition If p(x|θ) is invariant under the actions ρg and φg of a group G and if ρg is transitive, then ∀(θ, g), ρg (θ) = θ Corollary If p(x|θ) is invariant under the actions ρg and φg of a group G and if ρg is transitive, then the Cramer-Rao Bound on the Mean Square Error associated to any G-invariant metric on M is constant. (Mines Paristech) august, 29th 18 / 21 Invariant parametric families Examples: Wahba’s problem : Etimate R using noisy measurments Yi = RT bi + Wi For any right-invariant metric we have: 2 R = Cste Sample Covariance Matrix estimation: p(x|Σ) = 1 (2π) n 2 exp(− 1 2 xT Σ−1 x) The family is invariant under the action of GLn(R) ρA(Σ) = AΣAT . As the natural metric of the cone of covariance matrices has the same invariance we have: 2 Σ = Cste (Mines Paristech) august, 29th 19 / 21 Invariant parametric families Conclusions The constant lower bound found by Smith has two interpretations: It is a general property of the Fisher metric. It is a consequence of the invariances of the problem. Furether result: An optimal estimator respects the invariances of the system. (Mines Paristech) august, 29th 20 / 21 Invariant parametric families Questions ? (Mines Paristech) august, 29th 21 / 21

ORAL SESSION 10 Optimal Transport Theory (Gabriel Peyré)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

A primal-dual approach for a total variation Wasserstein flow Martin Benning 1 , Luca Calatroni 2 , Bertram D¨uring 3 , Carola-Bibiane Sch¨onlieb 4 1 Magnetic Resonance Research Centre, University of Cambridge, UK 2 Cambridge Centre for Analysis, University of Cambridge, UK 3 Department of Mathematics, University of Sussex, UK 4 Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK Geometric Science of Information 2013 Ecole des Mines, Paris, 28-30 August 2013. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 1 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 2 / 24 A highly nonlinear fourth-order PDE For a regular domain Ω ⊂ Rd , d = 1, 2 we consider: The problem ut = · (u q), q ∈ ∂|Du|(Ω), in Ω × (0, T), u(0, x) = u0(x) ≥ 0 in Ω where Ω u0 dx = 1 and the total variation of u over Ω is defined as: |Du|(Ω) = sup p∈C∞ 0 (Ω;Rd ), p ∞≤1 Ω u · p dx. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 3 / 24 A highly nonlinear fourth-order PDE For a regular domain Ω ⊂ Rd , d = 1, 2 we consider: The problem ut = · (u q), q ∈ ∂|Du|(Ω), in Ω × (0, T), u(0, x) = u0(x) ≥ 0 in Ω where Ω u0 dx = 1 and the total variation of u over Ω is defined as: |Du|(Ω) = sup p∈C∞ 0 (Ω;Rd ), p ∞≤1 Ω u · p dx. Subgradients of TV can be characterised such that: q ∈ ∂|Du|(Ω) ⇒ q = − · u | u| if | u| = 0, which makes the problem above a nonlinear fourth-order PDE with severe restrictions and constraints for its numerical solution. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 3 / 24 An L2 -Wasserstein flow for density smoothing An equivalent problem has been investigated by Burger, Franek, Sch¨onlieb (2012). Therein, a smoothed version u of a given probability density u0 was computed as a minimiser of: 1 2 W2(u0Ld , uLd )2 L2−Wasserstein distance + α E(u) smoothing term for different choices of E(u) (Dirichlet energy, Log-entropy, Fisher information, Total Variation...), e.g. u0 could be a noisy MRI image or represent some real-world data (earthquakes or fires measurements). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 4 / 24 An L2 -Wasserstein flow for density smoothing An equivalent problem has been investigated by Burger, Franek, Sch¨onlieb (2012). Therein, a smoothed version u of a given probability density u0 was computed as a minimiser of: 1 2 W2(u0Ld , uLd )2 L2−Wasserstein distance + α E(u) smoothing term for different choices of E(u) (Dirichlet energy, Log-entropy, Fisher information, Total Variation...), e.g. u0 could be a noisy MRI image or represent some real-world data (earthquakes or fires measurements). Previous work in imaging by means of Wasserstein distance: S. Haker , L. Zhu and A. Tannenbaum (2004) for image registration; G. Peyr´e et al. (2013) for image color transfer; X. Bresson, T. Chan et al. (2009) for image segmentation; L. P. S. Demers et al. (2010) for particle image velocimetry; . . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 4 / 24 The L2 -Wasserstein metric Let (Ω, d) be a metric space. The L2 -Wasserstein distance between two probability measures µ1 , µ2 ∈ P2(Ω) (the space of all probability measures on Ω with µ-integrable second moment) is defined by W2(µ1 , µ2 )2 := min Π∈Γ(µ1,µ2) Ω×Ω d(x, y)2 dΠ(x, y). Here Γ(µ1 , µ2 ) denotes the space of pairings γ ∈ P(Ω × Ω) such that: µ1 is the first marginal of γ, µ2 is the second marginal of γ. The definition can be extended to (p-th)-Wasserstein distances. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 5 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 6 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). The combination of TV and the Wasserstein fidelity term gives you: Mass conservation! u0 initial probability measure ⇒ regularised probability density u. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 6 / 24 Why TV-Wasserstein? Compared to smoother regularisers: Capability of preserving discontinuities and structures when regularising densities (Rudin, Osher, Fatemi ‘92). Interest in Image Processing: discontinuities are the edges of the image = characteristic features in many imaging applications (bone density and brain images. . . ). The combination of TV and the Wasserstein fidelity term gives you: Mass conservation! u0 initial probability measure ⇒ regularised probability density u. Introduces a higher-order smoothing that reduces TV-artifacts. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 6 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient flow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 7 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient flow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Solving: 1 2 W2(uk Ld , uLd )2 + (tk+1 − tk )E(u) → argminu =: uk+1 provides an iterative approach (JKO scheme) to approximately solve diffusion equations of the type: ut = · (u E (u)) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 7 / 24 The minimisation problem and our PDE The problem: 1 2 W2(u0Ld , uLd )2 + αE(u) has to be interpreted as a time discrete approximation of a solution of the gradient flow of E with respect to the L2 -Wasserstein metric: it represents one timestep of De Giorgi’s minimising movement scheme. Solving: 1 2 W2(uk Ld , uLd )2 + (tk+1 − tk )E(u) → argminu =: uk+1 provides an iterative approach (JKO scheme) to approximately solve diffusion equations of the type: ut ∈ · (u ∂|Du|(Ω)) ⇒ our PDE! Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 7 / 24 Previous results and our goal In their work Burger, Franek, Sch¨onlieb have shown: Existence results (by standard technique in Calculus of Variations); Self-similarity properties of the solutions; Numerical results: augmented Lagrangian schemes solving the minimisation problem (for a fixed α, this means computing one timestep of the minimising movement scheme). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 8 / 24 Previous results and our goal In their work Burger, Franek, Sch¨onlieb have shown: Existence results (by standard technique in Calculus of Variations); Self-similarity properties of the solutions; Numerical results: augmented Lagrangian schemes solving the minimisation problem (for a fixed α, this means computing one timestep of the minimising movement scheme). We want to study the dynamics of the corresponding gradient flow (multiple timesteps), finding a numerical scheme providing its discrete approximation. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 8 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 9 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 10 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By definition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) |Du|(Ω) − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By definition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) sup p∈C∞ 0 (Ω;R2), p ∞≤1 Ω u · p dx − Ω qu dx ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By definition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω)    sup p∈C∞ 0 (Ω;R2), p ∞ ≤ 1 Ω u · p dx − Ω qu dx    ⇒ q ∈ ∂|Du|(Ω) Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 An alternative approach The original equation we consider is: ut = · (u q), q ∈ ∂|Du|(Ω). Can we formulate the problem differently? By definition of sudifferential: q ∈ ∂|Du|(Ω) ⇐⇒ |Du|(Ω) − Ω qu dx ≤ |Dv|(Ω) − Ω qv dx, ∀v ∈ L2 (Ω). So, if u ∈ BV(Ω) ⊂ L2 (Ω) is the solution of: min u∈BV(Ω) sup p∈C∞ 0 (Ω;R2)    Ω u · p dx − 1 ε F(|p| − 1) penalty term − Ω qu dx    ⇒ q ∈ ∂|Du|(Ω) where 0 < ε 1 measures the weight of the penalisation (Benning, M¨uller). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 11 / 24 The relaxed problem Merging the original equation we started from with the optimality conditions with respect to u and p we get:    ut = · (u q), q = · p, 0 = − u − 1 ε F (|p| − 1) and the nonlinearity now is encoded in the penalty term F . A typical choice for F is: F(|p| − 1) = 1 2 max{|p| − 1, 0}2 , F (|p| − 1) = 1{|p|≥1}sgn(p)(|p| − 1). We can now linearise F via its first-order Taylor approximation. . . Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 12 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 13 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme:    Un+1 − Un ∆t = · (Un Qn+1) Qn+1 = · Pn+1, 0 = − Un+1 − 1 ε F (Pn ) − 1 ε F (Pn )(Pn+1 − Pn ). Outer iterations (n subscripts) for the time evolution; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 14 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme:    U (k) n+1 − Un ∆t = · (Un Q (k) n+1) Q (k) n+1 = · P (k) n+1, 0 = − U (k) n+1 − 1 ε F (P (k−1) n+1 ) − 1 ε F (P (k−1) n+1 )(P (k) n+1 − P (k−1) n+1 ). Outer iterations (n subscripts) for the time evolution; Inner process (k superscripts) producing approximations of Un+1, Qn+1 and Pn+1 via Newton method; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 14 / 24 A damped Newton method to solve the system We discretise the differential operators and compute the numerical approximation of the solution u using the following scheme:    U (k) n+1 − Un ∆t = · (Un Q (k) n+1) Q (k) n+1 = · P (k) n+1, 0 = − U (k) n+1 − 1 ε F (P (k−1) n+1 ) − 1 ε F (P (k−1) n+1 )(P (k) n+1 − P(k−1) )−τk (P (k) n+1 − P (k−1) n+1 ). Outer iterations (n subscripts) for the time evolution; Inner process (k superscripts) producing approximations of Un+1, Qn+1 and Pn+1 via Newton method; The damping sequence τk guarantees the invertibility of the operators defining the system: it starts from a large τ0 and decreases to ensure quick convergence. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 14 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 15 / 24 Numerical ingredients 1 Discretisations of the differential operators using forward differences (for ) and backward differences (for ·), thus preserving adjointness; 2 Neumann boundary conditions; 3 Computational domains: closed and bounded (cartesian product of) interval(s); 4 The matrix defining the linear system in each Newton step has block-structure ⇒ numerical inversion of the operators by using Schur complement; 5 Stopping criterion for the inner Newton loop: U (k) n+1 − U (k−1) n+1 2 U (k) n+1 2 ≤ tol , Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 16 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 17 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein flows. ε = 10−5 , τ0 = 1. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein flows. ε = 10−5 , τ0 = 1. Features: Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein flows. ε = 10−5 , τ0 = 1. Features: Similar with TV: Preservation of structure (i.e. discontinuities); Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 18 / 24 Some 1-D examples We compare the TV-Wasserstein approach with the standard TV one: −0.5 0 0.5 1 1.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (a) Gaussian in. cond. −1 −0.5 0 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 x y Comparison TV/TV−Wasserstein Initial condition TV solution TV−Wasserstein solution (b) χ[a,b] in. cond. −1 −0.5 0 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x y TV/TV−Wasserstein comparison Initial condition TV result TV−Wasserstein result (c) Stair in. cond. Figure : Solutions for TV and TV-Wasserstein flows. ε = 10−5 , τ0 = 1. Features: Similar with TV: Preservation of structure (i.e. discontinuities); Different with TV: * Decreasing of intensity ↔ enlarging of the support (because of the mass conservation); * Constant background = TV solutions: convergence to their mean. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 18 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 19 / 24 2-D results Solution of the TV-Wasserstein flow: (a) Initial condition. (b) TV result. (c) TV-Wasserstein result. The intensity of the square decreases, but the intensity of the background stays constant (different from TV!); As the intensity of the square decreases, the support enlarges due to the mass conservation property. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 20 / 24 2-D results: applications to denoising Solution of the TV-Wasserstein flow: (a) Original pyramid. (b) Noisy pyramid. (c) TV-Wasserstein. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 21 / 24 2-D results: applications to denoising Solution of the TV-Wasserstein flow: (a) Original pyramid. (b) Noisy pyramid. (c) TV-Wasserstein. (d) TV. Reduced staircasing! Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 21 / 24 2-D results: applications to denoising (cont.) Solution of the TV-Wasserstein flow for real-world images: (a) Noisy LEGO. (b) TV result. (c) TV-Wasserstein result. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 22 / 24 2-D results: applications to denoising (cont.) Solution of the TV-Wasserstein flow for real-world images: (a) Noisy LEGO. (b) TV result. (c) TV-Wasserstein result. Applications in MRI: the images of interest are densities restored from undersampled measurements and/or corrupted by noise or blur.. Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 22 / 24 Outline 1 The problem 2 Primal-dual formulation of the problem A relaxed optimality system of PDEs The numerical approach 3 Numerical results The 1-D case The 2-D case with applications to denoising 4 Conclusions Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 23 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efficiently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efficiently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efficiently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Q2 From the analysis of the 1-D case, more insights on the theory underlying the TV-Wasserstein gradient flow (joint work with M. Burger, D. Matthes). Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 24 / 24 Recap and future directions Tackling directly the non-smoothness of the higher-order TV-subgradients and relaxing via a penalty term leads to a system of nonlinear PDEs; The numerical solution is computed efficiently by using a nested damped Newton method that computes the numerical approximation of the solution in each time iteration; The results preserve the mass-conservation property and show good results in density smoothing (e.g. denoising in imaging), reducing artifacts compared to lower-order models; Q1 Rigorous analysis of the scheme? Barrier term? Stability properties? Q2 From the analysis of the 1-D case, more insights on the theory underlying the TV-Wasserstein gradient flow (joint work with M. Burger, D. Matthes). Thanks for listening! e-mail: l.calatroni@maths.cam.ac.uk Benning, Calatroni, D¨uring, Sch¨onlieb (CCA) A primal-dual approach for a TV-Wasserstein flow GSI 2013, Paris, August 2013 24 / 24

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

1 Dual Methods for Optimal Transport Quentin M´erigot Geometric Science of Information 2013, Paris August 28, 2013 LJK / CNRS / Universit´e de Grenoble 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj Source and target with density: Benamou-Brenier ’00 Loeper-Rapetti ’05 Angenent-Haker-Tannenbaum ’03 Benamou-Froese-Oberman ’12 2 Computational optimal transport For αi, βj = 1: Hungarian algorithm linear programming General αi, βj: Bertsekas’ auction algorithm αi βj Source with density, finite target: Aurenhammer, Hoffmann, Aronov ’98 Oliker-Prussner ’89 Caffarelli-Kochengin-Oliker ’04 Kitagawa ’12 Source and target with density: Benamou-Brenier ’00 Loeper-Rapetti ’05 Angenent-Haker-Tannenbaum ’03 Benamou-Froese-Oberman ’12 3 Optimal transport: Monge’s problem µ = probability measure on X ν = prob. measure on finite Y Monge problem: Tc(µ, ν) := min{Cc(T); T#µ = ν} y with density, X

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)

The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar University of Washington The Tangent Earth Mover’s Distance Ofir Pele Ben Taskar Ariel University University of Washington Motivation • Computer Vision / Machine Learning problems with distances: – Image retrieval. – Descriptors matching. – K-nearest neighbor classification. – Support vector machines. – Clustering. – … • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Earth Mover’s Distance ≠ The Earth Mover’s Distance = The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance for Probability Distributions The Earth Mover’s Distance - Rubner, Tomasi, Guibas IJCV 2000 • Pele and Werman 08 – , a new EMD definition. The Earth Mover’s Distance The Earth Mover’s Distance Differences Between and EMD • EMD - scale invariant, - scale variant. Differences Between and EMD • EMD - scale invariant, - scale variant. Differences Between and EMD • EMD – partial match, not necessarily. Differences Between and EMD • EMD – partial match, not necessarily. Differences Between and EMD • If ground distance is a metric: • EMD shortcoming: – Does not differentiate between global transformation to local non-structured ones. The Earth Mover’s Distance • EMD shortcoming: – Does not differentiate between global transformation to local non-structured ones. • Our solution: – The Tangent Earth Mover’s Distance. The Earth Mover’s Distance • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Tangent Distance – Simard et al. 98 • What we want: a distance that is invariant to small global transformations: • What we want: a distance that is invariant to small global transformations. • Main idea: approximate transforms of a pattern by its tangent plane at the pattern: The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. The Tangent Distance – Simard et al. 98 • Tangent distance shortcoming: – Not robust to small local deformations. • Our solution: – The Tangent Earth Mover’s Distance. • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The EMD part: arbitrary movements that cost. The Tangent Earth Mover’s Distance The Tangent part: small global movement for free. For example, to the right. The EMD part: arbitrary movements that cost. $ The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 Tangent Vector 0 0 0 1 11 -1 -2 The Tangent Earth Mover’s Distance – an Example of a Tangent Vector Features Histogram 0 0 0 0 01 1 2 Transformed Histogram 0 0 0 1 12 0 0 Tangent Vector 0 0 0 1 11 -1 -2 The Tangent Earth Mover’s Distance • Previous works about the efficient computation of the EMD can be used to accelerate also the TEMD computation: The Tangent Earth Mover’s Distance • Previous works about the efficient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. The Tangent Earth Mover’s Distance • Previous works about the efficient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. – Pele and Werman 2009: Thresholded ground distances. The Tangent Earth Mover’s Distance • Previous works about the efficient computation of the EMD can be used to accelerate also the TEMD computation: – Ling and Okada 2007: Manhattan grids. – Pele and Werman 2009: Thresholded ground distances. – Combinations. • Motivation. • The Earth Mover’s Distance. • The Tangent Distance. • The Tangent Earth Mover’s Distance. • Experimental Results. • Future Work. Outline Experiments • 10 classes: – People in Africa – Beaches – Outdoor Buildings – Buses – Dinosaurs – Elephants – Flowers – Horses – Mountains – Food Experiments • 5 queries from each class. • Computed the distance of each image to the query and its reflection and chose the minimum. Experiments • 5 queries from each class. Experiments • Image representations: SIFT 32x48 L*a*b* Image Experiments - SIFT Why Thresholded ? The Earth Mover’s Distance • EMD shortcomings: – Poor performance with outliers. – Long computation time. The Earth Mover’s Distance • EMD shortcomings: – Poor performance with outliers. – Long computation time. • Our solutions: – Thresholded distances between bins. – Efficient algorithms. Robust Distances Robust Distances • Very high distances outliers same difference. Robust Distances • Very high distances outliers same difference. Robust Distances - Exponent • Usually a negative exponent is used: Robust Distances - Exponent • Usually a negative exponent is used: Robust Distances - Exponent • Exponent is used because it is (Ruzon and Tomasi 01): robust, smooth, monotonic, and a metric Robust Distances - Exponent • Exponent is used because it is (Ruzon and Tomasi 01): robust, smooth, monotonic, and a metric Input is always discrete anyway … Robust Distances - Thresholded Thresholded Distances • Color distance should be thresholded (robust). 0 50 100 150 distancefromblue Thresholded Distances 0 5 10 distancefromblue Exponent changes small distances Another Reason for Thresholded Distances FastEMD - Pele and Werman ICCV 2009 FastEMD - Pele and Werman ICCV 2009 • Any thresholded distance number of edges with cost different from the threshold The Flow Network Transformation Original Network The Flow Network Transformation Original Network Simplified Network The Flow Network Transformation Original Network Simplified Network The Flow Network Transformation Flowing between exact corresponding bins (Monge sequence for metrics) The Flow Network Transformation Removing Empty Bins and their edges The Flow Network Transformation We actually finished here…. Many Successful Applications of FastEMD Superpixel comparison Image retargetingOriginal image Object class Semantic Scene Surface Layout Image segmentation Image retargeting Results- Retrieval Curve using SIFT Averagenumberofcorrectimagesretrieved Number of nearest neighbors images retrieved (our ECCV 08) QF A2 c=2,D2 (our ICCV 09) c=1,D2 (our ICCV 09) c=2,D2 (our 13) c=1,D2 (our 13) c=1,2D1(our 13) (Simard et al. 98) L1 EMD-L1 (Ling & Okada 07) L2 Results- Normalized AUC using SIFT c=2,D2 (our ICCV 09) c=1,D2 (our ICCV 09) c=2,D2 (our 13) c=1,D2 (our 13) c=1, 2D1 (our 13) (Simard et al. 98) L1 EMD-L1 (Ling & Okada 07) QF A2 L2 (our ECCV 08) Experiments - Color Images Results- Retrieval Curve using Color Images Averagenumberofcorrectimagesretrieved Number of nearest neighbors images retrieved QF A20 D20 (our ICCV 09) D20 (our 13) Results- Normalized AUC using Color Images D20 (our ICCV 09) D20 (our 13) QF A20 • Faster Algorithms for the Tangent Earth Mover’s Distance: Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. – Decomposition idea: break the problem into easy to solve “pieces” and then enforce consistency. Near Future Work • Faster Algorithms for the Tangent Earth Mover’s Distance: – Greedy algorithms for quick hot-start. – Decomposition idea: break the problem into easy to solve “pieces” and then enforce consistency. • Efficient and accurate segmentation of images using the Tangent Earth Mover’s Distance. Near Future Work • “Big Data” – need for complex non-linear models. Far Future Work • “Big Data” – need for complex non-linear models. • Complex distances are perfect fit for this. Far Future Work • “Big Data” – need for complex non-linear models. • Complex distances are perfect fit for this. • New research about how to learn efficiently with such distances (cascades, nearest neighbors, …). Far Future Work Papers & Code are / will be at my website: Or “Ofir Pele” http://www.seas.upenn.edu/~ofirpele/

ORAL SESSION 11 Probability on Manifolds (Marc Arnaudon)

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo
Voir la vidéo

Group Action Induced Distances On Spaces of Linear Stochastic Processes Bijan Afsari and René Vidal (JHU) Motivation: Classification and Clustering of High dimensional Time-series Data  High-dimensional dynamic data:  Econometrics, video surveillance, biomedical applications, ...  How to classify and cluster such data?  Linear Dynamical System (LDS) based approach:  Model time-series data as output of LDSs.  Do statistics on spaces of models i.e., LDSs. Example: Classification & Clustering Human Actions  Model with/Learn LDSs:  Typically: ( ).  Classification & Clustering on LDS Spaces:  Choose a distance.  1-nearest neighbor, nearest mean, k-means clustering, … m-dimensional input (e.g. standard white Gaussian) [1] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003. [2] P. V. Overschee and B. D. Moor. Subspace algorithms for the stochastic identification problem. Automatica, 29(3):649–660, 1993. p-dimensional output (video or features) n-dimensional state Important fact we will visit again:  R=(A,B,C) is called a realization .  We distinguish between realization R and the LDS M it realizes  An LDS has an equivalent class of realizations all of which are indistinguishable from input-output relations. Some LDS Basics n: order of the LDS (p,m): size or dimension of the LDS  State space representation.  Equivalent representations e.g., vector ARMA or transfer function representations.  State space representation is advantageous because of fast learning or system identification algorithms available e.g., [1,2].  Consider -dimensional time series  Model with an LDS of the form (order and size ):  Statistical analysis on spaces of LDSs:  Choose an appropriate space containing .  Geometrize (e.g., define a distance, find shortest paths).  Develop tools for statistical analysis on :  Probability distributions.  Averaging algorithms.  PCA. Problem: Pattern Recognition for Time-series Data Via Statistical Analysis on Spaces of LDSs n: order of the LDS (p,m): size or dimension of the LDS Statistical analysis on spaces …(cont’ed)  We assume that all LDSs have the same order and size.  motivated by implementational and theoretical considerations.  Statistical analysis on spaces of LDSs:  Not a new problem!  1D version under different disguises is an old problem, but not fully addressed or solved in full generality.  For 1D AR models a nice theory exists and widely used e.g., speech processing [1,2,3,4].  High-dimensional version more recent (e.g., activity recognition).  Even here already some theoretical frameworks exist, but they are not computationally friendly. An important feature of our approach [1] S. I. Amari. Differential geometry of a parametric family of invertible linear systems-Riemannian metric, …. Math. Systems Theory, 20:53–82, 1987. [2] S. I. Amari and H. Nagaoka. Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs. AMS, 2000. [3] A. Jr. Gray., and J. Markel. "Distance measures for speech processing."Acoustics, Speech and Signal Processing, IEEE Trans.on 24.5 (1976): 380-391. [4] F. Barbaresco. Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet  Target Space: Processes generated by LDSs of size and order .  From processes to spectra:  Identify Gaussian process with its PSD matrix .  Parameterization of this space is difficult. Why?  ‘s dependence on is highly nonlinear. A Panoramic View: Various Ambient Spaces  Any distance in an ambient space induces an extrinsic distance.  Infinite vs. finite dimensional ambient spaces.  Most available distances:  For the infinite dimensional ambient spaces.  Specializing to smaller spaces possible but practically difficult [1,2]. We will try to by-pass this difficulty by directly comparing realizations!  Power spectral density is a p x p matrix.  Recall: under Gaussianity We will have a small ambient space! [1] S. I. Amari. Differential geometry of a parametric family of invertible linear systems-Riemannian metric, …. Mathematical Systems Theory, 20:53–82, 1987. [2] N. Ravishanker, E. L. Melnick, and C.-L. Tsai. Differential geometry of ARMA models. Journal of Time Series Analysis, 11(3):259–274, 1990. e.g., the Itakura-Saito divergence and its variants Some existing approaches  Control theory literature:  Our target space of fixed order and fixed size LDSs is an important space in control theory.  Its topology has been studied e.g., [3,4].  Riemannian distances have been proposed [1,4].  Computationally very demanding especially in high dimensions. [[1] B. Hanzon. Identifiability, Recursive Identification and Spaces of Linear Dynamical Systems, volume CWI Tracts 63 and 64. Amsterdam, 1989. [2] M. Hazewinkel. Moduli and canonical forms for linear dynamical systems II: the topological case. Mathematical Systems Theory, 10:363–385, 1977. [3] M. Hazewinkel, and R. E. Kalman. On invariants, canonical forms and moduli for linear, constant, finite dimensional, dynamical systems. Springer Berlin Heidelberg, 1976. [4] P. S. Krishnaprasad. Geometry of Minimal Systems and the Identification Problem. PhD thesis, Harvard University, 1977. How to Go from Spectra to realizations (A,B,C)?  Internal & input symmetries:  Group acts on realizations as  and generate the same output process.  When is the converse true?  Under minimum phase and certain extra rank conditions (i.e., on some submanifolds of realization space). For example:  ( is a tall, full-rank matrix).  (A,B) is controllable i.e., .  (A,C) is observable i.e. Group of n x n non-singular matrices Group of m x m orthogonal matrices m-dim. unit variance Gaussian noise A rank condition but in terms of a complex variable (frequency) Well known in control theory  The realization-LDS space pair form a principal fiber bundle with structure group .  Road map:  Instead of comparing PSDs we compare realizations considering the group action. LDS Space as Base Space of A Principal Fiber Bundle  A principal fiber bundle:  Under these rank conditions the action is free and proper.  LDS space is a smooth quotient manifold: [1] J. M. Lee. Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer, 2002. Follows from a theorem in diff. geometry [1]. Locally looks like a product space but not globally! The Alignment Distance: A Group Action Induced Distance  The Alignment distance:  Let a -invariant distance on the realization space be given.  Slide one realization along its fiber till it’s aligned with another one, i.e., solve: [1] L. Younes. Shapes and Diffeomorphisms, volume 171 of Applied Mathematical Sciences. Springer, 2010.  Difficulty:  Since is non-compact constructing such a is difficult. A true distance [1].  Computational advantage:  Many (extrinsic) -invariant distances are available.  For example:  An alignment distance:  A similar notion of standardization is used in Kendall’s shape analysis theory [2]. Standardize then Align the Realizations  Reduction of structure group:  A standardized (orthogonal) subbundle and the maximal compact subgroup acting on it: [1] S. Kobayashi and K. Nomizu. Foundations of Differential Geometry Volume I. Wiley Classics Library Edition. John Wiley & Sons, 1963. [2] D. G. Kendall, D. Barden, T. K. Carne, and H. Le. Shape and Shape Theory. Wiley Series In Probability And Statistics. John Wiley & Sons, 1999.  Consequential in quantum gauge theory !  Throw out the non-compact part of the structure group safely!  Basic fact from diff. geometry of fiber bundles [1].  Proof is essentially based on the Gram-Schmidt orthogonalization!  No Canonical reduction!  Depending on application we might prefer one. An Example:  Tall and full rank LDSs of order and size :  In video analysis and generalized dynamic factor models [1,3,4].  Standardize via SVD of C  The simplest distance ( ) :  A fast algorithm available [2].  For other LDS spaces:  Other standardizations possible via methods known as balancing.[1] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003. [2] N. D. Jimenez et. al. Fast Jacobi-type algorithm for computing distances between linear dynamical systems. In ECC, 2013. [3] B. Afsari, et. al. Group action induced distances for averaging and clustering linear dynamical systems […]. In IEEE CVPR, 2012. [4] M. Deistler et. al.. Generalized linear dynamic factor models: An approach via singular autoregressions. EJC, 3:211–224, 2010. Appears naturally as the output of a fast systems Identification algorithm [1]. C belongs to a Stieffel manifold . The Alignment Distance: Pros & Cons!  Not an intrinsic distance on our target space:  But does not come from an infinite dimensional ambient space.  In some instances can preserve system order in averaging naturally.  Optimization on orthogonal group:  Non-convex and local minimizers are possible.  But even for intrinsic Riemannian distances not every solution to geodesic equation is length minimizing.  Instead of a (large) set of ODEs we solve a static optimization. More Examples of Alignment Distances  The basic definition can be extended in various way.  For example:  Getting rid of :  Norms/or distances other than the Frobenius norm:  E.g., nuclear norm or 1-norm.  Distances on other spaces (e.g., Stieffel manifold for ):  Define distances which are insensitive to scaling:  Consider suitable realization submanifolds e.g., Align and Average Algorithm  Consider tall and full rank LDSs:  Take realizations and :  Coordinate descent: alternate alignment finding average:  Decouples to Alignment, Euclidean averaging & projection to Stieffel manifold (because of ):  Align realization with by finding .  Euclidean average and orthonormalize .  Iterate over .  Iterates (LDSs) are almost surely tall full rank and minimal phase, however stability of the average LDS is not guaranteed. [1]B. Afsari, R. Chaudhry, A. Ravichandran, and R. Vidal. Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic visual scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. [2] M. Deistler, B. O. Anderson, A. Filler, C. Zinner, and W. Chen. Generalized linear dynamic factor models: An approach via singular autoregressions. European Journal of Control, 3:211–224, 2010. Example: Clustering of Human Actions Via K-means Algorithm on Space of Tall Full Rank LDSs  We have 55 videos:  Model with LDSs (m=n=5,p=13542):  Four classes:  Running to the left/right.  Walking to the left/right. [1]B. Afsari, R. Chaudhry, A. Ravichandran, and R. Vidal. Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic visual scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.  K-means clustering algorithm:  Only number of clusters known.  Align & Average algorithm used in the K-means algorithm to find the center of clusters.  The four clusters are recovered. Conclusions More Information, • Vision Lab @ Johns Hopkins University • http://www.vision.jhu.edu •Thank You! • ONR N00014-09-10084, NSF #0941362, NSF #0941463, NSF 0931805, NSF #1335035

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Integral Geometry of Linearly Combined Gaussian and Student t, and Skew t Random Fields Yann Gavet, Ola Ahmad and Jean-Charles Pinoli École Nationale Supérieure des Mines de Saint-Etienne, LGF 5307, France ahmad@emse.fr, gavet@emse.fr, pinoli@emse.fr GSI2013 - Geometric Science of Information, Paris August 28-30 2013 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work General stochastic problem: Y = Data + error e.g., Data is a matrix of unknown random variables of N dimensions represented on a group of voxels, (2D or 3D images). How can they be approximated or represented? Non-parametric methods "often numeric" have no reference of probability. Parametric methods: e.g.; Random fields theory priori knowledge of Y based on the measurements. few significant parameters & geometric information that control and interpret some physical problems. 4 / 32 General stochastic problem: Y = Data + error e.g., Data is a matrix of unknown random variables of N dimensions represented on a group of voxels, (2D or 3D images). How can they be approximated or represented? Non-parametric methods "often numeric" have no reference of probability. Parametric methods: e.g.; Random fields theory priori knowledge of Y based on the measurements. few significant parameters & geometric information that control and interpret some physical problems. 4 / 32 Application example: Total hip implant 5 / 32 Statistical analysis via stochastic modelling Real phenomenon: biology, physics, mechanics, ... Stochastic represen- tation of problem Experimental observa- tions & measurements Geometric features (MF or LKCs,...) cal- culated from the model Emprical features Parameters estimation. Validity testning of model. Decision & analysis of phenomena. 6 / 32 Why Gaussian random fields? Completely characterized by their first and second order moments, mean and covariance function. Smooth and twice-differentiable Why not Gaussian random field ? Real observations are often not Gaussian 7 / 32 Why Gaussian random fields? Completely characterized by their first and second order moments, mean and covariance function. Smooth and twice-differentiable Why not Gaussian random field ? Real observations are often not Gaussian 7 / 32 Example: Worn engineered surface Rough Skewed Heavy-tailed distributions Need to go beyond the Gaussian 8 / 32 Beyond Gaussian random fields Related Gaussian RFs F : Rk R, f(x) = F(g(x)) g1, ..., gk are i.i.d Gaussian RFs. Examples: χ2 , F, t RFs Mixed random fields : High flexibility f(x) = β1Z(x) + β2G(x), G, Z are independent random fields. 9 / 32 Beyond Gaussian random fields Related Gaussian RFs F : Rk R, f(x) = F(g(x)) g1, ..., gk are i.i.d Gaussian RFs. Examples: χ2 , F, t RFs Mixed random fields : High flexibility f(x) = β1Z(x) + β2G(x), G, Z are independent random fields. 9 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 10 / 32 Random Fields A random field Y(x) : x S indexed by some space S, (e.g., S RN ), satisfies that any arbitrary p collection, (Y(x1), ..., Y(xp)) follows a multivariate probability density function with (p p) covariance matrix ΩY 11 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 12 / 32 Excursion sets Excursion sets A random set over a level h of Y: Eh = x S : Y(x) h Example: thresholding the surface at some height level 13 / 32 Excursion sets Excursion sets A random set over a level h of Y: Eh = x S : Y(x) h Example: thresholding the surface at some height level 13 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry Estimation of intrinsic volumes of Eh E[ k (Eh(Y, S))] = N−k j=0 j + k j j+k (S)ρj(h) 0(.) = χ(.) : Euler-Poincaré characteristic j(.) : j th dimensional volume ρj(.) : EC densities How to get j and ρj? 14 / 32 Integral geometry j(S) : N(S) = σ−N S det(Λ(x)) 1=2 dx N−1(S) = 1 2 σ−(N−1) @S det(Λ@S(x)) 1=2 N−1(dx) ρj(.) : Morse theory: ρj(h) = E ˙Y+ (j)det( ¨Y|j−1) ˙Y|j−1 = 0, Y = h p ˙Y|j−1 (0; h) 15 / 32 Integral geometry j(S) : N(S) = σ−N S det(Λ(x)) 1=2 dx N−1(S) = 1 2 σ−(N−1) @S det(Λ@S(x)) 1=2 N−1(dx) ρj(.) : Morse theory: ρj(h) = E ˙Y+ (j)det( ¨Y|j−1) ˙Y|j−1 = 0, Y = h p ˙Y|j−1 (0; h) 15 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 16 / 32 Gaussian t random field Definition Y(x) = G(x) + βT (x), β > 0, x S G: is a stationary Gaussian random field. T : is a homogeneous student’s t random field with ν degrees of freedom. Linear transformed pdf at each fixed point x of both normal and t pdfs: pY (y) = Γ +1 2 2πβΓ 2 2 1=2 ∞ −∞ 1 + (y u)2 β2ν −ν+1 2 e−u2 2 du 17 / 32 EC densities of Gaussian t random field Theorem [Ahmad and Pinoli(2013a)] The EC densities, ρj(.) of a two-dimensional real-valued Gaussian t random field with ν 2 degrees of freedom, and β > 0 are given, at level h, by: where ΛG = λGI2, and Λ = λI2 is the second spectral moments matrix of G, and Λ = λI2 is associated with T 18 / 32 Simulation example Simulated and analytical Minkowski functionals for the Gaussian−t random field of 5 degrees of freedom and β = 0.2. 19 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 20 / 32 Application to surface characterization Machined surface observed from Polyethylene material [Ahmad and Pinoli(2012)]: Fitting the empirical and analytical intrinsic volumes of the real surface and the Gaussian−t random field of 5 degrees of freedom and β = 1.2 21 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 22 / 32 Skew t Random field Definition G0(x), G1(x), ..., Gk (x), (x S), i.i.d stationary centred Gaussian random fields, with (N N) spectral moment matrix Λ. Z Normal(0, 1) is independent of G0, G1, ..., Gk : Y(x) = δ Z + 1 δ2G0(x) k i=1 G2 i (x)/k 1=2 , , δ2 < 1 (1) defines a skew t RF with k degrees of freedom, and skewness index α = δ/ 1 δ2 23 / 32 Example: two-dimensional skew t RFs 24 / 32 EC densities Theorem [Ahmad and Pinoli(2013c)] The EC densities, ρj(.) of a two-dimensional real-valued skew t random field with k degrees of freedom and skewness parameter α R, are given by: 25 / 32 Simulation example Simulated and analytical Minkowski functionals for the skew−t random field of 5 degrees of freedom and skewness index α = 0.7. 26 / 32 Outline Introduction Background & Motivation Preliminaries Random Fields Integral geometry of random fields Linear mixtures random fields Gaussian t random field Application Skew random fields Skew t random field Application Conclusions and Future work 27 / 32 Application to worn engineered surfaces Worn engineered surface observed from Polyethylene material [Ahmad and Pinoli(2013b)]: Fitting the empirical and analytical intrinsic volumes of the real surface and the skew−t random field of 6 degrees of freedom and δ = 0.5 28 / 32 Application to worn engineered surfaces Worn engineered surface observed from Polyethylene material [Ahmad and Pinoli(2013b)]: 28 / 32 Conclusion Random fields are computationally feasible, voxel-based, probabilistic models that can be used to approximate and represent some physical problems. Integral geometry provides interesting geometric information of the excursion sets, called intrinsic volumes. These geometric characteristics can be calculated analytically to fit the real measurements with some probabilistic model, and to estimate its parameters. Skew t random field is an appropriate model for statistical representation of worn engineered surfaces. 29 / 32 Future Work Using skew t random field for statistical analysis of surface roughness evolution. Space-scale random fields for multi-scale characterization. Space-time random fields for prediction of future behaviour, and for estimation of roughness evolution of rough engineered surfaces. Opened question: Intrinsic volumes of probabilistic models of non explicit or closed analytical form. 30 / 32 References: Ola Ahmad and Jean-Charles Pinoli. On the linear combination of the gaussian and student’s t random field and the integral geometry of its excursion sets. Statistics & Probability Letters, 83(2):559 – 567, 2013a. ISSN 0167-7152. doi: 10.1016/j.spl.2012.10.022. Ola Suleiman Ahmad and Jean-Charles Pinoli. On the linear combination of the gaussian and student-t random fields and the geometry of its excursion sets. In Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress on Engineering and Computer Science 2012, WCECS 2012, 24-26 October, San Francisco, USA, pages 1–5, 2012. Ola Suleiman Ahmad and Jean-Charles Pinoli. Lipschitz-killing curvatures of the excursion sets of skew student-t random fields. In 2nd Annual International Conference on Computational Mathematics, Computational Geometry & Statistics, volume 1, Feb 2013b. doi: 10.5176/2251-1911_CMCGS13.05. Ola Suleiman Ahmad and Jean-Charles Pinoli. Lipschitz-killing curvatures of the excursion sets of skew student’ s t random fields. Stochastic Models, 29(2):273–289, 2013c. ISSN 1532-6349. doi: 10.1080/15326349.2013.783290. 31 / 32 Thank You for Your Attention ahmad@emse.fr, gavet@emse.fr, pinoli@emse.fr

Creative Commons Aucune (Tous droits réservés) Aucune (Tous droits réservés)
Voir la vidéo
Voir la vidéo
Voir la vidéo

Nonlinear
 Modeling
 and
 Processing
  Using
 Empirical
 Intrinsic
 Geometry
  with
 Application
 to
 Biomedical
 Imaging
  Ronen
 Talmon1,
 Yoel
 Shkolnisky2,
 and
 Ronald
 Coifman1
  1Mathematics
 Department,
 Yale
 University
  2Applied
 Mathematics
 Department,
 Tel
 Aviv
 University
 
 
 
 
  Geometric
 Science
 of
 Information
 (GSI
 2013)
  August
 28-­‐30,
 2013,
 Paris
  Introduc)on
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  2
  •  Example
 for
 Intrinsic
 Modeling
 I
  "   Molecular
 Dynamics
  "   Consider
 a
 molecule
 oscilla)ng
 stochas)cally
 in
 water
 
  –  For
 example,
 Alanine
 Dipep)de
  "   Due
 to
 the
 coherent
 structure
 of
 molecular
 mo)on,
 we
 assume
 that
 the
  configura)on
 at
 any
 given
 )me
 is
 essen)ally
 described
 by
 a
 small
 number
  of
 structural
 variables
  – 
 In
 the
 Alanine
 case,
 we
 will
 discover
 two
 factors,
 
 corresponding
 to
 the
 dihedral
 angles
  1 8 7 4 3 9 5 6 2 10 Introduc)on
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  3
  •  Example
 for
 Intrinsic
 Modeling
 I
  "   We
 observe
 three
 atoms
 of
 the
 molecule
 for
 a
 certain
 period,
 three
 other
  atoms
 for
 a
 second
 period,
 and
 the
 rest
 in
 the
 last
 period
  "   The
 task
 is
 to
 describe
 the
 posi)ons
 of
 all
 atoms
 at
 all
 )mes
  –  More
 precisely,
 derive
 intrinsic
 variables
 that
 correspond
 to
 the
 dihedral
 angles
  and
 describe
 their
 rela)on
 to
 the
 posi)ons
 of
 all
 atoms
  –  We
 always
 derive
 the
 same
 intrinsic
 variables
 (angles)
 from
 par)al
 observa)ons
  (independently
 of
 the
 specific
 atoms
 we
 observe)
 
  –  If
 we
 learn
 the
 model,
 we
 can
 describe
 the
 
  posi)ons
 of
 all
 atoms
  1 8 7 4 3 9 5 6 2 10 Introduc)on
  Talmon,
 Shkolnisky,
 and
 Coifman
  4
  •  Example
 for
 Intrinsic
 Modeling
 II
  "   PredicBng
 EpilepBc
 Seizures
  " Goal:
 to
 warn
 the
 pa)ent
 prior
 to
 the
 seizure
  (when
 medica)on
 or
 surgery
 are
 not
 viable)
  " Data:
 intracranial
 EEG
 recordings
  8/28/13
  0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
  Talmon,
 Shkolnisky,
 and
 Coifman
  5
  •  Example
 for
 Intrinsic
 Modeling
 II
  " Our
 assump)on:
 the
 measurements
 are
 controlled
 by
 underlying
 processes
 that
  represent
 the
 brain
 ac)vity
  " Main
 Idea:
 predict
 seizures
 based
 on
 the
 “brain
 ac)vity
 processes”
  " Challenges:
 Noisy
 data,
 unknown
 model,
 and
 no
 available
 examples
  8/28/13
  0 0.5 1 1.5 2 x 10 5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Samples icEEGRecording Time Frames Frequency[Hz] 50 100 150 200 250 300 350 400 112 96 80 64 48 32 16 Introduc)on
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  6
  •  Manifold
 Learning
"   Represent
 the
 data
 as
 points
 in
 a
 high
 dimensional
  space
  "   The
 points
 lie
 on
 a
 low
 dimensional
 structure
  (manifold)
 that
 is
 governed
 by
 latent
 factors
  "   For
 example,
 atom
 trajectories
 and
 the
 dihedral
  angles
  Introduc)on
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  7
  •  Manifold
 Learning
"   Tradi)onal
 manifold
 learning
 techniques:
  –  Laplacian
 eigenmaps
 [Belkin
 &
 Niyogi,
 03’]
  –  Diffusion
 maps
 [Coifman
 &
 Lafon,
 05’;
 Singer
 &
 Coifman,
 08’]
  Manifold
  Learning
  Parameteriza)on
 of
  the
 manifold
  Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  8
  •  Formula)on
 –
 “State”
 Space
" Dynamical
 model:
 let
 
 
 
 
 
 
 be
 a
 
 
 
 -­‐dimensional
 underlying
 process
 (the
 state)
 in
  )me
 index
 
 
 
 
 that
 evolves
 according
 to
 
 
  where
 
 
 
 
 
 
 are
 unknown
 dri`
 coefficients
 and
 
 
 
 
 
 
 
 are
 independent
 white
 noises
 
  " Measurement
 modality:
 let
 
 
 
 
 
 be
 an
 
 
 
 -­‐dimensional
 measured
 signal,
 given
 by
  – 
 
 
 
 
 
 
 is
 the
 clean
 observa)on
 component
 drawn
 from
 the
 )me-­‐varying
 pdf
  – 
 
 
 
 
 
 
 is
 a
 corrup)ng
 noise
 (independent
 of
 
 
 
 
 
 )
  – 
 
 
 
 
 
 
 is
 an
 arbitrary
 measurement
 func)on
  " The
 goal:
 recover
 and
 track
 
 
 
 
 
 
 
 given
  zt = g(yt, vt) f(y; ✓) d✓i t = ai (✓t)dt + dwi t, i = 1, . . . , d Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  9
  •  Manifold
 Learning
 for
 Time
 Series
" The
 general
 outline:
  –  Construct
 an
 affinity
 matrix
 (kernel)
 between
 the
 measurements
 
 
 
 
 
 ,
 e.g.,
  –  Normalize
 the
 kernel
 to
 obtain
 a
 Laplace
 operator
 [Chung,
 97’]
  –  The
 spectral
 decomposi)on
 (eigenvectors)
 represents
 the
 underlying
 factors
  Manifold
  Learning
  k(zt, zs) = exp ⇢ kzt zsk2 " 'i 2 RN $ ✓i t 2 RN Measurement
  Modality
  N N N N N Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  10
  •  Intrinsic
 Modeling
  "   The
 mapping
 between
 the
 observable
 data
 and
 the
 underlying
 processes
  is
 o`en
 stochas)c
 and
 contains
 measurement
 noise
  –  Repeated
 observa)ons
 of
 the
 same
 phenomenon
 usually
 yield
 different
  measurement
 realiza)ons
  –  The
 measurements
 may
 be
 performed
 using
 different
 instruments/sensors
 
  "   Each
 set
 of
 related
 measurements
 of
 the
 same
 phenomenon
 will
 have
 a
  different
 geometric
 structure
  –  Depending
 on
 the
 instrument
 and
 the
 specific
 realiza)on
  –  Poses
 a
 problem
 for
 standard
 manifold
 learning
 methods
  Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  11
  •  Intrinsic
 Modeling
  Intrinsic(Embedding( Observable(Domain(II(Observable(Domain(I( Par6al(Observa6on(I7A( Par6al(Observa6on(I7B( Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  12
  •  How
 to
 Obtain
 an
 Intrinsic
 Model?
"   Q:
 Does
 the
 Euclidean
 distance
 between
 the
 measurements
 convey
 the
 informa)on?
  Realiza)ons
 of
 a
 random
 process
 and
 measurement
 noise
  "   A:
 We
 propose
 a
 new
 paradigm
 -­‐
 Empirical
 Intrinsic
 Geometry
 (EIG)
 
  [Talmon
 &
 Coifman,
 PNAS,
 13’]
  –  Find
 a
 proper
 high
 dimensional
 representa)on
  –  Find
 an
 intrinsic
 distance
 measure:
 robust
 to
 measurement
 noise
 and
 modality
 
 
 
  k(zt, zs) = exp ⇢ kzt zsk2 " Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  13
  •  Geometric
 Interpreta)on
"   Exploit
 perturba)ons
 to
 explore
 and
 learn
 the
 tangent
 plane
  "   Compare
 the
 points
 based
 on
 the
 principal
 direc)ons
 of
 the
 tangent
  planes
 (“local
 PCA”)
  Underlying*Process* Measurement*1* Measurement*2* Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  14
  •  The
 Mahalanobis
 Distance
"   We
 view
 the
 local
 histograms
 as
 feature
 vectors
 for
 each
 measurement
 
  "   For
 each
 feature
 vector,
 we
 compute
 the
 local
 covariance
 matrix
 in
 a
 temporal
  neighborhood
 of
 length
 
 
 
  where
 
 
 
 
 
 
 is
 the
 local
 mean
  "   Define
 a
 symmetric
 
 
 
 
 -­‐dependent
 distance
 between
 feature
 vectors
Defini)on
 –
 Mahalanobis
 Distance
  zt ! ht L C d2 C(zt, zs) = 1 2 (ht hs)T (C 1 t + C 1 s )(ht hs) Ct = 1 L tX s=t L+1 (hs µt)(hs µt)T Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  15
  •  Results

 
  "   Each
 histogram
 bin
 can
 be
 expressed
 as
 
 
  where
 
 
 
 
 
 
 
 are
 the
 histogram
 bins
 
 
  "   By
 relying
 on
 the
 independence
 of
 the
 processes:
  Assump)on
  "   The
 histograms
 are
 linear
 transforma)ons
 of
 the
 pdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
p(z; ✓) = Z g(y,v)=z f(y; ✓)q(v)dydv Lemma
  "   In
 the
 histograms
 domain,
 any
 sta)onary
 noise
 is
 a
 linear
 transforma)on
p(z; ✓) Hj hj t = Z z2Hj p(z; ✓)dz Empirical
 Intrinsic
 Geometry
  Assump)on
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  16
  •  Results
The
 Mahalanobis
 distance:
  "   Is
 invariant
 under
 linear
 transforma)ons,
 thus
 by
 lemma,
 noise
 resilient
  "   Approximates
 the
 Euclidean
 distance
 between
 samples
 of
 the
 underlying
 process,
 i.e.,
 
 
  "   The
 process
 
 
 
 
 
 
 can
 be
 described
 as
 a
 (possibly
 nonlinear)
 bi-­‐Lipschitz
 func)on
 of
 the
  underlying
 process
  "   We
 rely
 on
 a
 first
 order
 approxima)on
 of
 the
 measurement
 func)on:
 
  where
 
 
 
 
 
 
 
 
 is
 the
 Jacobian,
 defined
 as
  k✓t ✓sk2 = d2 C(zt, zs) + O(kht hsk4 ) Theorem
 [Talmon
 &
 Coifman,
 PNAS,
 13’]
  ht Jt ht = JT t ✓t + ✏t Jji t = @hj @✓i Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  17
  •  Rela)onship
 to
 Informa)on
 Geometry
Q:
 Does
 the
 structure
 of
 the
 measurements
 convey
 the
 informa)on?
  A:
 The
 local
 densi)es
 of
 the
 measurements
 do
 and
 not
 par)cular
 realiza)ons
 
  " Informa)on
 Geometry
 [Amari
 &
 Nagaoka,
 00’]:
  –  Use
 the
 Kullback-­‐Liebler
 divergence
 approximated
 by
 the
 Fisher
 metric
 
  where
 
 
 
 
 
 is
 the
 Fisher
 InformaBon
 matrix
" EIG:
 a
 similar
 data-­‐driven
 metric:
 consider
 the
 following
 features
  It D(p(zt; ✓)||p(zt0 ; ✓)) = ✓T t It ✓t lj t = ↵j log ⇣ hj t ⌘ Theorem
  "  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 (underlying
 manifold
 dimensionality)
  "  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 (feature
 vectors
 dimensionality)
 
It = JT t Jt Ct = JtJT t Empirical
 Intrinsic
 Geometry
  8/28/13
  Talmon,
 Shkolnisky,
 and
 Coifman
  18
  •  Anisotropic
 Kernel
"   Let
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 be
 a
 set
 of
 measurements
  –  For
 each
 measurement,
 we
 compute
 the
 local
 histogram
 and
 covariance
  "   Construct
 an
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  symmetric
 affinity
 matrix
  –  Approximates
 the
 Euclidean
 distances
 between
 the
 underlying
 process
  –  Invariant
 to
 the
 measurement
 modality
 and
 resilient
 to
 noise
  "   The
 corresponding
 Laplace
 operator
 
 
 
 
 
 
 can
 recover
 the
 underlying
 process
  "   Compute